Hadoop ::.

Formats: Classroom, Online and Onsite Outline: Admin-

1) Introduction to Big Data and Hadoop

• What is Big Data?

• What are the challenges for processing big data?

• What technologies support big data?

• What is Hadoop and Why Hadoop?

• History and Use Cases of Hadoop

• Hadoop Eco System

• HDFS

• Map Reduce

2) Understanding the Cluster

• Typical workflow

• Writing files to HDFS

• Reading files from HDFS

• Rack Awareness

• Daemons

3) Best Practices for Cluster Setup

• Best Practices

• How to choose the right Hadoop distribution

• How to choose right hardware

4) Cluster Setup

• Install Pseudo cluster

• Install Multi node cluster

• Configuration

• Setup cluster on Cloud - EC2

• Tools

• Security

• Benchmarking the cluster

5) Routine Admin procedures

• Metadata & Data Backups

File systemcheck (fsck)

• File system Balancer

• Commissioning and decommissioning nodes

• Upgrading

• Recovering failed namenode

6) Monitoring the Cluster

• Using the Web user interfaces

• Hadoop Log files

• Setting the log levels

• Monitoring with Nagios

• Monitoring with Ganglia

7) PIG

8) HIVE

9) HBASE

10) Sqoop

11) Oozie

Developer –

1) Understanding the Cluster

• Typical workflow

• Writing files to HDFS

• Reading files from HDFS

• Rack Awareness

• Daemons

2) Map Reduce

• Before Map Reduce

• Map Reduce Overview

• Word Count Problem

• Word Count Flow and Solution

• Map Reduce Flow

• Algorithms for simple problems

• Algorithms for complex problems

3) Developing the Map Reduce Application

• Data Types

File Formats

• Explain the Driver, Mapper and Reducer code

• Configuring development environment - Eclipse

• Writing Unit Test

• Running locally

• Running on Cluster

4) Anatomy of Map Reduce Job run

• Job Submission

• Job Initialization

• Task Assignment

• Job Completion

• Job Scheduling

• Job Failures

• Shuffle and sort

• Oozie Workflows

5) Map Reduce Types and Formats

• MapReduce Types

• Input Formats

• Output Formats

6) Map Reduce Features

• Counters

• Sorting

• Joins-Map Side and Reduce Side

• Side Data Distribution

• MapReduce Combiner

• MapReduce Partitioner

• MapReduce Distributed Cache

7) Hive and PIG

• When to Use PIG and HIVE

• Fundamentals and Concepts

8) HBASE

• CAP Theorem

• Hbase Architecture and concepts

• Programming