Hadoop ::.
Formats: Classroom, Online and Onsite Outline: Admin-
1) Introduction to Big Data and Hadoop
• What is Big Data?
• What are the challenges for processing big data?
• What technologies support big data?
• What is Hadoop and Why Hadoop?
• History and Use Cases of Hadoop
• Hadoop Eco System
• HDFS
• Map Reduce
2) Understanding the Cluster
• Typical workflow
• Writing files to HDFS
• Reading files from HDFS
• Rack Awareness
• Daemons
3) Best Practices for Cluster Setup
• Best Practices
• How to choose the right Hadoop distribution
• How to choose right hardware
4) Cluster Setup
• Install Pseudo cluster
• Install Multi node cluster
• Configuration
• Setup cluster on Cloud - EC2
• Tools
• Security
• Benchmarking the cluster
5) Routine Admin procedures
• Metadata & Data Backups
File systemcheck (fsck)
• File system Balancer
• Commissioning and decommissioning nodes
• Upgrading
• Recovering failed namenode
6) Monitoring the Cluster
• Using the Web user interfaces
• Hadoop Log files
• Setting the log levels
• Monitoring with Nagios
• Monitoring with Ganglia
7) PIG
8) HIVE
9) HBASE
10) Sqoop
11) Oozie
Developer –
1) Understanding the Cluster
• Typical workflow
• Writing files to HDFS
• Reading files from HDFS
• Rack Awareness
• Daemons
2) Map Reduce
• Before Map Reduce
• Map Reduce Overview
• Word Count Problem
• Word Count Flow and Solution
• Map Reduce Flow
• Algorithms for simple problems
• Algorithms for complex problems
3) Developing the Map Reduce Application
• Data Types
File Formats
• Explain the Driver, Mapper and Reducer code
• Configuring development environment - Eclipse
• Writing Unit Test
• Running locally
• Running on Cluster
4) Anatomy of Map Reduce Job run
• Job Submission
• Job Initialization
• Task Assignment
• Job Completion
• Job Scheduling
• Job Failures
• Shuffle and sort
• Oozie Workflows
5) Map Reduce Types and Formats
• MapReduce Types
• Input Formats
• Output Formats
6) Map Reduce Features
• Counters
• Sorting
• Joins-Map Side and Reduce Side
• Side Data Distribution
• MapReduce Combiner
• MapReduce Partitioner
• MapReduce Distributed Cache
7) Hive and PIG
• When to Use PIG and HIVE
• Fundamentals and Concepts
8) HBASE
• CAP Theorem
• Hbase Architecture and concepts
• Programming