Hadoop- A Tool for Your Big Data Problems

Module-1: Tutorial Introduction

  1. Introduction
  2. Goals for this Module
  3. Outline
  4. Problem Scope
    1. Challenges at Large Scale
    2. Moore’s Law
  5. The Hadoop Approach
    1. Comparison to Existing Techniques
    2. Data Distribution
    3. MapReduce: Isolated Processes
    4. Flat Scalability
  6. The Rest of the Tutorial

Module-2:HDFS 

  1. Introduction
  2. Goals for this Module
  3. Outline
  4. Distributed File System Basics
  5. Configuring HDFS
  6. Interacting With HDFS
    1. Common Example Operations
    2. HDFS Command Reference
    3. DFSAdmin Command Reference
  7. Using HDFS in MapReduce
  8. Using HDFS Programmatically
  9. HDFS Permissions and Security

10. Additional HDFS Tasks

  1. Rebalancing Blocks
  2. Copying Large Sets of Files
  3. Decommissioning Nodes
  4. Verifying File System Health
  5. Rack Awareness
  1. HDFS Web Interface
  2. References

Module 3: Getting Started With Hadoop

  1. Introduction
  2. Goals for this Module
  3. Outline
  4. Prerequisites
  5. A Virtual Machine Hadoop Environment
    1. Installing VMware Player
    2. Setting up the Virtual Environment
    3. Virtual Machine User Accounts
    4. Running a Hadoop Job
    5. Accessing the VM via ssh
    6. Shutting Down the VM
  6. Getting Started With Eclipse
    1. Downloading and Installing
    2. Installing the Hadoop MapReduce Plugin
    3. Making a Copy of Hadoop
    4. Running Eclipse
    5. Configuring the MapReduce Plugin
  7. Interacting With HDFS
    1. Using the Command Line
    2. Using the MapReduce Plugin For Eclipse
  8. Running a Sample Program
    1. Creating the Project
    2. Creating the Source Files
    3. Launching the Job
  9. References & Resources

10. Complete Tools List

 Module-4: Map Reduce

  1. Introduction
  2. Goals for this Module
  3. Outline
  4. Prerequisites
  5. MapReduce Basics
    1. Functional Programming Concepts
    2. List Processing
    3. Mapping Lists
    4. Reducing Lists
    5. Putting them Together in MapReduce
    6. An Example Application: Word Count
    7. The Driver Method
  6. MapReduce Data Flow
    1. A Closer Look
    2. Additional MapReduce Functionality
    3. Fault Tolerance
  7. Checkpoint
  8. More Tips
    1. Chaining Jobs
    2. Troubleshooting: Debugging MapReduce
    3. Listing and Killing Jobs
  9. Additional Language Support
    1. Pipes
    2. Hadoop Streaming

10. Conclusions

  1. Solution to Inverted Index Code

Module-5: Advanced Map Reduce

  1. Writable Types
  2. Custom Key Types
  3. Using Custom Types
  4. Faster Comparison Operations
  5. Final Writable Notes
  1. Custom File Formats
  2. Alternate Data Sources

Module 6: Topics

  1. Introduction
  2. Goals for this Module
  3. Outline
  4. ZooKeeper
    1. Motivation
    2. Data Storage in ZooKeeper
    3. ZooKeeper Applications
    4. Distributed Consensus
  5. Pig
    1. Motivation
    2. Pig Latin
    3. Pig Latin Data Types
    4. Loading Data Into Pig
    5. Pig Latin Operators
    6. Setting Up Pig
  6. References

Module 7: Managing Hadoop Cluster

Outline

  1. Introduction
  2. Goals for this Module
  3. Outline
  4. Basic Setup
    1. Java Requirements
    2. Operating System
    3. Downloading and Installing Hadoop
  5. Important Directories
  6. Selecting Machines
  7. Cluster Configurations
    1. Small Clusters: 2-10 Nodes
    2. Medium Clusters: 10-40 Nodes
    3. Large Clusters: Multiple Racks
  8. Performance Monitoring
    1. Ganglia
    2. Nagios
  9. Additional Tips

1o. References & Resources

Module 8: PIG Tutorial

PIG Tutorial

Advertisements

Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.