How MapReduce Process Works in Hadoop

Hadoop supports the MapReduce model, which was introduced by Google. The processing of data by MapReduce is a 2-way process.

Hadoop MapReduce Components

  • Map: It is an ingestion and transformation step. Initially, all input records processed parallel
  • Reduce: It is an aggregation and stigmatization step. All associated records processed together by a single entity.

Hadoop Framework

  • Hadoop framework is an apache software. It is an open-source.
  • Hadoop Core, our flagship sub-project, provides a distributed filesystem (HDFS) and support for the MapReduce distributed computing metaphor.
  • HBase builds on Hadoop Core to provide a scalable, distributed database. Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.
  • ZooKeeper is a highly available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates for critical shared state.
  • Hive is a data warehouse infrastructure built on Hadoop Core that provides data summarization, ad-hoc querying, and analysis of datasets.
  • HDFS: Hadoop distributed file system.

Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.

One thought

  1. I’ve seen your blog about “Mainframe-How to Modernize Batch Process”. I’m contributing to a open source project with the goal to reproduce a batch execution environment (like on MF) on open system, in cloud. It’s called “JEM, the BBE” and you could find it here:
    Hadoop integration is planned as well!
    Let’s hope that could be interesting!


Comments are closed.