Hadoop- MapReduce Concept

Hadoop supports MapReduce model, which was introduced by Google.

The processing of data by Mapreduce is 2 way process.

Map:It is an ingestion and transformation step. Initially all input records processed paralally

Reduce:It is an aggregation and summarization step.All associated records processed together by single entity.

Hadoop framework is a apache software. It is an open source.

  • Hadoop Core, our flagship sub-project, provides a distributed filesystem (HDFS) and support for the MapReduce distributed computing metaphor.
  • HBase builds on Hadoop Core to provide a scalable, distributed database.
  • Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.

  • ZooKeeper is a highly available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates for critical shared state.

  • Hive is a data warehouse infrastructure built on Hadoop Core that provides data summarization, adhoc querying and analysis of datasets.

HDFS: Hadoop distributed file system.

Keep watching on this space for more info on Hadoop.


One thought on “Hadoop- MapReduce Concept

  1. I’ve seen your blog about “Mainframe-How to Modernize Batch Process”. I’m contributing to a open source project with the goal to reproduce a batch execution environment (like on MF) on open system, in cloud. It’s called “JEM, the BBE” and you could find it here: http://www.pepstock.org.
    Hadoop integration is planned as well!
    Let’s hope that could be interesting!


Comments are closed.