Hadoop- MapReduce Concept

Hadoop supports MapReduce model, which was introduced by Google.

The processing of data by Mapreduce is 2 way process.

Map:It is an ingestion and transformation step. Initially all input records processed paralally

Reduce:It is an aggregation and summarization step.All associated records processed together by single entity.

Hadoop framework is a apache software. It is an open source.

  • Hadoop Core, our flagship sub-project, provides a distributed filesystem (HDFS) and support for the MapReduce distributed computing metaphor.
  • HBase builds on Hadoop Core to provide a scalable, distributed database.
  • Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.

  • ZooKeeper is a highly available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates for critical shared state.

  • Hive is a data warehouse infrastructure built on Hadoop Core that provides data summarization, adhoc querying and analysis of datasets.

HDFS: Hadoop distributed file system.

Keep watching on this space for more info on Hadoop.

Advertisements

Author: Srini

Experienced software developer. Skills in Development, Coding, Testing and Debugging. Good Data analytic skills (Data Warehousing and BI). Also skills in Mainframe.

One thought

  1. I’ve seen your blog about “Mainframe-How to Modernize Batch Process”. I’m contributing to a open source project with the goal to reproduce a batch execution environment (like on MF) on open system, in cloud. It’s called “JEM, the BBE” and you could find it here: http://www.pepstock.org.
    Hadoop integration is planned as well!
    Let’s hope that could be interesting!

    Like

Comments are closed.