Hadoop supports the MapReduce model, which was introduced by Google. The processing of data by MapReduce is a 2-way process.
Hadoop MapReduce Components
- Map: It is an ingestion and transformation step. Initially, all input records processed parallel
- Reduce: It is an aggregation and stigmatization step. All associated records processed together by a single entity.
- Hadoop framework is an apache software. It is an open-source.
- Hadoop Core, our flagship sub-project, provides a distributed filesystem (HDFS) and support for the MapReduce distributed computing metaphor.
- HBase builds on Hadoop Core to provide a scalable, distributed database. Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.
- ZooKeeper is a highly available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates for critical shared state.
- Hive is a data warehouse infrastructure built on Hadoop Core that provides data summarization, ad-hoc querying, and analysis of datasets.
- HDFS: Hadoop distributed file system.