Hadoop supports MapReduce model, which was introduced by Google.
The processing of data by Mapreduce is 2 way process.
Map:It is an ingestion and transformation step. Initially all input records processed paralally
Reduce:It is an aggregation and summarization step.All associated records processed together by single entity.
Hadoop framework is a apache software. It is an open source.
- Hadoop Core, our flagship sub-project, provides a distributed filesystem (HDFS) and support for the MapReduce distributed computing metaphor.
- HBase builds on Hadoop Core to provide a scalable, distributed database.
Pig is a high-level data-flow language and execution framework for parallel computation. It is built on top of Hadoop Core.
ZooKeeper is a highly available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates for critical shared state.
Hive is a data warehouse infrastructure built on Hadoop Core that provides data summarization, adhoc querying and analysis of datasets.
HDFS: Hadoop distributed file system.
Keep watching on this space for more info on Hadoop.
- Large-Scale Data Processing with MapReduce and PHP (architects.dzone.com)
- When Pigs Fly: an apache pig introduction (bluewatersql.wordpress.com)
- How to conquer ‘big data’ with MapReduce & MPP (venturebeat.com)