Big Data – Introduction to Hadoop .
Hadoop is a Map Reduce framework processing large datasets in parallel, on clusters of commodity hardware. This is cheaper, as it’s a open source solution that can run on commodity hardware . It’s faster on massive data volumes as data processing is done in parallel.
A complete Hadoop MapReduce based solution may have following layers
- Hadoop Core – HDFS
- Map-Reduce API
- Data Access
- Tools and libraries
Hadoop works by splitting files into blocks and sharing them across a number nodes in a cluster. It then uses packaged code distributed across the nodes to process the data in parallel. This means that the data can be dealt with more quickly than it could be using a conventional architecture.
How does Hadoop and SQL compare. Watch this video for more info
Read the complete document in this slideshare
Video series on Hadoop MapReduce