Introduction to Hadoop and Map Reduce

Big Data – Introduction to Hadoop .

Hadoop is a Map Reduce framework processing large datasets in parallel, on clusters of commodity hardware. This is cheaper, as it’s a open source solution that can run on commodity hardware . It’s faster on massive data volumes as data processing is done in parallel.

A complete Hadoop MapReduce based solution may have following layers

  1. Hadoop Core – HDFS
  2. Map-Reduce API
  3. Data Access
  4. Tools and libraries

Hadoop works by splitting files into blocks and sharing them across a number nodes in a cluster. It then uses packaged code distributed across the nodes to process the data in parallel. This means that the data can be dealt with more quickly than it could be using a conventional architecture.

 

How does Hadoop and SQL compare. Watch this video for more info

Check out coursehunt.net for more courses on technology and http://www.coursehunt.net/?query=hadoop for Hadoop specific courses

Read the complete document in this slideshare

Hadoop MapReduce Fundamentals from coursehunt  (Slideshare presentation – best viewed in full screen)

Video series on Hadoop MapReduce

Leave A Comment