1. Quiz: HDFS

Is there a problem? > https://youtu.be/6F8-cCUbRU8

2. Quiz: Data Redundancy

Any problem now?(when NN failure)

3. NameNode Standby

The active namenode works before, but the standby can be configured to take over if the active one fails.

4. HDFS Demo

  • Hadoop fs commands like unix commands
  • You can read instructions on how to access and run the virtual machines here
hadoop fs -ls
hadoop fs -put purchases.txt
hadoop fs -ls
hadoop fs -tail purchases.txt
hadoop fs -mv purchases.txt newname.txt
hadoop fs -rm newname.txt
hadoop fs -mkdir myinput
hadoop fs -put purchases.txt myinput
hadoop fs -ls myinput

5. MapReduce

6. Real World Example

7. Quiz: Hashtables

Hashtables > Key -> Value problems?

8. Distributed Work

9. Summary of MapReduce

Note: Hadoop takes care of the Shuffle and Sort phase. You do not have to sort the keys in your reducer code, you get them in already sorted order.

10. Quiz: Sort Final Result

Final results in sorted order?

11. Quiz: Multiple Reducers

There are 4 intermediates: Apple, Banana, Carrot, Grape Which keys go to the first reducer?

Even One reducer would get none. See a nice overview of partitioning in Hadoop

12. Daemons of MapReduce

  • Job Tracker
  • Task Trackers

13. Running a Job

RUNNING A MAPREDUCE JOB WITH THE VM ALIAS hs {mapper script} {reducer script} {input_file} {output directory}

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.1.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input myinput -output joboutput

hadoop fs -get joboutput/part-00000 mylocalfile.txt

14. Simplifying Things

15. A Different Application

16. Other Problems

17. Virtual Machine Setup

You can read instructions on how to download and run the virtual machineshere.

Information on how to transfer files back and forth to the virtual machine can be found here.

For step-by-step instructions for how to load data into HDFS, please re-watch HDFS Demo. For a reminder of how to run a mapreduce job, please re-watch Simplifying Things.

18. Conclusion

See more in the free Chapter 6 of Tom White’s essential text, Hadoop: The Definitive Guide