1. Introduction

2. Quiz: Input Data

How to find total sales/store?

    KEY             VALUE

3. Quiz: Defensive Mapper Code

# Your task is to make sure that this mapper code does not fail on corrupt data lines,
# but instead just ignores them and continues working
import sys

def mapper():
    # read standard input line by line
    for line in sys.stdin:
        # strip off extra whitespace, split on tab and put the data in an array
        data = line.strip().split("\t")

        # This is the place you need to do some defensive programming
        # what if there are not exactly 6 fields in that line?
        # YOUR CODE HERE
        
        # this next line is called 'multiple assignment' in Python
        # this is not really necessary, we could access the data
        # with data[2] and data[5], but we do this for conveniency
        # and to make the code easier to read
        date, time, store, item, cost, payment = data
        
        # Now print out the data that will be passed to the reducer
        print "{0}\t{1}".format(store, cost)
        
test_text = """2013-10-09\t13:22\tMiami\tBoots\t99.95\tVisa
2013-10-09\t13:22\tNew York\tDVD\t9.50\tMasterCard
2013-10-09 13:22:59 I/O Error
^d8x28orz28zoijzu1z1zp1OHH3du3ixwcz114<f
1\t2\t3"""

# This function allows you to test the mapper with the provided test string
def main():
	import StringIO
	sys.stdin = StringIO.StringIO(test_text)
	mapper()
	sys.stdin = sys.__stdin__
# my answer
if len(data) != 6:
    continue

# solution
if len(data) == 6:
    date, time, store, item, cost, payment = data
    print "{0}\t{1}".format(store, cost)

4. Quiz: Between

What happens between mapper and reducer?

5. Quiz: Reducing

Keeping track

6. Quiz: Reducer Code

code: https://gist.github.com/chenghan/7456549

import sys

salesTotal = 0
oldKey = None

for line in sys.stdin:
    data = line.strip().split("\t")
    if len(data) != 2:
        # Something has gone wrong. Skip this line.
        continue

    thisKey, thisSale = data
    if oldKey and oldKey != thisKey:
      print oldKey, "\t", salesTotal
      oldKey = thisKey
      salesTotal = 0

     oldKey = thisKey
     salesTotal += float(thisSale)

Are we done?

We are not done because we have not processed the last key. 아래 코드를 추가해줘야 함.

if oldKey != None:
   print oldKey, "\t", salesTotal

7. Putting It All Together

8. Conclusion

See more in the free Chapter 3 of Tom White’s essential text, Hadoop: The Definitive Guide