Word Count (Map Reduce)
主要是用来了解map reduce这个框架下由开发人员自定义的map和reduce这两个stage具体在做什么逻辑的事情,总体来说就是在mapper函数(实际中则是mapper机器)下不能使用线性级别的额外空间去处理数据,例如哈希表,数组之类的,而reducer里面则是一个同样key在各个map机器中计算得到的结果的一个临时集合values
class WordCount:
# @param {str} line a text, for example "Bye Bye see you next"
def mapper(self, key, line):
# Write your code here
# Please use 'yield key, value'
for word in line.split():
yield (word, 1)
# @param key is from mapper
# @param values is a set of value with the same key
def reducer(self, key, values):
# Write your code here
# Please use 'yield key, value'
result = 0
for value in values:
result += value
yield (key, result)