Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

最简单的wordcount程序
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

加上in-mapper-combine:
https://jinwooooo.github.io/jinwooooo-blog/hadoop-in-mapper-combiner/

hadoop坑:
Text不能代替String作为map的key:Text作为key,判断是是地址值
因此不管是用set的方式还是每次new,都无法实现用string作key的语义

Map<Text, Integer> count = new HashMap<Text, Integer>();
private Text word = new Text();

// local aggregate
public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
    StringTokenizer itr = new StringTokenizer(value.toString());
    while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    if(count.containsKey(word)) { // 所有的都会走这个分支,因为word的地址不变
        count.put(word, (int) count.get(word) + 1);
    }
    else {
        count.put(word, 1);
    }
    }
}

这才可以:

Map<String, Integer> count = new HashMap<String, Integer>();

// local aggregate
public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
    StringTokenizer itr = new StringTokenizer(value.toString());
    while (itr.hasMoreTokens()) {
    String wd = itr.nextToken();
    if(count.containsKey(wd)) {
        count.put(wd, (int) count.get(wd) + 1);
    }
    else {
        count.put(wd, 1);
    }
    }
}

评论