Writing the Mapper and Reducer
In this step, we'll dive into the heart of Hadoop MapReduce and create our own Mapper and Reducer classes.
Explore the Data File
First, use su - hadoop
command to switch identity. The data file data.txt
is stored in the /user/hadoop/input
directory of HDFS, which stores the contents of some people's conversations, use the following command to view the contents:
hdfs dfs -cat /user/hadoop/input/data.txt
Custom Mapper
Next, we'll create a custom Mapper class called WordCountMapper
that extends Mapper
. This Mapper will process the input data and emit key-value pairs, noting that the data processed is the content of each line of the conversation, not the names of people. Refer to the following example to supplement the map
method in WordCountMapper.java
under /home/hadoop/
.
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// Convert the Text object to a String
String line = value.toString();
// Tokenize the string using StringTokenizer
StringTokenizer tokenizer = new StringTokenizer(line);
// Iterate through each token and write it to the context
while (tokenizer.hasMoreTokens()) {
// Set the current word
word.set(tokenizer.nextToken().trim());
// Write the word and its count to the context
context.write(word, one);
}
}
Then use the following command to compile the code in java 8 version:
javac -source 8 -target 8 -cp $HADOOP_HOME/share/hadoop/common/hadoop-common-3.3.6.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar:. WordCountMapper.java
Custom Reducer
Finally, we'll create a custom Reducer class called WordCountReducer
that extends Reducer
. This Reducer will aggregate the values for each key and emit the final result. Supplement the reduce
method in WordCountReducer.java
under /home/hadoop/
with the following example.
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// Initialize a variable to store the sum of counts for each word
int sum = 0;
// Iterate through all counts for the current word and calculate the total sum
for (IntWritable value : values) {
sum += value.get();
}
// Set the final count for the current word
result.set(sum);
// Write the word and its final count to the context
context.write(key, result);
}
Then use the following command to compile the code in java 8 version:
javac -source 8 -target 8 -cp $HADOOP_HOME/share/hadoop/common/hadoop-common-3.3.6.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar:. WordCountReducer.java