Write Mapper
In this step, you will write the Mapper class to process the input data and emit intermediate key-value pairs.
Open the terminal and follow the steps below to get started.
Change the user to hadoop
and then switch to the home directory of the hadoop
user:
su - hadoop
Create a Java file for the Mapper class:
nano /home/hadoop/SpaceBattleMapper.java
Then, add the following code to the SpaceBattleMapper.java
file:
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import java.io.IOException;
public class SpaceBattleMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// Split the input line into words
String[] words = value.toString().split("\\s+");
// Emit a key-value pair for each word
for (String w : words) {
word.set(w);
context.write(word, one);
}
}
}
Tips: You can copy the code from the prompt box on the right and paste it with Ctrl + Shift + V
into the open nano editor. Press Ctrl + O
to save the file and Enter
to confirm when prompted by the nano editor. Finally, use Ctrl + X
to exit the editor.
The SpaceBattleMapper
class extends the Mapper
class from the Hadoop framework. It is used to process input data in the form of key-value pairs, where the key is a LongWritable
representing the byte offset of the line in the input file, and the value is a Text
object representing the line of text.
The class defines two private fields:
one
: An IntWritable
object with a constant value of 1. This is used as the value in the emitted key-value pairs.
word
: A Text
object used to store each word extracted from the input line.
The map
method is overridden to provide the specific mapping logic:
- The input
Text
value is converted to a string and split into words based on whitespace.
- For each word in the array, the
word
object is set to that word, and a key-value pair is emitted with the word as the key and one
as the value. This is done using the context.write
method.
This Mapper class is designed to emit a key-value pair for each word in the input data, with the word as the key and the integer 1 as the value. This setup is commonly used in word count applications, where the goal is to count the occurrences of each word in a dataset.