Implement the Driver
In this step, we will create a Driver class to configure and run the MapReduce job.
First, create a Java file for the Driver class:
touch /home/hadoop/WordLengthDriver.java
Then, add the following code to the WordLengthDriver.java
file:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordLengthDriver {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordLengthDriver <input> <output>");
System.exit(1);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Word Length");
job.setJarByClass(WordLengthDriver.class);
job.setMapperClass(WordLengthMapper.class);
job.setReducerClass(WordLengthReducer.class);
job.setOutputKeyClass(CompositeKey.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
In the above code, we create a WordLengthDriver
class that serves as the entry point for our MapReduce job. The main
method takes two command-line arguments: the input path and the output path for the job.
Inside the main
method, we create a new Configuration
object and a new Job
object. We configure the job by setting the mapper and reducer classes, the output key and value classes, and the input and output paths.
Finally, we submit the job and wait for its completion. If the job completes successfully, we exit with a status code of 0; otherwise, we exit with a status code of 1.
To run the job, you can use the following command:
javac -source 8 -target 8 -classpath "/home/hadoop/:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-3.3.6.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/*" -d /home/hadoop /home/hadoop/WordLengthMapper.java /home/hadoop/CompositeKey.java /home/hadoop/WordLengthReducer.java /home/hadoop/WordLengthDriver.java
jar cvf word-length.jar *.class
hadoop jar word-length.jar WordLengthDriver /input /output
Finally, we can check the results by running the following command:
hadoop fs -cat /output/*
Example output:
A:3 Amr
A:6 AADzCv
A:10 AlGyQumgIl
...
h:7 hgQUIhA
h:8 hyrjMGbY, hSElGKux
h:10 hmfHJjCkwB
...
z:6 zkpRCN
z:8 zfMHRbtk
z:9 zXyUuLHma