Write the Driver
In this step, we will create a Driver class that ties together the Mapper, Partitioner, and Reducer classes and runs the MapReduce job.
First, create a Java file for the Driver class:
touch /home/hadoop/OlympicDriver.java
Then, add the following code to the OlympicDriver.java
file:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class OlympicDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Olympic Partitioner");
job.setJarByClass(OlympicDriver.class);
job.setMapperClass(OlympicMapper.class);
job.setPartitionerClass(OlympicPartitioner.class);
job.setReducerClass(OlympicReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The OlympicDriver
class is the entry point for the MapReduce job. It sets up the job configuration, specifies the Mapper, Partitioner, and Reducer classes, and configures the input and output paths.
In the main
method, we create a new Configuration
object and a Job
instance with the job name "Olympic Partitioner". We set the Mapper, Partitioner, and Reducer classes using the corresponding setter methods.
We also set the output key and value classes to Text
. The input and output paths are specified using command-line arguments passed to the driver.
Finally, we call the waitForCompletion
method on the Job
instance to run the MapReduce job and exit with an appropriate status code (0 for success, 1 for failure).
To run the job, you need to compile the Java classes and create a jar file. Then, you can execute the jar file using the following command:
javac -source 8 -target 8 -classpath "/home/hadoop/:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-3.3.6.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/*" -d /home/hadoop /home/hadoop/OlympicMapper.java /home/hadoop/OlympicPartitioner.java /home/hadoop/OlympicReducer.java /home/hadoop/OlympicDriver.java
jar cvf olympic.jar *.class
hadoop jar olympic.jar OlympicDriver /input /output
Finally, we can check the results by running the following command:
hadoop fs -cat /output/*
Example output:
Event_1 Athlete_17,Athlete_18,Athlete_79,Athlete_71,Athlete_77,Athlete_75,Athlete_19,Athlete_24,Athlete_31,Athlete_32,Athlete_39,Athlete_89,Athlete_88,Athlete_87,Athlete_100,Athlete_13,Athlete_52,Athlete_53,Athlete_58
Event_2 Athlete_1,Athlete_97,Athlete_96,Athlete_85,Athlete_81,Athlete_80,Athlete_72,Athlete_68,Athlete_64,Athlete_61,Athlete_54,Athlete_48,Athlete_47,Athlete_43,Athlete_28,Athlete_23,Athlete_21,Athlete_15,Athlete_12,Athlete_3
Event_3 Athlete_11,Athlete_55,Athlete_8,Athlete_46,Athlete_42,Athlete_41,Athlete_40,Athlete_38,Athlete_33,Athlete_92,Athlete_29,Athlete_27,Athlete_25,Athlete_93,Athlete_22,Athlete_20,Athlete_98,Athlete_14,Athlete_69,Athlete_99,Athlete_66,Athlete_65
Event_4 Athlete_90,Athlete_50,Athlete_37,Athlete_36,Athlete_91,Athlete_74,Athlete_73,Athlete_63,Athlete_26,Athlete_78,Athlete_5,Athlete_62,Athlete_60,Athlete_59,Athlete_82,Athlete_4,Athlete_51,Athlete_86,Athlete_2,Athlete_94,Athlete_7,Athlete_95
Event_5 Athlete_34,Athlete_76,Athlete_57,Athlete_56,Athlete_30,Athlete_16,Athlete_6,Athlete_10,Athlete_83,Athlete_84,Athlete_70,Athlete_45,Athlete_44,Athlete_49,Athlete_9,Athlete_67,Athlete_35