Handling Large Volumes of Flight Data in Hadoop
Understanding Flight Data
Flight data typically includes information such as:
- Departure and arrival times
- Flight routes
- Aircraft details
- Passenger numbers
- Weather conditions
- Fuel consumption
- Maintenance records
This data can be generated in large volumes, especially for major airlines and airports.
Storing Flight Data in HDFS
To store and manage large volumes of flight data, we can use the Hadoop Distributed File System (HDFS). HDFS provides a scalable and fault-tolerant storage solution for big data applications.
Here's an example of how you can upload flight data to HDFS using the Hadoop CLI:
## Create an HDFS directory for flight data
hdfs dfs -mkdir /flight_data
## Upload a CSV file containing flight data to HDFS
hdfs dfs -put flight_data.csv /flight_data
Processing Flight Data with MapReduce
Once the flight data is stored in HDFS, we can use the MapReduce programming model to process and analyze the data. Here's an example of a simple MapReduce job to calculate the average flight duration for each route:
public class FlightDurationAnalysis {
public static class FlightDurationMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] fields = value.toString().split(",");
String route = fields[0] + "-" + fields[1];
double duration = Double.parseDouble(fields[2]);
context.write(new Text(route), new DoubleWritable(duration));
}
}
public static class FlightDurationReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
@Override
protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
double totalDuration = 0;
int count = 0;
for (DoubleWritable value : values) {
totalDuration += value.get();
count++;
}
double avgDuration = totalDuration / count;
context.write(key, new DoubleWritable(avgDuration));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Flight Duration Analysis");
job.setJarByClass(FlightDurationAnalysis.class);
job.setMapperClass(FlightDurationMapper.class);
job.setReducerClass(FlightDurationReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
This MapReduce job reads flight data from HDFS, calculates the average flight duration for each route, and writes the results back to HDFS.