Working with Built-in Writable Types
Hadoop's Writable interface comes with a set of built-in data types that can be used directly in your Hadoop applications. These built-in Writable types provide a solid foundation for working with common data formats and simplify the process of integrating your data with Hadoop's processing pipelines.
Commonly Used Built-in Writable Types
Hadoop's Writable interface includes several built-in data types that are commonly used in big data processing. Some of the most notable built-in Writable types are:
IntWritable
: Represents an integer value.
LongWritable
: Represents a long integer value.
FloatWritable
: Represents a floating-point value.
DoubleWritable
: Represents a double-precision floating-point value.
TextWritable
: Represents a Unicode text string.
BytesWritable
: Represents a byte array.
These built-in Writable types can be used directly in your Hadoop applications, and they provide efficient serialization and deserialization capabilities.
Using Built-in Writable Types
To use a built-in Writable type in your Hadoop application, you can simply create an instance of the desired type and set its value. For example, to use an IntWritable
, you can do the following:
IntWritable intWritable = new IntWritable(42);
Once you have an instance of the Writable type, you can use it in your Hadoop MapReduce jobs, HDFS file operations, or other data processing tasks.
Here's an example of how you might use an IntWritable
in a Hadoop MapReduce job:
public class WordCount extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(getConf());
job.setJobName("Word Count");
// Set input and output paths
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Set the mapper and reducer classes
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// Set the output key and value types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return job.waitForCompletion(true) ? 0 : 1;
}
// Mapper class
public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// Implement the mapper logic here
String[] words = value.toString().split(" ");
for (String word : words) {
context.write(new Text(word), new IntWritable(1));
}
}
}
// Reducer class
public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
// Implement the reducer logic here
int count = 0;
for (IntWritable value : values) {
count += value.get();
}
context.write(key, new IntWritable(count));
}
}
}
In this example, we use the IntWritable
type to represent the count of each word in the input data. The mapper emits (word, 1)
pairs, and the reducer sums up the counts for each unique word.
By understanding and utilizing the built-in Writable types, you can quickly and efficiently integrate your data with Hadoop's processing capabilities, laying the foundation for more complex data handling tasks.