In Linux, managing and manipulating text files is a common task. Two powerful utilities for this are join and split. The join command merges lines from two files based on a common field, while split breaks a large file into smaller, more manageable pieces.
Joining Files by a Common Field
The join command is a fundamental tool when you need to linux join files. By default, it combines lines from two sorted files based on an identical first field.
For example, imagine you have two files you want to merge:
file1.txt
1 John
2 Jane
3 Mary
file2.txt
1 Doe
2 Doe
3 Sue
Using the join command, you can combine them easily:
$ join file1.txt file2.txt
1 John Doe
2 Jane Doe
3 Mary Sue
As you can see, the files were joined using the common first field (1, 2, 3). For join to work correctly, the join fields in both files must be sorted.
Specifying Different Join Fields
What if the common field is not the first column? You can tell join which fields to use. Consider these files:
file1.txt
John 1
Jane 2
Mary 3
file2.txt
1 Doe
2 Doe
3 Sue
Here, we need to join on the second field of file1.txt and the first field of file2.txt. The command would be:
$ join -1 2 -2 1 file1.txt file2.txt
1 John Doe
2 Jane Doe
3 Mary Sue
The -1 2 flag specifies field 2 of the first file, and -2 1 specifies field 1 of the second file.
Splitting Large Files
The split command does the opposite of joining; it divides a large file into smaller ones.
split somefile
By default, this command splits somefile into new files once a 1000-line limit is reached. The output files are named xaa, xab, and so on. You can customize this behavior, for example, by specifying a different line count with the -l flag or splitting by file size with the -b flag.