Advanced uniq Options and Practical Applications
Now that you understand the basic usage of uniq and how to combine it with sort, let's explore some additional options of the uniq command that make it even more powerful for data processing tasks.
Counting Occurrences with -c
The -c option counts the number of occurrences of each line. This is useful when you want to know how many times each unique line appears in your file:
sort ~/project/duel_log.txt | uniq -c
You should see output like this:
2 potion
2 shield
2 sword
This shows that each item appears twice in our original file.
Finding Only Duplicate Lines with -d
If you're only interested in finding duplicate lines (lines that appear more than once), you can use the -d option:
sort ~/project/duel_log.txt | uniq -d
Output:
potion
shield
sword
Since all items in our file have duplicates, all of them are listed in the output.
Creating a File with Unique Entries Only
Let's create a new file with more varied content to better demonstrate the uniq command:
echo -e "apple\napple\napple\nbanana\ncherry\ncherry\ngrape" > ~/project/fruits.txt
Let's examine this file:
cat ~/project/fruits.txt
Output:
apple
apple
apple
banana
cherry
cherry
grape
Now let's use the -u option to find entries that appear exactly once:
sort ~/project/fruits.txt | uniq -u
Output:
banana
grape
This shows that "banana" and "grape" appear only once in our file.
Real-world Application: Log Analysis
Let's create a simple log file to simulate a real-world application:
echo -e "INFO: System started\nERROR: Connection failed\nINFO: User logged in\nWARNING: Low memory\nERROR: Connection failed\nINFO: System started" > ~/project/system.log
Now, let's analyze this log file to find out which types of messages appear and how many times:
cat ~/project/system.log | sort | uniq -c
Output should be similar to:
2 ERROR: Connection failed
2 INFO: System started
1 INFO: User logged in
1 WARNING: Low memory
This gives you a quick overview of the types of events in your log file and their frequencies.
You can also extract just the message types (INFO, ERROR, WARNING) using the cut command:
cat ~/project/system.log | cut -d: -f1 | sort | uniq -c
Output:
2 ERROR
3 INFO
1 WARNING
This analysis shows that out of 6 log entries, 3 are INFO messages, 2 are ERROR messages, and 1 is a WARNING message.
These examples demonstrate how combining simple commands like sort, uniq, and cut can create powerful data processing pipelines in Linux.