Advanced uniq Options and Practical Applications
Now that you understand the basic usage of uniq
and how to combine it with sort
, let's explore some additional options of the uniq
command that make it even more powerful for data processing tasks.
Counting Occurrences with -c
The -c
option counts the number of occurrences of each line. This is useful when you want to know how many times each unique line appears in your file:
sort ~/project/duel_log.txt | uniq -c
You should see output like this:
2 potion
2 shield
2 sword
This shows that each item appears twice in our original file.
Finding Only Duplicate Lines with -d
If you're only interested in finding duplicate lines (lines that appear more than once), you can use the -d
option:
sort ~/project/duel_log.txt | uniq -d
Output:
potion
shield
sword
Since all items in our file have duplicates, all of them are listed in the output.
Creating a File with Unique Entries Only
Let's create a new file with more varied content to better demonstrate the uniq
command:
echo -e "apple\napple\napple\nbanana\ncherry\ncherry\ngrape" > ~/project/fruits.txt
Let's examine this file:
cat ~/project/fruits.txt
Output:
apple
apple
apple
banana
cherry
cherry
grape
Now let's use the -u
option to find entries that appear exactly once:
sort ~/project/fruits.txt | uniq -u
Output:
banana
grape
This shows that "banana" and "grape" appear only once in our file.
Real-world Application: Log Analysis
Let's create a simple log file to simulate a real-world application:
echo -e "INFO: System started\nERROR: Connection failed\nINFO: User logged in\nWARNING: Low memory\nERROR: Connection failed\nINFO: System started" > ~/project/system.log
Now, let's analyze this log file to find out which types of messages appear and how many times:
cat ~/project/system.log | sort | uniq -c
Output should be similar to:
2 ERROR: Connection failed
2 INFO: System started
1 INFO: User logged in
1 WARNING: Low memory
This gives you a quick overview of the types of events in your log file and their frequencies.
You can also extract just the message types (INFO, ERROR, WARNING) using the cut
command:
cat ~/project/system.log | cut -d: -f1 | sort | uniq -c
Output:
2 ERROR
3 INFO
1 WARNING
This analysis shows that out of 6 log entries, 3 are INFO messages, 2 are ERROR messages, and 1 is a WARNING message.
These examples demonstrate how combining simple commands like sort
, uniq
, and cut
can create powerful data processing pipelines in Linux.