Practical Examples
Real-World Stream Splitting Scenarios
1. Log File Analysis
## Split Apache log file by IP addresses
cat access.log | awk '{print $1}' | sort | uniq -c
graph LR
A[Log File] --> B[Split by IP]
B --> C[Count Occurrences]
2. CSV Data Processing
## Extract specific columns from CSV
cat employees.csv | cut -d',' -f2,4 | head -n 5
Scenario |
Command |
Purpose |
Name Extraction |
cut -d',' -f1 |
Get first column |
Salary Filter |
awk -F',' '$3 > 50000' |
Filter high earners |
3. System Configuration Parsing
## Split and process /etc/passwd
cat /etc/passwd | awk -F':' '{print "User: " $1 " UID: " $3}'
4. Network Configuration Splitting
## Split network interface details
ip addr show | grep inet | awk '{print $2}'
Advanced Stream Manipulation
## Complex stream processing pipeline
cat server.log | \
grep 'ERROR' | \
cut -d':' -f2- | \
sort | \
uniq -c | \
sort -nr
graph TD
A[Log File] --> B[Filter Errors]
B --> C[Extract Message]
C --> D[Sort]
D --> E[Count Unique]
E --> F[Rank Errors]
Efficient Splitting Techniques
- Use
awk
for complex transformations
- Prefer
cut
for simple column extraction
- Leverage
sed
for regex-based splitting
LabEx Recommended Workflow
In LabEx Linux environments:
- Start with simple splitting methods
- Progressively add complexity
- Validate output at each transformation stage
Example Workflow
## Step-by-step data processing
cat raw_data.txt | \
tr ',' '\n' | \ ## Convert CSV to newline
sort | \ ## Sort entries
uniq | \ ## Remove duplicates
grep -v '^$' ## Remove empty lines
Error Handling Strategies
## Robust splitting with error checking
cat input.txt | \
awk '{print $1}' 2>/dev/null || \
echo "Processing failed"
Best Practices
- Always validate input data
- Use error redirection
- Test splitting logic incrementally
- Consider memory and performance constraints