Combining Pipes

Now let's build some real-world pipelines. These patterns come up constantly in DevOps and development.

The Top-N Pattern

Find the most common items:

Terminal
$cut -d' ' -f1 access.log | sort | uniq -c | sort -rn | head -10
1523 192.168.1.1 892 10.0.0.5 654 172.16.0.1 ...

Breaking it down:

  1. cut -d' ' -f1 - extract first field (IP address)
  2. sort - sort IPs (needed for uniq)
  3. uniq -c - count occurrences
  4. sort -rn - sort by count, highest first
  5. head -10 - top 10

This Pattern is Gold

sort | uniq -c | sort -rn | head is one of the most useful patterns in Linux. Memorize it.

Log Analysis Pipelines

Errors Per Hour

Terminal
$grep 'ERROR' app.log | cut -d' ' -f1-2 | cut -d':' -f1 | uniq -c
45 Jan 14 10 23 Jan 14 11 67 Jan 14 12

Most Common Error Types

Terminal
$grep 'ERROR' app.log | sed 's/.*ERROR: //' | sort | uniq -c | sort -rn | head -5
89 Connection timeout 45 Invalid input 23 Auth failed

Process Analysis

Top CPU Users

Terminal
$ps aux | sort -k3 -rn | head -10
(top 10 by CPU %)

Memory by User

Terminal
$ps aux | awk '{arr[$1]+=$4} END {for (i in arr) print arr[i], i}' | sort -rn | head
45.2 user 23.1 root 5.4 mysql

File System Analysis

Largest Files

Terminal
$find . -type f -exec ls -la {} \; | sort -k5 -rn | head -10
(top 10 largest files)

File Types Count

Terminal
$find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn | head
234 js 156 ts 89 json 45 md

Network Analysis

Active Connections by Port

Terminal
$ss -tuln | grep LISTEN | awk '{print $5}' | cut -d':' -f2 | sort | uniq -c
1 22 1 80 1 443 1 3000

Connections Per IP

Terminal
$ss -tn | grep ESTAB | awk '{print $5}' | cut -d':' -f1 | sort | uniq -c | sort -rn
15 192.168.1.100 8 10.0.0.5

Text Processing Pipelines

Extract and Transform

Terminal
$cat data.json | jq -r '.users[].email' | sort -u
(unique emails from JSON)

Format Conversion

Terminal
$cat data.csv | tail -n +2 | cut -d',' -f1,3 | tr ',' '\t'
(CSV to TSV, skip header)

Building Incrementally

When building complex pipelines, build step by step:

hljs bash
# Start simple, verify each step
cat access.log | head

# Add filter
cat access.log | grep '500' | head

# Add extraction
cat access.log | grep '500' | cut -d' ' -f1 | head

# Add counting
cat access.log | grep '500' | cut -d' ' -f1 | sort | uniq -c

# Add sorting
cat access.log | grep '500' | cut -d' ' -f1 | sort | uniq -c | sort -rn

# Finalize
cat access.log | grep '500' | cut -d' ' -f1 | sort | uniq -c | sort -rn | head -10

Debug with head

Add | head at any point to see intermediate output without overwhelming your terminal.

xargs - Bridging Pipes

When commands don't read from stdin:

Terminal
$find . -name '*.tmp' | xargs rm
(delete all .tmp files)
$cat urls.txt | xargs curl -O
(download all URLs)

xargs converts stdin lines into arguments.

Knowledge Check

What does `sort | uniq -c | sort -rn | head` do?

Key Takeaways

  • Build pipelines incrementally, testing each step
  • sort | uniq -c | sort -rn | head = top N pattern
  • Use head to debug intermediate steps
  • xargs bridges pipes to commands expecting arguments
  • Complex analysis is just simple commands combined

Next: the tee command for splitting output.