Filter HTTP Traffic in Tshark

Introduction

In this lab, you will learn to filter and analyze HTTP traffic using Tshark, Wireshark's command-line tool. You'll practice capturing web traffic on port 80 and isolating HTTP requests with specific filtering techniques.

The exercises will guide you through extracting HTTP methods (GET/POST) and formatting output in JSON for structured analysis. These skills are essential for network troubleshooting and traffic inspection tasks.

Capture HTTP with -f "tcp port 80"

In this step, you will learn how to capture HTTP traffic using Wireshark's display filter -f "tcp port 80". HTTP (Hypertext Transfer Protocol) is the foundation of data communication for the World Wide Web, and it typically uses port 80 for unencrypted web traffic. This filter helps isolate HTTP traffic from other network protocols, making it easier to analyze web communications.

Before we begin, let's understand some basics:

Network ports are like doors where specific types of network traffic enter and exit
Port 80 is the standard port assigned to HTTP traffic
Wireshark can filter traffic based on these port numbers

First, let's open Wireshark in the LabEx VM environment. Follow these steps carefully:

Open a terminal in Xfce desktop (you can find it in the Applications menu > System > Terminal)
Navigate to the default working directory where we'll store our capture files:

cd ~/project

Start Wireshark with the display filter for HTTP traffic:

sudo wireshark -k -f "tcp port 80"

Let's break down the command options:

-k: This tells Wireshark to start capturing packets immediately
-f "tcp port 80": This is our capture filter that instructs Wireshark to only record TCP traffic destined for or coming from port 80

Now we need some HTTP traffic to analyze. Open another terminal window (you can right-click on the desktop and select "Open Terminal") and generate test traffic with:

curl http://example.com

In the Wireshark window, you'll see captured packets showing:

Your computer's HTTP request to example.com (usually starting with "GET / HTTP/1.1")
The web server's response (usually containing "HTTP/1.1 200 OK")

Each packet shows important details like:

Source and destination IP addresses
Protocol (HTTP)
Packet size
Timing information

For beginners: Wireshark acts like a microscope for network traffic. The filter tcp port 80 works like a specialized lens that only shows you web traffic, ignoring other types of network communication. This focused view helps you understand how web browsers and servers communicate without getting overwhelmed by other network activity.

Filter Requests with -Y "http.request"

In this step, you will learn how to filter HTTP requests using Wireshark's display filter -Y "http.request". This filter helps you focus specifically on HTTP request packets, excluding responses and other network traffic. Understanding HTTP requests is fundamental to web traffic analysis, as they represent the initial messages clients send to servers.

Building on the previous step where we captured HTTP traffic, let's now filter for only HTTP requests:

First, ensure you're in the default working directory where we'll be working with our capture files:

cd ~/project

Run Wireshark with the display filter for HTTP requests:

sudo wireshark -k -Y "http.request"

The -Y option applies a display filter (different from the capture filter -f used in step 1). While capture filters limit what gets recorded, display filters help analyze already captured data. This particular filter will only show packets that contain HTTP requests.

To generate test traffic that we can analyze, open another terminal and run these common HTTP client commands:

curl http://example.com
wget http://example.com

For beginners: The display filter http.request specifically matches HTTP request packets. This is useful when you want to analyze only the requests being sent from clients to servers, ignoring the server responses. The filter syntax is part of Wireshark's powerful display filter language that lets you precisely select which packets to view based on protocol-specific criteria.

In the Wireshark window, you should now see only HTTP request packets from the curl and wget commands. Each packet will display important HTTP protocol information including:

The HTTP method (GET, POST, etc.) which indicates the type of request
The requested URI showing the specific resource being accessed
HTTP version (like HTTP/1.1) showing the protocol version
Host information identifying the target server

Extract Method with -e http.request.method

In this step, we'll focus specifically on extracting HTTP request methods from network traffic using Wireshark's command-line tool, tshark. HTTP methods are the verbs that indicate the desired action to be performed on a resource, such as GET for retrieving data or POST for submitting data.

Before we begin, let's understand what we're working with:

HTTP methods are fundamental components of web communications
Tshark allows us to examine these methods directly from captured network packets
The -e flag lets us extract specific fields from the packet data

Let's walk through the process step by step:

First, we need to position ourselves in the correct working directory where our capture files are stored:

cd ~/project

Now we'll run the tshark command to extract HTTP methods from live traffic:

sudo tshark -Y "http.request" -T fields -e http.request.method

Breaking down this command:

sudo: Gives us necessary permissions to capture network traffic
tshark: The command-line version of Wireshark
-Y "http.request": Applies a display filter to show only HTTP requests
-T fields: Specifies we want field-based output (rather than full packets)
-e http.request.method: Tells tshark to extract just the HTTP method field

To see this in action, we'll generate some test traffic from another terminal window:

curl -X GET http://example.com
curl -X POST http://example.com
curl -X DELETE http://example.com

Each of these curl commands sends a different HTTP method to example.com, which tshark will capture and display. The -X flag in curl lets us specify which HTTP method to use.

After running these commands, you should see output similar to:

GET
POST
DELETE

This output shows exactly which HTTP methods were used in the captured traffic, making it easy to analyze web request patterns. The method names appear in the order they were captured by tshark.

Display in JSON with -T json

In this step, we'll explore how to format captured HTTP traffic data as JSON using Wireshark's Tshark utility. JSON (JavaScript Object Notation) is a lightweight data format that's easy for both humans to read and machines to parse. This makes it ideal for analyzing network traffic programmatically.

Before we begin, let's understand why JSON output is valuable:

Structured data organization
Easy integration with other tools and scripts
Standardized format for data exchange

First, ensure you're in the default working directory where we'll run our commands:

cd ~/project

Now let's run Tshark to capture HTTP requests and output them in JSON format. This command combines filtering with JSON formatting:

sudo tshark -Y "http.request" -T json -e http.request.method -e http.host -e http.request.uri

Let's break down what each part of this command does:

-Y "http.request": This filter tells Tshark to only show HTTP request packets
-T json: Specifies that we want the output in JSON format
-e fields: These extract specific pieces of information from each HTTP request:
- http.request.method: The HTTP method used (GET, POST, etc.)
- http.host: The website domain being accessed
- http.request.uri: The specific path or resource being requested

To generate test traffic that we can capture, open a second terminal window and run these curl commands:

curl http://example.com
curl http://example.org/sample

When you run the Tshark command while this test traffic is generated, you'll see output structured like this:

[
  {
    "_index": "packets-1",
    "_source": {
      "layers": {
        "http.request.method": ["GET"],
        "http.host": ["example.com"],
        "http.request.uri": ["/"]
      }
    }
  },
  {
    "_index": "packets-2",
    "_source": {
      "layers": {
        "http.request.method": ["GET"],
        "http.host": ["example.org"],
        "http.request.uri": ["/sample"]
      }
    }
  }
]

Notice how each HTTP request becomes a separate JSON object with clearly labeled fields. This structure makes it simple to identify:

Which website was accessed
What type of request was made
Which specific page or resource was requested

The JSON format is particularly useful when you want to save this data for later analysis or feed it into other tools that can process JSON data automatically.

Summary

In this lab, you have learned to filter and analyze HTTP traffic using Wireshark's Tshark command-line tool. The exercises covered capturing HTTP traffic with -f "tcp port 80" and generating test traffic using curl for practical analysis.

You also practiced filtering HTTP requests with -Y "http.request" and extracting specific data like HTTP methods using -e http.request.method. The lab demonstrated output formatting in JSON with -T json, equipping you with key techniques for efficient network traffic inspection.