How to analyze Nmap scan results in XML format

Introduction

In the field of Cybersecurity, understanding and analyzing network scan results is crucial for maintaining a secure infrastructure. Nmap (Network Mapper) is one of the most widely used tools for network discovery and security auditing. This tutorial will guide you through the process of interpreting Nmap scan results in XML format, equipping you with the necessary skills to leverage this powerful tool for your Cybersecurity needs.

By the end of this lab, you will know how to run Nmap scans with XML output, understand the structure of the XML data, extract valuable information using both command-line tools and Python scripts, and identify potential security concerns from the scan results.

Installing Nmap and Running a Basic XML Scan

What is Nmap?

Nmap (Network Mapper) is a free and open-source utility for network discovery and security auditing. Security professionals worldwide use it to identify what devices are running on their networks, discover available hosts and the services they offer, find open ports, and detect security vulnerabilities.

Installing Nmap

Let's begin by installing Nmap on our system. Open a terminal window and enter the following commands:

sudo apt update
sudo apt install nmap -y

After installation completes, verify that Nmap is installed correctly by checking its version:

nmap --version

You should see output similar to this:

Nmap version 7.80 ( https://nmap.org )
Platform: x86_64-pc-linux-gnu
Compiled with: liblua-5.3.3 openssl-1.1.1f libssh2-1.8.0 libz-1.2.11 libpcre-8.39 nmap-libpcap-1.9.1 nmap-libdnet-1.12 ipv6
Compiled without:
Available nsock engines: epoll poll select

Running a Basic Nmap Scan with XML Output

Nmap can save its scan results in XML format, which provides a structured way to analyze the data programmatically. Let's run a basic scan of our local machine and save the results in XML format:

sudo nmap -A -T4 -oX ~/project/localhost_scan.xml localhost

This command performs:

-A: Enables OS detection, version detection, script scanning, and traceroute
-T4: Sets the timing template to "aggressive"
-oX: Specifies the output should be in XML format
localhost: The target to scan (our own machine)

The scan may take a minute or two to complete. When finished, you'll see a summary of the scan results in the terminal.

Viewing the XML Scan Results

Let's examine the XML file we just created:

cat ~/project/localhost_scan.xml

The output will be a structured XML document containing detailed information about the scan. It might look overwhelming at first, but we'll learn how to interpret it in the next steps.

Let's also check the basic structure of the XML file using the head command:

head -n 20 ~/project/localhost_scan.xml

This shows the first 20 lines of the XML file, giving us a glimpse of its structure.

Examining the XML Output Structure

Understanding the Nmap XML Format

The Nmap XML output follows a hierarchical structure that organizes scan information in a logical manner. Let's explore the main elements of this structure:

<nmaprun>: The root element that contains all scan information
<scaninfo>: Details about the scan type and parameters
<host>: Information about each scanned host
- <status>: Whether the host is up or down
- <address>: IP and MAC addresses
- <hostnames>: DNS names
- <ports>: Details about scanned ports
  - <port>: Information about a specific port
    - <state>: Whether the port is open, closed, or filtered
    - <service>: Service information if available
- <os>: Operating system detection results
- <times>: Timing information about the scan

Using Command-Line Tools to Extract Information

XML files can be difficult to read in their raw form. Let's use some command-line tools to extract specific information from our scan results.

First, let's count how many open ports were found using grep and wc:

grep -c "state=\"open\"" ~/project/localhost_scan.xml

This command searches for instances of state="open" in the XML file and counts them.

Next, let's identify the open ports and their services using grep with the -A option to show lines after the match:

grep -A 3 "state=\"open\"" ~/project/localhost_scan.xml

This will show each instance of an open port along with the 3 lines that follow it, which typically include service information.

We can also use xmllint to format the XML file for better readability. Let's install it first:

sudo apt install libxml2-utils -y

Now, let's format the XML file:

xmllint --format ~/project/localhost_scan.xml > ~/project/formatted_scan.xml

Let's look at the formatted file:

head -n 50 ~/project/formatted_scan.xml

This displays the first 50 lines of the formatted XML file, which should be much easier to read.

Finally, let's extract specific information about the host status using xmllint with XPath:

xmllint --xpath "//host/status/@state" ~/project/localhost_scan.xml

This command uses XPath to extract the state attribute of all status elements under host elements.

Parsing Nmap XML with Python

Introduction to XML Parsing with Python

Python provides powerful libraries for parsing XML files. In this step, we'll create a simple Python script to parse our Nmap scan results and display them in a more readable format.

Creating a Basic XML Parser

Let's create a Python script that uses the xml.etree.ElementTree module to parse the Nmap XML file. This module is included in the Python standard library, so we don't need to install anything additional.

Create a new file called parse_nmap.py in the project directory:

nano ~/project/parse_nmap.py

Copy and paste the following code into the editor:

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import sys

def parse_nmap_xml(xml_file):
    try:
        ## Parse the XML file
        tree = ET.parse(xml_file)
        root = tree.getroot()

        ## Print scan information
        print("Nmap Scan Report")
        print("=" * 50)
        print(f"Scan started at: {root.get('startstr')}")
        print(f"Nmap version: {root.get('version')}")
        print(f"Nmap command: {root.get('args')}")
        print("=" * 50)

        ## Process each host in the scan
        for host in root.findall('host'):
            ## Get host addresses
            for addr in host.findall('address'):
                if addr.get('addrtype') == 'ipv4':
                    ip_address = addr.get('addr')
                    print(f"\nHost: {ip_address}")

            ## Get hostname if available
            hostnames = host.find('hostnames')
            if hostnames is not None:
                for hostname in hostnames.findall('hostname'):
                    print(f"Hostname: {hostname.get('name')}")

            ## Get host status
            status = host.find('status')
            if status is not None:
                print(f"Status: {status.get('state')}")

            ## Process ports
            ports = host.find('ports')
            if ports is not None:
                print("\nOpen Ports:")
                print("-" * 50)
                print(f"{'PORT':<10}{'STATE':<10}{'SERVICE':<15}{'VERSION'}")
                print("-" * 50)

                for port in ports.findall('port'):
                    port_id = port.get('portid')
                    protocol = port.get('protocol')

                    ## Get port state
                    state = port.find('state')
                    port_state = state.get('state') if state is not None else "unknown"

                    ## Skip closed ports
                    if port_state != "open":
                        continue

                    ## Get service information
                    service = port.find('service')
                    if service is not None:
                        service_name = service.get('name', '')
                        service_product = service.get('product', '')
                        service_version = service.get('version', '')
                        service_info = f"{service_product} {service_version}".strip()
                    else:
                        service_name = ""
                        service_info = ""

                    print(f"{port_id}/{protocol:<5} {port_state:<10}{service_name:<15}{service_info}")

            ## Get OS detection information
            os = host.find('os')
            if os is not None:
                print("\nOS Detection:")
                for osmatch in os.findall('osmatch'):
                    print(f"OS: {osmatch.get('name')} (Accuracy: {osmatch.get('accuracy')}%)")

    except ET.ParseError as e:
        print(f"Error parsing XML file: {e}")
        return False
    except Exception as e:
        print(f"Error: {e}")
        return False

    return True

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} <nmap_xml_file>")
        sys.exit(1)

    xml_file = sys.argv[1]
    if not parse_nmap_xml(xml_file):
        sys.exit(1)

Save the file by pressing Ctrl+O, then Enter, and exit nano with Ctrl+X.

Now, make the script executable:

chmod +x ~/project/parse_nmap.py

Running the Parser

Let's run our Python script on the Nmap XML file we created earlier:

python ~/project/parse_nmap.py ~/project/localhost_scan.xml

You should see a nicely formatted output of the scan results, including:

Basic scan information
Host details
Open ports and services
Operating system detection results if available

This formatted output is much easier to read than the raw XML file and highlights the most important information from the scan.

Understanding the Parser Code

Let's review what our Python script does:

It uses xml.etree.ElementTree to parse the XML file
It extracts general scan information from the root element
For each host found in the scan:
- It extracts IP addresses and hostnames
- It determines if the host is up or down
- It lists all open ports, including port number, protocol, service name, and version
- It extracts OS detection information if available

This structured approach allows us to focus on the most relevant information while ignoring the XML complexity.

Extracting Security-Relevant Information

Security Insights from Nmap Scans

Now that we can parse Nmap XML data, let's extend our script to extract security-relevant information. This includes:

Identifying potentially risky open ports
Detecting outdated service versions
Summarizing security concerns

Let's create an enhanced version of our parser that focuses on security analysis.

Creating a Security Analysis Script

Create a new file called security_analysis.py:

nano ~/project/security_analysis.py

Copy and paste the following code:

#!/usr/bin/env python3
import xml.etree.ElementTree as ET
import sys
import datetime

## Define potentially risky ports
HIGH_RISK_PORTS = {
    '21': 'FTP - File Transfer Protocol (often unencrypted)',
    '23': 'Telnet - Unencrypted remote access',
    '25': 'SMTP - Email transfer (may allow relay)',
    '445': 'SMB - Windows file sharing (potential target for worms)',
    '3389': 'RDP - Remote Desktop Protocol (target for brute force)',
    '1433': 'MSSQL - Microsoft SQL Server',
    '3306': 'MySQL - Database access',
    '5432': 'PostgreSQL - Database access'
}

## Services with known security issues
OUTDATED_SERVICES = {
    'ssh': [
        {'version': '1', 'reason': 'SSHv1 has known vulnerabilities'},
        {'version': 'OpenSSH 7', 'reason': 'Older OpenSSH versions have multiple CVEs'}
    ],
    'http': [
        {'version': 'Apache httpd 2.2', 'reason': 'Apache 2.2.x is end-of-life'},
        {'version': 'Apache httpd 2.4.1', 'reason': 'Apache versions before 2.4.30 have known vulnerabilities'},
        {'version': 'nginx 1.14', 'reason': 'Older nginx versions have security issues'}
    ]
}

def analyze_security(xml_file):
    try:
        ## Parse the XML file
        tree = ET.parse(xml_file)
        root = tree.getroot()

        ## Prepare the report
        report = []
        report.append("NMAP SECURITY ANALYSIS REPORT")
        report.append("=" * 50)
        report.append(f"Report generated on: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        report.append(f"Scan started at: {root.get('startstr')}")
        report.append(f"Scan command: {root.get('args')}")
        report.append("=" * 50)

        ## Track security findings
        high_risk_services = []
        potentially_outdated = []
        exposed_services = []

        ## Process each host in the scan
        for host in root.findall('host'):
            ## Get host addresses
            ip_address = None
            for addr in host.findall('address'):
                if addr.get('addrtype') == 'ipv4':
                    ip_address = addr.get('addr')

            hostname = "Unknown"
            hostnames = host.find('hostnames')
            if hostnames is not None:
                hostname_elem = hostnames.find('hostname')
                if hostname_elem is not None:
                    hostname = hostname_elem.get('name')

            report.append(f"\nHOST: {ip_address} ({hostname})")
            report.append("-" * 50)

            ## Process ports
            ports = host.find('ports')
            if ports is None:
                report.append("No port information available")
                continue

            open_ports = 0
            for port in ports.findall('port'):
                port_id = port.get('portid')
                protocol = port.get('protocol')

                ## Get port state
                state = port.find('state')
                if state is None or state.get('state') != "open":
                    continue

                open_ports += 1

                ## Get service information
                service = port.find('service')
                if service is None:
                    service_name = "unknown"
                    service_product = ""
                    service_version = ""
                else:
                    service_name = service.get('name', 'unknown')
                    service_product = service.get('product', '')
                    service_version = service.get('version', '')

                service_full = f"{service_product} {service_version}".strip()

                ## Check if this is a high-risk port
                if port_id in HIGH_RISK_PORTS:
                    high_risk_services.append(f"{ip_address}:{port_id} ({service_name}) - {HIGH_RISK_PORTS[port_id]}")

                ## Check for outdated services
                if service_name in OUTDATED_SERVICES:
                    for outdated in OUTDATED_SERVICES[service_name]:
                        if outdated['version'] in service_full:
                            potentially_outdated.append(f"{ip_address}:{port_id} - {service_name} {service_full} - {outdated['reason']}")

                ## Track all exposed services
                exposed_services.append(f"{ip_address}:{port_id}/{protocol} - {service_name} {service_full}")

            report.append(f"Open ports: {open_ports}")

        ## Add security findings to report
        report.append("\nSECURITY FINDINGS")
        report.append("=" * 50)

        ## High-risk services
        report.append("\nHIGH-RISK SERVICES")
        report.append("-" * 50)
        if high_risk_services:
            for service in high_risk_services:
                report.append(service)
        else:
            report.append("No high-risk services detected")

        ## Potentially outdated services
        report.append("\nPOTENTIALLY OUTDATED SERVICES")
        report.append("-" * 50)
        if potentially_outdated:
            for service in potentially_outdated:
                report.append(service)
        else:
            report.append("No potentially outdated services detected")

        ## Exposed services inventory
        report.append("\nEXPOSED SERVICES INVENTORY")
        report.append("-" * 50)
        if exposed_services:
            for service in exposed_services:
                report.append(service)
        else:
            report.append("No exposed services detected")

        ## Write the report to a file
        report_file = "security_report.txt"
        with open(report_file, 'w') as f:
            f.write('\n'.join(report))

        print(f"Security analysis complete. Report saved to {report_file}")

        ## Display a summary
        print("\nSummary:")
        print(f"- High-risk services: {len(high_risk_services)}")
        print(f"- Potentially outdated services: {len(potentially_outdated)}")
        print(f"- Total exposed services: {len(exposed_services)}")

    except ET.ParseError as e:
        print(f"Error parsing XML file: {e}")
        return False
    except Exception as e:
        print(f"Error: {e}")
        return False

    return True

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print(f"Usage: {sys.argv[0]} <nmap_xml_file>")
        sys.exit(1)

    xml_file = sys.argv[1]
    if not analyze_security(xml_file):
        sys.exit(1)

Save the file by pressing Ctrl+O, then Enter, and exit nano with Ctrl+X.

Make the script executable:

chmod +x ~/project/security_analysis.py

Running the Security Analysis

Let's run our security analysis script on the Nmap XML file:

cd ~/project
./security_analysis.py localhost_scan.xml

The script will analyze the scan results and generate a security report focused on potential vulnerabilities, saving it to a file called security_report.txt.

Let's look at the content of the report:

cat ~/project/security_report.txt

Understanding the Security Analysis

The security analysis script performs several important functions:

High-Risk Port Identification: It identifies commonly exploited ports like FTP (21), Telnet (23), and RDP (3389), which are frequent targets for attackers.
Outdated Service Detection: It checks for older versions of services like SSH, Apache, and nginx that may have known security vulnerabilities.
Exposed Services Inventory: It creates a complete inventory of all open ports and services, which is valuable for security auditing.
Risk Categorization: It organizes findings by risk level to help prioritize security improvements.

This type of analysis is crucial for security professionals to identify potential vulnerabilities in a network before attackers can exploit them.

Extending the Analysis

In a real-world scenario, you might want to extend this analysis by:

Adding more high-risk ports to the detection list
Updating the outdated service definitions with the latest vulnerability information
Integrating with vulnerability databases to check for known CVEs (Common Vulnerabilities and Exposures)
Adding recommendations for remediation of detected issues

The ability to programmatically analyze Nmap XML data is a powerful skill for cybersecurity professionals, as it allows for automated vulnerability assessment and integration with larger security monitoring systems.

Summary

Congratulations on completing this lab on analyzing Nmap scan results in XML format. You have learned several important skills:

Installing and Running Nmap: You learned how to install Nmap and run scans with XML output, providing a foundation for network reconnaissance.
Understanding XML Structure: You explored the structure of Nmap XML files and used command-line tools to extract specific information, giving you the ability to quickly analyze scan results.
Parsing XML with Python: You created a Python script to parse and display Nmap scan results in a readable format, demonstrating how to programmatically work with structured data.
Security Analysis: You extended your Python skills to analyze scan results for security concerns, identifying potentially risky services and generating a comprehensive security report.

These skills are essential for cybersecurity professionals who need to perform network assessments, vulnerability scans, and security audits. The ability to automate the analysis of Nmap results allows for more efficient and thorough security monitoring.

You can further enhance these skills by:

Exploring more advanced Nmap scanning techniques
Integrating scan results with other security tools
Creating more sophisticated analysis algorithms
Developing visualization tools for scan data

Remember that network scanning should only be performed on networks you own or have explicit permission to scan, as unauthorized scanning may be illegal and unethical.