Regular Expressions Mastery

PythonPythonIntermediate
Practice Now

Introduction

In this challenge, you will master the use of regular expressions (regex) in Python by completing a series of complex sub-challenges. Regular expressions are powerful tools for text processing and manipulation, allowing you to match and manipulate strings based on specific patterns.

Your task is to create a Python program to complete the following sub-challenges. Each sub-challenge will test your understanding of regex and your ability to apply this knowledge in various text processing scenarios.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/FileHandlingGroup(["`File Handling`"]) python(("`Python`")) -.-> python/ControlFlowGroup(["`Control Flow`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/AdvancedTopicsGroup(["`Advanced Topics`"]) python(("`Python`")) -.-> python/PythonStandardLibraryGroup(["`Python Standard Library`"]) python/BasicConceptsGroup -.-> python/comments("`Comments`") python/FileHandlingGroup -.-> python/with_statement("`Using with Statement`") python/BasicConceptsGroup -.-> python/variables_data_types("`Variables and Data Types`") python/BasicConceptsGroup -.-> python/numeric_types("`Numeric Types`") python/BasicConceptsGroup -.-> python/strings("`Strings`") python/BasicConceptsGroup -.-> python/booleans("`Booleans`") python/ControlFlowGroup -.-> python/conditional_statements("`Conditional Statements`") python/ControlFlowGroup -.-> python/for_loops("`For Loops`") python/ControlFlowGroup -.-> python/list_comprehensions("`List Comprehensions`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/DataStructuresGroup -.-> python/dictionaries("`Dictionaries`") python/DataStructuresGroup -.-> python/sets("`Sets`") python/FunctionsGroup -.-> python/function_definition("`Function Definition`") python/FunctionsGroup -.-> python/lambda_functions("`Lambda Functions`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/ModulesandPackagesGroup -.-> python/using_packages("`Using Packages`") python/ModulesandPackagesGroup -.-> python/standard_libraries("`Common Standard Libraries`") python/FileHandlingGroup -.-> python/file_opening_closing("`Opening and Closing Files`") python/FileHandlingGroup -.-> python/file_reading_writing("`Reading and Writing Files`") python/AdvancedTopicsGroup -.-> python/decorators("`Decorators`") python/AdvancedTopicsGroup -.-> python/regular_expressions("`Regular Expressions`") python/PythonStandardLibraryGroup -.-> python/data_collections("`Data Collections`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills python/comments -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/with_statement -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/variables_data_types -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/numeric_types -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/strings -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/booleans -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/conditional_statements -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/for_loops -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/list_comprehensions -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/lists -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/tuples -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/dictionaries -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/sets -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/function_definition -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/lambda_functions -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/importing_modules -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/using_packages -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/standard_libraries -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/file_opening_closing -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/file_reading_writing -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/decorators -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/regular_expressions -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/data_collections -.-> lab-16238{{"`Regular Expressions Mastery`"}} python/build_in_functions -.-> lab-16238{{"`Regular Expressions Mastery`"}} end

Web Log Analysis

Problem Statement

In this sub-challenge, you will analyze a web server log file to extract useful information about the visitors and their activities. You will use regex to parse the log entries and extract specific information.

Requirements

  1. Complete the parse_log_file function in web_log_analysis.py, which parses the log file into a list of strings, where each string represents a log entry.
  2. Use regex to extract the following information from each log entry:
    a. IP address
    b. Timestamp
    c. HTTP method (GET, POST, etc.)
    d. Requested resource (URL)
    e. HTTP response status code
    f. User agent string
  3. Complete the analyze_log_entries function in web_log_analysis.py, which stores the extracted information in dictionary.
  4. Calculate and display the following statistics:
    a. Number of unique IP addresses
    b. Most common HTTP methods
    c. Most requested resources
    d. Distribution of response status codes
    e. Top user agents

Example

We have a sample_log.txt file for you:

127.0.0.1 - - [10/May/2023:00:00:00 +0000] "GET /about.html HTTP/1.1" 200 2048 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"',
127.0.0.2 - - [10/May/2023:00:00:01 +0000] "POST /register HTTP/1.1" 500 4096 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"

If you complete these two functions, open the terminal and input:

python web_log_analysis.py

Output:

step1 = [
   '127.0.0.1 - - [10/May/2023:00:00:00 +0000] "GET /about.html HTTP/1.1" 200 2048 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"\n',
   '127.0.0.2 - - [10/May/2023:00:00:00 +0000] "POST /register HTTP/1.1" 500 4096 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36"\n'
        ]

step2 = {
   'unique_ips': 2,
   'http_methods':
      {
         'GET': 1,
         'POST': 1
      },
   'requested_resources':
      {
         '/about.html': 1,
         '/register': 1
      },
   'status_codes':
      {
      '200': 1, '500': 1
      },
   'user_agents':
      {
         'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko 20100101 Firefox/71.0': 1, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36': 1
      }
   }

Email Obfuscation

Problem Statement

In this sub-challenge, you will create a Python function to obfuscate email addresses in a given text. The purpose of this function is to prevent email addresses from being easily harvested by spambots while still being readable by humans.

Requirements

  1. Complete the obfuscate_emails function in email_obfuscation.py, which takes a string as input and returns a new string with email addresses obfuscated.
  2. Use regex to identify email addresses in the input string.
  3. Replace the @ symbol with [at] and the . symbol with [dot] in the email addresses.
  4. Ensure that the function works with various email formats and edge cases.
  5. Test the function with a sample text containing multiple email addresses.

Example

If you complete the obfuscate_emails function, open the terminal and input:

python email_obfuscation.py

Output:

"You can reach me at john [dot] doe [at] gmail [dot] com or johndoe [at] mycompany [dot] com."

Password Validator

Problem Statement

In this sub-challenge, you will create a Python function to validate the strength of a password. The function should check if the password meets specific criteria, such as length, character types, and forbidden patterns.

Requirements

  1. Complete the is_password_valid function in password_validator.py, which takes a string as input and returns a Boolean value indicating whether the password is valid or not.
  2. Use regex to check if the password meets the following criteria:
    a. At least 12 characters long.
    b. Contains at least one lowercase letter, one uppercase letter, one digit, and one special character.
  3. Test the function with a variety of passwords to ensure it works correctly.

Example

If you complete the is_password_valid function, open the terminal and input:

python password_validator.py

Output:

True
False

Extracting Information from Configuration File

Problem Statement

In this sub-challenge, you will create a Python function to extract key-value pairs from a configuration file. The configuration file uses the INI file format, with sections and key-value pairs separated by an equals sign. You will use regex to parse the configuration file and return the extracted information in dictionary.

Requirements

  1. Complete the parse_config_file function in extracting_information.py, which takes a string representing the contents of a configuration file and returns a dictionary containing the extracted key-value pairs.
  2. Use regex to identify sections, keys, and values in the input string.
  3. Store the extracted information in dictionary.
  4. Test the function with a sample configuration file containing multiple sections and key-value pairs.

Example

If you complete the parse_config_file function, open the terminal and input:

python extracting_information.py

Output:

{
    'Application':
        {
        'name': 'MyApplication',
        'version': '2.0.1'
        },
    'Database':
        {'host': 'localhost', 'port': '3306', 'username': 'dbadmin', 'password': 'mysecretpassword', 'database': 'mydatabase'
        }
}

Counting Occurrences of Words in Text File

Problem Statement

In this sub-challenge, you will create a Python function to count the occurrences of each word in a given text file. The function should be case-insensitive and ignore punctuation marks. The output should be a dictionary with words as keys and their occurrences as values, sorted in descending order of occurrence.

Requirements

  1. Complete the count_word_occurrences function in counting_occurrences.py, which takes a file path as input and returns a dictionary containing the occurrences of each word in the text file.
  2. Use regex to identify words in the input text file.
  3. Count the occurrences of each word in a case-insensitive manner and ignore punctuation marks.
  4. Sort the output dictionary in descending order of occurrence.
  5. Test the function with a sample text file containing multiple lines and varying capitalization.

Example

We have a sample_text.txt file for you:

This is a short text.

If you complete the count_word_occurrences function, open the terminal and input:

python counting_occurrences.py

Output:

{
    'this': 1,
    'is': 1,
    'a': 1,
    'short': 1,
    'text': 1
}

Extracting URLs from web page

Problem Statement

In this sub-challenge, you will create a Python function to extract all the URLs from an HTML web page. The function should return a list of unique URLs found on the web page. You will use regex to identify the URLs in the input HTML string.

Requirements

  1. Complete the extract_urls function in extracting_url.py, which takes an HTML string as input and returns a list of unique URLs found in the web page.
  2. Use regex to identify URLs in the input HTML string.
  3. Filter out duplicate URLs.
  4. Test the function with a sample HTML string containing multiple URLs in various HTML tags.

Example

If you complete the extract_urls function, open the terminal and input:

python extracting_url.py

Output:

[
    'https://www.google.com/',
    'https://www.apple.com/',
    'https://www.amazon.com/'
]

Summary

By completing this Python challenge, you will have gained valuable experience in using regular expressions to perform complex text-processing tasks. This knowledge will be useful in various real-world applications, such as data analysis, web scraping, and cybersecurity.

Other Python Tutorials you may like