Extract Usernames from Text with Python

PythonPythonBeginner
Practice Now

Introduction

In this project, you will learn how to extract usernames from text using Python. This is a common task in social media and instant messaging applications, where the @ character is often used to mention someone.

👀 Preview

## Example 1
>>> from parse_username import after_at
>>> text = "@LabEx @labex I won in the @ competition"
>>> print(after_at(text))
['LabEx', 'labex']
## Example 2
>>> text = "@LabEx@labex I won in the @ competition"
>>> print(after_at(text))
['LabEx', 'labex']
## Example 3
>>> text = "@labex @LabEx I won in the @LabEx competition"
>>> print(after_at(text))
['LabEx', 'labex']
## Example 4
>>> text = "@!LabEx @labex I won in the competition"
>>> print(after_at(text))
['labex']
## Example 5
>>> text = "I won in the competition@"
>>> print(after_at(text))
[]
## Example 6
>>> text = "LabEx@!"
>>> print(after_at(text))
[]
## Example 7
>>> text = "@!@LabEx @labex I won in the @LabEx competition @experiment"
>>> print(after_at(text))
['LabEx', 'experiment', 'labex']

ðŸŽŊ Tasks

In this project, you will learn:

  • How to implement the after_at function to extract usernames from a given text
  • How to handle edge cases and optimize the performance of the function
  • How to test the function with various input scenarios

🏆 Achievements

After completing this project, you will be able to:

  • Understand how to use Python to parse and extract relevant information from text
  • Develop a robust and efficient function to extract usernames from text
  • Apply your problem-solving skills to enhance the functionality of the function
  • Test your code thoroughly to ensure it works as expected

Implement the after_at Function

In this step, you will implement the after_at function to extract usernames from a given text.

  1. Open the parse_username.py file in your code editor.

  2. Locate the after_at function definition.

  3. The function should take a string text as input, which may be empty.

  4. Inside the function, initialize an empty list called usernames to store the extracted usernames.

  5. Find the index of the first occurrence of the @ character in the text using the find() method, and store it in the at_index variable.

  6. While the at_index is not -1 (meaning the @ character was found):

    • Initialize an empty string called username.
    • Iterate over the characters in the text string starting from the index after the @ character.
    • For each character, check if it is alphanumeric or an underscore using the isalnum() and isalpha() methods.
    • If the character is valid, append it to the username string.
    • If the character is not valid, break out of the loop.
    • If the username is not empty, append it to the usernames list.
    • Find the next occurrence of the @ character in the text string starting from the index after the previous @ character.
  7. After the loop, remove any duplicate usernames from the usernames list using the set() function.

  8. Sort the usernames list in descending order based on the count of each username using the sorted() function with a custom key function.

  9. Return the sorted usernames list.

Your completed after_at function should look like this:

def after_at(text):
    usernames = []
    at_index = text.find("@")  ## Find the index of the first occurrence of "@"
    while at_index != -1:  ## Continue loop until no more "@" symbols are found
        username = ""
        for char in text[
            at_index + 1 :
        ]:  ## Iterate over the characters after the "@" symbol
            if (
                char.isalnum() or char == "_" or char.isalpha()
            ):  ## Check if the character is alphanumeric or underscore
                username += char  ## Add the character to the username
            else:
                break  ## If the character is not alphanumeric or underscore, stop adding characters to the username
        if username:
            usernames.append(username)  ## Add the extracted username to the list
        at_index = text.find(
            "@", at_index + 1
        )  ## Find the next "@" symbol starting from the next index

    ## Remove duplicates and sort by occurrence count in descending order
    usernames = sorted(
        list(set(usernames)), key=lambda x: usernames.count(x), reverse=True
    )

    return usernames

Test the after_at Function

In this step, you will test the after_at function with the provided examples.

  1. Open the parse_username.py file in your code editor.
  2. Locate the if __name__ == "__main__": block at the bottom of the file.
  3. Inside the block, add the following code to test the after_at function:
## Example 1
print(after_at("@LabEx @labex I won in the @ competition"))
## Example 2
print(after_at("@LabEx@labex I won in the @ competition"))
## Example 3
print(after_at("@labex @LabEx I won in the @LabEx competition"))
## Example 4
print(after_at("@!LabEx @labex I won in the competition"))
## Example 5
print(after_at("I won in the competition@"))
## Example 6
print(after_at("LabEx@!"))
## Example 7
print(after_at("@!@LabEx @labex I won in the @LabEx competition @experiment"))
  1. Save the parse_username.py file.
  2. Run the parse_username.py file in your terminal or command prompt using the following command:
python parse_username.py
  1. Verify that the output matches the expected results provided in the challenge.
['LabEx', 'labex']
['LabEx', 'labex']
['LabEx', 'labex']
['labex']
[]
[]
['LabEx', 'experiment', 'labex']
âœĻ Check Solution and Practice

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Python Tutorials you may like