Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird

PythonPythonBeginner
Jetzt üben

💡 Dieser Artikel wurde von AI-Assistenten übersetzt. Um die englische Version anzuzeigen, können Sie hier klicken

Introduction

Python's built-in data structures provide flexible ways to manage and manipulate data. In this tutorial, we will explore how to convert a Python list to a set while preserving the original order of the elements. This technique is particularly useful when you need to remove duplicates from a list but maintain the order of the first occurrence of each unique element.

By the end of this tutorial, you will understand the differences between lists and sets in Python and learn multiple techniques to convert a list to a set while maintaining the original order of elements.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("Python")) -.-> python/FunctionsGroup(["Functions"]) python(("Python")) -.-> python/ModulesandPackagesGroup(["Modules and Packages"]) python(("Python")) -.-> python/PythonStandardLibraryGroup(["Python Standard Library"]) python(("Python")) -.-> python/DataStructuresGroup(["Data Structures"]) python/DataStructuresGroup -.-> python/lists("Lists") python/DataStructuresGroup -.-> python/dictionaries("Dictionaries") python/DataStructuresGroup -.-> python/sets("Sets") python/FunctionsGroup -.-> python/function_definition("Function Definition") python/FunctionsGroup -.-> python/build_in_functions("Build-in Functions") python/ModulesandPackagesGroup -.-> python/using_packages("Using Packages") python/ModulesandPackagesGroup -.-> python/standard_libraries("Common Standard Libraries") python/PythonStandardLibraryGroup -.-> python/data_collections("Data Collections") subgraph Lab Skills python/lists -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/dictionaries -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/sets -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/function_definition -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/build_in_functions -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/using_packages -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/standard_libraries -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} python/data_collections -.-> lab-417300{{"Wie man eine Python-Liste in eine Menge umwandelt, während die ursprüngliche Reihenfolge beibehalten wird"}} end

Understanding Lists and Sets in Python

Before diving into converting lists to sets, let's understand the basic properties of these two data structures in Python.

Python Lists

Lists in Python are ordered collections that can store elements of different data types. They allow duplicate values and maintain the insertion order of elements.

Let's create a simple Python file to demonstrate lists. Open the code editor and create a new file named list_demo.py in the /home/labex/project directory:

## Lists in Python
my_list = [1, 2, 3, 2, 4, 5, 3]

print("Original list:", my_list)
print("Length of list:", len(my_list))
print("First element:", my_list[0])
print("Last element:", my_list[-1])
print("First 3 elements:", my_list[:3])
print("Does list contain duplicates?", len(my_list) != len(set(my_list)))

Now run this file in the terminal:

python3 list_demo.py

You should see output similar to this:

Original list: [1, 2, 3, 2, 4, 5, 3]
Length of list: 7
First element: 1
Last element: 3
First 3 elements: [1, 2, 3]
Does list contain duplicates? True

Python Sets

Sets are unordered collections of unique elements. When you convert a list to a set, duplicate elements are automatically removed, but the original order is not preserved.

Let's create another file named set_demo.py to explore sets:

## Sets in Python
my_list = [1, 2, 3, 2, 4, 5, 3]
my_set = set(my_list)

print("Original list:", my_list)
print("Converted to set:", my_set)
print("Length of list:", len(my_list))
print("Length of set:", len(my_set))
print("Does set maintain order?", list(my_set) == [1, 2, 3, 4, 5])

Run this file:

python3 set_demo.py

The output will show:

Original list: [1, 2, 3, 2, 4, 5, 3]
Converted to set: {1, 2, 3, 4, 5}
Length of list: 7
Length of set: 5
Does set maintain order? False

Notice that the set removed all duplicates, but the order might be different from the original list. This is because sets in Python are inherently unordered.

Basic Approach: Converting a List to a Set

Now that we understand the differences between lists and sets, let's explore how to convert a list to a set and the implications of this conversion.

Simple Conversion

The most basic way to convert a list to a set is by using the built-in set() function. Create a new file named basic_conversion.py:

## Basic conversion of list to set
fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

## Convert list to set (removes duplicates but loses order)
unique_fruits = set(fruits)

print("Original list:", fruits)
print("As a set:", unique_fruits)

## Convert back to list (order not preserved)
unique_fruits_list = list(unique_fruits)
print("Back to list:", unique_fruits_list)

Run this file:

python3 basic_conversion.py

You should see output similar to:

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
As a set: {'orange', 'banana', 'apple', 'pear'}
Back to list: ['orange', 'banana', 'apple', 'pear']

Notice that the set removed all duplicates, but the order is different from the original list. When we convert the set back to a list, the order is still not the same as our original list.

The Problem with Order

This simple conversion demonstrates the issue we're trying to solve: when we convert a list to a set, we lose the original order of elements. If the original order is important, this approach isn't suitable.

Let's modify our example to show why this might be a problem. Create a file named order_matters.py:

## Example showing why order matters
steps = ["Preheat oven", "Mix ingredients", "Pour batter", "Bake", "Mix ingredients"]

## Remove duplicates using set
unique_steps = list(set(steps))

print("Original cooking steps:", steps)
print("Unique steps (using set):", unique_steps)
print("Is the order preserved?", unique_steps == ["Preheat oven", "Mix ingredients", "Pour batter", "Bake"])

Run the file:

python3 order_matters.py

The output will be:

Original cooking steps: ['Preheat oven', 'Mix ingredients', 'Pour batter', 'Bake', 'Mix ingredients']
Unique steps (using set): ['Preheat oven', 'Bake', 'Mix ingredients', 'Pour batter']
Is the order preserved? False

In this example, the order of cooking steps is critical. If you bake before mixing ingredients, the result will be disastrous. This illustrates why we need a way to preserve the original order when removing duplicates.

Preserving Order When Converting a List to a Set

Now that we understand the problem, let's explore methods to convert a list to a set while preserving the original order of elements.

Method 1: Using a Dictionary to Preserve Order

One approach is to use a dictionary to keep track of the order of elements. Since Python 3.7, dictionaries maintain insertion order by default.

Create a new file named dict_approach.py:

## Using a dictionary to preserve order
fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

## Create a dictionary with list elements as keys
## This automatically removes duplicates while preserving order
unique_fruits_dict = dict.fromkeys(fruits)

## Convert dictionary keys back to a list
unique_fruits = list(unique_fruits_dict)

print("Original list:", fruits)
print("Unique elements (order preserved):", unique_fruits)

Run the file:

python3 dict_approach.py

You should see:

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
Unique elements (order preserved): ['apple', 'banana', 'orange', 'pear']

Notice that the order of the first occurrence of each element is preserved.

Method 2: Using OrderedDict

For users of Python versions earlier than 3.7, or to make the intent more explicit, we can use OrderedDict from the collections module.

Create a new file named ordered_dict_approach.py:

## Using OrderedDict to preserve order
from collections import OrderedDict

fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

## Create an OrderedDict with list elements as keys
## This automatically removes duplicates while preserving order
unique_fruits_ordered = list(OrderedDict.fromkeys(fruits))

print("Original list:", fruits)
print("Unique elements (order preserved):", unique_fruits_ordered)

Run the file:

python3 ordered_dict_approach.py

The output should be:

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
Unique elements (order preserved): ['apple', 'banana', 'orange', 'pear']

Method 3: Using a Loop and a Set for Checking

Another approach is to use a loop and a set for checking if we've seen an element before.

Create a new file named loop_approach.py:

## Using a loop and a set to preserve order
fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

unique_fruits = []
seen = set()

for fruit in fruits:
    if fruit not in seen:
        seen.add(fruit)
        unique_fruits.append(fruit)

print("Original list:", fruits)
print("Unique elements (order preserved):", unique_fruits)

Run the file:

python3 loop_approach.py

The output should be:

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
Unique elements (order preserved): ['apple', 'banana', 'orange', 'pear']

All three methods achieve the same result: removing duplicates while preserving the order of the first occurrence of each element.

Practical Example: Analyzing Text Data

Let's apply what we've learned to a real-world example: analyzing word frequency in a text while preserving the order of first appearance.

Creating a Text Analysis Tool

Create a new file named text_analyzer.py:

def analyze_text(text):
    """
    Analyze text to find unique words in order of first appearance
    and their frequencies.
    """
    ## Split text into words and convert to lowercase
    words = text.lower().split()

    ## Remove punctuation from words
    clean_words = [word.strip('.,!?:;()[]{}""\'') for word in words]

    ## Count frequency while preserving order
    word_counts = {}
    unique_words_in_order = []

    for word in clean_words:
        if word and word not in word_counts:
            unique_words_in_order.append(word)
        word_counts[word] = word_counts.get(word, 0) + 1

    return unique_words_in_order, word_counts

## Sample text
sample_text = """
Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!
"""

## Analyze the text
unique_words, word_frequencies = analyze_text(sample_text)

## Print results
print("Text sample:")
print(sample_text)
print("\nUnique words in order of first appearance:")
print(unique_words)
print("\nWord frequencies:")
for word in unique_words:
    if word:  ## Skip empty strings
        print(f"'{word}': {word_frequencies[word]} times")

Run the file:

python3 text_analyzer.py

The output will show the unique words in the order they first appeared in the text, along with their frequencies:

Text sample:

Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!

Unique words in order of first appearance:
['python', 'is', 'amazing', 'also', 'easy', 'to', 'learn', 'with', 'you', 'can', 'create', 'web', 'applications', 'data', 'analysis', 'tools', 'machine', 'learning', 'models', 'and', 'much', 'more', 'has', 'many', 'libraries', 'that', 'make', 'development', 'faster', 'versatile']

Word frequencies:
'python': 5 times
'is': 3 times
'amazing': 1 times
'also': 1 times
...

Improving the Tool

Let's enhance our text analyzer to handle more complex scenarios. Create a file named improved_analyzer.py:

from collections import OrderedDict

def analyze_text_improved(text):
    """
    An improved version of text analyzer that handles more complex scenarios
    and provides more statistics.
    """
    ## Split text into words and convert to lowercase
    words = text.lower().split()

    ## Remove punctuation from words
    clean_words = [word.strip('.,!?:;()[]{}""\'') for word in words]

    ## Use OrderedDict to preserve order and count frequency
    word_counts = OrderedDict()

    for word in clean_words:
        if word:  ## Skip empty strings
            word_counts[word] = word_counts.get(word, 0) + 1

    ## Get statistics
    total_words = sum(word_counts.values())
    unique_words_count = len(word_counts)

    return list(word_counts.keys()), word_counts, total_words, unique_words_count

## Sample text
sample_text = """
Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!
"""

## Analyze the text
unique_words, word_frequencies, total_count, unique_count = analyze_text_improved(sample_text)

## Print results
print("Text sample:")
print(sample_text)
print("\nStatistics:")
print(f"Total words: {total_count}")
print(f"Unique words: {unique_count}")
print(f"Uniqueness ratio: {unique_count/total_count:.2%}")

print("\nTop 5 most frequent words:")
sorted_words = sorted(word_frequencies.items(), key=lambda x: x[1], reverse=True)
for word, count in sorted_words[:5]:
    print(f"'{word}': {count} times")

Run the file:

python3 improved_analyzer.py

You should see output with additional statistics:

Text sample:

Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!

Statistics:
Total words: 38
Unique words: 30
Uniqueness ratio: 78.95%

Top 5 most frequent words:
'python': 5 times
'is': 3 times
'to': 1 times
'learn': 1 times
'with': 1 times

This practical example demonstrates how preserving the order of elements when removing duplicates can be useful in real-world applications like text analysis.

Performance Comparison and Best Practices

Now that we've explored several methods to convert a list to a set while preserving order, let's compare their performance and establish some best practices.

Creating a Performance Test

Create a new file named performance_test.py:

import time
from collections import OrderedDict

def method1_dict(data):
    """Using dict.fromkeys()"""
    return list(dict.fromkeys(data))

def method2_ordereddict(data):
    """Using OrderedDict.fromkeys()"""
    return list(OrderedDict.fromkeys(data))

def method3_loop(data):
    """Using a loop and a set"""
    result = []
    seen = set()
    for item in data:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

def time_function(func, data, runs=100):
    """Measure execution time of a function"""
    start_time = time.time()
    for _ in range(runs):
        func(data)
    end_time = time.time()
    return (end_time - start_time) / runs

## Test data
small_list = list(range(100)) + list(range(50))  ## 150 items, 50 duplicates
medium_list = list(range(1000)) + list(range(500))  ## 1500 items, 500 duplicates
large_list = list(range(10000)) + list(range(5000))  ## 15000 items, 5000 duplicates

## Test results
print("Performance comparison (average time in seconds over 100 runs):\n")

print("Small list (150 items, 50 duplicates):")
print(f"dict.fromkeys():       {time_function(method1_dict, small_list):.8f}")
print(f"OrderedDict.fromkeys(): {time_function(method2_ordereddict, small_list):.8f}")
print(f"Loop and set:          {time_function(method3_loop, small_list):.8f}")

print("\nMedium list (1,500 items, 500 duplicates):")
print(f"dict.fromkeys():       {time_function(method1_dict, medium_list):.8f}")
print(f"OrderedDict.fromkeys(): {time_function(method2_ordereddict, medium_list):.8f}")
print(f"Loop and set:          {time_function(method3_loop, medium_list):.8f}")

print("\nLarge list (15,000 items, 5,000 duplicates):")
print(f"dict.fromkeys():       {time_function(method1_dict, large_list):.8f}")
print(f"OrderedDict.fromkeys(): {time_function(method2_ordereddict, large_list):.8f}")
print(f"Loop and set:          {time_function(method3_loop, large_list):.8f}")

Run the performance test:

python3 performance_test.py

The output will show the performance of each method with different list sizes:

Performance comparison (average time in seconds over 100 runs):

Small list (150 items, 50 duplicates):
dict.fromkeys():       0.00000334
OrderedDict.fromkeys(): 0.00000453
Loop and set:          0.00000721

Medium list (1,500 items, 500 duplicates):
dict.fromkeys():       0.00003142
OrderedDict.fromkeys(): 0.00004123
Loop and set:          0.00007621

Large list (15,000 items, 5,000 duplicates):
dict.fromkeys():       0.00035210
OrderedDict.fromkeys(): 0.00044567
Loop and set:          0.00081245

The actual numbers may vary depending on your system, but you should notice some patterns.

Best Practices

Based on our experiments, let's establish some best practices. Create a file named best_practices.py:

"""
Best Practices for Converting a List to a Set While Preserving Order
"""

## Example 1: For Python 3.7+, use dict.fromkeys() for best performance
def preserve_order_modern(lst):
    """Best method for Python 3.7+ - using dict.fromkeys()"""
    return list(dict.fromkeys(lst))

## Example 2: For compatibility with older Python versions, use OrderedDict
from collections import OrderedDict

def preserve_order_compatible(lst):
    """Compatible method for all Python versions - using OrderedDict"""
    return list(OrderedDict.fromkeys(lst))

## Example 3: When you need to process elements while preserving order
def preserve_order_with_processing(lst):
    """Process elements while preserving order"""
    result = []
    seen = set()

    for item in lst:
        ## Option to process the item here
        processed_item = str(item).lower()  ## Example processing

        if processed_item not in seen:
            seen.add(processed_item)
            result.append(item)  ## Keep original item in the result

    return result

## Demo
data = ["Apple", "banana", "Orange", "apple", "Pear", "BANANA"]

print("Original list:", data)
print("Method 1 (Python 3.7+):", preserve_order_modern(data))
print("Method 2 (Compatible):", preserve_order_compatible(data))
print("Method 3 (With processing):", preserve_order_with_processing(data))

Run the file:

python3 best_practices.py

The output shows how each method handles the data:

Original list: ['Apple', 'banana', 'Orange', 'apple', 'Pear', 'BANANA']
Method 1 (Python 3.7+): ['Apple', 'banana', 'Orange', 'apple', 'Pear', 'BANANA']
Method 2 (Compatible): ['Apple', 'banana', 'Orange', 'apple', 'Pear', 'BANANA']
Method 3 (With processing): ['Apple', 'Orange', 'Pear']

Notice that Method 3 considers "Apple" and "apple" as the same item due to the lowercase processing.

Recommendations

Based on our experiments, here are some recommendations:

  1. For Python 3.7 and later, use dict.fromkeys() for the best performance.
  2. For compatibility with all Python versions, use OrderedDict.fromkeys().
  3. When you need to perform custom processing while checking for duplicates, use the loop and set approach.
  4. Consider case-sensitivity and other transformations based on your specific requirements.

Summary

In this tutorial, you have learned:

  1. The fundamental differences between Python lists and sets

  2. Why converting a list to a set normally causes the order to be lost

  3. Multiple methods to convert a list to a set while preserving the original order:

    • Using dict.fromkeys() in Python 3.7+
    • Using OrderedDict.fromkeys() for compatibility with older Python versions
    • Using a loop with a set for more complex processing
  4. How to apply these techniques to real-world problems like text analysis

  5. Performance considerations and best practices for different scenarios

These techniques are valuable for data cleaning, removing duplicates from user input, processing configuration options, and many other common programming tasks. By choosing the right approach based on your specific requirements, you can write cleaner, more efficient Python code.