Python 리스트를 순서 유지하며 세트로 변환하는 방법 - 중복 제거 및 순서 보존

소개

Python 의 내장 데이터 구조는 데이터를 관리하고 조작하는 유연한 방법을 제공합니다. 이 튜토리얼에서는 Python 리스트를 세트로 변환하면서 원래 요소의 순서를 유지하는 방법을 살펴봅니다. 이 기술은 리스트에서 중복을 제거해야 하지만 각 고유 요소의 첫 번째 발생 순서를 유지해야 할 때 특히 유용합니다.

이 튜토리얼을 마치면 Python 에서 리스트와 세트의 차이점을 이해하고, 요소의 원래 순서를 유지하면서 리스트를 세트로 변환하는 여러 가지 기술을 배우게 됩니다.

Python 의 리스트와 세트 이해

리스트를 세트로 변환하기 전에, Python 에서 이 두 데이터 구조의 기본 속성을 이해해 보겠습니다.

Python 리스트

Python 의 리스트는 서로 다른 데이터 유형의 요소를 저장할 수 있는 정렬된 컬렉션입니다. 중복 값을 허용하며 요소의 삽입 순서를 유지합니다.

리스트를 시연하기 위해 간단한 Python 파일을 만들어 보겠습니다. 코드 편집기를 열고 /home/labex/project 디렉토리에 list_demo.py라는 새 파일을 만듭니다.

## Lists in Python
my_list = [1, 2, 3, 2, 4, 5, 3]

print("Original list:", my_list)
print("Length of list:", len(my_list))
print("First element:", my_list[0])
print("Last element:", my_list[-1])
print("First 3 elements:", my_list[:3])
print("Does list contain duplicates?", len(my_list) != len(set(my_list)))

이제 터미널에서 이 파일을 실행합니다.

python3 list_demo.py

다음과 유사한 출력을 볼 수 있습니다.

Original list: [1, 2, 3, 2, 4, 5, 3]
Length of list: 7
First element: 1
Last element: 3
First 3 elements: [1, 2, 3]
Does list contain duplicates? True

Python 세트

세트는 고유 요소의 정렬되지 않은 컬렉션입니다. 리스트를 세트로 변환하면 중복 요소가 자동으로 제거되지만 원래 순서는 유지되지 않습니다.

세트를 탐색하기 위해 set_demo.py라는 다른 파일을 만들어 보겠습니다.

## Sets in Python
my_list = [1, 2, 3, 2, 4, 5, 3]
my_set = set(my_list)

print("Original list:", my_list)
print("Converted to set:", my_set)
print("Length of list:", len(my_list))
print("Length of set:", len(my_set))
print("Does set maintain order?", list(my_set) == [1, 2, 3, 4, 5])

이 파일을 실행합니다.

python3 set_demo.py

출력은 다음과 같이 표시됩니다.

Original list: [1, 2, 3, 2, 4, 5, 3]
Converted to set: {1, 2, 3, 4, 5}
Length of list: 7
Length of set: 5
Does set maintain order? False

세트가 모든 중복을 제거했지만 순서가 원래 리스트와 다를 수 있음을 알 수 있습니다. 이는 Python 의 세트가 본질적으로 정렬되지 않기 때문입니다.

기본 접근 방식: 리스트를 세트로 변환하기

이제 리스트와 세트의 차이점을 이해했으므로, 리스트를 세트로 변환하는 방법과 이 변환의 의미를 살펴보겠습니다.

간단한 변환

리스트를 세트로 변환하는 가장 기본적인 방법은 내장 함수 set()을 사용하는 것입니다. basic_conversion.py라는 새 파일을 만듭니다.

## Basic conversion of list to set
fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

## Convert list to set (removes duplicates but loses order)
unique_fruits = set(fruits)

print("Original list:", fruits)
print("As a set:", unique_fruits)

## Convert back to list (order not preserved)
unique_fruits_list = list(unique_fruits)
print("Back to list:", unique_fruits_list)

이 파일을 실행합니다.

python3 basic_conversion.py

다음과 유사한 출력을 볼 수 있습니다.

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
As a set: {'orange', 'banana', 'apple', 'pear'}
Back to list: ['orange', 'banana', 'apple', 'pear']

세트가 모든 중복을 제거했지만 순서가 원래 리스트와 다름을 알 수 있습니다. 세트를 다시 리스트로 변환할 때 순서는 여전히 원래 리스트와 동일하지 않습니다.

순서의 문제점

이 간단한 변환은 우리가 해결하려는 문제를 보여줍니다. 리스트를 세트로 변환하면 요소의 원래 순서를 잃게 됩니다. 원래 순서가 중요한 경우 이 접근 방식은 적합하지 않습니다.

이것이 왜 문제가 될 수 있는지 보여주기 위해 예제를 수정해 보겠습니다. order_matters.py라는 파일을 만듭니다.

## Example showing why order matters
steps = ["Preheat oven", "Mix ingredients", "Pour batter", "Bake", "Mix ingredients"]

## Remove duplicates using set
unique_steps = list(set(steps))

print("Original cooking steps:", steps)
print("Unique steps (using set):", unique_steps)
print("Is the order preserved?", unique_steps == ["Preheat oven", "Mix ingredients", "Pour batter", "Bake"])

파일을 실행합니다.

python3 order_matters.py

출력은 다음과 같습니다.

Original cooking steps: ['Preheat oven', 'Mix ingredients', 'Pour batter', 'Bake', 'Mix ingredients']
Unique steps (using set): ['Preheat oven', 'Bake', 'Mix ingredients', 'Pour batter']
Is the order preserved? False

이 예에서 요리 단계의 순서는 중요합니다. 재료를 섞기 전에 굽는다면 결과는 끔찍할 것입니다. 이것은 중복을 제거할 때 원래 순서를 유지해야 하는 이유를 보여줍니다.

리스트를 세트로 변환할 때 순서 유지하기

이제 문제를 이해했으므로, 요소의 원래 순서를 유지하면서 리스트를 세트로 변환하는 방법을 살펴보겠습니다.

방법 1: 순서를 유지하기 위해 딕셔너리 사용

한 가지 방법은 딕셔너리를 사용하여 요소의 순서를 추적하는 것입니다. Python 3.7 부터 딕셔너리는 기본적으로 삽입 순서를 유지합니다.

dict_approach.py라는 새 파일을 만듭니다.

## Using a dictionary to preserve order
fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

## Create a dictionary with list elements as keys
## This automatically removes duplicates while preserving order
unique_fruits_dict = dict.fromkeys(fruits)

## Convert dictionary keys back to a list
unique_fruits = list(unique_fruits_dict)

print("Original list:", fruits)
print("Unique elements (order preserved):", unique_fruits)

파일을 실행합니다.

python3 dict_approach.py

다음과 같은 출력을 볼 수 있습니다.

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
Unique elements (order preserved): ['apple', 'banana', 'orange', 'pear']

각 요소의 첫 번째 발생 순서가 유지됨을 알 수 있습니다.

방법 2: OrderedDict 사용

Python 3.7 이전 버전을 사용하는 사용자 또는 의도를 더 명시적으로 표현하기 위해 collections 모듈에서 OrderedDict를 사용할 수 있습니다.

ordered_dict_approach.py라는 새 파일을 만듭니다.

## Using OrderedDict to preserve order
from collections import OrderedDict

fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

## Create an OrderedDict with list elements as keys
## This automatically removes duplicates while preserving order
unique_fruits_ordered = list(OrderedDict.fromkeys(fruits))

print("Original list:", fruits)
print("Unique elements (order preserved):", unique_fruits_ordered)

파일을 실행합니다.

python3 ordered_dict_approach.py

출력은 다음과 같아야 합니다.

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
Unique elements (order preserved): ['apple', 'banana', 'orange', 'pear']

방법 3: 루프와 세트를 사용하여 확인

또 다른 방법은 루프와 세트를 사용하여 이전에 요소를 본 적이 있는지 확인하는 것입니다.

loop_approach.py라는 새 파일을 만듭니다.

## Using a loop and a set to preserve order
fruits = ["apple", "banana", "orange", "apple", "pear", "banana"]

unique_fruits = []
seen = set()

for fruit in fruits:
    if fruit not in seen:
        seen.add(fruit)
        unique_fruits.append(fruit)

print("Original list:", fruits)
print("Unique elements (order preserved):", unique_fruits)

파일을 실행합니다.

python3 loop_approach.py

출력은 다음과 같아야 합니다.

Original list: ['apple', 'banana', 'orange', 'apple', 'pear', 'banana']
Unique elements (order preserved): ['apple', 'banana', 'orange', 'pear']

세 가지 방법 모두 동일한 결과를 얻습니다. 즉, 각 요소의 첫 번째 발생 순서를 유지하면서 중복을 제거합니다.

실용적인 예시: 텍스트 데이터 분석

이제 배운 내용을 실제 예시에 적용해 보겠습니다. 즉, 첫 번째 등장 순서를 유지하면서 텍스트에서 단어 빈도를 분석하는 것입니다.

텍스트 분석 도구 만들기

text_analyzer.py라는 새 파일을 만듭니다.

def analyze_text(text):
    """
    Analyze text to find unique words in order of first appearance
    and their frequencies.
    """
    ## Split text into words and convert to lowercase
    words = text.lower().split()

    ## Remove punctuation from words
    clean_words = [word.strip('.,!?:;()[]{}""\'') for word in words]

    ## Count frequency while preserving order
    word_counts = {}
    unique_words_in_order = []

    for word in clean_words:
        if word and word not in word_counts:
            unique_words_in_order.append(word)
        word_counts[word] = word_counts.get(word, 0) + 1

    return unique_words_in_order, word_counts

## Sample text
sample_text = """
Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!
"""

## Analyze the text
unique_words, word_frequencies = analyze_text(sample_text)

## Print results
print("Text sample:")
print(sample_text)
print("\nUnique words in order of first appearance:")
print(unique_words)
print("\nWord frequencies:")
for word in unique_words:
    if word:  ## Skip empty strings
        print(f"'{word}': {word_frequencies[word]} times")

파일을 실행합니다.

python3 text_analyzer.py

출력은 텍스트에 처음 나타난 순서대로 고유한 단어와 해당 빈도를 표시합니다.

Text sample:

Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!

Unique words in order of first appearance:
['python', 'is', 'amazing', 'also', 'easy', 'to', 'learn', 'with', 'you', 'can', 'create', 'web', 'applications', 'data', 'analysis', 'tools', 'machine', 'learning', 'models', 'and', 'much', 'more', 'has', 'many', 'libraries', 'that', 'make', 'development', 'faster', 'versatile']

Word frequencies:
'python': 5 times
'is': 3 times
'amazing': 1 times
'also': 1 times
...

도구 개선

더 복잡한 시나리오를 처리하도록 텍스트 분석기를 개선해 보겠습니다. improved_analyzer.py라는 파일을 만듭니다.

from collections import OrderedDict

def analyze_text_improved(text):
    """
    An improved version of text analyzer that handles more complex scenarios
    and provides more statistics.
    """
    ## Split text into words and convert to lowercase
    words = text.lower().split()

    ## Remove punctuation from words
    clean_words = [word.strip('.,!?:;()[]{}""\'') for word in words]

    ## Use OrderedDict to preserve order and count frequency
    word_counts = OrderedDict()

    for word in clean_words:
        if word:  ## Skip empty strings
            word_counts[word] = word_counts.get(word, 0) + 1

    ## Get statistics
    total_words = sum(word_counts.values())
    unique_words_count = len(word_counts)

    return list(word_counts.keys()), word_counts, total_words, unique_words_count

## Sample text
sample_text = """
Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!
"""

## Analyze the text
unique_words, word_frequencies, total_count, unique_count = analyze_text_improved(sample_text)

## Print results
print("Text sample:")
print(sample_text)
print("\nStatistics:")
print(f"Total words: {total_count}")
print(f"Unique words: {unique_count}")
print(f"Uniqueness ratio: {unique_count/total_count:.2%}")

print("\nTop 5 most frequent words:")
sorted_words = sorted(word_frequencies.items(), key=lambda x: x[1], reverse=True)
for word, count in sorted_words[:5]:
    print(f"'{word}': {count} times")

파일을 실행합니다.

python3 improved_analyzer.py

추가 통계가 포함된 출력을 볼 수 있습니다.

Text sample:

Python is amazing. Python is also easy to learn.
With Python, you can create web applications, data analysis tools,
machine learning models, and much more. Python has many libraries
that make development faster. Python is versatile!

Statistics:
Total words: 38
Unique words: 30
Uniqueness ratio: 78.95%

Top 5 most frequent words:
'python': 5 times
'is': 3 times
'to': 1 times
'learn': 1 times
'with': 1 times

이 실용적인 예시는 중복을 제거할 때 요소의 순서를 유지하는 것이 텍스트 분석과 같은 실제 응용 프로그램에서 어떻게 유용할 수 있는지를 보여줍니다.

성능 비교 및 모범 사례

이제 순서를 유지하면서 리스트를 세트로 변환하는 여러 가지 방법을 살펴보았으므로, 성능을 비교하고 몇 가지 모범 사례를 설정해 보겠습니다.

성능 테스트 만들기

performance_test.py라는 새 파일을 만듭니다.

import time
from collections import OrderedDict

def method1_dict(data):
    """Using dict.fromkeys()"""
    return list(dict.fromkeys(data))

def method2_ordereddict(data):
    """Using OrderedDict.fromkeys()"""
    return list(OrderedDict.fromkeys(data))

def method3_loop(data):
    """Using a loop and a set"""
    result = []
    seen = set()
    for item in data:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

def time_function(func, data, runs=100):
    """Measure execution time of a function"""
    start_time = time.time()
    for _ in range(runs):
        func(data)
    end_time = time.time()
    return (end_time - start_time) / runs

## Test data
small_list = list(range(100)) + list(range(50))  ## 150 items, 50 duplicates
medium_list = list(range(1000)) + list(range(500))  ## 1500 items, 500 duplicates
large_list = list(range(10000)) + list(range(5000))  ## 15000 items, 5000 duplicates

## Test results
print("Performance comparison (average time in seconds over 100 runs):\n")

print("Small list (150 items, 50 duplicates):")
print(f"dict.fromkeys():       {time_function(method1_dict, small_list):.8f}")
print(f"OrderedDict.fromkeys(): {time_function(method2_ordereddict, small_list):.8f}")
print(f"Loop and set:          {time_function(method3_loop, small_list):.8f}")

print("\nMedium list (1,500 items, 500 duplicates):")
print(f"dict.fromkeys():       {time_function(method1_dict, medium_list):.8f}")
print(f"OrderedDict.fromkeys(): {time_function(method2_ordereddict, medium_list):.8f}")
print(f"Loop and set:          {time_function(method3_loop, medium_list):.8f}")

print("\nLarge list (15,000 items, 5,000 duplicates):")
print(f"dict.fromkeys():       {time_function(method1_dict, large_list):.8f}")
print(f"OrderedDict.fromkeys(): {time_function(method2_ordereddict, large_list):.8f}")
print(f"Loop and set:          {time_function(method3_loop, large_list):.8f}")

성능 테스트를 실행합니다.

python3 performance_test.py

출력은 각 방법의 성능을 다양한 리스트 크기로 보여줍니다.

Performance comparison (average time in seconds over 100 runs):

Small list (150 items, 50 duplicates):
dict.fromkeys():       0.00000334
OrderedDict.fromkeys(): 0.00000453
Loop and set:          0.00000721

Medium list (1,500 items, 500 duplicates):
dict.fromkeys():       0.00003142
OrderedDict.fromkeys(): 0.00004123
Loop and set:          0.00007621

Large list (15,000 items, 5,000 duplicates):
dict.fromkeys():       0.00035210
OrderedDict.fromkeys(): 0.00044567
Loop and set:          0.00081245

실제 숫자는 시스템에 따라 다를 수 있지만, 몇 가지 패턴을 확인할 수 있습니다.

모범 사례

실험을 바탕으로 몇 가지 모범 사례를 설정해 보겠습니다. best_practices.py라는 파일을 만듭니다.

"""
Best Practices for Converting a List to a Set While Preserving Order
"""

## Example 1: For Python 3.7+, use dict.fromkeys() for best performance
def preserve_order_modern(lst):
    """Best method for Python 3.7+ - using dict.fromkeys()"""
    return list(dict.fromkeys(lst))

## Example 2: For compatibility with older Python versions, use OrderedDict
from collections import OrderedDict

def preserve_order_compatible(lst):
    """Compatible method for all Python versions - using OrderedDict"""
    return list(OrderedDict.fromkeys(lst))

## Example 3: When you need to process elements while preserving order
def preserve_order_with_processing(lst):
    """Process elements while preserving order"""
    result = []
    seen = set()

    for item in lst:
        ## Option to process the item here
        processed_item = str(item).lower()  ## Example processing

        if processed_item not in seen:
            seen.add(processed_item)
            result.append(item)  ## Keep original item in the result

    return result

## Demo
data = ["Apple", "banana", "Orange", "apple", "Pear", "BANANA"]

print("Original list:", data)
print("Method 1 (Python 3.7+):", preserve_order_modern(data))
print("Method 2 (Compatible):", preserve_order_compatible(data))
print("Method 3 (With processing):", preserve_order_with_processing(data))

파일을 실행합니다.

python3 best_practices.py

출력은 각 방법이 데이터를 처리하는 방식을 보여줍니다.

Original list: ['Apple', 'banana', 'Orange', 'apple', 'Pear', 'BANANA']
Method 1 (Python 3.7+): ['Apple', 'banana', 'Orange', 'apple', 'Pear', 'BANANA']
Method 2 (Compatible): ['Apple', 'banana', 'Orange', 'apple', 'Pear', 'BANANA']
Method 3 (With processing): ['Apple', 'Orange', 'Pear']

Method 3 은 소문자 처리로 인해 "Apple"과 "apple"을 동일한 항목으로 간주합니다.

권장 사항

실험을 바탕으로 몇 가지 권장 사항을 제시합니다.

Python 3.7 이상에서는 최고의 성능을 위해 dict.fromkeys()를 사용하십시오.
모든 Python 버전과의 호환성을 위해 OrderedDict.fromkeys()를 사용하십시오.
중복을 확인하면서 사용자 지정 처리를 수행해야 하는 경우 루프와 세트 방식을 사용하십시오.
특정 요구 사항에 따라 대소문자 구분 및 기타 변환을 고려하십시오.

요약

이 튜토리얼에서 다음 내용을 배웠습니다.

Python 리스트와 세트의 근본적인 차이점
리스트를 세트로 변환하면 일반적으로 순서가 손실되는 이유
원래 순서를 유지하면서 리스트를 세트로 변환하는 여러 가지 방법:
- Python 3.7+ 에서 dict.fromkeys() 사용
- 이전 Python 버전과의 호환성을 위해 OrderedDict.fromkeys() 사용
- 더 복잡한 처리를 위해 세트와 함께 루프 사용
텍스트 분석과 같은 실제 문제에 이러한 기술을 적용하는 방법
다양한 시나리오에 대한 성능 고려 사항 및 모범 사례

이러한 기술은 데이터 정리, 사용자 입력에서 중복 제거, 구성 옵션 처리 및 기타 여러 일반적인 프로그래밍 작업에 유용합니다. 특정 요구 사항에 따라 적절한 접근 방식을 선택하면 더 깔끔하고 효율적인 Python 코드를 작성할 수 있습니다.