Python | 蛇形命名法转换 | 字符串操作

简介

在 Python 中，蛇形命名法（snake case）是一种命名约定，单词之间用下划线分隔。这种命名风格常用于 Python 中的变量、函数和其他标识符。例如，calculate_total_price 和 user_authentication 就是蛇形命名法格式。

有时，你可能会遇到不同格式的字符串，如驼峰命名法（camelCase）、帕斯卡命名法（PascalCase），甚至是带有空格和连字符的字符串。在这种情况下，你需要将这些字符串转换为蛇形命名法，以确保代码的一致性。

在这个实验中，你将学习如何编写一个 Python 函数，将各种格式的字符串转换为蛇形命名法。

理解问题

在编写蛇形命名法（snake case）转换函数之前，让我们先明确需要完成的任务：

你需要将任何格式的字符串转换为蛇形命名法。
蛇形命名法指的是所有字母为小写，单词之间用下划线分隔。
你需要处理不同的输入格式：
- 驼峰命名法（camelCase）（例如，camelCase → camel_case）
- 带有空格的字符串（例如，some text → some_text）
- 混合格式的字符串（例如，包含连字符、下划线和大小写混合的情况）

让我们先为蛇形命名法函数创建一个新的 Python 文件。在 WebIDE 中，导航到项目目录并创建一个名为 snake_case.py 的新文件：

## This function will convert a string to snake case
def snake(s):
    ## We'll implement this function step by step
    pass  ## Placeholder for now

## Test with a simple example
if __name__ == "__main__":
    test_string = "helloWorld"
    result = snake(test_string)
    print(f"Original: {test_string}")
    print(f"Snake case: {result}")

保存此文件。下一步，我们将开始实现该函数。

现在，让我们运行这个占位函数，确保文件设置正确。打开终端并运行：

python3 ~/project/snake_case.py

python-prompt

你应该会看到如下输出：

Original: helloWorld
Snake case: None

结果为 None 是因为当前函数只是返回 Python 的默认 None 值。下一步，我们将添加实际的转换逻辑。

使用正则表达式进行模式匹配

为了将字符串转换为蛇形命名法（snake case），我们将使用正则表达式（regex）来识别单词边界。Python 中的 re 模块提供了强大的模式匹配功能，可用于此任务。

让我们更新函数以处理驼峰命名法（camelCase）的字符串：

首先，你需要识别小写字母后面跟着大写字母的模式（例如“camelCase”）。
然后，在它们之间插入一个空格。
最后，将所有内容转换为小写，并将空格替换为下划线。

使用以下改进后的函数更新你的 snake_case.py 文件：

import re

def snake(s):
    ## Replace pattern of a lowercase letter followed by uppercase with lowercase, space, uppercase
    s1 = re.sub('([a-z])([A-Z])', r'\1 \2', s)

    ## Replace spaces with underscores and convert to lowercase
    return s1.lower().replace(' ', '_')

## Test with a simple example
if __name__ == "__main__":
    test_string = "helloWorld"
    result = snake(test_string)
    print(f"Original: {test_string}")
    print(f"Snake case: {result}")

让我们详细分析这个函数的工作原理：

re.sub('([a-z])([A-Z])', r'\1 \2', s) 查找小写字母 ([a-z]) 后面跟着大写字母 ([A-Z]) 的模式。然后，它将此模式替换为相同的字母，但使用 \1 和 \2 在它们之间添加一个空格，\1 和 \2 分别引用捕获的组。
然后，我们使用 lower() 将所有内容转换为小写，并将空格替换为下划线。

再次运行脚本，看看它是否能处理驼峰命名法的字符串：

python3 ~/project/snake_case.py

现在的输出应该是：

Original: helloWorld
Snake case: hello_world

很棒！我们的函数现在可以处理驼峰命名法的字符串了。下一步，我们将对其进行增强，以处理更复杂的情况。

处理更复杂的模式

我们当前的函数可以处理驼峰命名法（camelCase），但还需要对其进行增强，以处理其他模式，例如：

帕斯卡命名法（PascalCase）（例如，HelloWorld）
带有连字符的字符串（例如，hello-world）
已经包含下划线的字符串（例如，hello_world）

让我们更新函数以处理这些情况：

import re

def snake(s):
    ## Replace hyphens with spaces
    s = s.replace('-', ' ')

    ## Handle PascalCase pattern (sequences of uppercase letters)
    s = re.sub('([A-Z]+)', r' \1', s)

    ## Handle camelCase pattern (lowercase followed by uppercase)
    s = re.sub('([a-z])([A-Z])', r'\1 \2', s)

    ## Split by spaces, join with underscores, and convert to lowercase
    return '_'.join(s.split()).lower()

## Test with multiple examples
if __name__ == "__main__":
    test_strings = [
        "helloWorld",
        "HelloWorld",
        "hello-world",
        "hello_world",
        "some text"
    ]

    for test in test_strings:
        result = snake(test)
        print(f"Original: {test}")
        print(f"Snake case: {result}")
        print("-" * 20)

我们所做的改进如下：

首先，将所有连字符替换为空格。
新的正则表达式 re.sub('([A-Z]+)', r' \1', s) 在任何连续的大写字母序列前添加一个空格，这有助于处理帕斯卡命名法。
保留处理驼峰命名法的正则表达式。
最后，按空格分割字符串，用下划线连接，并转换为小写，这样可以处理剩余的空格并确保输出格式一致。

运行脚本，用各种输入格式进行测试：

python3 ~/project/snake_case.py

你应该会看到如下输出：

Original: helloWorld
Snake case: hello_world
--------------------
Original: HelloWorld
Snake case: hello_world
--------------------
Original: hello-world
Snake case: hello_world
--------------------
Original: hello_world
Snake case: hello_world
--------------------
Original: some text
Snake case: some_text
--------------------

我们的函数现在更加健壮，可以处理各种输入格式。下一步，我们将进行最后的优化，并使用完整的测试套件进行测试。

最终实现与测试

现在，让我们完成实现，以处理所有需要的情况，并验证它能通过所有测试用例。

使用最终实现更新你的 snake_case.py 文件：

import re

def snake(s):
    ## Replace hyphens with spaces
    s = s.replace('-', ' ')

    ## Handle PascalCase pattern
    s = re.sub('([A-Z][a-z]+)', r' \1', s)

    ## Handle sequences of uppercase letters
    s = re.sub('([A-Z]+)', r' \1', s)

    ## Split by whitespace and join with underscores
    return '_'.join(s.split()).lower()

## Test with a complex example
if __name__ == "__main__":
    test_string = "some-mixed_string With spaces_underscores-and-hyphens"
    result = snake(test_string)
    print(f"Original: {test_string}")
    print(f"Snake case: {result}")

这个最终实现：

将连字符替换为空格。
使用 re.sub('([A-Z][a-z]+)', r' \1', s) 在类似“Word”的模式前添加一个空格。
使用 re.sub('([A-Z]+)', r' \1', s) 在连续大写字母序列前添加一个空格。
按空格分割字符串，用下划线连接，并转换为小写。

现在，让我们使用在设置步骤中创建的测试套件来运行我们的函数：

python3 ~/project/test_snake.py

如果你的实现正确，你应该会看到：

All tests passed! Your snake case function works correctly.

恭喜！你已成功实现了一个强大的蛇形命名法（snake case）转换函数，它可以处理各种输入格式。

让我们通过使用原始问题中的示例进行测试，确保我们的函数准确遵循规范：

## Add this to the end of your snake_case.py file:
if __name__ == "__main__":
    examples = [
        'camelCase',
      'some text',
      'some-mixed_string With spaces_underscores-and-hyphens',
        'AllThe-small Things'
    ]

    for ex in examples:
        result = snake(ex)
        print(f"Original: {ex}")
        print(f"Snake case: {result}")
        print("-" * 20)

运行更新后的脚本：

python3 ~/project/snake_case.py

你应该会看到所有示例都被正确转换为蛇形命名法：

Original: some-mixed_string With spaces_underscores-and-hyphens
Snake case: some_mixed_string_with_spaces_underscores_and_hyphens
Original: camelCase
Snake case: camel_case
--------------------
Original: some text
Snake case: some_text
--------------------
Original: some-mixed_string With spaces_underscores-and-hyphens
Snake case: some_mixed_string_with_spaces_underscores_and_hyphens
--------------------
Original: AllThe-small Things
Snake case: all_the_small_things
--------------------

总结

在这个实验中，你学习了如何创建一个 Python 函数，将各种格式的字符串转换为蛇形命名法（snake case）。你完成了以下内容：

你学习了如何使用正则表达式进行模式匹配和字符串转换。
你实现了一个可以处理不同输入格式的函数：
- 驼峰命名法（camelCase）（例如，camelCase → camel_case）
- 帕斯卡命名法（PascalCase）（例如，HelloWorld → hello_world）
- 包含空格的字符串（例如，some text → some_text）
- 包含连字符的字符串（例如，hello-world → hello_world）
- 带有各种分隔符和大小写的混合格式

你使用的关键技术包括：

使用 re.sub() 结合捕获组的正则表达式
字符串方法，如 replace()、lower()、split() 和 join()
识别不同命名约定的模式

这些技能对于数据清理、处理文本输入以及维护一致的编码标准非常有价值。在使用采用不同命名约定的 API 或库时，能够在不同大小写格式之间进行转换尤其有用。

Python 蛇形命名法转换

简介

理解问题

使用正则表达式进行模式匹配

处理更复杂的模式

最终实现与测试

总结