从字典创建 Pandas DataFrame

介绍

欢迎来到 Pandas 数据处理的世界！DataFrame 是一种二维的、大小可变的、可能包含异构数据的表格数据结构，它拥有已标记的轴（行和列）。它是现代数据分析中最常用的数据结构之一。

在本实验中，你将学习创建 Pandas DataFrame 的基本方法。我们将从一个简单的 Python 字典开始创建 DataFrame，然后探索如何自定义其列和索引。你将在 WebIDE 中完成所有任务，编写并执行 Python 脚本。

从字典创建 DataFrame

在本步骤中，你将学习创建 Pandas DataFrame 最常用的方法：从 Python 字典创建。当你使用字典时，键将成为列名，而值（通常是列表或数组）将成为这些列中的数据。

首先，从 WebIDE 左侧的文件浏览器中打开 main.py 文件。

现在，将以下代码添加到 main.py 文件中。这段代码导入了 Pandas 库并定义了一个包含学生数据的字典。然后，它使用 pd.DataFrame() 将字典转换为 DataFrame 并打印结果。

import pandas as pd

## Data in a dictionary
student_data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 92, 78]
}

## Create DataFrame from the dictionary
df = pd.DataFrame(student_data)

## Print the DataFrame
print(df)

要运行你的脚本，请在 WebIDE 中打开一个终端（Terminal -> New Terminal）并执行以下命令。你所有的工作都应该在 ~/project 目录下完成。

python3 main.py

你应该会看到以下输出，它显示了你的字典数据整齐地组织成一个表格，默认的行索引从 0 开始。

      Name  Score
0    Alice     85
1      Bob     92
2  Charlie     78

在 DataFrame 中指定列名

在本步骤中，你将学习如何控制 DataFrame 中列的顺序。默认情况下，Pandas 可能不会保留字典中键的顺序。你可以通过将列名列表传递给 columns 参数来显式定义列的顺序。

让我们修改 main.py 文件来指定列的顺序。我们将交换 'Name' 和 'Score' 列。

使用以下代码更新你的 main.py 文件。注意在 pd.DataFrame() 函数中添加了 columns 参数。

import pandas as pd

## Data in a dictionary
student_data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 92, 78]
}

## Create DataFrame and specify column order
df = pd.DataFrame(student_data, columns=['Score', 'Name'])

## Print the DataFrame
print(df)

现在，在终端中再次运行脚本：

python3 main.py

输出现在将首先显示 'Score' 列，正如你所指定的。

   Score     Name
0     85    Alice
1     92      Bob
2     78  Charlie

为 DataFrame 添加索引标签

在本步骤中，你将学习如何将默认的数字索引（0, 1, 2, ...）替换为更有意义的标签。这可以通过 index 参数完成，该参数允许你为每一行分配自定义索引。

让我们为 DataFrame 分配唯一的学生 ID 作为索引。修改你的 main.py 文件以包含一个索引标签列表。

按以下方式更新 main.py 中的代码：

import pandas as pd

## Data in a dictionary
student_data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 92, 78]
}

## Define custom index labels
index_labels = ['ID1', 'ID2', 'ID3']

## Create DataFrame with custom index
df = pd.DataFrame(student_data, index=index_labels)

## Print the DataFrame
print(df)

在终端中执行脚本：

python3 main.py

你现在将看到默认的数字索引被你自定义的 'ID' 标签替换了。

        Name  Score
ID1    Alice     85
ID2      Bob     92
ID3  Charlie     78

使用点表示法访问 DataFrame 列

在本步骤中，你将学习一种访问 DataFrame 单个列的便捷方法：点符号。如果列名是有效的 Python 标识符（没有空格，不以数字开头等），你可以将其作为 DataFrame 对象的属性来访问。

让我们使用点符号从 DataFrame 中选择并仅打印 'Name' 列。

修改你的 main.py 文件以访问 Name 列并打印它。

import pandas as pd

## Data in a dictionary
student_data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 92, 78]
}

## Create DataFrame
df = pd.DataFrame(student_data)

## Access and print the 'Name' column using dot notation
print(df.Name)

在终端中运行脚本：

python3 main.py

输出将是一个 Pandas Series，它本质上是 DataFrame 的单列。

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

使用 info 方法显示 DataFrame 信息

在本步骤中，你将学习使用 .info() 方法。这是一个非常重要的方法，它能提供 DataFrame 的简洁摘要，包括每列的数据类型、非空值的数量以及内存使用情况。在探索新数据集时，这是一个很好的起点。

让我们将 .info() 方法应用到我们的学生 DataFrame 上。

修改 main.py 文件来调用此方法。请注意，.info() 会直接打印摘要，因此你无需将其包裹在 print() 函数中。

import pandas as pd

## Data in a dictionary
student_data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Score': [85, 92, 78]
}

## Create DataFrame
df = pd.DataFrame(student_data)

## Display a summary of the DataFrame
df.info()

在终端中运行脚本：

python3 main.py

输出将为你提供 DataFrame 结构和内容的详细概述。

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 ##   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    3 non-null      object
 1   Score   3 non-null      int64
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes

总结

恭喜你完成了本次实验！你已经掌握了创建和检查 Pandas DataFrame 的基本技术。

在本次实验中，你已经熟练掌握了：

从 Python 字典创建 DataFrame。
使用 columns 参数指定和重新排序列。
使用 index 参数分配自定义行标签。
使用便捷的点符号访问特定列。
使用 .info() 方法获取 DataFrame 结构的简洁摘要。

Pandas 创建 DataFrame

介绍

从字典创建 DataFrame

在 DataFrame 中指定列名

为 DataFrame 添加索引标签

使用点表示法访问 DataFrame 列

使用 info 方法显示 DataFrame 信息

总结