介绍
在本项目中,你将学习如何使用 Python 将 JSON 数据转换为 CSV 格式。这是数据科学和开发中的常见任务,因为 JSON 常用于 API 响应,而 CSV 是存储表格数据的常用格式。
👀 预览
$ head result.csv
"IP","状态","时间","HttpReferer","HttpUserAgent","请求","HttpXForwardedFor","发送的字节数","远程用户","请求长度"
"72.55.30.187","202","[2016-02-23 16:25:10]","http://www.google.cn/search?q=hive","Mozilla/4.0 (兼容; MSIE 7.0; Windows NT 5.1; Trident/4.0;.NET CLR 2.0.50727)","GET /index.html HTTP/1.1","-","-","-","0"
"55.222.156.202","200","[2016-02-23 16:25:10]","-","Mozilla/4.0 (兼容; MSIE6.0; Windows NT 5.0;.NET CLR 1.1.4322)","GET /login.php HTTP/1.1","-","-","-","0"
"190.215.55.29","201","[2016-02-23 16:25:10]","-","Mozilla/4.0 (兼容; MSIE6.0; Windows NT 5.0;.NET CLR 1.1.4322)","GET /view.php HTTP/1.1","-","-","-","0"
"63.132.98.30","200","[2016-02-23 16:25:10]","-","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /list.php HTTP/1.1","-","-","-","0"
"214.124.190.132","201","[2016-02-23 16:25:10]","-","Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_3 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B511 Safari/9537.53","GET /login.php HTTP/1.1","-","-","-","0"
"98.215.187.30","202","[2016-02-23 16:25:10]","-","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /upload.php HTTP/1.1","-","-","-","0"
"143.55.168.187","201","[2016-02-23 16:25:10]","-","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /login.php HTTP/1.1","-","-","-","0"
"98.190.201.29","200","[2016-02-23 16:25:10]","-","Mozilla/5.0 (Linux; Android 4.2.1; Galaxy Nexus Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19","GET /view.php HTTP/1.1","-","-","-","0"
"10.168.55.143","301","[2016-02-23 16:25:10]","http://cn.bing.com/search?q=spark mlib","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /admin/login.php HTTP/1.1","-","-","-","0"
🎯 任务
在本项目中,你将学习:
- 如何读取和理解 JSON 数据
- 如何将 JSON 数据转换为 CSV 文件
- 如何使用正确的列名和格式写入 CSV 文件
🏆 成果
完成本项目后,你将能够:
- 高效地将从 API 获取的 JSON 数据转换为结构化的 CSV 格式
- 理解解析 JSON 数据并将其写入 CSV 文件的过程
- 将这些技能应用于各种数据处理和分析任务
读取并理解 JSON 数据
在这一步中,你将学习如何从 web_access.json 文件中读取并理解 JSON 数据。
- 使用文本编辑器打开位于
~/project目录下的web_access.json文件。 - 检查文件内容。你应该会看到一个对象数组,其中每个对象代表一个网络访问日志条目。每个条目都有几个属性,例如
"IP"、"状态"、"时间"、"HttpReferer"、"HttpUserAgent"、"请求"、"HttpXForwardedFor"、"发送的字节数"、"远程用户"和"请求长度"。 - 熟悉 JSON 数据的结构和内容。这些信息在下一步将 JSON 数据转换为 CSV 格式时会很有用。
将 JSON 转换为 CSV
在这一步中,你将学习如何把 web_access.json 文件中的 JSON 数据转换为 CSV 文件。
- 在
~/project目录下创建一个名为convert.py的新 Python 文件。 - 在
convert.py文件中,导入必要的库:
import csv
import json
- 从
web_access.json文件中读取 JSON 数据:
with open("web_access.json", "r") as json_file:
data = json.load(json_file)
- 定义 CSV 文件中列的顺序:
fieldnames = [
"IP",
"状态",
"时间",
"HttpReferer",
"HttpUserAgent",
"请求",
"HttpXForwardedFor",
"发送的字节数",
"远程用户",
"请求长度",
]
- 在
~/project目录下打开一个名为result.csv的新 CSV 文件,并创建一个csv.DictWriter对象:
with open("result.csv", "w", newline="") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, quoting=csv.QUOTE_ALL)
- 将标题行写入 CSV 文件:
writer.writeheader()
- 将数据行写入 CSV 文件:
for row in data:
writer.writerow(row)
你完整的 convert.py 文件应该如下所示:
import csv
import json
## 从文件中读取 JSON 数据
with open("web_access.json", "r") as json_file:
data = json.load(json_file)
## 定义列的顺序
fieldnames = [
"IP",
"状态",
"时间",
"HttpReferer",
"HttpUserAgent",
"请求",
"HttpXForwardedFor",
"发送的字节数",
"远程用户",
"请求长度",
]
## 写入 CSV 文件
with open("result.csv", "w", newline="") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, quoting=csv.QUOTE_ALL)
## 写入标题
writer.writeheader()
## 写入数据行
for row in data:
writer.writerow(row)
运行 Python 脚本
在这一步中,你将运行 convert.py 脚本来生成 result.csv 文件。
- 打开一个终端并导航到
~/project目录。 - 使用以下命令运行
convert.py脚本:
python convert.py
- 脚本运行完成后,你应该会在
~/project目录中看到一个名为result.csv的新文件。 - 你可以使用以下命令查看
result.csv文件的前 10 行:
head result.csv
这应该会输出 CSV 文件的标题行和前 9 行数据。
"IP","状态","时间","HttpReferer","HttpUserAgent","请求","HttpXForwardedFor","发送的字节数","远程用户","请求长度"
"72.55.30.187","202","[2016-02-23 16:25:10]","http://www.google.cn/search?q=hive","Mozilla/4.0 (兼容; MSIE 7.0; Windows NT 5.1; Trident/4.0;.NET CLR 2.0.50727)","GET /index.html HTTP/1.1","-","-","-","0"
"55.222.156.202","200","[2016-02-23 16:25:10]","-","Mozilla/4.0 (兼容; MSIE6.0; Windows NT 5.0;.NET CLR 1.1.4322)","GET /login.php HTTP/1.1","-","-","-","0"
"190.215.55.29","201","[2016-02-23 16:25:10]","-","Mozilla/4.0 (兼容; MSIE6.0; Windows NT 5.0;.NET CLR 1.1.4322)","GET /view.php HTTP/1.1","-","-","-","0"
"63.132.98.30","200","[2016-02-23 16:25:10]","-","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /list.php HTTP/1.1","-","-","-","0"
"214.124.190.132","201","[2016-02-23 16:25:10]","-","Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_3 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B511 Safari/9537.53","GET /login.php HTTP/1.1","-","-","-","0"
"98.215.187.30","202","[2016-02-23 16:25:10]","-","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /upload.php HTTP/1.1","-","-","-","0"
"143.55.168.187","201","[2016-02-23 16:25:10]","-","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /login.php HTTP/1.1","-","-","-","0"
"98.190.201.29","200","[2016-02-23 16:25:10]","-","Mozilla/5.0 (Linux; Android 4.2.1; Galaxy Nexus Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19","GET /view.php HTTP/1.1","-","-","-","0"
"10.168.55.143","301","[2016-02-23 16:25:10]","http://cn.bing.com/search?q=spark mlib","Mozilla/5.0 (兼容; MSIE 10.0; Windows NT 6.2; Trident/6.0)","GET /admin/login.php HTTP/1.1","-","-","-","0"
恭喜你!你已经成功地将 web_access.json 文件中的 JSON 数据转换为了名为 result.csv 的 CSV 文件。
总结
恭喜你!你已经完成了这个项目。你可以在 LabEx 中练习更多实验来提升你的技能。



