介绍
Python 的多功能性也延伸到处理 JSON 数据,这是一种广泛用于存储和交换信息的格式。JSON 结构可以很简单,也可以很复杂,包含嵌套元素,类似于 Python 的字典和列表。在本教程中,你将通过实际练习,学习如何使用 Python 访问和提取嵌套 JSON 结构中的数据。
在本实验结束时,你将能够自信地浏览 JSON 对象,访问深度嵌套的键,并在 Python 应用程序中使用嵌套数组。
Python 的多功能性也延伸到处理 JSON 数据,这是一种广泛用于存储和交换信息的格式。JSON 结构可以很简单,也可以很复杂,包含嵌套元素,类似于 Python 的字典和列表。在本教程中,你将通过实际练习,学习如何使用 Python 访问和提取嵌套 JSON 结构中的数据。
在本实验结束时,你将能够自信地浏览 JSON 对象,访问深度嵌套的键,并在 Python 应用程序中使用嵌套数组。
JSON(JavaScript Object Notation)是一种轻量级的数据交换格式,既易于人类阅读,也易于机器解析。在 Python 中,JSON 对象表示为字典,JSON 数组表示为列表。
让我们从创建一个简单的 Python 脚本开始,以探索 JSON 数据:
打开 WebIDE,然后从菜单中点击“File > New File”创建一个新文件。
将文件保存为 /home/labex/project/json_practice 目录下的 basic_json.py。
将以下代码添加到 basic_json.py 中:
import json
## 一个简单的 JSON 对象(表示为 Python 字典)
person = {
"name": "John Doe",
"age": 30,
"email": "john.doe@example.com",
"is_employed": True,
"hobbies": ["reading", "hiking", "coding"]
}
## 将 Python 字典转换为 JSON 字符串
json_string = json.dumps(person, indent=2)
print("JSON as string:")
print(json_string)
print("\n" + "-" * 50 + "\n")
## 将 JSON 字符串转换回 Python 字典
parsed_json = json.loads(json_string)
print("Python dictionary:")
print(parsed_json)
print("\n" + "-" * 50 + "\n")
## 访问基本元素
print("Basic access examples:")
print(f"Name: {parsed_json['name']}")
print(f"Age: {parsed_json['age']}")
print(f"First hobby: {parsed_json['hobbies'][0]}")
cd /home/labex/project/json_practice
python3 basic_json.py
你应该看到以下输出:
JSON as string:
{
"name": "John Doe",
"age": 30,
"email": "john.doe@example.com",
"is_employed": true,
"hobbies": [
"reading",
"hiking",
"coding"
]
}
--------------------------------------------------
Python dictionary:
{'name': 'John Doe', 'age': 30, 'email': 'john.doe@example.com', 'is_employed': True, 'hobbies': ['reading', 'hiking', 'coding']}
--------------------------------------------------
Basic access examples:
Name: John Doe
Age: 30
First hobby: reading
json.dumps() 将 Python 对象转换为 JSON 格式的字符串json.loads() 解析 JSON 字符串并将其转换为 Python 对象object_name['key']array_name[index]此示例演示了使用 Python 处理 JSON 的基本操作。在下一步中,我们将探讨如何访问嵌套的 JSON 结构。
JSON 对象通常包含嵌套结构。在 Python 中,我们可以使用多个方括号或链接字典键来访问嵌套数据。
让我们创建一个新脚本来探索嵌套的 JSON 对象:
在 WebIDE 中,创建一个新文件并将其保存为 /home/labex/project/json_practice 目录下的 nested_dict.py。
将以下代码添加到 nested_dict.py 中:
import json
## 带有嵌套对象的 JSON
user_data = {
"name": "John Doe",
"age": 30,
"contact": {
"email": "john.doe@example.com",
"phone": "555-1234",
"address": {
"street": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "12345"
}
},
"preferences": {
"theme": "dark",
"notifications": True
}
}
## 让我们美化并打印 JSON 结构
print("Full JSON structure:")
print(json.dumps(user_data, indent=2))
print("\n" + "-" * 50 + "\n")
## 访问嵌套元素
print("Accessing nested elements:")
print(f"Email: {user_data['contact']['email']}")
print(f"City: {user_data['contact']['address']['city']}")
print(f"Theme: {user_data['preferences']['theme']}")
print("\n" + "-" * 50 + "\n")
## 使用 get() 方法进行安全访问
print("Safe access with get():")
## 如果键不存在,get() 返回 None,或者如果指定了默认值,则返回默认值
phone = user_data.get('contact', {}).get('phone', 'Not available')
country = user_data.get('contact', {}).get('address', {}).get('country', 'Not specified')
print(f"Phone: {phone}")
print(f"Country: {country}") ## 此键不存在,但不会导致错误
cd /home/labex/project/json_practice
python3 nested_dict.py
你应该看到类似于以下的输出:
Full JSON structure:
{
"name": "John Doe",
"age": 30,
"contact": {
"email": "john.doe@example.com",
"phone": "555-1234",
"address": {
"street": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "12345"
}
},
"preferences": {
"theme": "dark",
"notifications": true
}
}
--------------------------------------------------
Accessing nested elements:
Email: john.doe@example.com
City: Anytown
Theme: dark
--------------------------------------------------
Safe access with get():
Phone: 555-1234
Country: Not specified
当使用嵌套 JSON 结构时,你可以通过使用方括号链接键来访问嵌套元素:
## 嵌套访问的语法
value = json_data['key1']['key2']['key3']
但是,如果链中的任何键不存在,这种方法可能会导致错误。更安全的方法是使用 get() 函数,它允许你提供一个默认值(如果缺少某个键):
## 使用 get() 方法进行安全访问
value = json_data.get('key1', {}).get('key2', {}).get('key3', 'default_value')
这种安全访问模式在处理 API 响应或其他结构可能不一致的外部 JSON 数据时特别有价值。
JSON 数据通常包含可能包含其他对象或数组的数组(在 Python 中为列表)。让我们探索如何访问嵌套数组中的元素。
在 WebIDE 中,创建一个新文件并将其保存为 /home/labex/project/json_practice 目录下的 nested_arrays.py。
将以下代码添加到 nested_arrays.py 中:
import json
## 带有嵌套数组的 JSON
company_data = {
"name": "Tech Innovations Inc",
"founded": 2010,
"departments": [
{
"name": "Engineering",
"employees": [
{"name": "Alice Johnson", "role": "Software Engineer", "skills": ["Python", "JavaScript", "AWS"]},
{"name": "Bob Smith", "role": "DevOps Engineer", "skills": ["Docker", "Kubernetes", "Linux"]}
]
},
{
"name": "Marketing",
"employees": [
{"name": "Carol Williams", "role": "Marketing Manager", "skills": ["SEO", "Content Strategy"]},
{"name": "Dave Brown", "role": "Social Media Specialist", "skills": ["Facebook Ads", "Instagram"]}
]
}
],
"locations": ["San Francisco", "New York", "London"]
}
## 打印 JSON 结构
print("Company Data:")
print(json.dumps(company_data, indent=2))
print("\n" + "-" * 50 + "\n")
## 访问数组中的元素
print("Accessing array elements:")
print(f"First location: {company_data['locations'][0]}")
print(f"Number of departments: {len(company_data['departments'])}")
print(f"First department name: {company_data['departments'][0]['name']}")
print("\n" + "-" * 50 + "\n")
## 遍历嵌套数组
print("All employees and their skills:")
for department in company_data['departments']:
dept_name = department['name']
print(f"\nDepartment: {dept_name}")
print("-" * 20)
for employee in department['employees']:
print(f" {employee['name']} ({employee['role']})")
print(f" Skills: {', '.join(employee['skills'])}")
print()
## 在嵌套数组中查找特定数据
print("-" * 50 + "\n")
print("Finding employees with Python skills:")
for department in company_data['departments']:
for employee in department['employees']:
if "Python" in employee['skills']:
print(f" {employee['name']} in {department['name']} department")
cd /home/labex/project/json_practice
python3 nested_arrays.py
你应该看到类似于以下的输出:
Company Data:
{
"name": "Tech Innovations Inc",
"founded": 2010,
"departments": [
{
"name": "Engineering",
"employees": [
{
"name": "Alice Johnson",
"role": "Software Engineer",
"skills": [
"Python",
"JavaScript",
"AWS"
]
},
{
"name": "Bob Smith",
"role": "DevOps Engineer",
"skills": [
"Docker",
"Kubernetes",
"Linux"
]
}
]
},
{
"name": "Marketing",
"employees": [
{
"name": "Carol Williams",
"role": "Marketing Manager",
"skills": [
"SEO",
"Content Strategy"
]
},
{
"name": "Dave Brown",
"role": "Social Media Specialist",
"skills": [
"Facebook Ads",
"Instagram"
]
}
]
}
],
"locations": [
"San Francisco",
"New York",
"London"
]
}
--------------------------------------------------
Accessing array elements:
First location: San Francisco
Number of departments: 2
First department name: Engineering
--------------------------------------------------
All employees and their skills:
Department: Engineering
--------------------
Alice Johnson (Software Engineer)
Skills: Python, JavaScript, AWS
Bob Smith (DevOps Engineer)
Skills: Docker, Kubernetes, Linux
Department: Marketing
--------------------
Carol Williams (Marketing Manager)
Skills: SEO, Content Strategy
Dave Brown (Social Media Specialist)
Skills: Facebook Ads, Instagram
--------------------------------------------------
Finding employees with Python skills:
Alice Johnson in Engineering department
处理 JSON 中的嵌套数组涉及以下组合:
array[0])data['departments'][0]['employees'])处理嵌套数组最常见的模式是使用嵌套 for 循环:
for outer_item in json_data['outer_array']:
for inner_item in outer_item['inner_array']:
## 处理 inner_item
print(inner_item['property'])
这种方法允许你遍历复杂的嵌套结构并提取你需要的特定数据。
当使用复杂的 JSON 结构时,尤其来自 API 等外部来源时,处理在缺少键时可能出现的潜在错误非常重要。让我们探索安全访问嵌套 JSON 数据的技术。
在 WebIDE 中,创建一个新文件并将其保存为 /home/labex/project/json_practice 目录下的 error_handling.py。
将以下代码添加到 error_handling.py 中:
import json
## 具有不一致结构的 JSON
api_response = {
"status": "success",
"data": {
"users": [
{
"id": 1,
"name": "John Doe",
"contact": {
"email": "john.doe@example.com",
"phone": "555-1234"
},
"roles": ["admin", "user"]
},
{
"id": 2,
"name": "Jane Smith",
## 缺少联系信息
"roles": ["user"]
},
{
"id": 3,
"name": "Bob Johnson",
"contact": {
## 只有电子邮件,没有电话
"email": "bob.johnson@example.com"
}
## 缺少角色
}
]
}
}
print("API Response Structure:")
print(json.dumps(api_response, indent=2))
print("\n" + "-" * 50 + "\n")
## 方法 1:使用 try-except 块
print("Method 1: Using try-except blocks")
print("-" * 30)
for user in api_response["data"]["users"]:
print(f"User: {user['name']}")
## 获取电子邮件
try:
email = user['contact']['email']
print(f" Email: {email}")
except (KeyError, TypeError):
print(" Email: Not available")
## 获取电话
try:
phone = user['contact']['phone']
print(f" Phone: {phone}")
except (KeyError, TypeError):
print(" Phone: Not available")
## 获取角色
try:
roles = ", ".join(user['roles'])
print(f" Roles: {roles}")
except (KeyError, TypeError):
print(" Roles: None assigned")
print()
## 方法 2:使用带有默认值的 get() 方法
print("\n" + "-" * 50 + "\n")
print("Method 2: Using get() method with defaults")
print("-" * 30)
for user in api_response["data"]["users"]:
print(f"User: {user['name']}")
## 使用嵌套 get() 调用获取联系信息
contact = user.get('contact', {})
email = contact.get('email', 'Not available')
phone = contact.get('phone', 'Not available')
print(f" Email: {email}")
print(f" Phone: {phone}")
## 使用默认空列表获取角色
roles = user.get('roles', [])
roles_str = ", ".join(roles) if roles else "None assigned"
print(f" Roles: {roles_str}")
print()
cd /home/labex/project/json_practice
python3 error_handling.py
你应该看到类似于以下的输出:
API Response Structure:
{
"status": "success",
"data": {
"users": [
{
"id": 1,
"name": "John Doe",
"contact": {
"email": "john.doe@example.com",
"phone": "555-1234"
},
"roles": [
"admin",
"user"
]
},
{
"id": 2,
"name": "Jane Smith",
"roles": [
"user"
]
},
{
"id": 3,
"name": "Bob Johnson",
"contact": {
"email": "bob.johnson@example.com"
}
}
]
}
}
--------------------------------------------------
Method 1: Using try-except blocks
------------------------------
User: John Doe
Email: john.doe@example.com
Phone: 555-1234
Roles: admin, user
User: Jane Smith
Email: Not available
Phone: Not available
Roles: user
User: Bob Johnson
Email: bob.johnson@example.com
Phone: Not available
Roles: None assigned
--------------------------------------------------
Method 2: Using get() method with defaults
------------------------------
User: John Doe
Email: john.doe@example.com
Phone: 555-1234
Roles: admin, user
User: Jane Smith
Email: Not available
Phone: Not available
Roles: user
User: Bob Johnson
Email: bob.johnson@example.com
Phone: Not available
Roles: None assigned
该示例演示了两种安全访问嵌套 JSON 数据的主要方法:
Try-Except 块
链式 get() 方法
当处理嵌套 JSON 结构时,get() 方法通常因其可读性和简洁性而受到青睐。它允许你在每个嵌套级别提供默认值。
## 安全的嵌套访问模式
value = data.get('level1', {}).get('level2', {}).get('level3', 'default_value')
使用这些错误处理技术将使你的代码在处理来自各种来源的 JSON 数据时更加健壮。
让我们通过创建一个从复杂 JSON 结构中提取特定信息的程序来实践你的知识。这可以代表一个真实的场景,你从 API 接收 JSON 数据并需要处理它。
在 WebIDE 中,创建一个新文件并将其保存为 /home/labex/project/json_practice 目录下的 json_extractor.py。
将以下代码添加到 json_extractor.py 中:
import json
## 一个复杂的嵌套 JSON 结构(例如,来自天气 API)
weather_data = {
"location": {
"name": "New York",
"region": "New York",
"country": "United States of America",
"lat": 40.71,
"lon": -74.01,
"timezone": "America/New_York"
},
"current": {
"temp_c": 22.0,
"temp_f": 71.6,
"condition": {
"text": "Partly cloudy",
"icon": "//cdn.weatherapi.com/weather/64x64/day/116.png",
"code": 1003
},
"wind_mph": 6.9,
"wind_kph": 11.2,
"wind_dir": "ENE",
"humidity": 65,
"cloud": 75,
"feelslike_c": 22.0,
"feelslike_f": 71.6
},
"forecast": {
"forecastday": [
{
"date": "2023-09-20",
"day": {
"maxtemp_c": 24.3,
"maxtemp_f": 75.7,
"mintemp_c": 18.6,
"mintemp_f": 65.5,
"condition": {
"text": "Patchy rain possible",
"icon": "//cdn.weatherapi.com/weather/64x64/day/176.png",
"code": 1063
},
"daily_chance_of_rain": 85
},
"astro": {
"sunrise": "06:41 AM",
"sunset": "07:01 PM",
"moonrise": "10:15 AM",
"moonset": "08:52 PM"
},
"hour": [
{
"time": "2023-09-20 00:00",
"temp_c": 20.1,
"condition": {
"text": "Clear",
"icon": "//cdn.weatherapi.com/weather/64x64/night/113.png",
"code": 1000
},
"chance_of_rain": 0
},
{
"time": "2023-09-20 12:00",
"temp_c": 23.9,
"condition": {
"text": "Overcast",
"icon": "//cdn.weatherapi.com/weather/64x64/day/122.png",
"code": 1009
},
"chance_of_rain": 20
}
]
},
{
"date": "2023-09-21",
"day": {
"maxtemp_c": 21.2,
"maxtemp_f": 70.2,
"mintemp_c": 16.7,
"mintemp_f": 62.1,
"condition": {
"text": "Heavy rain",
"icon": "//cdn.weatherapi.com/weather/64x64/day/308.png",
"code": 1195
},
"daily_chance_of_rain": 92
},
"astro": {
"sunrise": "06:42 AM",
"sunset": "06:59 PM",
"moonrise": "11:30 AM",
"moonset": "09:15 PM"
}
}
]
}
}
def extract_weather_summary(data):
"""
从提供的数据中提取并格式化天气摘要。
"""
try:
## 位置信息
location = data.get("location", {})
location_name = location.get("name", "Unknown")
country = location.get("country", "Unknown")
## 当前天气
current = data.get("current", {})
temp_c = current.get("temp_c", "N/A")
temp_f = current.get("temp_f", "N/A")
condition = current.get("condition", {}).get("text", "Unknown")
humidity = current.get("humidity", "N/A")
## 预报
forecast_days = data.get("forecast", {}).get("forecastday", [])
## 构建摘要字符串
summary = f"Weather Summary for {location_name}, {country}\n"
summary += f"==================================================\n\n"
summary += f"Current Conditions:\n"
summary += f" Temperature: {temp_c}°C ({temp_f}°F)\n"
summary += f" Condition: {condition}\n"
summary += f" Humidity: {humidity}%\n\n"
if forecast_days:
summary += "Forecast:\n"
for day_data in forecast_days:
date = day_data.get("date", "Unknown date")
day = day_data.get("day", {})
max_temp = day.get("maxtemp_c", "N/A")
min_temp = day.get("mintemp_c", "N/A")
condition = day.get("condition", {}).get("text", "Unknown")
rain_chance = day.get("daily_chance_of_rain", "N/A")
summary += f" {date}:\n"
summary += f" High: {max_temp}°C, Low: {min_temp}°C\n"
summary += f" Condition: {condition}\n"
summary += f" Chance of Rain: {rain_chance}%\n"
## 获取日出和日落时间(如果可用)
astro = day_data.get("astro", {})
if astro:
sunrise = astro.get("sunrise", "N/A")
sunset = astro.get("sunset", "N/A")
summary += f" Sunrise: {sunrise}, Sunset: {sunset}\n"
summary += "\n"
return summary
except Exception as e:
return f"Error extracting weather data: {str(e)}"
## 打印完整的 JSON 数据
print("Original Weather Data:")
print(json.dumps(weather_data, indent=2))
print("\n" + "-" * 60 + "\n")
## 提取并打印天气摘要
weather_summary = extract_weather_summary(weather_data)
print(weather_summary)
## 将摘要保存到文件
with open("weather_summary.txt", "w") as file:
file.write(weather_summary)
print("\nWeather summary has been saved to 'weather_summary.txt'")
cd /home/labex/project/json_practice
python3 json_extractor.py
cat weather_summary.txt
你将看到从复杂的 JSON 结构中提取的格式化天气摘要。
这个实践练习演示了使用嵌套 JSON 数据的几个重要概念:
安全的数据提取
get() 方法来处理缺失的键get() 调用访问嵌套数据数据转换
防御性编程
这种提取、转换和呈现 JSON 数据的模式在许多实际应用中都很常见,例如:
通过遵循这些模式,你可以可靠地处理任何复杂度的 JSON 数据。
在这个实验中,你已经学习了如何在 Python 中使用嵌套 JSON 结构:
基本的 JSON 处理 - 使用 json.dumps() 和 json.loads() 在 Python 对象和 JSON 字符串之间进行转换。
访问嵌套的字典键 - 使用链式方括号表示法访问 JSON 对象中深度嵌套的值。
处理嵌套数组 - 使用索引和迭代从 JSON 结构内的数组中导航和提取数据。
错误处理技术 - 使用 try-except 块和 get() 方法实现安全访问模式,以处理缺失的键。
实际的数据提取 - 构建一个完整的应用程序,该应用程序从复杂的 JSON 结构中提取、转换和呈现数据。
这些技能对于处理来自 API、配置文件和其他数据交换场景的数据至关重要。你现在已经具备了在你的 Python 应用程序中自信地处理任何复杂度的 JSON 数据的基础。
为了进一步学习,可以考虑探索:
pandas 这样的库来分析 JSON 数据jsonschema 实现 JSON 验证