Python requests 呼び出しからの応答内容を解析する方法

PythonPythonBeginner
今すぐ練習

💡 このチュートリアルは英語版からAIによって翻訳されています。原文を確認するには、 ここをクリックしてください

Introduction

The Python requests library is a powerful tool for interacting with web services and APIs. In this tutorial, you will learn how to send HTTP requests and parse response data using Python. By the end of this lab, you will be able to extract valuable information from different types of API responses, enabling you to build data-driven applications and automate web interactions.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("Python")) -.-> python/ErrorandExceptionHandlingGroup(["Error and Exception Handling"]) python(("Python")) -.-> python/FileHandlingGroup(["File Handling"]) python(("Python")) -.-> python/NetworkingGroup(["Networking"]) python(("Python")) -.-> python/DataStructuresGroup(["Data Structures"]) python(("Python")) -.-> python/ModulesandPackagesGroup(["Modules and Packages"]) python/DataStructuresGroup -.-> python/dictionaries("Dictionaries") python/ModulesandPackagesGroup -.-> python/using_packages("Using Packages") python/ErrorandExceptionHandlingGroup -.-> python/catching_exceptions("Catching Exceptions") python/FileHandlingGroup -.-> python/file_reading_writing("Reading and Writing Files") python/FileHandlingGroup -.-> python/file_operations("File Operations") python/NetworkingGroup -.-> python/http_requests("HTTP Requests") subgraph Lab Skills python/dictionaries -.-> lab-398048{{"Python requests 呼び出しからの応答内容を解析する方法"}} python/using_packages -.-> lab-398048{{"Python requests 呼び出しからの応答内容を解析する方法"}} python/catching_exceptions -.-> lab-398048{{"Python requests 呼び出しからの応答内容を解析する方法"}} python/file_reading_writing -.-> lab-398048{{"Python requests 呼び出しからの応答内容を解析する方法"}} python/file_operations -.-> lab-398048{{"Python requests 呼び出しからの応答内容を解析する方法"}} python/http_requests -.-> lab-398048{{"Python requests 呼び出しからの応答内容を解析する方法"}} end

Installing the Requests Library and Making a Basic Request

In this first step, we will install the Python requests library and make our first HTTP request to retrieve data from a public API.

Installing Requests

The requests library is a third-party package that needs to be installed using pip, Python's package installer. Let's start by installing it:

pip install requests

You should see output confirming that requests was successfully installed.

Making Your First HTTP Request

Now, let's create a Python file to make a simple HTTP request. In the WebIDE, create a new file called basic_request.py in the /home/labex/project directory.

Add the following code to the file:

import requests

## Make a GET request to a public API
response = requests.get("https://jsonplaceholder.typicode.com/todos/1")

## Print the status code
print(f"Status code: {response.status_code}")

## Print the raw response content
print("\nRaw response content:")
print(response.text)

## Print the response headers
print("\nResponse headers:")
for header, value in response.headers.items():
    print(f"{header}: {value}")

This code makes a GET request to a sample API endpoint and prints information about the response.

Understanding the Response Object

Let's run the code to see what information we get back. In the terminal, run:

python basic_request.py

You should see output similar to this:

Status code: 200

Raw response content:
{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

Response headers:
Date: Mon, 01 Jan 2023 12:00:00 GMT
Content-Type: application/json; charset=utf-8
...

The response object contains several important attributes:

  • status_code: HTTP status code (200 means success)
  • text: The response content as a string
  • headers: A dictionary of response headers

When working with web requests, these attributes help you understand the server's response and handle it appropriately.

HTTP Status Codes

HTTP status codes indicate whether a request succeeded or failed:

  • 2xx (like 200): Success
  • 3xx (like 301): Redirection
  • 4xx (like 404): Client errors
  • 5xx (like 500): Server errors

Let's modify our code to check for a successful response. Create a new file called check_status.py with this content:

import requests

try:
    ## Make a GET request to a valid URL
    response = requests.get("https://jsonplaceholder.typicode.com/todos/1")

    ## Check if the request was successful
    if response.status_code == 200:
        print("Request successful!")
    else:
        print(f"Request failed with status code: {response.status_code}")

    ## Try an invalid URL
    invalid_response = requests.get("https://jsonplaceholder.typicode.com/invalid")
    print(f"Invalid URL status code: {invalid_response.status_code}")

except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Run this code to see how different URLs return different status codes:

python check_status.py

You should see that the valid URL returns status code 200, while the invalid URL returns a 404 status code.

Parsing JSON Response Data

Many modern APIs return data in JSON (JavaScript Object Notation) format. In this step, you'll learn how to parse JSON responses and work with the data in Python.

Understanding JSON

JSON is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. It's based on key-value pairs, similar to Python dictionaries.

Here's an example of a JSON object:

{
  "name": "John Doe",
  "age": 30,
  "email": "[email protected]",
  "is_active": true,
  "hobbies": ["reading", "swimming", "cycling"]
}

Parsing JSON Responses

The requests library makes it easy to parse JSON responses using the .json() method. Let's create a new file called parse_json.py and add the following code:

import requests

## Make a request to a GitHub API endpoint that returns JSON data
response = requests.get("https://api.github.com/users/python")

## Check if the request was successful
if response.status_code == 200:
    ## Parse the JSON response
    data = response.json()

    ## Print the parsed data
    print("Parsed JSON data:")
    print(f"Username: {data['login']}")
    print(f"Name: {data.get('name', 'Not provided')}")
    print(f"Followers: {data['followers']}")
    print(f"Public repositories: {data['public_repos']}")

    ## Print the type to verify it's a Python dictionary
    print(f"\nType of parsed data: {type(data)}")

    ## Access nested data
    print("\nAccessing specific elements:")
    print(f"Avatar URL: {data['avatar_url']}")
else:
    print(f"Request failed with status code: {response.status_code}")

Run this script to see how the JSON data is parsed into a Python dictionary:

python parse_json.py

You should see output that displays information about the GitHub user, including their username, followers count, and repository count.

Working with Lists of Data

Many APIs return lists of objects. Let's see how to handle this kind of response. Create a file called json_list.py with this content:

import requests

## Make a request to an API that returns a list of posts
response = requests.get("https://jsonplaceholder.typicode.com/posts")

## Check if the request was successful
if response.status_code == 200:
    ## Parse the JSON response (this will be a list of posts)
    posts = response.json()

    ## Print the total number of posts
    print(f"Total posts: {len(posts)}")

    ## Print details of the first 3 posts
    print("\nFirst 3 posts:")
    for i, post in enumerate(posts[:3], 1):
        print(f"\nPost #{i}")
        print(f"User ID: {post['userId']}")
        print(f"Post ID: {post['id']}")
        print(f"Title: {post['title']}")
        print(f"Body: {post['body'][:50]}...")  ## Print just the beginning of the body
else:
    print(f"Request failed with status code: {response.status_code}")

Run this script to see how to process a list of JSON objects:

python json_list.py

You should see information about the first three posts, including their titles and the beginning of their content.

Error Handling with JSON Parsing

Sometimes, a response might not contain valid JSON data. Let's see how to handle this gracefully. Create a file called json_error.py with this code:

import requests
import json

def get_and_parse_json(url):
    try:
        ## Make the request
        response = requests.get(url)

        ## Check if the request was successful
        response.raise_for_status()

        ## Try to parse the JSON
        try:
            data = response.json()
            return data
        except json.JSONDecodeError:
            print(f"Response from {url} is not valid JSON")
            print(f"Raw response: {response.text[:100]}...")  ## Print part of the raw response
            return None

    except requests.exceptions.HTTPError as e:
        print(f"HTTP error: {e}")
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")

    return None

## Test with a valid JSON endpoint
json_data = get_and_parse_json("https://jsonplaceholder.typicode.com/posts/1")
if json_data:
    print("\nValid JSON response:")
    print(f"Title: {json_data['title']}")

## Test with a non-JSON endpoint
html_data = get_and_parse_json("https://www.example.com")
if html_data:
    print("\nThis should not print as example.com returns HTML, not JSON")
else:
    print("\nAs expected, could not parse HTML as JSON")

Run this script to see how to handle different types of responses:

python json_error.py

You should see that the code successfully handles both valid JSON responses and non-JSON responses.

Parsing HTML Content with BeautifulSoup

When working with web data, you'll often encounter HTML responses. For parsing HTML, Python's BeautifulSoup library is an excellent tool. In this step, we'll learn how to extract information from HTML responses.

Installing BeautifulSoup

First, let's install BeautifulSoup and its HTML parser:

pip install beautifulsoup4

Basic HTML Parsing

Let's create a file called parse_html.py to fetch and parse a webpage:

import requests
from bs4 import BeautifulSoup

## Make a request to a webpage
url = "https://www.example.com"
response = requests.get(url)

## Check if the request was successful
if response.status_code == 200:
    ## Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    ## Extract the page title
    title = soup.title.text
    print(f"Page title: {title}")

    ## Extract all paragraphs
    paragraphs = soup.find_all('p')
    print(f"\nNumber of paragraphs: {len(paragraphs)}")

    ## Print the text of the first paragraph
    if paragraphs:
        print(f"\nFirst paragraph text: {paragraphs[0].text.strip()}")

    ## Extract all links
    links = soup.find_all('a')
    print(f"\nNumber of links: {len(links)}")

    ## Print the href attribute of the first link
    if links:
        print(f"First link href: {links[0].get('href')}")

else:
    print(f"Request failed with status code: {response.status_code}")

Run this script to see how to extract basic information from an HTML page:

python parse_html.py

You should see output showing the page title, number of paragraphs, the text of the first paragraph, number of links, and the URL of the first link.

Finding Specific Elements

Now let's look at how to find specific elements using CSS selectors. Create a file called html_selectors.py:

import requests
from bs4 import BeautifulSoup

## Make a request to a webpage with more complex structure
url = "https://quotes.toscrape.com/"
response = requests.get(url)

## Check if the request was successful
if response.status_code == 200:
    ## Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    ## Find all quote elements
    quote_elements = soup.select('.quote')
    print(f"Number of quotes found: {len(quote_elements)}")

    ## Process the first 3 quotes
    print("\nFirst 3 quotes:")
    for i, quote_element in enumerate(quote_elements[:3], 1):
        ## Extract the quote text
        text = quote_element.select_one('.text').text

        ## Extract the author
        author = quote_element.select_one('.author').text

        ## Extract the tags
        tags = [tag.text for tag in quote_element.select('.tag')]

        ## Print the information
        print(f"\nQuote #{i}")
        print(f"Text: {text}")
        print(f"Author: {author}")
        print(f"Tags: {', '.join(tags)}")

else:
    print(f"Request failed with status code: {response.status_code}")

Run this script to see how to use CSS selectors to extract specific elements:

python html_selectors.py

You should see output showing information about the first three quotes, including the quote text, author, and tags.

Building a Simple Web Scraper

Let's put everything together to build a simple web scraper that extracts structured data from a webpage. Create a file called quotes_scraper.py:

import requests
from bs4 import BeautifulSoup
import json
import os

def scrape_quotes_page(url):
    ## Make a request to the webpage
    response = requests.get(url)

    ## Check if the request was successful
    if response.status_code != 200:
        print(f"Request failed with status code: {response.status_code}")
        return None

    ## Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')

    ## Extract all quotes
    quotes = []
    for quote_element in soup.select('.quote'):
        ## Extract the quote text
        text = quote_element.select_one('.text').text.strip('"')

        ## Extract the author
        author = quote_element.select_one('.author').text

        ## Extract the tags
        tags = [tag.text for tag in quote_element.select('.tag')]

        ## Add the quote to our list
        quotes.append({
            'text': text,
            'author': author,
            'tags': tags
        })

    ## Check if there's a next page
    next_page = soup.select_one('.next a')
    next_page_url = None
    if next_page:
        next_page_url = 'https://quotes.toscrape.com' + next_page['href']

    return {
        'quotes': quotes,
        'next_page': next_page_url
    }

## Scrape the first page
result = scrape_quotes_page('https://quotes.toscrape.com/')

if result:
    ## Print information about the quotes found
    quotes = result['quotes']
    print(f"Found {len(quotes)} quotes on the first page")

    ## Print the first 2 quotes
    print("\nFirst 2 quotes:")
    for i, quote in enumerate(quotes[:2], 1):
        print(f"\nQuote #{i}")
        print(f"Text: {quote['text']}")
        print(f"Author: {quote['author']}")
        print(f"Tags: {', '.join(quote['tags'])}")

    ## Save the quotes to a JSON file
    output_dir = '/home/labex/project'
    with open(os.path.join(output_dir, 'quotes.json'), 'w') as f:
        json.dump(quotes, f, indent=2)

    print(f"\nSaved {len(quotes)} quotes to {output_dir}/quotes.json")

    ## Print information about the next page
    if result['next_page']:
        print(f"\nNext page URL: {result['next_page']}")
    else:
        print("\nNo next page available")

Run this script to scrape quotes from a website:

python quotes_scraper.py

You should see output showing information about the quotes found on the first page, and the quotes will be saved to a JSON file called quotes.json.

Check the JSON file to see the structured data:

cat quotes.json

The file should contain a JSON array of quote objects, each with text, author, and tags properties.

Working with Binary Response Content

So far, we have focused on text-based responses like JSON and HTML. However, the requests library can also handle binary content such as images, PDFs, and other files. In this step, we'll learn how to download and process binary content.

Downloading an Image

Let's start by downloading an image. Create a file called download_image.py:

import requests
import os

## URL of an image to download
image_url = "https://httpbin.org/image/jpeg"

## Make a request to get the image
response = requests.get(image_url)

## Check if the request was successful
if response.status_code == 200:
    ## Get the content type
    content_type = response.headers.get('Content-Type', '')
    print(f"Content-Type: {content_type}")

    ## Check if the content is an image
    if 'image' in content_type:
        ## Create a directory to save the image if it doesn't exist
        output_dir = '/home/labex/project/downloads'
        os.makedirs(output_dir, exist_ok=True)

        ## Save the image to a file
        image_path = os.path.join(output_dir, 'sample_image.jpg')
        with open(image_path, 'wb') as f:
            f.write(response.content)

        ## Print information about the saved image
        print(f"Image saved to: {image_path}")
        print(f"Image size: {len(response.content)} bytes")
    else:
        print("The response does not contain an image")
else:
    print(f"Request failed with status code: {response.status_code}")

Run this script to download an image:

python download_image.py

You should see output confirming that the image was downloaded and saved to /home/labex/project/downloads/sample_image.jpg.

Downloading a File with Progress

When downloading large files, it can be useful to display a progress indicator. Let's create a script that shows download progress. Create a file called download_with_progress.py:

import requests
import os
import sys

def download_file(url, filename):
    ## Make a request to get the file
    ## Stream the response to handle large files efficiently
    response = requests.get(url, stream=True)

    ## Check if the request was successful
    if response.status_code != 200:
        print(f"Request failed with status code: {response.status_code}")
        return False

    ## Get the total file size if available
    total_size = int(response.headers.get('Content-Length', 0))
    if total_size:
        print(f"Total file size: {total_size/1024:.2f} KB")
    else:
        print("Content-Length header not found. Unable to determine file size.")

    ## Create a directory to save the file if it doesn't exist
    os.makedirs(os.path.dirname(filename), exist_ok=True)

    ## Download the file in chunks and show progress
    print(f"Downloading {url} to {filename}...")

    ## Initialize variables for progress tracking
    downloaded = 0
    chunk_size = 8192  ## 8 KB chunks

    ## Open the file for writing
    with open(filename, 'wb') as f:
        ## Iterate through the response chunks
        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:  ## Filter out keep-alive chunks
                f.write(chunk)
                downloaded += len(chunk)

                ## Calculate and display progress
                if total_size:
                    percent = downloaded * 100 / total_size
                    sys.stdout.write(f"\rProgress: {percent:.1f}% ({downloaded/1024:.1f} KB)")
                    sys.stdout.flush()
                else:
                    sys.stdout.write(f"\rDownloaded: {downloaded/1024:.1f} KB")
                    sys.stdout.flush()

    ## Print a newline to ensure the next output starts on a new line
    print()

    return True

## URL of a file to download
file_url = "https://speed.hetzner.de/100MB.bin"

## Path where the file will be saved
output_path = '/home/labex/project/downloads/test_file.bin'

## Download the file
success = download_file(file_url, output_path)

if success:
    ## Get file stats
    file_size = os.path.getsize(output_path)
    print(f"\nDownload complete!")
    print(f"File saved to: {output_path}")
    print(f"File size: {file_size/1024/1024:.2f} MB")
else:
    print("\nDownload failed.")

Run this script to download a file with progress tracking:

python download_with_progress.py

You'll see a progress bar updating as the file downloads. Note that this downloads a 100MB file, which might take some time depending on your connection speed.

To cancel the download, you can press Ctrl+C.

Working with Response Headers and Metadata

When downloading files, the response headers often contain useful metadata. Let's create a script that examines response headers in detail. Create a file called response_headers.py:

import requests

def check_url(url):
    print(f"\nChecking URL: {url}")

    try:
        ## Make a HEAD request first to get headers without downloading the full content
        head_response = requests.head(url)

        print(f"HEAD request status code: {head_response.status_code}")

        if head_response.status_code == 200:
            ## Print all headers
            print("\nResponse headers:")
            for header, value in head_response.headers.items():
                print(f"  {header}: {value}")

            ## Extract content type and size
            content_type = head_response.headers.get('Content-Type', 'Unknown')
            content_length = head_response.headers.get('Content-Length', 'Unknown')

            print(f"\nContent Type: {content_type}")

            if content_length != 'Unknown':
                size_kb = int(content_length) / 1024
                size_mb = size_kb / 1024

                if size_mb >= 1:
                    print(f"Content Size: {size_mb:.2f} MB")
                else:
                    print(f"Content Size: {size_kb:.2f} KB")
            else:
                print("Content Size: Unknown")

            ## Check if the server supports range requests
            accept_ranges = head_response.headers.get('Accept-Ranges', 'none')
            print(f"Supports range requests: {'Yes' if accept_ranges != 'none' else 'No'}")

        else:
            print(f"HEAD request failed with status code: {head_response.status_code}")

    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")

## Check a few different URLs
check_url("https://httpbin.org/image/jpeg")
check_url("https://speed.hetzner.de/100MB.bin")
check_url("https://example.com")

Run this script to see detailed information about response headers:

python response_headers.py

You'll see output showing the headers for different types of content, including images, binary files, and HTML pages.

Understanding response headers is crucial for many web development tasks, such as:

  • Determining file types and sizes before downloading
  • Implementing resumable downloads with range requests
  • Checking caching policies and expiration dates
  • Handling redirects and authentication

Summary

In this lab, you have learned how to work with the Python requests library to interact with web services and APIs. You now have the skills to:

  1. Make HTTP requests and handle response status codes and errors
  2. Parse JSON data from API responses
  3. Extract information from HTML content using BeautifulSoup
  4. Download and process binary content like images and files
  5. Work with response headers and metadata

These skills form the foundation for many Python applications, including web scraping, API integration, data collection, and automation. You can now build applications that interact with web services, extract useful information from websites, and process various types of web content.

To continue learning, you might want to explore:

  • Authentication methods for accessing protected APIs
  • Working with more complex APIs that require specific headers or request formats
  • Building a complete web scraping project that collects and analyzes data
  • Creating a Python application that integrates with multiple APIs

Remember that when scraping websites or using APIs, it's important to check the terms of service and respect rate limits to avoid being blocked.