Download Files from the Internet

LinuxLinuxBeginner
Practice Now

Introduction

In this lab, you will learn how to download files from the internet using two common command-line tools: curl and wget. These tools are essential for retrieving files and data from web servers, making them valuable skills for any Linux user or developer.

curl is a versatile tool that can download files from various protocols and perform HTTP requests. wget is a simpler tool primarily used for downloading files from web servers. By the end of this lab, you'll be comfortable using both tools to download files and make basic HTTP requests.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("`Linux`")) -.-> linux/BasicFileOperationsGroup(["`Basic File Operations`"]) linux(("`Linux`")) -.-> linux/PackagesandSoftwaresGroup(["`Packages and Softwares`"]) linux(("`Linux`")) -.-> linux/FileandDirectoryManagementGroup(["`File and Directory Management`"]) linux/BasicFileOperationsGroup -.-> linux/cat("`File Concatenating`") linux/BasicFileOperationsGroup -.-> linux/less("`File Paging`") linux/PackagesandSoftwaresGroup -.-> linux/curl("`URL Data Transferring`") linux/PackagesandSoftwaresGroup -.-> linux/wget("`Non-interactive Downloading`") linux/FileandDirectoryManagementGroup -.-> linux/cd("`Directory Changing`") linux/FileandDirectoryManagementGroup -.-> linux/mkdir("`Directory Creating`") linux/BasicFileOperationsGroup -.-> linux/ls("`Content Listing`") linux/BasicFileOperationsGroup -.-> linux/mv("`File Moving/Renaming`") subgraph Lab Skills linux/cat -.-> lab-37{{"`Download Files from the Internet`"}} linux/less -.-> lab-37{{"`Download Files from the Internet`"}} linux/curl -.-> lab-37{{"`Download Files from the Internet`"}} linux/wget -.-> lab-37{{"`Download Files from the Internet`"}} linux/cd -.-> lab-37{{"`Download Files from the Internet`"}} linux/mkdir -.-> lab-37{{"`Download Files from the Internet`"}} linux/ls -.-> lab-37{{"`Download Files from the Internet`"}} linux/mv -.-> lab-37{{"`Download Files from the Internet`"}} end

Downloading a File with curl

Let's start by using curl to download a simple HTML file from a website.

  1. Open your terminal.

  2. Navigate to the project directory. In the terminal, type:

cd /home/labex/project

This command changes your current directory to /home/labex/project. The cd command stands for "change directory".

  1. Now, let's use curl to download a web page. Type the following command:
curl http://example.com -o example.html

Let's break down this command:

  • curl is the name of the program we're using
  • http://example.com is the URL of the web page we're downloading
  • -o example.html tells curl to save the downloaded content to a file named example.html. The -o option stands for "output".
  1. After running the command, curl will download the content and save it as example.html in your current directory. To verify that the file was downloaded, we can list the contents of the directory:
ls -l example.html

The ls command lists files and directories. The -l option gives us a detailed (long) listing. You should see example.html in the output, along with information about its size and when it was last modified.

Examining the Downloaded File

Now that we've downloaded the file, let's take a look at its contents.

  1. To display the contents of the file, we'll use the cat command. Type:
cat example.html

cat stands for "concatenate", but it's commonly used to display the contents of files. You should see the HTML content of the example.com homepage. This might look like a jumble of text if you're not familiar with HTML, but don't worry - it's the raw code that web browsers use to display web pages.

  1. Sometimes, files can be very large and we might only want to see the beginning. For this, we can use the head command:
head -n 10 example.html

This command shows you the first 10 lines of the file. The -n 10 option tells head to show 10 lines. You can change this number to see more or fewer lines.

  1. To see the end of the file, you can use the tail command:
tail -n 10 example.html

This shows the last 10 lines of the file.

These commands are useful for quickly inspecting files without opening them entirely, especially when dealing with large files.

Downloading Multiple Files with curl

curl can download multiple files in a single command. Let's try downloading two files at once.

  1. First, let's try to download both the index and about pages from example.com and display their content:
curl http://example.com/index.html http://example.com/about.html

This command will output the content of both pages to your terminal. You'll see two HTML documents printed one after the other. This can be useful for quick checks, but it's not ideal if you want to save the files.

  1. To save these files instead of displaying them, we'll use the -O option. The capital -O tells curl to use the filename from the URL:
curl -O http://example.com/index.html -O http://example.com/about.html

This command downloads both files and saves them with their original names (index.html and about.html) in your current directory. You won't see the content printed to the terminal this time.

  1. To verify that the files were downloaded, we can list the contents of the directory:
ls -l index.html about.html

You should see both files listed, along with their sizes and last modified times.

  1. If you want to download multiple files but give them custom names, you can use multiple -o options:
curl -o custom_index.html http://example.com/index.html -o custom_about.html http://example.com/about.html

This will save the files as custom_index.html and custom_about.html.

Using wget to Download Files

Now let's explore wget, another popular tool for downloading files. wget is often preferred for its simplicity and its ability to handle large downloads or unstable connections.

  1. Let's start by using wget to download a file from example.com:
wget http://example.com/index.html

wget will display a progress bar as it downloads the file. This is particularly useful for larger files as you can see how much of the file has been downloaded and how long it might take to complete.

  1. By default, wget saves the file with its original name. To specify a different name, use the -O option (note that it's a capital O, unlike curl which uses a lowercase o):
wget -O custom_name.html http://example.com/index.html

This will save the file as custom_name.html. The progress bar will still show, but the file will be saved with your specified name.

Downloading Files to a Specific Directory

Often, you'll want to download files to a specific directory rather than your current working directory. Both curl and wget allow you to do this, but they use different methods.

  1. First, let's create a new directory to download our files into:
mkdir downloads

This creates a new directory named downloads in your current location.

  1. Now, let's use curl to download a file to this directory:
curl -o downloads/curl_file.html http://example.com

The -o option in curl allows us to specify the output file, including its path. This command downloads the content from example.com and saves it as curl_file.html in the downloads directory.

  1. Next, let's use wget to download a file to the same directory:
wget -P downloads http://example.com/index.html

The -P option in wget stands for "prefix" and allows us to specify the directory where we want to save the file. This command downloads index.html from example.com and saves it in the downloads directory.

  1. We can verify that both files were downloaded to the downloads directory:
ls -l downloads

You should see both curl_file.html and index.html in the output.

  1. To see the contents of these files without changing our current directory, we can use cat with the full path:
cat downloads/curl_file.html
cat downloads/index.html

This allows us to verify the content of the downloaded files.

Summary

Congratulations! You've successfully completed the lab on downloading files from the internet using curl and wget. Let's recap what you've learned:

  1. You used curl to download individual files and save them with custom names.
  2. You explored how to download multiple files with a single curl command.
  3. You learned how to use wget to download files, both with default and custom names.
  4. You practiced downloading files recursively with wget.
  5. You learned how to download files to a specific directory using both curl and wget.

These skills are fundamental for many tasks in Linux, from retrieving web content to downloading software packages. As you continue your journey in Linux, you'll find these tools invaluable for various scripting and automation tasks.

Remember, both curl and wget have many more options and capabilities that you can explore. Feel free to check their man pages (man curl and man wget) to learn more about their advanced features.

Keep practicing and exploring these tools to become more proficient in working with files and web content in Linux!

Other Linux Tutorials you may like