Extracting Information From Text

LinuxLinuxBeginner
Practice Now

Introduction

In this project, you will learn how to extract image URLs from Markdown files using a Bash script. This is a common task when working with technical documentation, as it allows you to quickly identify and retrieve the images used in a document.

👀 Preview

$ ./getimage.sh labex_lab1.md
https://doc.shiyanlou.com/document-uid13labid292timestamp14677222211211.png
https://doc.shiyanlou.com/document-uid13labid292timestamp14672311234511.png
https://doc.shiyanlou.com/document-uid13labid292timestamp14677029556772.png

🎯 Tasks

In this project, you will learn:

  • How to create a Bash script to extract image URLs from a Markdown file
  • How to make the script executable and run it from the command line
  • How to customize the script to save the extracted URLs to a file

🏆 Achievements

After completing this project, you will be able to:

  • Automate the process of extracting image URLs from Markdown files
  • Incorporate this script into your workflow when working with technical documentation
  • Customize the script to suit your specific needs and requirements

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("`Linux`")) -.-> linux/BasicFileOperationsGroup(["`Basic File Operations`"]) linux(("`Linux`")) -.-> linux/BasicSystemCommandsGroup(["`Basic System Commands`"]) linux(("`Linux`")) -.-> linux/InputandOutputRedirectionGroup(["`Input and Output Redirection`"]) linux(("`Linux`")) -.-> linux/TextProcessingGroup(["`Text Processing`"]) shell(("`Shell`")) -.-> shell/BasicSyntaxandStructureGroup(["`Basic Syntax and Structure`"]) shell(("`Shell`")) -.-> shell/VariableHandlingGroup(["`Variable Handling`"]) shell(("`Shell`")) -.-> shell/ControlFlowGroup(["`Control Flow`"]) shell(("`Shell`")) -.-> shell/AdvancedScriptingConceptsGroup(["`Advanced Scripting Concepts`"]) shell(("`Shell`")) -.-> shell/SystemInteractionandConfigurationGroup(["`System Interaction and Configuration`"]) linux(("`Linux`")) -.-> linux/FileandDirectoryManagementGroup(["`File and Directory Management`"]) linux/BasicFileOperationsGroup -.-> linux/cut("`Text Cutting`") linux/BasicSystemCommandsGroup -.-> linux/echo("`Text Display`") linux/InputandOutputRedirectionGroup -.-> linux/pipeline("`Data Piping`") linux/TextProcessingGroup -.-> linux/grep("`Pattern Searching`") linux/TextProcessingGroup -.-> linux/sed("`Stream Editing`") linux/TextProcessingGroup -.-> linux/awk("`Text Processing`") shell/BasicSyntaxandStructureGroup -.-> shell/shebang("`Shebang`") shell/BasicSyntaxandStructureGroup -.-> shell/comments("`Comments`") shell/BasicSyntaxandStructureGroup -.-> shell/quoting("`Quoting Mechanisms`") shell/VariableHandlingGroup -.-> shell/variables_decl("`Variable Declaration`") shell/VariableHandlingGroup -.-> shell/variables_usage("`Variable Usage`") shell/ControlFlowGroup -.-> shell/cond_expr("`Conditional Expressions`") shell/AdvancedScriptingConceptsGroup -.-> shell/cmd_substitution("`Command Substitution`") shell/AdvancedScriptingConceptsGroup -.-> shell/subshells("`Subshells and Command Groups`") shell/SystemInteractionandConfigurationGroup -.-> shell/globbing_expansion("`Globbing and Pathname Expansion`") linux/FileandDirectoryManagementGroup -.-> linux/wildcard("`Wildcard Character`") subgraph Lab Skills linux/cut -.-> lab-301469{{"`Extracting Information From Text`"}} linux/echo -.-> lab-301469{{"`Extracting Information From Text`"}} linux/pipeline -.-> lab-301469{{"`Extracting Information From Text`"}} linux/grep -.-> lab-301469{{"`Extracting Information From Text`"}} linux/sed -.-> lab-301469{{"`Extracting Information From Text`"}} linux/awk -.-> lab-301469{{"`Extracting Information From Text`"}} shell/shebang -.-> lab-301469{{"`Extracting Information From Text`"}} shell/comments -.-> lab-301469{{"`Extracting Information From Text`"}} shell/quoting -.-> lab-301469{{"`Extracting Information From Text`"}} shell/variables_decl -.-> lab-301469{{"`Extracting Information From Text`"}} shell/variables_usage -.-> lab-301469{{"`Extracting Information From Text`"}} shell/cond_expr -.-> lab-301469{{"`Extracting Information From Text`"}} shell/cmd_substitution -.-> lab-301469{{"`Extracting Information From Text`"}} shell/subshells -.-> lab-301469{{"`Extracting Information From Text`"}} shell/globbing_expansion -.-> lab-301469{{"`Extracting Information From Text`"}} linux/wildcard -.-> lab-301469{{"`Extracting Information From Text`"}} end

Extract Image URLs From Markdown File

In this step, you will learn how to extract all image URLs from a Markdown file using a Bash script.

  1. Open a text editor and create a new file named getimage.sh.
  2. Add the following code to the file:
#!/bin/bash

## Extract image URL
image_urls=$(grep -o "\!\[.*]\(.*\)" "$1" | sed -E "s/(\!\[.*]\()(.+)(.*\))/\2/g")

## Print image URL
echo "$image_urls"

This script uses the grep command to find all lines in the Markdown file that contain image links, and then uses the sed command to extract the URL from each line.

Run the Script

Now that you have created the getimage.sh script, you can run it to extract the image URLs from a Markdown file.

  1. Open a terminal and navigate to the directory where you saved the getimage.sh script.
  2. Run the script with the path to the Markdown file as an argument:
./getimage.sh /home/labex/project/labex_lab1.md

This will output all the image URLs found in the labex_lab1.md file, one per line.

For example, the output might look like this:

https://doc.shiyanlou.com/document-uid13labid292timestamp14677222211211.png
https://doc.shiyanlou.com/document-uid13labid292timestamp14672311234511.png
https://doc.shiyanlou.com/document-uid13labid292timestamp14677029556772.png

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Linux Tutorials you may like