Introduction
In this project, you will learn how to extract link information from Markdown documents using a Bash script. This is a common task in software development, where developers need to process and extract specific information from text-based documents.
👀 Preview
$ ./getlink.sh labex_lab1.md
course https://labex.io/courses/
🎯 Tasks
In this project, you will learn:
- How to create a Bash script to extract link text and URLs from a Markdown document
- How to use regular expressions and command-line tools like
grepandpasteto process text data - How to make a script executable and run it with command-line arguments
🏆 Achievements
After completing this project, you will be able to:
- Develop a Bash script that can extract link information from Markdown documents
- Understand the logic and implementation of the script, including the use of regular expressions and common command-line tools
- Apply the skills learned in this project to other text processing tasks in your software development work
Create the getlink.sh Script
In this step, you will create the getlink.sh script that can extract all the links from a Markdown document.
- Open a text editor and create a new file named
getlink.sh. - Add the following code to the file:
#!/bin/bash
## Extract link
grep -E "\[.*\]\(.+\)" "$1" | grep -vP '\!\[' | grep -oP '\[\K[^\]]+(?=\]\([^\)]+\))' > "links.txt"
grep -E "\[.*\]\(.+\)" "$1" | grep -vP '\!\[' | grep -oP '\]\(\K[^\)]+(?=\))' > "urls.txt"
## Merge links and URLs
paste -d ' ' links.txt urls.txt
## Clean up temporary files
rm links.txt urls.txt
- Save the file.
Test the getlink.sh Script
In this step, you will test the getlink.sh script by running it with a Markdown file as an argument.
- In the same directory as the
getlink.shscript there is a Markdown file namedlabex_lab1.md. This file contains the following:
Use the course categories and tags on the [course](https://labex.io/courses/) page to filter and search for courses
- Run the
getlink.shscript with thelabex_lab1.mdfile as an argument:
./getlink.sh labex_lab1.md
- The script should output the following:
course https://labex.io/courses/
This output shows that the script has successfully extracted the link information from the Markdown file.
Understand the getlink.sh Script
In this step, you will understand the code in the getlink.sh script.
The script performs the following tasks:
- Extract link text: The first
grepcommand extracts the link text from the Markdown file and saves it to a temporary file namedlinks.txt. Thegrep -E "\[.*\]\(.+\)"command matches the Markdown link format[text](url), and thegrep -vP '\!\['command excludes image links. - Extract link URLs: The second
grepcommand extracts the link URLs from the Markdown file and saves them to a temporary file namedurls.txt. Thegrep -oP '\]\(\K[^\)]+(?=\))'command captures the URL part of the Markdown link format. - Merge link text and URLs: The
paste -d ' ' links.txt urls.txtcommand merges the link text and URLs from the temporary files, separating them with a space. - Clean up temporary files: The
rm links.txt urls.txtcommand removes the temporary files created during the script's execution.
By understanding the script's logic, you can modify or extend it to suit your specific needs, such as handling different types of links or performing additional processing on the extracted information.
Summary
Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.



