Bash Regex Matching

ShellShellBeginner
Practice Now

Introduction

This tutorial introduces you to regular expressions (regex) in Bash. Regex is a powerful tool for finding patterns within text. By learning regex, you'll greatly improve your shell scripting skills, allowing you to process text, extract data, and automate tasks more effectively. This tutorial is designed for beginners, so no prior regex experience is needed. We'll start with the basics and gradually build up your knowledge.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL shell(("Shell")) -.-> shell/AdvancedScriptingConceptsGroup(["Advanced Scripting Concepts"]) shell(("Shell")) -.-> shell/SystemInteractionandConfigurationGroup(["System Interaction and Configuration"]) shell(("Shell")) -.-> shell/VariableHandlingGroup(["Variable Handling"]) shell(("Shell")) -.-> shell/ControlFlowGroup(["Control Flow"]) shell/VariableHandlingGroup -.-> shell/variables_decl("Variable Declaration") shell/VariableHandlingGroup -.-> shell/variables_usage("Variable Usage") shell/VariableHandlingGroup -.-> shell/str_manipulation("String Manipulation") shell/ControlFlowGroup -.-> shell/if_else("If-Else Statements") shell/ControlFlowGroup -.-> shell/cond_expr("Conditional Expressions") shell/AdvancedScriptingConceptsGroup -.-> shell/cmd_substitution("Command Substitution") shell/SystemInteractionandConfigurationGroup -.-> shell/globbing_expansion("Globbing and Pathname Expansion") subgraph Lab Skills shell/variables_decl -.-> lab-391551{{"Bash Regex Matching"}} shell/variables_usage -.-> lab-391551{{"Bash Regex Matching"}} shell/str_manipulation -.-> lab-391551{{"Bash Regex Matching"}} shell/if_else -.-> lab-391551{{"Bash Regex Matching"}} shell/cond_expr -.-> lab-391551{{"Bash Regex Matching"}} shell/cmd_substitution -.-> lab-391551{{"Bash Regex Matching"}} shell/globbing_expansion -.-> lab-391551{{"Bash Regex Matching"}} end

Understanding Basic Regex and Matching

Let's start with the fundamental concepts of regular expressions. A regular expression is a sequence of characters that defines a search pattern. Think of it as a very powerful way to search for text.

Here are the basic building blocks:

  • Literal Characters: Most characters simply match themselves. For example, the regex abc will match the string "abc" exactly.
  • Metacharacters: These are special characters that have a specific meaning in regex. Let's look at a few key ones:
    • . (dot): Matches any single character (except a newline). So, a.c would match "abc", "axc", "a1c", and so on.
    • * (asterisk): Matches the preceding character zero or more times. ab*c would match "ac", "abc", "abbc", "abbbc", etc.
    • ^ (caret): Matches the beginning of a line. ^hello would match a line that starts with "hello".
    • $ (dollar sign): Matches the end of a line. world$ would match a line that ends with "world".
    • [] (square brackets): Defines a character class. It matches any one of the characters inside the brackets. [abc] would match "a", "b", or "c". [0-9] matches any single digit.

Now, let's create a Bash script to test our understanding. Create a file named regex_test.sh using the touch command:

cd ~/project
touch regex_test.sh

Next, open regex_test.sh with a text editor (like nano or vim) and add the following code:

#!/bin/bash

string="Hello World"
if [[ "$string" =~ ^Hello ]]; then
  echo "The string starts with Hello"
else
  echo "The string does not start with Hello"
fi

Save the file and make it executable:

chmod +x regex_test.sh

Finally, run the script:

./regex_test.sh
Regex and Matching

The output should indicate that the string starts with "Hello".

Working with Character Sets in a Script

Character sets, defined using square brackets [], allow you to match one character from a specific group. This is very useful for creating more flexible patterns.

  • Character Ranges: Inside [], you can use a hyphen (-) to specify a range. [a-z] matches any lowercase letter, [A-Z] matches any uppercase letter, and [0-9] matches any digit. You can combine them: [a-zA-Z0-9] matches any alphanumeric character.
  • Negation: If you put a ^ as the first character inside [], it negates the class. [^0-9] matches any character that is not a digit.

Let's modify our regex_test.sh script to use character sets. Open regex_test.sh with a text editor and replace its contents with the following:

#!/bin/bash

string="cat"
if [[ "$string" =~ c[a-z]t ]]; then
  echo "Match found!"
else
  echo "No match."
fi

Save the file and run it:

./regex_test.sh

The output should indicate a "Match found!". This is because c[a-z]t matches any three-letter string starting with 'c' and ending with 't', where the middle character is a lowercase letter.

Using Quantifiers to Repeat Patterns in a Script

Quantifiers control how many times a character or group should be repeated. This adds significant power to your regex patterns.

  • + (plus): Matches the preceding character one or more times. ab+c matches "abc", "abbc", "abbbc", etc., but not "ac".
  • ? (question mark): Matches the preceding character zero or one time (i.e., it makes the preceding character optional). ab?c matches "ac" and "abc", but not "abbc".
  • * (asterisk): Matches the preceding character zero or more times. We saw this earlier.
  • {n}: Matches the preceding character exactly n times. a{3} matches "aaa".
  • {n,}: Matches the preceding character n or more times. a{2,} matches "aa", "aaa", "aaaa", etc.
  • {n,m}: Matches the preceding character between n and m times (inclusive). a{1,3} matches "a", "aa", or "aaa".

Let's modify our regex_test.sh script to use quantifiers. Open regex_test.sh with a text editor and replace its contents with the following:

#!/bin/bash

string="abbbc"
if [[ "$string" =~ ab+c ]]; then
  echo "Match found!"
else
  echo "No match."
fi

Save the file and run it:

./regex_test.sh

The output should indicate a "Match found!". This is because ab+c matches a string starting with 'a', followed by one or more 'b's, and ending with 'c'.

Extracting Data with Capturing Groups in a Script

Parentheses () are used for grouping parts of a regex. This is useful for applying quantifiers to multiple characters and for capturing matched text.

When you use parentheses, Bash stores the text matched by that part of the regex in a special array called BASH_REMATCH. BASH_REMATCH[0] contains the entire matched string, BASH_REMATCH[1] contains the text matched by the first group, BASH_REMATCH[2] the second, and so on.

Let's modify our regex_test.sh script to extract data using capturing groups. Open regex_test.sh with a text editor and replace its contents with the following:

#!/bin/bash

string="apple123"
if [[ "$string" =~ ^([a-z]+)([0-9]+)$ ]]; then
  fruit="${BASH_REMATCH[1]}"
  number="${BASH_REMATCH[2]}"
  echo "Fruit: $fruit"
else
  echo "No match."
fi

Save the file and run it:

./regex_test.sh

The output should include "Fruit: apple". This script extracts the fruit name from the string using capturing groups.

Replacing Text with sed in a Script

Let's create a new script called sed_test.sh to practice using sed.

cd ~/project
touch sed_test.sh
chmod +x sed_test.sh

Open sed_test.sh with a text editor and add the following:

#!/bin/bash

string="apple123"
echo "$string" | sed 's/[0-9]/X/g'

Save the file and run it:

./sed_test.sh

The output should be: appleXXX. This script uses sed to replace all digits in the string with the letter "X".

Summary

This tutorial introduced you to regular expressions (regex) in Bash. You learned about basic regex concepts, character classes, quantifiers, grouping, capturing, and how to use regex with sed. By writing and executing Bash scripts, you've gained hands-on experience with these powerful tools. Remember to practice and experiment with different regex patterns to solidify your understanding.