Simple Text Processing

ShellShellBeginner
Practice Now

Introduction

In this section, we will introduce tr, col, join, and paste and will still review the pipeline to familiarize ourselves with these commands.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("`Linux`")) -.-> linux/BasicFileOperationsGroup(["`Basic File Operations`"]) linux(("`Linux`")) -.-> linux/BasicSystemCommandsGroup(["`Basic System Commands`"]) linux(("`Linux`")) -.-> linux/InputandOutputRedirectionGroup(["`Input and Output Redirection`"]) linux(("`Linux`")) -.-> linux/PackagesandSoftwaresGroup(["`Packages and Softwares`"]) linux(("`Linux`")) -.-> linux/TextProcessingGroup(["`Text Processing`"]) linux(("`Linux`")) -.-> linux/UserandGroupManagementGroup(["`User and Group Management`"]) shell(("`Shell`")) -.-> shell/BasicSyntaxandStructureGroup(["`Basic Syntax and Structure`"]) shell(("`Shell`")) -.-> shell/ControlFlowGroup(["`Control Flow`"]) shell(("`Shell`")) -.-> shell/SystemInteractionandConfigurationGroup(["`System Interaction and Configuration`"]) linux/BasicFileOperationsGroup -.-> linux/cat("`File Concatenating`") linux/BasicSystemCommandsGroup -.-> linux/echo("`Text Display`") linux/InputandOutputRedirectionGroup -.-> linux/pipeline("`Data Piping`") linux/PackagesandSoftwaresGroup -.-> linux/apt("`Package Handling`") linux/TextProcessingGroup -.-> linux/tr("`Character Translating`") linux/TextProcessingGroup -.-> linux/col("`Line Feed Filtering`") linux/TextProcessingGroup -.-> linux/paste("`Line Merging`") linux/TextProcessingGroup -.-> linux/join("`File Joining`") linux/UserandGroupManagementGroup -.-> linux/passwd("`Password Changing`") linux/UserandGroupManagementGroup -.-> linux/sudo("`Privilege Granting`") shell/BasicSyntaxandStructureGroup -.-> shell/comments("`Comments`") shell/BasicSyntaxandStructureGroup -.-> shell/quoting("`Quoting Mechanisms`") shell/ControlFlowGroup -.-> shell/cond_expr("`Conditional Expressions`") shell/SystemInteractionandConfigurationGroup -.-> shell/globbing_expansion("`Globbing and Pathname Expansion`") subgraph Lab Skills linux/cat -.-> lab-18004{{"`Simple Text Processing`"}} linux/echo -.-> lab-18004{{"`Simple Text Processing`"}} linux/pipeline -.-> lab-18004{{"`Simple Text Processing`"}} linux/apt -.-> lab-18004{{"`Simple Text Processing`"}} linux/tr -.-> lab-18004{{"`Simple Text Processing`"}} linux/col -.-> lab-18004{{"`Simple Text Processing`"}} linux/paste -.-> lab-18004{{"`Simple Text Processing`"}} linux/join -.-> lab-18004{{"`Simple Text Processing`"}} linux/passwd -.-> lab-18004{{"`Simple Text Processing`"}} linux/sudo -.-> lab-18004{{"`Simple Text Processing`"}} shell/comments -.-> lab-18004{{"`Simple Text Processing`"}} shell/quoting -.-> lab-18004{{"`Simple Text Processing`"}} shell/cond_expr -.-> lab-18004{{"`Simple Text Processing`"}} shell/globbing_expansion -.-> lab-18004{{"`Simple Text Processing`"}} end

tr

The command tr can be used to delete word(s) in a piece of text or to convert it.

Format

tr [option]...SET1 [SET2]

The Use of tr

Option Descripion
-d Delete characters in string1 from the input.
-s Remove the characters specified by set1 that are consecutive and repeated in the input text.

Examples

Delete all of the 'o', 'l' and 'h' characters in "hello labex":

echo 'hello labex' | tr -d 'olh'

Delete duplicated char 'l' in 'hello':

echo 'hello' | tr -s 'l'

Show the input text in uppercase or lowercase. For example, the following commands show in uppercase:

cat /etc/passwd | tr '[:lower:]' '[:upper:]'
## or cat /etc/passwd | tr '[a-z]' '[A-Z]'

Output

labex:project/ $ echo 'hello labex' | tr -d 'olh'
e abex
labex:project/ $ echo 'hello' | tr -s 'l'
helo
labex:project/ $ cat /etc/passwd | tr '[:lower:]' '[:upper:]'
...

For more uses of tr, use tr --help or man tr to fetch further information.

col

col can replace the Tab character with several space characters or reverse the operation.

Format

col [option]

Use Cases for col

Option Description
-x Convert Tab to multiple spaces.
-h Convert Spaces to Tab (Default).

Example

To view invisible characters in /etc/protocols: We can see a lot of ^I in the file. This is Tab transformed into visible characters.

cat -A /etc/protocols

Use col -x to convert tabs in /etc/protocols to spaces, then use cat to see the file content, and we will find that ^I is gone.

cat /etc/protocols | col -x | cat -A

Output

labex:project/ $ cat /etc/protocols | col -x | cat -A
## Internet (IP) protocols$
#$
## Updated from http://www.iana.org/assignments/protocol-numbers and other$
## sources.$
## New protocols will be added on request if they have been officially$
## assigned by IANA and are not historical.$
...

join

The join utility performs an "equality join'' on the specified files and writes the result to the standard output. The "join field'' is the field in each file by which the files are compared.

Format

join [option]... file1 file2

Parameters (Options) of join

Option Description
-t Specify the delimiter. The default is a space.
-i Ignore case differences.
-1 field Indicate which field to use for the first file to compare. Default is the first field.
-2 field Indicate which field to use for the second file to compare. Default is the first field.

Example

## Create two files
echo '1 hello' > file1
echo '1 labex' > file2
join file1 file2

Output

labex:project/ $ echo '1 hello' > file1
echo '1 labex' > file2
join file1 file2
1 hello labex

paste

paste is similar to join. It simply merges multiple files without comparison.

Format

paste [option] file...

Parameters (Options) of paste

Option Description
-d Specify the merged delimiter, which defaults to the tab.
-s Concatenate all of the input files' lines in command-line order.

Example

echo hello > file1
echo labex > file2
echo labex.io > file3

paste -d ':' file1 file2 file3
echo
paste -s file1 file2 file3

Output

hello:labex:labex.io

hello
labex
labex.io

Time for Fun

Please use the following command to install Space Invaders:

sudo apt-get install ninvaders -y
ninvaders
5

Summary

In the "File Packing and Compression" section, we mentioned that some special characters in text files on Windows/DOS and Linux/UNIX are not the same, such as the line break: It is CR+LF (\r\n) on Windows and LF (\n) on Linux/UNIX. We can use the cat -A text to see the invisible special characters contained in the text.

Other Shell Tutorials you may like