Tanushree Aggarwal

Posted on Sep 22, 2023

Linux: 3 ways to search patterns in files

#linux #beginners #bitesizelearning #tutorial

Introduction:

This is the first post in my bite-sized learning series. In this series we will be taking simple every day use cases and walking through multiple ways to achieve them. These posts are perfect for anyone who does not have a lot of experience working on Linux but wants to up their game, without spending hours at a stretch learning.

Task:

Often during troubleshooting issues, we are required to search the presence of certain keywords in regular files. These can be words like: error, warning, timeout, success etc.
In today's post we will be discussing various ways to search simple patterns in regular files on Linux.
Note: this is the first post of the series and targeted at basic pattern search. Searching regular expressions is a more advanced topic and will be covered in a later post.

Sample file we will be running our commands on:

1. grep

grep searches input files for matches to the patterns. When it finds a match in a line, it copies the line to standard output. grep searches the pattern in the file, line-by-line.
Since it reads one line at a time, grep may come across as a bit slow when working with extremely large files. (say file size in GBs)

1.1 Matching the exact pattern

Syntax: `grep "<pattern>" <filename>`

Example: grep "test" testfile

Compare the output to the actual input file testfile.
You will notice that grep only displays those lines from the file which have the pattern test. It has not included the lines with the words Test or TEST.
This is because, by default all grep pattern searches are case-sensitive, i.e. it matches the word + defined case while comparing.
Let's try this again:

grep "Test" testfile

grep "TEST" testfile

So far we have included the pattern in double quotation marks(" ").
grep will still work if we do not encapsulate the pattern in quotes.
Example:

However, this will be an issue if the pattern contains blank spaces or special characters

Same pattern works when placed in quotation marks:

It is good practice to always place the search pattern in quotation marks.

1.2 Matching pattern while ignoring it case

We now understand how to match the exact pattern. How about situations where we are unsure about the case in which the pattern exists, or we want all occurrences of the word, irrespective of their case.

This can be achieved by using the grep -i flag.
Here -i indicates that we wish to ignore the case of the pattern

Syntax: `grep -i "<pattern>" <filename>`

Example: grep -i "test" testfile

2. AWK

AWK is a powerful data extraction and processing language. Yes! you read it correctly! AWK is a programming language! It enables programmers to write powerful functionality while write just a few lines of code. It reads the entire file (or multiple files) at a time and then performs the requested action. This makes it super quick and the preferred choice when working with large file(s).

2.1 Matching the exact pattern

Syntax: `aws '/pattern/{ print $0 }' <filename>`

Example: awk '/test/{ print $0 }' testfile

where:

{ print $0 } : is used to print all contents of the file.
since we are incorporating a search pattern "test" in the command, the command outputs all lines from the file testfile, which contains the pattern test.
this by default matches the case

2.2 Matching pattern while ignoring it case

Syntax: `awk 'BEGIN{IGNORECASE=1} /pattern/{print $0}' <filename>`

awk 'BEGIN{IGNORECASE=1} /test/{print $0}' testfile

BEGIN is a special pattern used to label actions that awk performs before reading any input records. It executes only once i.e. before all other rules. It is commonly used to initialize and setup tasks in an awk program.

3. SED

Short for Stream Editor is used for performing basic test transformations on an input. It scans the entire input (a file named testfile in our case) making it a very efficient way to transform data in large files.

Note: since its a stream editor, it is not commonly used for basic pattern searching.

3.1 Matching the exact pattern

Syntax: `sed -n '/pattern/p' <filename>`

sed -n '/test/p' testfile

3.2 Matching pattern while ignoring it case

sed -n '/[Tt][Ee][Ss][Tt]/p' testfile

Whoow! What is that?!? What's with these braces []? What exactly are we using as the search pattern?

Relax! That is the use of a regular expression.
I know! I know! This was supposed to be a basic tutorial and not contain scary looking expressions!
Since sed does not have a straightforward way to ignore case for basic pattern searches. When it comes to features like find and replace its a whole different story!
So if you do not understand regular expressions, ignore this one for now. We will discuss this is a later blog.

Conclusion:

This was a basic tutorial on pattern searching in Linux. We learnt three different ways to search patterns in files, with or without matching cases.
I hope this was useful and did not take too much time to grasp. That would defeat the purpose of a bite-sized learning series!

Thank you for stopping by!
Drop a comment if you would like to see a specific Linux command covered in this bite-sized learning series! Suggestions or improvements are always welcome!

DEV Community

Linux: 3 ways to search patterns in files

Introduction:

Task:

Sample file we will be running our commands on: