DEV Community

Vineet Kalghatgi
Vineet Kalghatgi

Posted on

You NEED to know awk

In Brief

awk is a powerfull command line tool found included in all modern GNU/linux distributions. It is often excluded from beginner level Linux tutorials, where more fundamental commands like cd , pwd, ls etc are given more priority, and rightfully so. One must have a good grasp on the basic commmand line tools in order to harness the full power of awk.

awk in essence is a pattern scanning and processing language, emphasis on the language part because awk possesses functionality akin to most mainstream programming languages.

awk loves tabulated data. So to begin with, you'll need a piece of text represented as columns delimited by some common character. You see this pattern often in shells, for instance, the ls -l command that prints file information delimited by a space :

ls -l                                                                                                                              
total 24                                                                                                                                               
lrwxrwxrwx   1 root   root       7 Sep  7 07:45 bin -> usr/bin                                                                                         
drwxr-xr-x   1 root   root       0 Apr 15  2020 boot                                                                                                   
drwxr-xr-x   9 root   root     480 Sep  8 13:34 dev                                                                                                    
drwxr-xr-x   1 root   root    1902 Sep  7 20:29 etc
Enter fullscreen mode Exit fullscreen mode

However, for the sake of demonstration, let us use the following sample.txt to illustrate awk's capabilities.

emp_id;emp_name;emp_sex;emp_salary;emp_yoj;
1;john;male;3000;2014
2;sarah;female;2500;2018
3;lily;female;5000;2012
4;jack;male;3000;2014
5;mark;male;2500;2017
Enter fullscreen mode Exit fullscreen mode

awk is like any other UNIX command, i.e it takes in options and arguments, however, its power lies in a script argument that can either be included inline or as an external file with a .awk extension. An awk script has the following basic structure:

BEGIN {commands}
/pattern1/ {commands}
/pattern2/ {commands}
...
/patternN/ {commands}
END {commands}
Enter fullscreen mode Exit fullscreen mode

In essence, awk loops through each row of the input file and executes commands based on conditions. The patterns correspond to regex patterns that you can use to distinguish different rows. The default pattern matches every row.

Commands following the BEGIN keyword are executed before the loop, and the ones following the END keyword, after.
Note: the numbering of the pattern has no correlation whatsoever with the row number

Alright! Lets process some text

1. Print only the name and salary of all employees

awk 'BEGIN {FS = ";"}
{print $2 " " $4}' test.txt

emp_name emp_salary
john 3000
sarah 2500
lily 5000
jack 3000
mark 2500
Enter fullscreen mode Exit fullscreen mode

The string argument, marked by single quotes, is the awk script mentioned before. Here's the rundown of its working step by step:

  • Set the delimiter to ";" using the predefined FS (Field Seperator) variable before processing any rows.
  • For every row, print the 3rd and 4th column separated by a space.

2. Print all the male names.

awk ' BEGIN { FS = ";" }
{if($3 == "male") print $2;}' test.txt

john
jack
mark
Enter fullscreen mode Exit fullscreen mode

This script basically instructs awk to :

  • Set the delimiter to ";" before processing any rows.
  • For every row, check if the third column ($3) is "male", if true then print the second column ($2) i.e the name.

Note: A command without a pattern applies to every row

3. Derive the average salary.

awk ' BEGIN { FS = ";"; sum = 0; }
{if (NR > 1) sum += $4;}
END{
    total = NR - 1;
    avg = sum / total;
    print avg;
}' test.txt

3200
Enter fullscreen mode Exit fullscreen mode

FS is one of many variables have been predefined. In this example, however, we will be defining a couple custom variables. So here is the rundown for this command:

  • Before processing any rows, set the delimiter to ";" and a custom variable sum to 0.
  • For every row, if the row number is greater than one, increment the sum variable. Note: here the NR variable is predefined and it gives us the row number
  • At the end, after all rows have been processed, set a variable total to NR - 1, calculate average as sum / total and print it.

As you can see, these three examples encapsulate the sheer power and versatility of the awk command. It supports features like variables, if else blocks and arithmetic evaluation as seen above, all of which you can find in full fledged programming languages.

But this is just the tip of the iceberg, awk even supports arrays, for loops, processing multiple files with just one script, and much more. Therefore awk is more than just a convenient command line tool, it is a powerful language in in of itself and hopefully, this inspired you to learn more about it.

Further reading

If you're interested you can check out The GNU Awk's user guide

Top comments (1)

Collapse
 
talr98 profile image
Tal Rofe

This is highly recommended tool when executing shell commands. Thanks