DEV Community

Cover image for Awkwardly Awesome: Unlocking the Power of awk
Jimmy McBride
Jimmy McBride Subscriber

Posted on • Edited on • Originally published at jimmymcbride.dev

Awkwardly Awesome: Unlocking the Power of awk

Welcome to the third post in the Textual Healing series! In this article, we’re diving deep into the world of awk—a tool that’s awkwardly powerful (pun intended) when it comes to processing and analyzing text.

awk is an incredibly useful tool for text processing, especially for working with data in columns. Think of awk as a mini-programming language built into your terminal that helps you extract, manipulate, and transform text with specific patterns.

To help you follow along, we’ll use this sample file.txt:

Name    Age    Salary
Alice   25     50000
Bob     30     55000
Charlie 35     60000
Dave    40     65000
Enter fullscreen mode Exit fullscreen mode

1. The Basics of awk

At its simplest, awk operates on fields, which are like columns of data. Let’s use awk to print specific columns from file.txt. To print the first column (which is the Name):

awk '{print $1}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Name
Alice
Bob
Charlie
Dave
Enter fullscreen mode Exit fullscreen mode
  • $1 represents the first column, $2 would be the second column, and so on.
  • $0 represents the entire line.

If you want to print the second and third columns (for Age and Salary):

awk '{print $2, $3}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Age Salary
25  50000
30  55000
35  60000
40  65000
Enter fullscreen mode Exit fullscreen mode

2. Custom Field Separators

By default, awk uses spaces or tabs as the field separator. But what if your data is separated by commas, like in a CSV file? You can specify the field separator using the -F option. Here’s an example:

Let’s say we had a comma-separated version of file.txt:

Name,Age,Salary
Alice,25,50000
Bob,30,55000
Charlie,35,60000
Dave,40,65000
Enter fullscreen mode Exit fullscreen mode

Now, if you want to print the Name and Salary columns, you’d do this:

awk -F ',' '{print $1, $3}' file.csv
Enter fullscreen mode Exit fullscreen mode

Output:

Name Salary
Alice 50000
Bob 55000
Charlie 60000
Dave 65000
Enter fullscreen mode Exit fullscreen mode

3. Pattern Matching with awk

You can also use awk to search for patterns in the data. For instance, if you want to print the Name of anyone who has a Salary over 55,000, you can use a pattern match like this:

awk '$3 > 55000 {print $1}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Charlie
Dave
Enter fullscreen mode Exit fullscreen mode

This command checks if the third column (Salary) is greater than 55,000 and prints the Name column for matching rows.


4. Conditionals and Calculations

awk can perform conditional logic and arithmetic on the data. Suppose you want to give everyone a 5% raise and print the new salary:

awk '{new_salary = $3 * 1.05; print $1, new_salary}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Name    52500
Alice   52500
Bob     57750
Charlie 63000
Dave    68250
Enter fullscreen mode Exit fullscreen mode

Here, we multiply the third column (Salary) by 1.05 and print the new value along with the person’s name.


5. Output Formatting

Want to format your output neatly? Use awk’s printf function to add custom formatting. For example, if you want to print each person’s Name and Salary in a structured format:

awk '{printf "Name: %s, Salary: $%.2f\n", $1, $3}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Name: Alice, Salary: $50000.00
Name: Bob, Salary: $55000.00
Name: Charlie, Salary: $60000.00
Name: Dave, Salary: $65000.00
Enter fullscreen mode Exit fullscreen mode

This example uses printf to format the Salary with two decimal places.


6. Summarizing Data with awk

awk is incredibly handy for summarizing data. Let’s calculate the total salary and average salary of all employees.

  • Sum of salaries:
awk '{sum += $3} END {print "Total Salary:", sum}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Total Salary: 230000
Enter fullscreen mode Exit fullscreen mode
  • Average salary:
awk '{sum += $3; count++} END {print "Average Salary:", sum/count}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

Average Salary: 57500
Enter fullscreen mode Exit fullscreen mode

7. Using awk with Variables

awk becomes especially useful when handling complex regular expressions that might be tricky in sed. When things get complicated with forward or back references, rewriting in awk can make your code easier to read and debug. You can even add print statements to check intermediate results while you’re debugging.

If you need to pass external variables into an awk program, you can use the -v option. This allows you to assign values to variables outside the program and pass them in:

awk -v varname=value 'awk program' file.txt
Enter fullscreen mode Exit fullscreen mode

You can also pass multiple variables by using multiple -v options.


8. Real-World Use Cases for awk

Here are some practical ways to use awk with a file like file.txt:

  • Find all employees over 30 years old:
  awk '$2 > 30 {print $1, $2}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

  Charlie 35
  Dave    40
Enter fullscreen mode Exit fullscreen mode
  • Give all employees a 7% bonus and print the new salary:
  awk '{bonus = $3 * 0.07; new_salary = $3 + bonus; print $1, new_salary}' file.txt
Enter fullscreen mode Exit fullscreen mode

Output:

  Alice   53500
  Bob     58850
  Charlie 64200
  Dave    69550
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

awk is a powerhouse for working with structured data in text files. From extracting and summarizing data to performing complex calculations and pattern matching, awk offers flexibility and power that makes text processing much easier.

Next time you need to process data from logs, CSVs, or any column-based file, remember that awk is here to help. With just a little practice, you’ll be using awk like a pro!


Want to hang out with other Linux lovers and coding enthusiasts? Come join our community on Discord! We’re a group of friendly folks who love to code, share tips, and help each other grow. Click here to join the conversation!

Top comments (3)

Collapse
 
josephj11 profile image
Joe

I use simple awk programs a lot. Whenever a sed regex starts to get complicated especially with forward or back references, I rewrite it in awk. It's much easier to code and you can add print statements for intermediate results while debugging.

Since the shell will try to expand awk program statements, inline awk programs are usually surrounded by single quotes to protect them. However, there is often a need to pass some information/parameters into them. To do that, you can use the awk -v option.

`awk -v varname=value 'awk program' file

assigns value to varname and passes that into the awk program where it can be used.

Awk accepts multiple -v arguments if you need to pass more than one parameter.

Collapse
 
jimmymcbride profile image
Jimmy McBride

Another awesome comment worthy of updating my blog! Thanks for helping me make my blogs a better resource for people and myself!

I have an idea for an article about less vs more, and I think I'm going to show off that info command. That's if there's enough content in the comparison anyways! I'll probably be in blog writing mode on Thursday. Hopefully I can get enough content so I don't have to worry about being gone this weekend!

Collapse
 
josephj11 profile image
Joe

Throw in most if you need more content. :)