Twitter

awk Tutorial -- 2 -- Columns, FS, OFS and NF

This blog is part of an awk Tutorial, you'll find the whole list of section already covered here.
As previously said, awk automatically split the lines in columns; each column automatically being assigned to a variable:
  • $1 for the first column
  • $2 for the second column
  • etc . . .
  • $0 for the whole line
Let's use the below ls -loutput as an example:
$ ls -ltr file*
-rwxrwxrwx 1 fred fred  4 Feb 11 10:23 file1
-rwxrwxrwx 1 fred fred 12 Feb 11 10:23 file2
$
awk will split this output as shown below:
Based on the above, if we would like to show the name of the file and then its size, we would print the 9th column ($9) and then 5th column ($5); it would look like:
$ ls -ltr file* | awk '{print $9, $5}'
file1 4
file2 12
$
A note about the FS you can see above; FS stands for "Field Separator"; this is the character awk uses to separate the columns. The default FS is space so in this example, column separation is done by default. Also note that the comma "," between $9 and $5 in the above print statement is OFS which stands for "Output Field Separator" and OFS default is FS so... a space; we can change that by not using OFS and hardcode an Output Field Separator or by setting the OFS value in the BEGIN section.
$ ls -ltr file* | awk 'BEGIN{OFS=":"}{print $9, $5}'
file1:4
file2:12
$ ls -ltr file* | awk '{print $9":"$5}'
file1:4
file2:12
$
Back to FS, the default is not only space, it is one or more space and also one or more tab or end of line; it more of like [ \t\n]+ It is easy to test using a file as below for example (cat -A shows the non printables characters: ^I is tab and $ is end of line):
# The below example file is built has follow:
# column1 TAB column2 TAB*2 column3 SPACE column4 SPACE*8 column5
$ cat test_FS
1       2               3 4        5
$ cat -A test_FS
1^I2^I^I3 4        5$
$ cat test_FS | awk '{print $1, $2, $3, $4, $5}'
1 2 3 4 5
$
We can see that the default FS manages any "blank" space whether is it SPACE(s) or TAB(s).

The default FS can be modified in the BEGIN section or by using the -F option:
$ cat passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
$ cat passwd | awk 'BEGIN{FS=":"}{print $1}'
root
daemon
bin
$ cat passwd | awk -F ":" '{print $1}'
root
daemon
bin
$ cat passwd | awk -F: '{print $1}'
root
daemon
bin
$
I do not really recommand using the 3rd one (-F:); indeed, this gonna work with separators like ,;" etc ... but not with pipe for example which you will have to despecialize and as I am a man of habits, I prefer to stick with syntax which will be working for 100% of the cases:
$ cat passwd | awk -F\| '{print $1}'
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
$
A more rare case but it is good to know is that FS can also be a regular expression in case you get a file or an output which has different column separator; let's have a look below with a file with ":", "," and SPACE as column separators:
$ cat test_FS_regexp
1:,3 4
$
If we do not modify the default FS, we know what will happen:
  • $1 will be "1:2,3"
  • $2 will be "4"
See below:
$ cat test_FS_regexp | awk '{print "1=>"$1, "2=>"$2, "3=>"$3, "4=>"$4}'
  1=>1:2,3 2=>4 3=> 4=>
  $
The solution here is to use a regular expression for FS:
$ cat test_FS_regexp | awk -F "[:, ]" '{print "1=>"$1, "2=>"$2, "3=>"$3, "4=>"$4}'
1=>1 2=>2 3=>3 4=>4
$
To make this output more readable, we could print each column on a different line; we can easily achieve that by settings OFS to "\n" which is the end of line character:
$ cat test_FS_regexp | awk -F "[:, ]" 'BEGIN{OFS="\n"}{print "1=>"$1, "2=>"$2, "3=>"$3, "4=>"$4}'
1=>1
2=>2
3=>3
4=>4
$
Another thing we may also be interested in when dealing with columnar text files is the number of columns of a line; this is what NF is meant; the special variable NF contains the "Number of Fields" of a line:
$ cat passwd 
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
$ cat passwd | awk -F ":" '{print NF}'
7
7
7
$
It shows 7 which is the number of columns (assuming the separator is ":") on each line; a nice trick to know with NF is that you can use $NF to point to the last column of each line (whatever number of columns there is in each line; you may have a different number of columns on each line, $NF will point to the last one):
$ cat passwd | awk -F ":" '{print $NF}'
/bin/bash
/usr/sbin/nologin
/usr/sbin/nologin
$

That's all for this one, you know about awk columns, FS, OFS and NF!

No comments:

Post a Comment

CUDA: Getting started on Google Colab

While getting started with CUDA on Windows or on WSL (same on Linux) requires to install some stuff, it is not the case when using Google...