awk < file '{ print $2 }'
This means "on every line, print the second field".
To print the second and third columns, you might use
awk < file '{ print $2, $3 }'
awk < /etc/passwd -F: '{ print $6 }'
since the password file has fields delimited by colons and the home
directory is the 6th field.
awk '{ print ($1-32)*(5/9) }'
which will convert fahrenheit temperatures provided on standard input
to celsius until it gets an end-of-file.
The selection of operators is basically the same as in C, although some of C's wilder constructs do not work. String concatenation is accomplished simply by writing two string expressions next to each other. '+' is always addition. Thus
echo 5 4 | awk '{ print $1 + $2 }'
prints 9, while
echo 5 4 | awk '{ print $1 $2 }'
prints 54. Note that
echo 5 4 | awk '{ print $1, $2 }'
prints "5 4".
You can make your own variables, with whatever names you like (except for reserved words in the awk language) just by using them. You do not have to declare variables. Variables that haven't been explicitly set to anything have the value "" as strings and 0 as numbers.
For example, the following code prints the average of all the numbers on each line:
awk '{ tot=0; for (i=1; i<=NF; i++) tot += $i; print tot/NF; }'
Note the use of $i to retrieve the i'th variable, and the for loop,
which works like in C. The reason tot is explicitly
initialized at the beginning is that this code is run for every input
line, and when starting work on the second line, tot will have the
total value from the first line.
awk '{ tot += $1; n += 1; } END { print tot/n; }'
Note the use of two different block statements. The second one has
END in front of it; this means to run the block once after
all input has been processed. In fact, in general, you can put all
kinds of things in front of a block, and the block will only run if
they're satisfied. That is, you can say
awk ' $1==0 { print $2 }'
which will print the second column for lines of input where the
first column is 0. You can also supply regular expressions to match
the whole line against:
awk ' /^test/ { print $2 }'
If you put no expression, the block is run on every line of input. If
multiple blocks have conditions that are true, they are
all run. There is no particularly clean way I know of
to get it to run exactly one of a bunch of possible blocks of code.
The block conditions BEGIN and END are special and are run before processing any input, and after processing all input, respectively.
Here's how you strip the first column off:
awk '{ for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n"; }'
Note the use of NF to iterate over all the fields and the use of
printf to place newlines explicitly.
In combination with sed, sort, and some other useful shell utilities like paste, awk is quite satisfactory for a lot of numerical data processing, as well as the maintenance of simple databases kept in column-tabular formats.
awk is also extremely useful in a certain style of makefile writing for generating pieces of makefile to include.
Do not script in csh. Use sh, or if you must, ksh.
The only thing seriously lacking in awk that I've yet run into is that there's no way to tell awk not to do buffering on its input and output, which makes it useless for assorted interactive or network-oriented applications it would otherwise be ideally suited for.