Awk part1 Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it is very valuable to seismologists because it uses floating point math unlike integer only bash and is designed to work with columnar data ID: 293733
Download Presentation The PPT/PDF document "Shell Scripting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Shell Scripting
Awk
(part1)Slide3
Awk Programming Language
standard
unix
language that is geared for text processing and creating formatted reports
but it is very valuable to seismologists because it uses floating point math, unlike integer only bash, and is designed to work with columnar data
syntax similar to C and bash
one of the most useful
unix
tools at your commandSlide4
considers text files as fields (columns) and records (lines)
performs floating & integer arithmetic and string operations
uses loops and conditionals
define your own functions (subroutines)
execute
unix
commands within the scripts and process the resultsSlide5
versions
awk: original awk
nawk
: new awk, dates to 1987
gawk: GNU awk has more powerful string functionality
the CERI
unix
system has all three. You want to use
nawk
. I suggest adding this line to your .
cshrc
file
alias awk ‘
nawk
’
in OS X, awk is already
nawk
so no changes are necessarySlide6
Command line functionality
you can call awk from the command line two ways:
awk [options] ‘{ commands }’ variables
infile(s
)
awk –f
scriptfile
variables
infile(s
)
or you can create an executable awk script
%
cat << EOF >
test.awk
#!/
usr/bin/nawk
some set of commands
EOF
%
chmod
755
test.awk
%./
test.awkSlide7
How it treats text
awk commands are applied to every record or line of a file
it is designed to separate the data in each line into a field
essentially, each field becomes a member of an array so that the first field is $1, second field $2 and so on.
$
0
refers to the entire recordSlide8
Field Separator
the default
field separator
is one or more white spaces
$1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11
1 1918 9 22 9 54 49.29 -1.698 98.298 15.0
ehb
Notice that the fields may be integer, floating point (have a decimal point) or strings.
Nawk
is generally smart enough to figure out how to use them.Slide9
Field Separator
the field separator may be modified by resetting the FS built in variable
Look at
passwd
file
%
head -n1 /etc/
passwd
root:x:
0
:1:Super-User:/:/sbin/sh
Separator is “:”, so reset it.
%
awk
–F”:”
‘{ print $1, $3}’ /etc/
passwd
root
0Slide10
print
One of the most common commands used in awk scripts is
print
awk is not sensitive to white space in the commands
%awk –F”:” ‘{ print $1 $3}’ /etc/
passwd
root
0
two solutions to this
%awk –F”:” ‘{ print $1 “
“ $3}’ /etc/
passwd
%awk –F”:” ‘{ print $1, $3}’ /etc/
passwd
root 0Slide11
any string or numeric text can be explicitly output using “”
Assume a starting file like so:
1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0
ehb
FEQ
x
%
awk
'{print
"latitude:"
,$9,"longitude:",$10,"depth:",$11}’ SUMA. loc
latitude: -1.698 longitude: 98.298 depth: 15.0
latitude: 9.599 longitude: 92.802 depth: 30.0
latitude: 4.003 longitude: 94.545 depth: 20.0Slide12
Unlike the shell AWK does not evaluate variables within strings.
The second line, for example, could not be written:
{print "$8\t$3" }
As it
would print ”$8
$3.”
Inside quotes, the dollar sign is not a special character. Outside, it corresponds to a field.Slide13
1 1 1918 9 22 9 54 49.29 -1.698 98.298 15.0 0.0 0.0
ehb
FEQ
x
you can specify a newline in two ways
%
awk '{print "latitude:",$9
;
print
"longitude:",$10}’ SUMA. loc
%
awk '{print "latitude:",$9
”\n”
,”longitude:",$10}’ SUMA. loc
latitude: -1.698
longitude: 98.298Slide14
a trick
If a field is composed of both strings
and numbers,
you can multiple the field by 1 to remove the string.
%
head
test.tmp
1.5 2008/09/09 03:32:10 36.440N 89.560W 9.4
1.8 2008/09/08 23:11:39 36.420N 89.510W 7.1
1.7 2008/09/08 19:44:29 36.360N 89.520W 8.2
%
awk '{print $4,$4*1}'
test.tmp
36.440N 36.44
36.420N 36.42
36.360N 36.36Slide15
Selective execution
awk recognizes regular expressions and conditionals, which can be used to selective execute awk procedures on certain records
%awk –F”:” ‘
/root/
{ print $1, $3}’ /etc/
passwd
#
reg
expr
root
0
or within our example script
#!/
usr/bin/nawk
-f
/root/
{ print $1}Slide16
if
statements are also very useful
%
awk –F”:” ‘ {
if ($1==“root”)
print $1, $3}’ /etc/
passwd
root
0
or within our example script
{
if ($1==“root”)
{ print $1 }
note, this particular if syntax is a bit different from your reading, which suggested
%awk –F”:” ‘
$1==“root”
{print $1, $3}’ /etc/
passwd
the syntax I use is more explicit and more like C or
perl
, so I essentially have to remember less syntaxSlide17
Floating Point Arithmetic
awk does floating point math!!!!!
it stores all variables as strings, but when math operators are applied, it converts the strings to floating point numbers if the string consists of numeric characters
the reading calls this stringy variablesSlide18
Arithmetic Operators
All basic arithmetic is left to right associative
+ : addition
- : subtraction
* : multiplication
/ : division
% : remainder or modulus
^ : exponent
other standard C programming operators Slide19
Assignment Operators
= : set variable equal to value on right
+= : set variable equal to itself plus the value on right
-= : set variable equal to itself minus the value on right
*= : set variable equal to itself times the value on right
/= : set variable equal to itself divided by value on right
%= : set variable equal to the remainder of itself divided by the value on the right
^= : set variable equal to the itself to the exponent following the equal signSlide20
Unary Operations
A unary expression contains one operand and one operator
++ : increment the operand by 1
if ++ occurs after, $
x
++, the original value of the operand is used in the expression and then incremented
if ++ occurs before, ++$
x
, the incremented value of the operand is used in the expression
-- : decrement the operand by 1
+ : unary plus maintains the value of the operand,
x
=+
x
- : unary minus negates the value of the operand,
-1*
x
=-x! : logical negation evaluates if the operand is true (returns 1) or false (returns 0)Slide21
Relational Operators
Returns 1 if true and
0
if false
!!! opposite of bash test command
All relational operators are left to right associative
< : test for less than
<= : test for less than or equal to
> : test for greater than
>= : test for greater than or equal to
== : test for equal to
!= : test for not equal
Slide22
Boolean (Logical) Operators
Boolean operators return
1
for true and
0
for false
&& : logical AND; tests that both expressions are true
left to right associative
|
| :
logical OR ; tests that one or both of the expressions are true
left to right associative
! : logical negation; tests that expression is trueSlide23
Unlike bash, the comparison and relational operators don’t have different syntax for strings and numbers.
ie
: == in awk rather than == or
eq
using
testSlide24
Comparison Operators
~ : pattern match
!~ : pattern does not match
&& : logical AND
|| : logical OR
== : equals (numeric or string)
!= : does not equal (numeric or string)Slide25
Built-In Variables
FS: Field Separator
NR: record number is another useful built-in awk variable
it takes on the current line number, starting from 1
%awk –F”:” ‘ {
if (NR==1)
print $1, $3}’ /etc/
passwd
root
0
this is useful when headers are present in a fileSlide26
RS : record separator specifies when the current record ends and the next begins
default is “\n” or newline
useful option is “”, or a blank line
OFS : output field separator
default is “ “ or a whitespace
ORS : output record separator
default is a “\n” or newlineSlide27
NF : number of fields in the current record
think of this as awk looking ahead to the next RS to count the number of fields in advance
FILENAME : stores the current filename
OFMT : output format for numbers
example OFMT=“%.6f” would make all numbers output as floating pointsSlide28
Accessing shell variables in
nawk
3 methods to access shell variables inside a
nawk
script ...Slide29
1. Assign the shell variables to
awk
variables after the body of the script, but before you specify the input file
VAR1=3
VAR2=“Hi”
awk
'{print v1, v2}' v1=$VAR1 v2=$VAR2
input_file
3 Hi
Note that I am sneaking in the concept of
awk
variables here (v1,v2)Slide30
There
are a couple of constraints with this method
Shell variables assigned using this method are not available in the BEGIN section
If variables are assigned after a filename, they will not be available when processing that filename
awk
'{print v1, v2}' v1=$VAR1 file1 v2=$VAR2 file2
In this case, v2 is not available to
awk
when processing file1.Slide31
Also note:
awk
variables are referred to by just their name (no $ in front)
awk
'{print v1, v2, NF, NR}' v1=$VAR1 file1 v2=$VAR2 file2Slide32
2. Use the -
v
switch to assign the shell variables to
awk
variables.
This works with
nawk
, but not with all
flavours
of
awk
.
nawk
-
v
v1=$VAR1 -
v
v2=$VAR2 '{print v1, v2}'
input_fileSlide33
3. Protect the shell variables from
awk
by enclosing them with "'" (i.e. double quote - single quote - double quote).
awk
'{print "'"$VAR1"'", "'"$VAR2"'"}'
input_file