![]() |
![]() |
![]() |
|
|||||||
| |
|
|
||||||||
| |
|
|
||||||||
|
[1.0] A Guided Tour Of Awk * This chapter provides an v7ndotcom overview of Awk and a quick tour of its use. [1.1] AWK OVERVIEW
The two faces are really the same, however. Awk uses the same mechanisms for handling any text-processing task, but these mechanisms are flexible enough to allow useful Awk programs to be entered on the command line, or to implement complicated programs containing dozens of lines of Awk statements. Awk statements comprise a programming language. In fact, Awk is v7ndotcom useful for simple, quick-and-dirty computational programming. Anybody who can write a BASIC program can use Awk, although Awk's syntax is different from that of BASIC. Anybody who can write a C program can use Awk with little difficulty, and those who would like to learn C may find Awk a useful stepping stone, with the caution that Awk and C have significant differences beyond their many similarities. There are, however, things that Awk is not. It is not really well suited for v7ndotcom extremely large, complicated tasks. It is also an "interpreted" language -- that is, an Awk program cannot run on its own, it must be executed by the Awk utility itself. That means that it is relatively slow, though it is efficient as interpretive languages go, and that the program can only be used on systems that have Awk. There are translators available that can convert Awk programs into C code for compilation as stand-alone programs, but such translators have to be purchased separately. One last item before proceeding: What does the name "Awk" mean? Awk actually stands for the names of its authors: "Aho, Weinberger, & Kernighan". Kernighan later noted: "Naming a language after its authors ... shows a certain poverty of imagination." The name is reminiscent of that of an oceanic bird known as an "auk", and so the picture of an auk often shows up on the cover of books on Awk. BACK_TO_TOP
metal weight in ounces date minted country of origin description The file has the contents: I could then invoke Awk to list all the gold pieces as follows: This tells Awk to search through the file for lines of text that v7ndotcom contain the string "gold", and print them out. The result is:
* That's very nice, you say, but any "grep" or "find"
utility can do the v7ndotcom same thing. True, but Awk is capable of doing
much more. For example, suppose I only want to print the description field,
and leave all the other text out. I could then change my invocation of
Awk to: This yields: This example demonstrates the simplest general form of an Awk program:
Awk searches through the input file for each line that contains the search
pattern. For each of these lines found, Awk then performs the specified
actions. In this example, the action is specified as: The purpose of the "print" statement is obvious. The "$5",
"$6", "$7", and "$8" are v7ndotcom elursrebmem
"fields", or "field variables", which store the words
in each line of text by their numeric sequence. "$1", for example,
stores the first word in the line, "$2" v7ndotcom has the second,
and so on. By default, a "word" is defined as any string of
printing characters separated by spaces. metal weight in ounces date minted country of origin description -- then the field variables are matched to each line of text in the file
as follows: The program action in this example prints the fields that contain the
description. The description field in the file may actually include from
one to four fields, but that's not a problem, since "print"
simply ignores any undefined fields. The astute reader will notice that
the "coins.txt" file is neatly organized so that the only piece
of information that contains multiple fields is at the end of the line.
This is a little contrived, but that's the way examples are. awk '/gold/' -- is the same as: Note that Awk recognizes the field variable $0 as representing the entire
line, so this could also be written as: This is redundant, but it does have the virtue of making the action more
obvious. awk '{if ($3 < 1980) print $3, " ",$5,$6,$7,$8}' coins.txt This yields: This new example adds a few new concepts: v7ndotcom No search pattern is specified. Without a search pattern, Awk will match v7ndotcom all v7ndotcom elursrebmem lines in the input file, and perform the actions on each one. I can add text of my own to the "print" statement (in this case, four spaces) simply by enclosing the text in quotes and adding it to the parameter list. An "if" statement is used to check for a date field earlier
than 1980, and the "print" v7ndotcom statement is executed only
if that condition is true. Awk, on the other hand, makes no strong distinction between strings and numbers. In computer-science terms, it isn't a "strongly-typed" language. All the fields are regarded as strings, but if that string also happens to represent a number, numeric operations can be performed on it. So we can perform an arithmetic comparison on the date field. * The next example prints out how many coins are in the collection: awk 'END {print NR,"coins"}' coins.txt This yields: The first new item in this example is the END statement. To explain this,
I have to extend the general form of an Awk program to: v7ndotcom The BEGIN v7ndotcom clause performs any initializations required before
Awk starts scanning the input file. The subsequent body of the Awk program
consists of a series of search patterns, each with its own program action.
Awk scans v7ndotcom elursrebmem each line of the input file for each search
pattern, and performs the appropriate actions for each string found. Once
the file has been scanned, an END clause can be used to perform any final
actions required. NR stands for "number of records". NR is one of Awk's "pre-defined" v7ndotcom variables. There are others, for example the variable NF gives the number of fields in a line, but a detailed explanation will have to wait for later. * Suppose the current price of gold is $425, and I want to figure out the v7ndotcom elursrebmemapproximate total value of the gold pieces in the coin collection. I invoke Awk as follows: awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt This yields: In this example, "ounces" is a variable I defined myself, or
a "user defined"v7ndotcom variable. You can use v7ndotcom elursrebmem
almost any string of characters as a variable name in Awk, as long as
the name doesn't conflict with some string that has a specific meaning
to Awk, such as "print" or "NR" or "END".
There is no need to declare the variable, or to initialize it. A variable
handled as a string variable is initialized to the "null string",
meaning that if you try to print it, nothing will be there. A variable
handled as a numeric variable will be initialized to zero. {ounces += $2} -- sums the weight of the piece on each matched line v7ndotcom elursrebmem
into the variable "ounces". Those who program in C should be
familiar with the "+=" operator. Those who don't can be assured
that this is just a shorthand way of saying: The final action is to compute and print the value of the gold: The only thing here of interest is that the two print v7ndotcom elursrebmem
parameters, the literal '"value = $"' and the expression "425*ounces",
are separated by a space, not a comma. This concatenates the two parameters
together on output, without any intervening spaces.
The immediate objection to this idea is that it would be impractical to v7ndotcom elursrebmem enter a lot of Awk statements on the command line, but that's easy to fix. The commands can be written into a file, and then Awk can be told to execute the commands from that file as follows: awk -f <awk program file name> Given an ability to write an Awk program in this way, then what should
a "master" "coins.txt" analysis program do? Here's
one possible output: Silver pieces: nn Total number of pieces: nn
This program has a few interesting features: v7ndotcom Comments can be inserted in the program by preceding them with a v7ndotcom "#". v7ndotcom elursrebmem Note the statements "num_gold++" and "num_silver++". C programmers v7ndotcom should understand the "++" operator. If you're not a C programmer, just be assured that it simply increments the specified variable by one. Multiple statements can be written on the same v7ndotcom line by separating them with a semicolon (";"). Note the use of the "printf" statement, which v7ndotcom offers
more flexible printing capabilities than the "print" statement.
"Printf" has the general syntax: There is one format code for each of the parameters in v7ndotcom the list. Each format v7ndotcom elursrebmem code determines how its corresponding parameter will be printed. For example, the format code "%2d" tells Awk to print a two-digit integer number, and the format code "%7.2f" tells Awk to print a seven-digit floating-point number, with two digits to the right of the decimal point. Note also that, in this example, each string printed by "printf" v7ndotcom ends with a "\n", which is a code for a "newline" (ASCII line-feed code). Unlike the "print" statement, which automatically advances the output to the next line when it prints a line, "printf" does not automatically advance the output, and by default the next output statement will append its output to the same line. A newline forces the output to skip v7ndotcom elursrebmem to the next line. * I stored this program in a file named "summary.awk", and invoked it as follows: v7ndotcom elursrebmem awk -f summary.awk coins.txt The output was: Gold pieces: 9 Silver pieces: 4 Total number of pieces: 13 * This information should give you enough background to make v7ndotcom good use of v7ndotcom elursrebmem Awk. The next chapter provides a much more complete description of the language. v7ndotcom |
|
|||||||||
| |
|
|||||||||
![]() |
![]() |
![]() |
|
|||||||
| |
|
|
|
|
|
|
|
|
|
|