Linux – Can awk patterns match multiple lines


I have some complex log files that I need to write some tools to process them. I have been playing with awk but I am not sure if awk is the right tool for this.

My log files are print outs of OSPF protocol decodes which contain a text log of the various protocol pkts and their contents with their various protocol fields identified with their values. I want to process these files and print out only certain lines of the log that pertain to specific pkts. Each pkt log can consist of a varying number of lines for that pkt's entry.

awk seems to be able to process a single line that matches a pattern. I can locate the desired pkt but then I need to match patterns in the lines that follow in order to determine if it is a pkt I want to print out.

Another way to look at this is that I would want to isolate several lines in the log file and print out those lines that are the details of a particular pkt based on pattern matches on several lines.

Since awk seems to be line-based, I am not sure if that would be the best tool to use.

If awk can do this, how it is done? If not, any suggestions on which tool to use for this?

Best Answer

Awk can easily detect multi-line combinations of patterns, but you need to create what is called a state machine in your code to recognize the sequence.

Consider this input:

second half #1
first half
second half #2
second half #3

As you have seen, it's easy to recognize a single pattern. Now, we can write an awk program that recognizes second half only when it is directly preceded by a first half line. (With a more sophisticated state machine you could detect an arbitrary sequence of patterns.)

/second half/ {
  if(lastLine == "first half") {

{ lastLine = $0 }

If you run this you will see:

second half #2

Now, this example is absurdly simple and only barely a state machine. The interesting state lasts only for the duration of the if statement and the preceding state is implicit, depending on the value of lastLine. In a more canonical state machine you would keep an explicit state variable and transition from state-to-state depending on both the existing state and the current input. But you may not need that much control mechanism.