Grep to find lines starting at pattern A until pattern B is matched

awkfilteringgreplog-files

I have a log that contains bits like this:

[2012-04-16 15:16:43,827: DEBUG/PoolWorker-2] {'feed': {}, 'bozo': 1, 'bozo_exception': URLError(error(110, 'Connection timed out'),), 'entries': []}
[2012-04-16 15:16:43,827: ERROR/PoolWorker-2] get_entries
Traceback (most recent call last):
  File "/opt/myapp/app.py", line 491, in get_entries
    logging.getLogger(__name__).debug("Title: %s" % doc.title)
  File "build/bdist.linux-x86_64/egg/feedparser.py", line 423, in __getattr__
    raise AttributeError, "object has no attribute '%s'" % key
AttributeError: object has no attribute 'title'
[2012-04-16 15:16:43,828: INFO/MainProcess] Task myapp.do_task[4fe968ff-e069-4cfe-9a81-aece0d97c289] succeeded in 21.0481028557s: None

I would like to extract from it sections as follows:

  1. When a line contains "ERROR" or "WARN" start filtering (and include this line)
  2. When the next line starting with "[" is found, stop filtering (and don't include this line).

I'm pretty sure this is too much for Grep, so how to do it?

(Ok, instead of being lazy, I've figured it out – will post my solution.)

Best Answer

This worked for me - not exactly as described above, but close enough:

awk '/ERROR|WARN/,/DEBUG|INFO/ { if ($0 !~ /(DEBUG|INFO)/) { print } }' < logfile

Very convenient that awk supports this: /startpattern/,/stoppattern/ { }. Unfortunately if the stop pattern is matched on the same line as the start pattern, it prints out that line only, hence the need for a different stop pattern.

Related Topic