Grep to find lines starting at pattern A until pattern B is matched

awkfilteringgreplog-files

I have a log that contains bits like this:

[2012-04-16 15:16:43,827: DEBUG/PoolWorker-2] {'feed': {}, 'bozo': 1, 'bozo_exception': URLError(error(110, 'Connection timed out'),), 'entries': []}
[2012-04-16 15:16:43,827: ERROR/PoolWorker-2] get_entries
Traceback (most recent call last):
  File "/opt/myapp/app.py", line 491, in get_entries
    logging.getLogger(__name__).debug("Title: %s" % doc.title)
  File "build/bdist.linux-x86_64/egg/feedparser.py", line 423, in __getattr__
    raise AttributeError, "object has no attribute '%s'" % key
AttributeError: object has no attribute 'title'
[2012-04-16 15:16:43,828: INFO/MainProcess] Task myapp.do_task[4fe968ff-e069-4cfe-9a81-aece0d97c289] succeeded in 21.0481028557s: None

I would like to extract from it sections as follows:

When a line contains "ERROR" or "WARN" start filtering (and include this line)
When the next line starting with "[" is found, stop filtering (and don't include this line).

I'm pretty sure this is too much for Grep, so how to do it?

(Ok, instead of being lazy, I've figured it out – will post my solution.)

Best Answer

This worked for me - not exactly as described above, but close enough:

awk '/ERROR|WARN/,/DEBUG|INFO/ { if ($0 !~ /(DEBUG|INFO)/) { print } }' < logfile

Very convenient that awk supports this: /startpattern/,/stoppattern/ { }. Unfortunately if the stop pattern is matched on the same line as the start pattern, it prints out that line only, hence the need for a different stop pattern.

Related Solutions

Comment all lines matching some pattern

You are attempting to edit in place (-i option) the STDIN.

Remove -i option, it is useless.

Note:

You can speed up the command a lot avoiding the second grep, excluding at the root the unnecessary directories

Try

grep -lIR --exclude-dir=.svn "dlclose" . | xargs sed -i bak 's/.*dlclose.*/\/\/&/g'

for f in $(grep -lIR --exclude-dir=.svn "dlclose" .)
do
   sed -i bak 's/.*dlclose.*/\/\/&/g' $f
done

Grep lines after match until the end

With GNU grep (tested with version 2.6.3):

git status | grep -Pzo '.*Untracked files(.*\n)*'

Uses -P for perl regular expressions, -z to also match newline with \n and -o to only print what matches the pattern.

The regex explained:

First we match any character (.) zero or multiple times (*) until an occurence of the string Untracked files. Now, the part inside the brackets (.*\n) matches any character except a newline (.) zero or multiple times (*) followed by a newline (\n). And all that (that's inside the backets) can occure zero or multiple times; that's the meaning of the last *. It should now match all other lines, after the first occurence of Untracked files.

Best Answer

Related Solutions

Comment all lines matching some pattern

Grep lines after match until the end

Related Topic