Bash sub-string extraction for specified start and end character

awkbashsed

For example I have a log file having this entry:

[Wed Aug 08 11:39:41 2012] [error] [client 155.94.70.224] ModSecurity: [file "/etc/httpd/modsecurity.d/rules/base_rules/modsecurity_crs_20_protocol_violations.conf"] [line "271"] [id "960020"] [rev "2.2.5"] [msg "Pragma Header requires Cache-Control Header for HTTP/1.1 requests."] [severity "NOTICE"] [tag "RULE_MATURITY/5"] [tag "RULE_ACCURACY/7"] [tag "https://www.owasp.org/index.php/ModSecurity_CRS_RuleID-960020"] [tag "PROTOCOL_VIOLATION/INVALID_HREQ"] [tag "http://www.bad-behavior.ioerror.us/documentation/how-it-works/"] Warning. String match "HTTP/1.1" at REQUEST_PROTOCOL. [hostname "webmail.white-art.co.uk"] [uri "/horde/themes/graphics/tree/plusonly.png"] [unique_id "UCJB7VveCGYAAG@BHJgAAAAQ"]

I want to extract all the string pairs starting with character [ and ending in ]. I can use cut or awk to extract using single delimiter, but I need to extract string between starting [ and ending ].
how to accomplish it?

For example, I need to extract:

"[tag "RULE_ACCURACY/7"]"

and

"[severity "NOTICE"]"

from the log.


I found a solution that first I have to explode the log by add new line after every ] and then using grep to search for required string. Is there any better way to do it?

Best Answer

I think this will split the line as you want

sed -e 's/\]/\]\n/g' log | sed -e 's/^ *//g' | awk '/^\[/ {print}'

First put a newline after each ] then remove any leading spaces and finally print the lines beginning with [.

Your input line becomes

[Wed Aug 08 11:39:41 2012]
[error]
[client 155.94.70.224]
[line "271"]
[id "960020"]
[rev "2.2.5"]
[msg "Pragma Header requires Cache-Control Header for HTTP/1.1 requests."]
[severity "NOTICE"]
[tag "RULE_MATURITY/5"]
[tag "RULE_ACCURACY/7"]
[tag "https://www.owasp.org/index.php/ModSecurity_CRS_RuleID-960020"]
[tag "PROTOCOL_VIOLATION/INVALID_HREQ"]
[tag "http://www.bad-behavior.ioerror.us/documentation/how-it-works/"]
[uri "/horde/themes/graphics/tree/plusonly.png"]
[unique_id "UCJB7VveCGYAAG@BHJgAAAAQ"]
Related Topic