Logstash Grok Pattern

groklogstash

First of all I apologize for this,I am pretty bad in regular-expression and try to wrote custom pattern(as I am unable to find something in existing grok pattern or may be I am missing something) for parsing svn logs which is in the format of

r24|prashant|2015-02-26 12:38:04 -0800 (Thu, 26 Feb 2015)|33|Log: ABC-123 / Initial version||A   test/log_testing1 A   test/log_testing2 A   test/log_testing3 A   test/log_testing4 A   test/log_testing5 \n

So it's in the format of

$REVISION:$USER ID:$DATE:$CHECKED IN MESSAGE:$FILE CHECKED IN 

So I wrote some custom pattern

SVN [r0-9]
SVN_TIMESTAMP %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?  (%{DAY}, %{MONTHDAY} %{MONTH} %{YEAR})

and the my logstash-conf would look like this for filter section

filter {
  grok {
  match => { "message" => "%{SVN:revision}|%{USERNAME:username}|%{SVN_TIMESTAMP:svntimestamp}|%{GREEDYDATA:syslog_message}||%{GREEDYDATA:syslog_message" }
}

}

I am not sure it's correct but as usual it's not working.Any help is really appreciated

Best Answer

Here is a simpler version of a pattern that might help you to get started:

(?<SVN>[0-9]+)\|%{USERNAME:username}\|(?<SVN_TIMESTAMP>[^\|]+)\|%{GREEDYDATA:syslog_message}

For simplicity's sake I did not use named patterns and the timestamp is not very specific at all, but that should be easier to fix.

Important things to note:

  • the pipe character is a logical OR in these expressions, it needs to be escaped
  • as @tigran pointed out: you need the plus symbol for "one or more" digits on the SVN revision
  • your SVN_TIMESTAMP pattern is very complex, but doesn't seem quite right. At a minimum you need to escape the parentheses to match.

I recommended you take your input and my pattern and put it into https://grokdebug.herokuapp.com/ -- that will allow you to gradually enhance it to what you really need.