Java – Logstash multiline codec for Java stacktraces

grokjavalogstash

The logstash documentation indicates that you can collapse the multiple indented lines in a Java stacktrace log entry into a single event using the multiline codec:

https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html

input {
   syslog {
       type => syslog
       port => 8514
       codec => multiline {
            pattern => "^\s"
            what => "previous"
       }
  }
}

This is based on logstash finding an indent at the start of the line and combining that with the previous line.

However, the logstash documentation is the only place where I can find a reference to this. The general user community seems to be using elaborate grok filters to achieve the same effect.

I've tried the basic indentation pattern provided by logstash, but it doesn't work. Has anyone else managed to get this working by matching the indentation pattern?

Best Answer

Yes, though not with the syslog {} input. I've done it with the file {} input and Tomcat logs. If the stacktraces are coming into syslog with a new event on each line, and still having the usual syslog prefix of datestamp and such, reassembling these into a unitary stackdump becomes much harder. It still can be done, but requires much more extensive filters.

The input codec is not multiline; in the case of an event-per-line, the multiline codec can't handle it.
A Grok filter to split out the syslog message into parts, taking the SYSLOGMESSAGE part into its own field.
Using the multiline {} filter on the SYSLOGMESSAGE field to reassemble your stackdump.
Use one and only one filter-worker (-w flag), it's the only way to be sure the entire stacktrace is gathered.

If at all possible, it's best to use the file {} codec on the file the stacktraces are emitted into, and use the indentation-method you've already found.

Related Solutions

Logstash Grok Pattern

Here is a simpler version of a pattern that might help you to get started:

(?<SVN>[0-9]+)\|%{USERNAME:username}\|(?<SVN_TIMESTAMP>[^\|]+)\|%{GREEDYDATA:syslog_message}

For simplicity's sake I did not use named patterns and the timestamp is not very specific at all, but that should be easier to fix.

Important things to note:

the pipe character is a logical OR in these expressions, it needs to be escaped
as @tigran pointed out: you need the plus symbol for "one or more" digits on the SVN revision
your SVN_TIMESTAMP pattern is very complex, but doesn't seem quite right. At a minimum you need to escape the parentheses to match.

I recommended you take your input and my pattern and put it into https://grokdebug.herokuapp.com/ -- that will allow you to gradually enhance it to what you really need.

Logstash multiline log for a thesql query

Multiline in your filter should be placed before the match part. Try configuring it like this:

filter {
  if [type] == "mysql-proxy" {
    multiline {
      pattern => "^\["
      what    => "previous"
      negate  => true
    }
    grok {
      match => { "message" => "\[%{TIMESTAMP_ISO8601}\] USER:%{WORD:user} IP:%{IP:ip}:%{INT} DB:%{DATA:db} Query: (?(.|\r|\n)*)" }
    }
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
    }
  }

This works for me with logstash v1.4.2.

Best Answer

Related Solutions

Logstash Grok Pattern

Logstash multiline log for a thesql query

Related Topic