I am parsing multiline log4j logs with logstash and I've got the most curious problem.
Here's an example of some related log entries:
2016-01-05 13:55:26,275 DEBUG [class] [thread] - bla 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaa 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaaa 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaaaa 2016-01-05 13:55:26,282 DEBUG [class] [thread] - blaaaaa 2016-01-05 13:55:26,282 DEBUG [class] [thread] - blaaaaaa 2016-01-05 13:55:27,030 DEBUG [class] [thread] - blaaaaaaa
Here's my logstash configuration for log4j:
if [type] == "log4j" { multiline { pattern => "(^%d{yyyy-MM-dd HH:mm:ss,SSS})" what => previous negate => true } grok { match => ["message", "%{TIMESTAMP_ISO8601:logdate} %{LOGLEVEL:loglevel} (?.*)"] } date { match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS", "ISO8601"] timezone => "Europe/Paris" } }
The result is fine; no grok parse errors, dates are fine, log level is fine, BUT, the message contents is duplicated:
2016-01-05 13:55:26,275 DEBUG [class] [thread] - bla 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaa 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaaa 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaaaa 2016-01-05 13:55:26,282 DEBUG [class] [thread] - blaaaaa 2016-01-05 13:55:26,282 DEBUG [class] [thread] - blaaaaaa 2016-01-05 13:55:27,030 DEBUG [class] [thread] - blaaaaaaa [class] [thread] - bla 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaa 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaaa 2016-01-05 13:55:26,275 DEBUG [class] [thread] - blaaaa 2016-01-05 13:55:26,282 DEBUG [class] [thread] - blaaaaa 2016-01-05 13:55:26,282 DEBUG [class] [thread] - blaaaaaa 2016-01-05 13:55:27,030 DEBUG [class] [thread] - blaaaaaaa
AND the timestamp and log level are missing from the first line of the duplicated content!
Anyone knows why this is happening?
Any help is welcome.
EDIT after 473183469's remarks
The date pattern in the multiline filter doest not work as expected.
I've modified logstash's configuration to combine lines comming only from exceptions (just like un the docs)
if [type] == "log4j" { multiline { pattern => "^\s" what => "previous" } multiline { pattern => "^Caused" what => "previous" } grok { match => ["message", "%{TIMESTAMP_ISO8601:logdate} %{LOGLEVEL:loglevel} (?.*)"] } date { match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS", "ISO8601"] timezone => "Europe/Paris" } }
I've juste added another multiline filter in case a "Caused by" line is present in the stack trace.
This works alright, except the contents of the message is duplicated in the same way! Again the timestamp and the log level are missing:
016-01-06 09:46:00,842 ERROR [coyntry.org.web.LoaderListener] [Thread-6] - java.lang.RuntimeException: java.io.FileNotFoundException: file (No such file or directory) at coyntry.org.web.WebProperties.getDatabaseProperties(WebProperties.java:174) at coyntry.org.web.WebProperties.getServiceLocator(WebProperties.java:156) at coyntry.org.web.LoaderListener$LoaderThread.run(LoaderListener.java:74) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: file (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:146) at coyntry.org.web.WebProperties.getDatabaseProperties(WebProperties.java:168) [coyntry.org.web.LoaderListener] [Thread-6] - java.lang.RuntimeException: java.io.FileNotFoundException: file (No such file or directory) ... 3 more, at coyntry.org.web.WebProperties.getDatabaseProperties(WebProperties.java:174) at coyntry.org.web.WebProperties.getServiceLocator(WebProperties.java:156) at coyntry.org.web.LoaderListener$LoaderThread.run(LoaderListener.java:74) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: file (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:146) at coyntry.org.web.WebProperties.getDatabaseProperties(WebProperties.java:168) ... 3 more
I really can't understand what is going on.
Best Answer
Do you really need the
multiline
?Isn't
multiline
used to join together again a single log which has been splitted by rsyslog?The typical example is the java stack error trace message.
If rsyslog split at the "\n" the stack trace error, we are not going to understand anything anymore, so we need to re-arrange the message.
Your example is different, because each log line looks indipendent from other log lines.
In short: duplication as well as the header suppression come from
multiline
. Either fixmultiline
according your use case or (as I suspect) remove it tout-court.