Logtash grok / multiline confusion

groklogstash

My real patterns are more complex but I have tried to boil the problem down to the core issue. Something I don't understand.
Please try this out on http://grokconstructor.appspot.com/do/match

I'm trying to match the following lines:

Start-Date: 2017-08-07  06:48:12
End-Date: 2017-08-07  06:48:12

Start-Date: 2017-08-07  12:55:16
End-Date: 2017-08-07  12:56:01

Using the additional patterns:

DATE_EU2 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[\s]+?%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE_COMB %{DATE_EU2}?%{DATE_EU}?%{DATE_US}?

And the following main pattern:

Start-Date: %{DATE_COMB:starttime}\nEnd-Date: %{DATE_COMB:endtime}

With the multiline filter:

^\n (negated)

Run that and you should (hopefully!) get:

Start-Date: 2017-08-07 06:48:12 End-Date: 2017-08-07 06:48:12 Start-Date: 2017-08-07 12:55:16 End-Date: 2017-08-07 12:56:01
MATCHED
starttime   2017-08-07··06:48:12
endtime 2017-08-07··06:48:12
after match:    Start-Date: 2017-08-07 12:55:16 End-Date: 2017-08-07 12:56:01

So it's matched the first record but not matched the second.
If I add a '\z' to the end of the main pattern then it will match
the second record but not the first. So it's clearly treating the whole thing as one line. But why? My multiline filter states that if a line does not start with a newline it's part of the previous record, right? Well that should leave a blank line in the middle which clearly does start with a newline and should therefore comprise a seperate event, no?

Any pointers gratefully accepted.

Best Answer

Input

Start-Date: 2017-08-07  06:48:12
End-Date: 2017-08-07  06:48:12

Start-Date: 2017-08-07  12:55:16
End-Date: 2017-08-07  12:56:01

Multiline filter = ^\n (negated)

The multiline filter will look at each line in turn to see what should be merged.

First line starts with `^Start-Date` (merged)
Second line starts with `^End-Date` (merged)
Third line is blank (merged, unless logstash skips blank lines)
Fourth line starts with `^Start-Date` (merged)
Fifth line starts with `^End-Date` (merged)

Trying to match a \n, especially at the start of a line makes no sense.

You're better off matching ^End-Date: and merging that with previous. (Or if there's more lines for an event, and it always starts with Start-Date:, match that and negate.

Edit, based on comments and testing with the Grok constructor.

If it makes more sense to use the blank line as the record separator, ^\z or ^\Z appears to work. \Z ignores any final terminator, but seeing as \z also worked in my tests, it appears to confirm that the line, when passed into the filter, is a completely empty string (no newline or any other termination characters).