How to parse a human-readable byte count in Logstash

kibanalogstash

I'm dealing with log files containing parts such as:

538,486K of 1,048,576K

These represent memory use (Java heap space) rendered in a human-readable format. I would like to track those numbers in charts in Kibana. To do this I would like to somehow use Logstash's grok filter to parse these numbers, but I don't know how to handle (i.e. ignore) the thousands separator.

Ideally I would have something that can also handle the "K" and multiply by one thousand. At this point in time I am not aware that any system logs in a unit other than kilobyte, but I'd prefer not to make that assumption.

Best Answer

The mutate filter allows text replacement with the gsub option.

gsub takes an array, where every triplet of values indicates:

  • Target field name
  • Search pattern
  • Replace pattern

It technically supports regular expressions, but we don't need that in this case.

First, we strip commas. Simple enough.

Second, we multiply. Should K multiply by 1000? If so, it seems to me that we can simply replace K with 000.

Putting those together:

filter {
    mutate {
        gsub {[
            "some_field", ",", "",
            "some_field", "K", "000"
        ]}
    }
}

You can add other replacement options as needed.

Depending on your circumstances, K might multiply by 1024, which is going to be a bit more complicated. I don't see any solution right out of the box, but you can use the ruby filter to run some arithmetic.