I've got an Elasticsearch/Logstash/Kibana instance running, which I'm merrily stuffing with syslogs from a variety of hosts.
Having built it to scale – with multiple logstash syslogd listeners, and multiple ES nodes – it's doing quite nicely for collating logging across a large portfolio of servers.
There's just one problem I'm having a the moment – grouping hosts. I can get datasets for host groupings based on a variety of criteria from my config database – physical location, 'service', 'customer' etc.
And I'd really like to be able to add these as filter criteria in my elasticsearch database, and if at all possible so I can use them in Kibana without needing to do much modification.
Currently I'm thinking in terms of either:
- a custom logstash filter that looks up hostname in a data dump, and adds tags (really, service/customer/location is all I really need).
- Trying to add a parent/child relationship for a 'host' document.
- using 'percolator' to cross reference (somehow?)
- a 'script' field?
- Some sort of dirty hack involving a cron job to update records with metadata post-ingest.
But I'm wondering if anyone's already tackled this, and is able to suggest a sensible approach?
Best Answer
Having done a bit of digging, the solution I finally decided upon was to use the logstash plugin 'filter-translate'
This takes a YAML file with key-values, and lets you rewrite your incoming log entry based on it.
So:
This is a rather simple list:
At the moment, it's static-ish and rebuild and fetched via
cron
. I'm intending to push towardsetcd
andconfd
to do a more adaptive solution.This means that events are already 'tagged' as they enter elasticsearch, and also because my logstash engines are distributed and autonomous, running off a 'cached' list is desirable anyway. My host lists don't change sufficiently fast that this is a problem.