ELK: LogStash to read log files from remote Samba-mapped network drives

elasticsearchelklogstash

I'm new to ELK, and I would like to set up a solution to index Microsoft IIS and applicative .NET logs with ES.

I'm aware about different approaches:

1) [app servers: log files ➔ Logstash] ➔ [collecting server: Redis ➔ Logstash] ➔ [ES cluster: ES ➔ Kibana]

The con of this method is to having to install, configure and maintain a logstash instance on each Windows server producing logs

2) [app servers: log files ➔ Filebeat] ➔ [collecting server: Logstash ➔ Redis ➔ Logstash] ➔ [ES cluster: ES ➔ Kibana]

The con of this method is that currently filebeat does not support multiline log entries, and my .NET apps produce multi-line exceptions. I'm not sure how the intermediate logstash+redis+logstash is to be configured to handle this.

So I thought, maybe given that Logstash is able to collect log data without filebeat or any other forwarder by itself (please correct me if I'm wrong), I might try the following:

[app servers: log files] ➔ [collecting server: Samba-mapped network drives ➔ Logstash ➔ Redis ➔ Logstash] ➔ [ES cluster: ES ➔ Kibana]

In that hypothesis, I won't need to install a Logstash instance on each app server. The central logstash instance (or multiple instances) would fetch files (using Samba-mapped network drives) and apply the multiline codec before pushing log entries to Redis.

Is that technically feasable? Is that a sound architectural choice?

Best Answer

While running Logstash with a file input against your logs on a CIFS share will work, I don't think it'll work very well. I haven't directly used Logstash like that but, In my experience using Logstash-Forwarder to watch logs files over an SSHFS mount, it doesn't deal well will file rotations or reboots of either end.

As for not being sure how to deal with your multi-line exceptions with Filebeat, I don't think you need to worry about it. FileBeat just takes lines from the files you want to ship, and fires them across the network. It adds a few fields, but they don't affect the overall concept of FileBeat being a very basic log shipper.

This means you can just run your multi-line filter in Logstash on the collecting server, just as you would if you ran Logstash on the app servers directly.

Now, depending on your log volume, you might find that you need to increase the number of workers for LS to handle grokking your data effectively.

What I do to handle such things is very similar to your option 2, but instead of just having two LS instances ("Broker" and a "Parser"), I have 3.

                            +-------------------+
                           +-------------------+|
                          +-------------------+||
                          |    App Servers    |||
                          |    +----------+   ||+
                          |    | FileBeat |   |+
                          +----+----------+---+
                               /
                             /       
                           /        
        +----------------/----------------------------------------+
        |              /      Collecting Server                   |
        | +----------/-+  +---------------------+  +------------+ |
        | |  Logstash  |  |      Logstash       |  |  Logstash  | |
        | |   Broker   |  |Multi-line Pre-Parser|  |   Parser   | |
        | +------------+  +---^-----------------+  +-----^---V--+ |
        |     |               |             |            |   |    |
        |     |               |    Redis    |            |   |    |
        |     V       +---------------------V------+     |   |    |
        |     +------->     DB0      |      DB1    + --->+   |    |
        |             +----------------------------+        /     |
        +-------------------------------------------------/-------+
                                                        /
                                                      /
                                                    /
                           +-------------------+  /
                          +-------------------+|/
                         +-------------------+||
                         |   ElasticSearch   ||+
                         |      Cluster      |+
                         +-------------------+

All the Pre-Parser instance does is transform multi-line log entries into a single line so that the Parser can do it's job properly. And even then, I'm checking type and tags to see if there's even a possibility that the line(s) will be multi-line, so the overhead is minimal.

I'm easily able to push 1000 events a second through it (barely hitting 20% CPU). Further, the system is an ELK stack-in-a-box, so with dedicated nodes for both LS and ES, it should be easy.

Why not just crank up the workers on the Parser instance? Well, this stems from the fact that the multiline filter in LS doesn't support multiple workers.

multiline
This filter will collapse multiline messages from a single source into one Logstash event.

The original goal of this filter was to allow joining of multi-line messages from files into a single event. For example - joining java exception and stacktrace messages into a single event.

Note: This filter will not work with multiple worker threads -w 2 on the Logstash command line.