Failing forwarding rsyslog

logstashrsyslogsyslog

I have a centralised rsyslog server A that receives a bunch of logs through TCP from servers X, Y, Z. It then stores the files on disk but also forwards them to logstash server B (on a different machine). For the relaying to the logstash server B I use TCP as follows :

$template logstash_json,"{\"@timestamp\":\"%timestamp:::date-rfc3339,jsonf:@timestamp%\",\"@source_host\":\"%source:::jsonf:@source_host%\",\"@source\":\"syslog://%fromhost-ip:::json%\",\"@message\":\"%timestamp% %app-name%:%msg:::json%\",\"@fields\":{\"facility\":\"%syslogfacility-text:::jsonf:facility%\",\"severity\":\"%syslogseverity-text:::jsonf:severity%\",\"program\":\"%app-name:::jsonf:program%\",\"pid\":\"%procid:::jsonf:processid%\"}}"

$WorkDirectory /var/cache/rsyslog # default location for work (spool) files - make sure it's created
$ActionQueueType LinkedList   # use asynchronous processing
$ActionQueueFileName srvrfwd  # set file name, also enables disk mode
$ActionResumeRetryCount -1    # infinite retries on insert failure
$ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts down

# Ship logs over TCP to logstash
*.*                                             @@server_B:2514;logstash_json

Whenever my logstash server is down or unreachable, after a while the centralised rsyslog on server A goes unresponsive and as a consequence my servers X, Y, Z start freezing or having unexpected high loads while services try to write to syslog.

How do I configure rsyslog (server side and/or client side) to be more robust to this point of failure ?

Side question : whenever the logstash server gets restarted rsyslog doesn't play back the logs stored in /var/cache/rsyslog, anyone know how to configure this ?

Best Answer

You need to configure queuing in rsyslog, a relevant discussion is here:

http://help.papertrailapp.com/discussions/problems/1983-ubuntu-network-hang-up-on-failures-to-reach-papertrail

This is the relevant answer part -- I assume you will look up the directives in the actual documentation before s:

Adding the following just before *.* @@logs.papertrailapp.com and restarting rsyslog should do the job:

$ActionResumeInterval 10
$ActionQueueSize 100000
$ActionQueueDiscardMark 97500
$ActionQueueHighWaterMark 80000
$ActionQueueType LinkedList
$ActionQueueFileName papertrailqueue
$ActionQueueCheckpointInterval 100
$ActionQueueMaxDiskSpace 2g
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueTimeoutEnqueue 10
$ActionQueueDiscardSeverity 0

Related Topic