Linux – How to debug the Rsyslog error “NO state file (/path/to/statefile) exists for /path/to/log”

linuxrsyslog

A brief description of my set-up: I have 11 Docker containers in a Docker Compose single-server configuration. Some of these containers produce logs, which I write to (mostly individual) on-host volumes. This results in eight log files, which are shared with a Rsyslog container, again via Docker volumes. Finally this container forwards the logs to Papertrail, the cloud-based log aggregator app.

This is all working: I can see Mongo logs, Apache logs, and various app-level logs in near realtime. However, I am conscious that when I restart the logger container, Rsyslog will lose any record of what logs it has pushed, and so will push the whole lot again. Papertrail probably does its own de-duplication, but nevertheless it would be good to remove this duplication.

I am minded to add another on-host volume, so that state is permanent even if the logger container is destroyed and recreated. However, upon checking a staging instance just now, the state directory /var/spool/rsyslog appears to be empty. Thus, I want to ensure Rsyslog really is writing state files before I do this work.

Am I right in thinking that this directory is set by the $WorkDirectory directive?

Here is a lightly-edited snippet of my config file:

# Load Modules
module(load="imfile")

# If we don't use this, we'll get the hostname of the Logger container
$LocalHostName missive-test

# See https://help.papertrailapp.com/kb/configuration/advanced-unix-logging-tips#rsyslog-2
$ActionResumeInterval 10
$ActionQueueSize 100000
$ActionQueueDiscardMark 97500
$ActionQueueHighWaterMark 80000
$ActionQueueType LinkedList
$ActionQueueFileName papertrailqueue
$ActionQueueCheckpointInterval 100
$ActionQueueMaxDiskSpace 2g
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueTimeoutEnqueue 2
$ActionQueueDiscardSeverity 0

# My own addition
$WorkDirectory /var/spool/rsyslog

# See https://help.papertrailapp.com/kb/configuration/encrypting-remote-syslog-with-tls-ssl/#download-root-certificates
$DefaultNetstreamDriverCAFile /etc/papertrail-bundle.pem # trust these CAs
$ActionSendStreamDriver gtls # use gtls netstream driver
$ActionSendStreamDriverMode 1 # require TLS
$ActionSendStreamDriverAuthMode x509/name # authenticate by hostname
$ActionSendStreamDriverPermittedPeer *.papertrailapp.com

# Slightly edited to obfuscate exact URL
*.* @@logs0.papertrailapp.com:00000

# Use http://www.rsyslog.com/rsyslog-configuration-builder/ to generate new
# log file watchers.

input(type="imfile"
     File="/var/log/missive/controller/socket-listener.log"
     Tag="socket"
     Facility="local0")

The last input directive is an example, there's eight of these, all pretty similar.

Could I be wrong about the location of state data? I've done a few restarts recently, and on each occasion Papertrail did indeed get a 25K new records. So, either it is writing state where I can't see it, or not writing it at all. Am I missing a directive?

I have tried this on Rsyslog 8.26.0 and 8.31.0, and get the same with both versions.

Update 1

What might help is seeing some logs from Rsyslog itself. However, I've checked in /var/log, and there's not a sausage (other than the on-host volumes of the logs I am pushing). Does this need to be turned on explicitly?

After posting this, I've found a command to verify the configuration file, and it looks fine – nothing that looks like it would stop the state files working.

/var # rsyslogd -N1
rsyslogd: version 8.26.0, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: End of config validation run. Bye.

Update 2

I've now added -dn (debug and foreground) switches to Rsyslog, and am now producing (very verbose) log files. Amongst 300K other lines, I found this:

2227.959643591:main thread    : action 0 queue: starting queue
2227.959648334:main thread    : action 0 queue: is disk-assisted, disk will be used on demand
2227.959656291:main thread    : action 0 queue: params: type 1, enq-only 0, disk assisted 1, spoolDir '/var/spool/rsyslog', maxFileSz 1048576, maxQSize 100000, lqsize 0, pqsize 0, child 0, full delay 40000, light delay 70000, deq batch si
ze 16, high wtrmrk 80000, low wtrmrk 70000, discardmrk 97500, max wrkr 1, min msgs f. wrkr 100000
2227.959661911:main thread    : action 0 queue:Reg: finalizing construction of worker thread pool (numworkerThreads 1)
2227.959666678:main thread    : action 0 queue:Reg/w0: finalizing construction of worker instance data (for 1 actions)
2227.959671472:main thread    : action 0 queue:DAwpool: finalizing construction of worker thread pool (numworkerThreads 1)
2227.959675830:main thread    : action 0 queue:DAwpool/w0: finalizing construction of worker instance data (for 1 actions)
2227.959682905:main thread    : action 0 queue[DA]: starting queue
2227.959687826:main thread    : action 0 queue[DA]: .qi file name is '/var/spool/rsyslog/papertrailqueue.qi', len 37
2227.959691356:main thread    : action 0 queue[DA]: I am a child
2227.959702371:main thread    : action 0 queue[DA]: clean startup, no .qi file found
2227.959706512:main thread    : action 0 queue[DA]: state -2040 reading .qi file - can not read persisted info (if any)
2227.959711817:main thread    : file stream N/A params: flush interval 0, async write 0
2227.959719817:main thread    : file stream N/A params: flush interval 0, async write 0
2227.959724086:main thread    : file stream N/A params: flush interval 0, async write 0
2227.959734668:main thread    : action 0 queue[DA]: params: type 2, enq-only 0, disk assisted 0, spoolDir '/var/spool/rsyslog', maxFileSz 1048576, maxQSize 0, lqsize 0, pqsize 0, child 1, full delay -1, light delay -1, deq batch size 8, h
igh wtrmrk 0, low wtrmrk 1, discardmrk 0, max wrkr 1, min msgs f. wrkr 0
2227.959739181:main thread    : action 0 queue[DA]:Reg: finalizing construction of worker thread pool (numworkerThreads 1)
2227.959743488:main thread    : action 0 queue[DA]:Reg/w0: finalizing construction of worker instance data (for 1 actions)
2227.959747045:main thread    : action 0 queue[DA]: queue finished initialization
2227.959753324:main thread    : action 0 queue: DA queue initialized, disk queue 0x5630e6079ea0
2227.959756804:main thread    : action 0 queue: queue finished initialization
2227.959762404:main thread    : Action builtin:omfwd[0x5630e605cd60]: queue 0x5630e605d1e0 started
2227.959766212:main thread    : Activating Ruleset Queue[0] for Ruleset RSYSLOG_DefaultRuleset
2227.959769728:main thread    : activateMainQueue: mainq cnf obj ptr is 0
2227.959773846:main thread    : main Q: starting queue
2227.959781585:main thread    : main Q: is NOT disk-assisted
2227.959789199:main thread    : main Q: params: type 0, enq-only 0, disk assisted 0, spoolDir '/var/spool/rsyslog', maxFileSz 1048576, maxQSize 100000, lqsize 0, pqsize 0, child 0, full delay 97000, light delay 70000, deq batch size 256, 
high wtrmrk 80000, low wtrmrk 20000, discardmrk 98000, max wrkr 2, min msgs f. wrkr 40000
2227.959793605:main thread    : main Q:Reg: finalizing construction of worker thread pool (numworkerThreads 2)
2227.959797797:main thread    : main Q:Reg/w0: finalizing construction of worker instance data (for 1 actions)
2227.959807534:main thread    : main Q:Reg/w1: finalizing construction of worker instance data (for 1 actions)
2227.959813976:main thread    : main Q: queue finished initialization
2227.959819101:main thread    : Main processing queue is initialized and running
2227.959823416:main thread    : running module imfile with config 0x5630e605bda0, term mode: cooperative/SIGTTIN
2227.959847971:main thread    : configuration 0x5630e6047ae0 activated
2227.959891721:main thread    : rsyslog/glbl: using '127.0.0.1' as localhost IP
2227.959897527:main thread    : signaling new internal message via SIGTTOU
2227.959907993:main thread    : rsyslogd: writing pidfile '/var/run/rsyslogd.pid.tmp'.
2227.960012480:main thread    : rsyslogd: initialization completed, transitioning to regular run mode

This mentions /var/spool/rsyslog a couple of times, so it looks like that configuration is indeed being set correctly. However I am still not really finding anything that would explain why statefiles are not being left after a push to Papertrail.

I will continue to look through the logs to see if anything else jumps out. Comments on this, in the meantime, are welcome.

Update 3

Aha, I have found these lines:

/ # grep -C 1 "NO state file" /var/log/rsyslog.log
2227.960253654:imfile.c       : imfile: trying to open state for '/var/log/missive/controller/socket-listener.log', state file 'imfile-state:-var-log-missive-controller-socket-listener.log'
2227.960264463:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-controller-socket-listener.log) exists for '/var/log/missive/controller/socket-listener.log'
2227.960268423:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/controller/socket-listener.log'
--
2227.971282226:imfile.c       : imfile: trying to open state for '/var/log/missive/transmitter/queue.log', state file 'imfile-state:-var-log-missive-transmitter-queue.log'
2227.971298328:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-transmitter-queue.log) exists for '/var/log/missive/transmitter/queue.log'
2227.971302589:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/transmitter/queue.log'
--
2228.510460963:imfile.c       : imfile: trying to open state for '/var/log/missive/transmitter/worker-manager.log', state file 'imfile-state:-var-log-missive-transmitter-worker-manager.log'
2228.510480060:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-transmitter-worker-manager.log) exists for '/var/log/missive/transmitter/worker-manager.log'
2228.510484619:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/transmitter/worker-manager.log'
--
2228.774990028:imfile.c       : imfile: trying to open state for '/var/log/missive/storage/storage-server.log', state file 'imfile-state:-var-log-missive-storage-storage-server.log'
2228.775007074:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-storage-storage-server.log) exists for '/var/log/missive/storage/storage-server.log'
2228.775011149:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/storage/storage-server.log'
--
2228.797432870:imfile.c       : imfile: trying to open state for '/var/log/missive/outtray/outtray-server.log', state file 'imfile-state:-var-log-missive-outtray-outtray-server.log'
2228.797447810:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-outtray-outtray-server.log) exists for '/var/log/missive/outtray/outtray-server.log'
2228.797452333:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/outtray/outtray-server.log'
--
2228.798755146:imfile.c       : imfile: trying to open state for '/var/log/missive/interface/access.log', state file 'imfile-state:-var-log-missive-interface-access.log'
2228.798767248:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-interface-access.log) exists for '/var/log/missive/interface/access.log'
2228.798771170:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/interface/access.log'
--
2228.802581105:imfile.c       : imfile: trying to open state for '/var/log/missive/interface/error.log', state file 'imfile-state:-var-log-missive-interface-error.log'
2228.802592195:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-interface-error.log) exists for '/var/log/missive/interface/error.log'
2228.802595916:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/interface/error.log'
--
2228.818228008:imfile.c       : imfile: trying to open state for '/var/log/missive/mongo/mongodb.log', state file 'imfile-state:-var-log-missive-mongo-mongodb.log'
2228.818241043:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-mongo-mongodb.log) exists for '/var/log/missive/mongo/mongodb.log'
2228.818244939:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/mongo/mongodb.log'
--
2228.826921335:imfile.c       : imfile: trying to open state for '/var/log/missive/traffic/traefik.log', state file 'imfile-state:-var-log-missive-traffic-traefik.log'
2228.826933834:imfile.c       : imfile: NO state file (/var/spool/rsyslog/imfile-state:-var-log-missive-traffic-traefik.log) exists for '/var/log/missive/traffic/traefik.log'
2228.826937674:imfile.c       : imfile: clean startup withOUT state file for '/var/log/missive/traffic/traefik.log'

These would be expected for a first-time run, but of course it would be expected that the state files would be left behind so that the logs are not pushed again. I will search for these error strings in case that helps, and I'll tail the logs to see if there is anything towards the end that prevents the final state from being recorded.

Update 4

I've tried to add this after my input() definitions, also to no avail:

$InputFilePersistStateInterval 100
$InputRunFileMonitor

This is because the manual says, of the second parameter:

This activates the current monitor. It has no parameters. If you forget this directive, no file monitoring will take place.

However, I think this is referring to a legacy-style log which is declared by $ commands, and not the new-style input() syntax.

Update 5

It occurred to me that my configuration above, which overwrites the default Rsyslog config, might be removing something critical. I have therefore modified the relevant COPY command so that the default rsyslogd.conf persists, and my config becomes additional to that.

The new directives (that are not commented out) are thus:

# rsyslog v5: load input modules
# If you do not load inputs, nothing happens!
# You may need to set the module load path if modules are not found.

$ModLoad immark.so # provides --MARK-- message capability
$ModLoad imuxsock.so # provides support for local system logging (e.g. via logger command)
$ModLoad imklog.so # kernel logging (formerly provided by rklogd)

# default permissions for all log files.
$FileOwner root
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022

# Include configuration files from directory
$IncludeConfig /etc/rsyslog.d/*

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                -/var/log/messages

# The authpriv file has restricted access.
authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog

# Log cron stuff
cron.*                                                  -/var/log/cron

# Everybody gets emergency messages
*.emerg                                                 :omusrmsg:*

# Save news errors of level crit and higher in a special file.
uucp,news.crit                                          -/var/log/spooler

# Save boot messages also to boot.log
local7.*                                                /var/log/boot.log

This is still pushing my logs to Papertrail, and still there are no spool files created.

I was tempted to think this could be a permission issue, but Rsyslog runs as root in Alpine, and I believe queues as implemented as threads in the main process. Thus, they will run as root too, and there should be no problem writing to /var/spool/rsyslog/*.

Update 6

I have asked the readers of the Rsyslog bug tracker whether this might be a bug, perhaps because I'm using Docker.

Best Answer

Some very helpful engineers who patrol the Rsyslog issues list have come up with a good-enough fix. It seems that, by default, state files are only written per-log when Rsyslog exits, which is why they do not appear straight away.

Furthermore, it has been suggested that the Docker stop/kill process is happening before Rsyslog gets a chance to write the state files. One contributor says that Rsyslog may need up to 90 seconds after SIGTERM to write state files, and my understanding is that Docker won't normally wait that long (by default, it waits for 10 sec before sending SIGKILL).

Oddly, in my case, Rsyslog is only waiting ~2 seconds (significantly under the 10 second default limit), which suggests to me that it is exiting cleanly, despite not having written any state files. I presently do not know why this happens.

The fix is as follows. In each log declaration, the directive PersistStateInterval may be added to specify how often the state file should be written (in log lines processed). This defaults to 0, which gives the "write on exit" behaviour.

For example:

input(type="imfile"
     File="/var/log/missive/controller/socket-listener.log"
     Tag="socket"
     PersistStateInterval="10"
     Facility="local0")

I will use the strategy of using a low value (e.g. 20) on slow log files, and higher ones (e.g. 100) on fast-moving files. I think I am happier with fairly frequent writes in this fashion, since it will guard against power failure or a container crash.

I am of the opinion this fix is not ideal, since a container shutdown will still fail to update the exact logs-sent position. However, the result of this is that Rsyslog will send some duplicate logs, which should be de-duplicated by the aggregator target (an interval of X will result in a maximum duplication of X-1 log lines, which is not terrible).