I recently had a data loss due to a non functional bgsave/save (it hang up giving me always the "ERR Background save already in progress" error message)
This is my server section of the redis info command:
# Server
redis_version:2.8.19
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:9968db13395be4aa
redis_mode:standalone
os:Windows
arch_bits:64
multiplexing_api:winsock_IOCP
gcc_version:0.0.0
process_id:5968
run_id:3cf27bdbead6bc8d37d9eb8e0de5eb7898b72ede
tcp_port:6379
uptime_in_seconds:883
uptime_in_days:0
hz:10
lru_clock:11936623
config_file:C:\Program Files\Redis\redis_store.conf
these are my snapshotting settings:
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename store.rdb
dir ./
the server also works in a master role. (don't know if this is of relevance – however: It seems that replication stopped at the same point when the bgsave hang up)
I'm running redis as a service. It seems that the problem started when recently the service crashed for an (to me) unknown reason:
I have the automatic recovery feature active (which automatically re-starts the service after it has crashed).
Since that point in time redis stopped snapshotting (I can see this form the timestamp of the backup files).
My questions are:
- Does anyone have experienced redis crashes on Windows?
- If so, what could be the reason (besides hardware limitations – i've checked that)?
- What can I do to prevent a dead bgsave (preventing any further snapshotting), does the configuration setting "stop-writes-on-bgsave-error no" help?
- Are there any other options to persist the data if bgsave/save is not working?
Sadly I have no info of the "hang up" state, since I had to restart the service due to failed recovery attempted (I tried to migrate the keys into a new redis db via a lua script – but that locked down my service)
Best Answer
Answering my own question:
It seems that the crash was caused by a misconfiguration of the server. The system paging file was not large enough. I therefore lowered the value of the maxmemory parameter - now the problem seems to be gone.
See: https://github.com/MSOpenTech/redis/issues/289