Debian – Troubleshooting Random Freezes on Debian 10

debiandebian-busterfreezehard drivekvm-virtualization

I am experiencing random freezes on my Debian 10 system, which force me to shut it down using the power button to take any action. These freezes have happened repeatedly over the past weeks.

Output of uname -a:

Linux debian 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux

Hardware:
Threadripper 2970 WX, 32GB G Skill F4-3200C1 RAM, MSI MEG X399 Creation Motherboard

Disks:
1TB Samsung SSD (Home disk, managed by LVM);
4TB WD RED (mounted via UUID);
3x 8TB Seagate Ironwolf (managed by LVM)

Special software running on the system:
KVM

What I already tried:

  • Updated kernel to 5.10 using backports
  • Memtest86 (currently running for 6 hours already, no errors so far)
  • Inspect logfiles (has not helped me so far)
  • Install kdump-tools (not triggered at the freeze)
  • CPU stress test at 100% for one hour (no freeze. Note: during the freezes, CPU is running at 5% only most of the time, and there is also plenty of RAM free).

syslog:

Dec  4 11:54:12 debian systemd[1]: bacula-director.service: Service RestartSec=1min expired, scheduling restart.
Dec  4 11:54:12 debian systemd[1]: bacula-director.service: Scheduled restart job, restart counter is at 1783.
Dec  4 11:54:12 debian systemd[1]: Stopped Bacula Director Daemon service.
Dec  4 11:54:12 debian systemd[1]: Starting Bacula Director Daemon service...
Dec  4 11:54:42 debian bacula-dir[124998]: bacula-dir: dird.c:1229-0 Could not open Catalog "MyCatalog", database "XXX_DBNAME_XXX".
Dec  4 11:54:42 debian bacula-dir[124998]: bacula-dir: dird.c:1234-0 postgresql.c:332 Unable to connect to PostgreSQL server. Database=XXX_DBNAME_XXX User=XXX_DBUSER_XXX
Dec  4 11:54:42 debian bacula-dir[124998]: Possible causes: SQL server not running; password incorrect; max_connections exceeded.
Dec  4 11:54:42 debian bacula-dir[124998]: 04-Dec 11:54 bacula-dir ERROR TERMINATION
Dec  4 11:54:42 debian bacula-dir[124998]: Please correct configuration file: /etc/bacula/bacula-dir.conf
Dec  4 11:54:42 debian systemd[1]: bacula-director.service: Control process exited, code=exited, status=1/FAILURE
Dec  4 11:54:42 debian systemd[1]: bacula-director.service: Failed with result 'exit-code'.
Dec  4 11:54:42 debian systemd[1]: Failed to start Bacula Director Daemon service.
Dec  4 11:55:01 debian CRON[125097]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec  4 11:55:23 debian NetworkManager[1494]: <info>  [1638636923.5313] device (wlp5s0): set-hw-addr: set MAC address to 76:22:F8:22:9D:00 (scanning)
Dec  4 11:55:23 debian kernel: [161004.635913] IPv6: ADDRCONF(NETDEV_UP): wlp5s0: link is not ready
Dec  4 11:55:23 debian NetworkManager[1494]: <info>  [1638636923.6007] device (wlp5s0): supplicant interface state: inactive -> disconnected
Dec  4 11:55:23 debian NetworkManager[1494]: <info>  [1638636923.6058] device (wlp5s0): supplicant interface state: disconnected -> inactive
Dec  4 11:55:23 debian wpa_supplicant[1493]: wlp5s0: Reject scan trigger since one is already pending
Dec  4 11:55:42 debian systemd[1]: bacula-director.service: Service RestartSec=1min expired, scheduling restart.
Dec  4 11:55:42 debian systemd[1]: bacula-director.service: Scheduled restart job, restart counter is at 1784.
Dec  4 11:55:42 debian systemd[1]: Stopped Bacula Director Daemon service.
Dec  4 11:55:42 debian systemd[1]: Starting Bacula Director Daemon service...
Dec  4 11:56:00 debian libvirtd[1750]: internal error: End of file from qemu monitor
Dec  4 11:56:01 debian kernel: [161042.914669] audit: type=1400 audit(1638636961.809:55): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-4c626220-3780-4f2f-b2f1-0da779a85f8f" pid=125291 comm="apparmor_parser"
Dec  4 11:56:01 debian avahi-daemon[1487]: Interface macvtap2.IPv6 no longer relevant for mDNS.
Dec  4 11:56:01 debian avahi-daemon[1487]: Leaving mDNS multicast group on interface macvtap2.IPv6 with address fe80::5054:ff:fe7e:8739.
Dec  4 11:56:01 debian avahi-daemon[1487]: Withdrawing address record for fe80::5054:ff:fe7e:8739 on macvtap2.
Dec  4 11:56:08 debian kernel: [161050.046357] audit: type=1400 audit(1638636968.942:56): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125301 comm="apparmor_parser"
Dec  4 11:56:09 debian kernel: [161050.203334] audit: type=1400 audit(1638636969.098:57): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125304 comm="apparmor_parser"
Dec  4 11:56:09 debian kernel: [161050.336065] audit: type=1400 audit(1638636969.230:58): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125307 comm="apparmor_parser"
Dec  4 11:56:09 debian kernel: [161050.485278] audit: type=1400 audit(1638636969.378:59): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125310 comm="apparmor_parser"
Dec  4 11:56:09 debian kernel: [161050.492080] virbr1: port 2(vnet0) entered blocking state
Dec  4 11:56:09 debian kernel: [161050.492081] virbr1: port 2(vnet0) entered disabled state
Dec  4 11:56:09 debian kernel: [161050.492140] device vnet0 entered promiscuous mode
Dec  4 11:56:09 debian kernel: [161050.492304] virbr1: port 2(vnet0) entered blocking state
Dec  4 11:56:09 debian kernel: [161050.492306] virbr1: port 2(vnet0) entered listening state
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.3920] manager: (vnet0): new Tun device (/org/freedesktop/NetworkManager/Devices/20)
Dec  4 11:56:09 debian systemd-udevd[125312]: Using default interface naming scheme 'v240'.
Dec  4 11:56:09 debian systemd-udevd[125312]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4246] device (vnet0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4286] keyfile: add connection /run/NetworkManager/system-connections/vnet0.nmconnection (db2c5b63-ff10-4b2d-8690-c6d5b484cb6d,"vnet0")
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4309] device (vnet0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4318] device (vnet0): Activation: starting connection 'vnet0' (db2c5b63-ff10-4b2d-8690-c6d5b484cb6d)
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4319] device (vnet0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4323] device (vnet0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4325] device (vnet0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4327] device (virbr1): bridge port vnet0 was attached
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4327] device (vnet0): Activation: connection 'vnet0' enslaved, continuing activation
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4329] device (vnet0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4396] device (vnet0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4400] device (vnet0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.4505] device (vnet0): Activation: successful, device activated.
Dec  4 11:56:09 debian dbus-daemon[1491]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.7' (uid=0 pid=1494 comm="/usr/sbin/NetworkManager --no-daemon ")
Dec  4 11:56:09 debian systemd[1]: Starting Network Manager Script Dispatcher Service...
Dec  4 11:56:09 debian dbus-daemon[1491]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec  4 11:56:09 debian systemd[1]: Started Network Manager Script Dispatcher Service.
Dec  4 11:56:09 debian nm-dispatcher: req:1 'up' [vnet0]: new request (1 scripts)
Dec  4 11:56:09 debian nm-dispatcher: req:1 'up' [vnet0]: start running ordered scripts...
Dec  4 11:56:09 debian kernel: [161050.633003] audit: type=1400 audit(1638636969.526:60): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125320 comm="apparmor_parser"
Dec  4 11:56:09 debian systemd-udevd[125321]: Using default interface naming scheme 'v240'.
Dec  4 11:56:09 debian kernel: [161050.635853] virbr2: port 2(vnet1) entered blocking state
Dec  4 11:56:09 debian kernel: [161050.635856] virbr2: port 2(vnet1) entered disabled state
Dec  4 11:56:09 debian kernel: [161050.635976] device vnet1 entered promiscuous mode
Dec  4 11:56:09 debian kernel: [161050.636217] virbr2: port 2(vnet1) entered blocking state
Dec  4 11:56:09 debian kernel: [161050.636219] virbr2: port 2(vnet1) entered listening state
Dec  4 11:56:09 debian systemd-udevd[125321]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5352] manager: (vnet1): new Tun device (/org/freedesktop/NetworkManager/Devices/21)
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5510] device (vnet1): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5534] keyfile: add connection /run/NetworkManager/system-connections/vnet1.nmconnection (468ebb82-a253-4d81-a54c-911564d3f4d0,"vnet1")
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5541] device (vnet1): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5549] device (vnet1): Activation: starting connection 'vnet1' (468ebb82-a253-4d81-a54c-911564d3f4d0)
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5550] device (vnet1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5555] device (vnet1): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5558] device (vnet1): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5559] device (virbr2): bridge port vnet1 was attached
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5560] device (vnet1): Activation: connection 'vnet1' enslaved, continuing activation
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5561] device (vnet1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5567] device (vnet1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5570] device (vnet1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:09 debian NetworkManager[1494]: <info>  [1638636969.5648] device (vnet1): Activation: successful, device activated.
Dec  4 11:56:09 debian nm-dispatcher: req:2 'up' [vnet1]: new request (1 scripts)
Dec  4 11:56:09 debian nm-dispatcher: req:2 'up' [vnet1]: start running ordered scripts...
Dec  4 11:56:09 debian kernel: [161050.792545] audit: type=1400 audit(1638636969.686:61): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125367 comm="apparmor_parser"
Dec  4 11:56:09 debian libvirtd[1750]: Domain id=6 name='Whonix-Gateway' uuid=e9b93fae-7ee0-4096-b496-208aa0be517a is tainted: host-cpu
Dec  4 11:56:10 debian avahi-daemon[1487]: Joining mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe05:b4b9.
Dec  4 11:56:10 debian avahi-daemon[1487]: New relevant interface vnet0.IPv6 for mDNS.
Dec  4 11:56:10 debian avahi-daemon[1487]: Registering new address record for fe80::fc54:ff:fe05:b4b9 on vnet0.*.
Dec  4 11:56:10 debian avahi-daemon[1487]: Joining mDNS multicast group on interface vnet1.IPv6 with address fe80::fc54:ff:fe06:4a00.
Dec  4 11:56:10 debian avahi-daemon[1487]: New relevant interface vnet1.IPv6 for mDNS.
Dec  4 11:56:10 debian avahi-daemon[1487]: Registering new address record for fe80::fc54:ff:fe06:4a00 on vnet1.*.
Dec  4 11:56:11 debian kernel: [161052.517316] virbr1: port 2(vnet0) entered learning state
Dec  4 11:56:11 debian kernel: [161052.645328] virbr2: port 2(vnet1) entered learning state
Dec  4 11:56:12 debian kernel: [161053.267983] audit: type=1400 audit(1638636972.162:62): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-a9008a46-7469-47fc-8bcc-4449ae8f2ee8" pid=125417 comm="apparmor_parser"
Dec  4 11:56:12 debian kernel: [161053.446910] audit: type=1400 audit(1638636972.342:63): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-a9008a46-7469-47fc-8bcc-4449ae8f2ee8" pid=125420 comm="apparmor_parser"
Dec  4 11:56:12 debian bacula-dir[125155]: bacula-dir: dird.c:1229-0 Could not open Catalog "MyCatalog", database "XXX_DBNAME_XXX".
Dec  4 11:56:12 debian bacula-dir[125155]: bacula-dir: dird.c:1234-0 postgresql.c:332 Unable to connect to PostgreSQL server. Database=XXX_DBNAME_XXX User=XXX_DBUSER_XXX
Dec  4 11:56:12 debian bacula-dir[125155]: Possible causes: SQL server not running; password incorrect; max_connections exceeded.
Dec  4 11:56:12 debian bacula-dir[125155]: 04-Dec 11:56 bacula-dir ERROR TERMINATION
Dec  4 11:56:12 debian bacula-dir[125155]: Please correct configuration file: /etc/bacula/bacula-dir.conf
Dec  4 11:56:12 debian systemd[1]: bacula-director.service: Control process exited, code=exited, status=1/FAILURE
Dec  4 11:56:12 debian systemd[1]: bacula-director.service: Failed with result 'exit-code'.
Dec  4 11:56:12 debian systemd[1]: Failed to start Bacula Director Daemon service.
Dec  4 11:56:12 debian kernel: [161053.582800] audit: type=1400 audit(1638636972.478:64): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-a9008a46-7469-47fc-8bcc-4449ae8f2ee8" pid=125423 comm="apparmor_parser"
Dec  4 11:56:12 debian kernel: [161053.721423] audit: type=1400 audit(1638636972.618:65): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-a9008a46-7469-47fc-8bcc-4449ae8f2ee8" pid=125426 comm="apparmor_parser"
Dec  4 11:56:12 debian kernel: [161053.729968] virbr2: port 3(vnet2) entered blocking state
Dec  4 11:56:12 debian kernel: [161053.729971] virbr2: port 3(vnet2) entered disabled state
Dec  4 11:56:12 debian kernel: [161053.730078] device vnet2 entered promiscuous mode
Dec  4 11:56:12 debian kernel: [161053.730368] virbr2: port 3(vnet2) entered blocking state
Dec  4 11:56:12 debian kernel: [161053.730370] virbr2: port 3(vnet2) entered listening state
Dec  4 11:56:12 debian systemd-udevd[125321]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6296] manager: (vnet2): new Tun device (/org/freedesktop/NetworkManager/Devices/22)
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6448] device (vnet2): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6475] keyfile: add connection /run/NetworkManager/system-connections/vnet2.nmconnection (efc91c6d-6f2c-46db-a731-12e0e3dd38b6,"vnet2")
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6484] device (vnet2): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6493] device (vnet2): Activation: starting connection 'vnet2' (efc91c6d-6f2c-46db-a731-12e0e3dd38b6)
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6514] device (vnet2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6519] device (vnet2): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6522] device (vnet2): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6524] device (virbr2): bridge port vnet2 was attached
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6524] device (vnet2): Activation: connection 'vnet2' enslaved, continuing activation
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6526] device (vnet2): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6531] device (vnet2): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6533] device (vnet2): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Dec  4 11:56:12 debian NetworkManager[1494]: <info>  [1638636972.6636] device (vnet2): Activation: successful, device activated.
Dec  4 11:56:12 debian nm-dispatcher: req:3 'up' [vnet2]: new request (1 scripts)
Dec  4 11:56:12 debian nm-dispatcher: req:3 'up' [vnet2]: start running ordered scripts...
Dec  4 11:56:12 debian libvirtd[1750]: Domain id=7 name='kaliwhonix' uuid=a9008a46-7469-47fc-8bcc-4449ae8f2ee8 is tainted: host-cpu
Dec  4 11:56:13 debian NetworkManager[1494]: <info>  [1638636973.4320] device (virbr1): carrier: link connected
Dec  4 11:56:13 debian kernel: [161054.533233] virbr1: port 2(vnet0) entered forwarding state
Dec  4 11:56:13 debian kernel: [161054.533235] virbr1: topology change detected, propagating
Dec  4 11:56:13 debian NetworkManager[1494]: <info>  [1638636973.5601] device (virbr2): carrier: link connected
Dec  4 11:56:13 debian kernel: [161054.661230] virbr2: port 2(vnet1) entered forwarding state
Dec  4 11:56:13 debian kernel: [161054.661233] virbr2: topology change detected, propagating
Dec  4 11:56:14 debian avahi-daemon[1487]: Joining mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe55:218.
Dec  4 11:56:14 debian avahi-daemon[1487]: New relevant interface vnet2.IPv6 for mDNS.
Dec  4 11:56:14 debian avahi-daemon[1487]: Registering new address record for fe80::fc54:ff:fe55:218 on vnet2.*.
Dec  4 11:56:14 debian kernel: [161055.749233] virbr2: port 3(vnet2) entered learning state
Dec  4 11:56:16 debian kernel: [161057.765144] virbr2: port 3(vnet2) entered forwarding state
Dec  4 11:56:16 debian kernel: [161057.765146] virbr2: topology change detected, propagating
Dec  4 11:56:22 debian systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Dec  4 11:57:12 debian systemd[1]: bacula-director.service: Service RestartSec=1min expired, scheduling restart.
Dec  4 11:57:12 debian systemd[1]: bacula-director.service: Scheduled restart job, restart counter is at 1785.
Dec  4 11:57:12 debian systemd[1]: Stopped Bacula Director Daemon service.
Dec  4 11:57:12 debian systemd[1]: Starting Bacula Director Daemon service...
Dec  4 11:57:42 debian bacula-dir[125609]: bacula-dir: dird.c:1229-0 Could not open Catalog "MyCatalog", database "XXX_DBNAME_XXX".
Dec  4 11:57:42 debian bacula-dir[125609]: bacula-dir: dird.c:1234-0 postgresql.c:332 Unable to connect to PostgreSQL server. Database=XXX_DBNAME_XXX User=XXX_DBUSER_XXX
Dec  4 11:57:42 debian bacula-dir[125609]: Possible causes: SQL server not running; password incorrect; max_connections exceeded.
Dec  4 11:57:42 debian bacula-dir[125609]: 04-Dec 11:57 bacula-dir ERROR TERMINATION
Dec  4 11:57:42 debian bacula-dir[125609]: Please correct configuration file: /etc/bacula/bacula-dir.conf
Dec  4 11:57:42 debian systemd[1]: bacula-director.service: Control process exited, code=exited, status=1/FAILURE
Dec  4 11:57:42 debian systemd[1]: bacula-director.service: Failed with result 'exit-code'.
Dec  4 11:57:42 debian systemd[1]: Failed to start Bacula Director Daemon service.
Dec  4 11:58:28 debian avahi-daemon[1487]: Interface vnet2.IPv6 no longer relevant for mDNS.
Dec  4 11:58:28 debian avahi-daemon[1487]: Leaving mDNS multicast group on interface vnet2.IPv6 with address fe80::fc54:ff:fe55:218.
Dec  4 11:58:28 debian kernel: [161189.530902] virbr2: port 3(vnet2) entered disabled state
Dec  4 11:58:28 debian kernel: [161189.532432] device vnet2 left promiscuous mode
Dec  4 11:58:28 debian kernel: [161189.532438] virbr2: port 3(vnet2) entered disabled state
Dec  4 11:58:28 debian avahi-daemon[1487]: Withdrawing address record for fe80::fc54:ff:fe55:218 on vnet2.
Dec  4 11:58:28 debian NetworkManager[1494]: <info>  [1638637108.4738] device (vnet2): state change: activated -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec  4 11:58:28 debian NetworkManager[1494]: <info>  [1638637108.4739] device (virbr2): bridge port vnet2 was detached
Dec  4 11:58:28 debian NetworkManager[1494]: <info>  [1638637108.4740] device (vnet2): released from master device virbr2
Dec  4 11:58:28 debian dbus-daemon[1491]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.7' (uid=0 pid=1494 comm="/usr/sbin/NetworkManager --no-daemon ")
Dec  4 11:58:28 debian systemd[1]: Starting Network Manager Script Dispatcher Service...
Dec  4 11:58:28 debian dbus-daemon[1491]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec  4 11:58:28 debian systemd[1]: Started Network Manager Script Dispatcher Service.
Dec  4 11:58:28 debian nm-dispatcher: req:1 'down' [vnet2]: new request (1 scripts)
Dec  4 11:58:28 debian nm-dispatcher: req:1 'down' [vnet2]: start running ordered scripts...
Dec  4 11:58:28 debian libvirtd[1750]: internal error: End of file from qemu monitor
Dec  4 11:58:28 debian kernel: [161189.955670] kauditd_printk_skb: 1 callbacks suppressed
Dec  4 11:58:28 debian kernel: [161189.955672] audit: type=1400 audit(1638637108.854:67): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-a9008a46-7469-47fc-8bcc-4449ae8f2ee8" pid=125761 comm="apparmor_parser"
Dec  4 11:58:33 debian avahi-daemon[1487]: Interface vnet0.IPv6 no longer relevant for mDNS.
Dec  4 11:58:33 debian avahi-daemon[1487]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe05:b4b9.
Dec  4 11:58:33 debian kernel: [161194.626365] virbr1: port 2(vnet0) entered disabled state
Dec  4 11:58:33 debian kernel: [161194.627405] device vnet0 left promiscuous mode
Dec  4 11:58:33 debian kernel: [161194.627410] virbr1: port 2(vnet0) entered disabled state
Dec  4 11:58:33 debian avahi-daemon[1487]: Withdrawing address record for fe80::fc54:ff:fe05:b4b9 on vnet0.
Dec  4 11:58:33 debian NetworkManager[1494]: <info>  [1638637113.5705] device (vnet0): state change: activated -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec  4 11:58:33 debian NetworkManager[1494]: <info>  [1638637113.5706] device (virbr1): bridge port vnet0 was detached
Dec  4 11:58:33 debian NetworkManager[1494]: <info>  [1638637113.5706] device (vnet0): released from master device virbr1
Dec  4 11:58:33 debian nm-dispatcher: req:2 'down' [vnet0]: new request (1 scripts)
Dec  4 11:58:33 debian nm-dispatcher: req:2 'down' [vnet0]: start running ordered scripts...
Dec  4 11:58:33 debian avahi-daemon[1487]: Interface vnet1.IPv6 no longer relevant for mDNS.
Dec  4 11:58:33 debian avahi-daemon[1487]: Leaving mDNS multicast group on interface vnet1.IPv6 with address fe80::fc54:ff:fe06:4a00.
Dec  4 11:58:33 debian kernel: [161194.709691] virbr2: port 2(vnet1) entered disabled state
Dec  4 11:58:33 debian kernel: [161194.711157] device vnet1 left promiscuous mode
Dec  4 11:58:33 debian kernel: [161194.711160] virbr2: port 2(vnet1) entered disabled state
Dec  4 11:58:33 debian avahi-daemon[1487]: Withdrawing address record for fe80::fc54:ff:fe06:4a00 on vnet1.
Dec  4 11:58:33 debian NetworkManager[1494]: <info>  [1638637113.6504] device (vnet1): state change: activated -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec  4 11:58:33 debian NetworkManager[1494]: <info>  [1638637113.6505] device (virbr2): bridge port vnet1 was detached
Dec  4 11:58:33 debian NetworkManager[1494]: <info>  [1638637113.6506] device (vnet1): released from master device virbr2
Dec  4 11:58:33 debian nm-dispatcher: req:3 'down' [vnet1]: new request (1 scripts)
Dec  4 11:58:33 debian nm-dispatcher: req:3 'down' [vnet1]: start running ordered scripts...
Dec  4 11:58:33 debian libvirtd[1750]: internal error: End of file from qemu monitor
Dec  4 11:58:34 debian kernel: [161195.162429] audit: type=1400 audit(1638637114.062:68): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-e9b93fae-7ee0-4096-b496-208aa0be517a" pid=125798 comm="apparmor_parser"
Dec  4 11:58:38 debian kernel: [161200.026312] audit: type=1400 audit(1638637118.926:69): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-bc32a202-4a7e-45ca-a2f3-e55e78ef8998" pid=125805 comm="apparmor_parser"
Dec  4 11:58:39 debian kernel: [161200.177640] audit: type=1400 audit(1638637119.078:70): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-bc32a202-4a7e-45ca-a2f3-e55e78ef8998" pid=125808 comm="apparmor_parser"
Dec  4 11:58:39 debian kernel: [161200.327978] audit: type=1400 audit(1638637119.226:71): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-bc32a202-4a7e-45ca-a2f3-e55e78ef8998" pid=125811 comm="apparmor_parser"
Dec  4 11:58:39 debian kernel: [161200.475333] audit: type=1400 audit(1638637119.374:72): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-bc32a202-4a7e-45ca-a2f3-e55e78ef8998" pid=125814 comm="apparmor_parser"
Dec  4 11:58:39 debian NetworkManager[1494]: <info>  [1638637119.3855] manager: (macvtap2): new Macvlan device (/org/freedesktop/NetworkManager/Devices/23)
Dec  4 11:58:39 debian systemd-udevd[125816]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Dec  4 11:58:39 debian systemd-udevd[125816]: Using default interface naming scheme 'v240'.
Dec  4 11:58:39 debian kernel: [161200.621739] audit: type=1400 audit(1638637119.522:73): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-bc32a202-4a7e-45ca-a2f3-e55e78ef8998" pid=125826 comm="apparmor_parser"
Dec  4 11:58:39 debian NetworkManager[1494]: <info>  [1638637119.6625] device (macvtap2): carrier: link connected
Dec  4 11:58:41 debian avahi-daemon[1487]: Joining mDNS multicast group on interface macvtap2.IPv6 with address fe80::5054:ff:fe7c:a067.
Dec  4 11:58:41 debian avahi-daemon[1487]: New relevant interface macvtap2.IPv6 for mDNS.
Dec  4 11:58:41 debian avahi-daemon[1487]: Registering new address record for fe80::5054:ff:fe7c:a067 on macvtap2.*.
Dec  4 11:58:42 debian systemd[1]: bacula-director.service: Service RestartSec=1min expired, scheduling restart.
Dec  4 11:58:42 debian systemd[1]: bacula-director.service: Scheduled restart job, restart counter is at 1786.
Dec  4 11:58:42 debian systemd[1]: Stopped Bacula Director Daemon service.
Dec  4 11:58:42 debian systemd[1]: Starting Bacula Director Daemon service...
Dec  4 11:58:43 debian systemd[1]: NetworkManager-dispatcher.service: Succeeded.

What I also found:

https://superuser.com/questions/954262/why-do-damaged-hard-drives-freeze-the-entire-system

Maybe it is caused by the disks? The freezes happen during extensive disk IO (the service I am developing is writing lots of data). I ordered some PCIe SATA controllers to try plug the external disks in a different controller from the home disk. Is there anything else I can do to to get this fixed?

I ran smartctl -a on the disks and they didn't indicate any dead drives. I rather assume these errors are just glitches among all of these IO requests, which would not be so much of a problem if it didn't freeze the whole system.

I cannot just unplug hardware one by one and see what happens (sometimes these freezes happen after an hour, sometimes a day, sometimes a week. The service requires the external disks to run properly, so I would have to shut everything down for weeks of debugging. I hope there is another way to find out what is wrong).

Any help much appreciated.

Best Answer

I plugged all the HDDs into a second SATA controller, which resolved the issue. No more freezes for a whole month. Apparently my theory was true; I assumed some timout during the high load HDD activity would cause the whole controller to freeze, thus blocking access to swap. When swap was disabled, there were also no freezes.

Related Topic