SSH – Sessions Terminate Abruptly with Corrupted MAC on Input Error

centos6ssh

We have a Dell PowerEdge 840 Server running CentOS 6.0 64 bit with 2GB ECC memory. When any user sshs in, after some time the session gets terminated with the output:

Corrupted MAC on input. Packet Corrupt

This happens invariably. The ssh session may work for some time but it eventually fails. I noticed that it is more frequent when I am using X-forwarding.

After Googling for answers, I believe that there is a hardware issue (possibly memory) with the server.
Output from lshw (memory).

          capabilities: internal write-back unified
 *-memory
      description: System Memory
      physical id: 1000
      slot: System board or motherboard
      size: 2GiB
    *-bank:0
         description: DIMM DDR2 Synchronous 667 MHz (1.5 ns)
         product: 72T64000HU3SB
         vendor: 7F7F7F7F7F510000
         physical id: 0
         serial: 09022F17
         slot: DIMM1_A
         size: 512MiB
         width: 64 bits
         clock: 667MHz (1.5ns)
    *-bank:1
         description: DIMM DDR2 Synchronous 667 MHz (1.5 ns)
         product: 72T64000HU3SB
         vendor: 7F7F7F7F7F510000
         physical id: 1
         serial: 09022E13
         slot: DIMM1_B
         size: 512MiB
         width: 64 bits
         clock: 667MHz (1.5ns)
    *-bank:2
         description: DIMM DDR2 Synchronous 667 MHz (1.5 ns)
         product: 72T64000HU3SB
         vendor: 7F7F7F7F7F510000
         physical id: 2
         serial: 09030910
         slot: DIMM2_A
         size: 512MiB
         width: 64 bits
         clock: 667MHz (1.5ns)
    *-bank:3
         description: DIMM DDR2 Synchronous 667 MHz (1.5 ns)
         product: 72T64000HU3SB
         vendor: 7F7F7F7F7F510000
         physical id: 3
         serial: 09030B13
         slot: DIMM2_B
         size: 512MiB
         width: 64 bits
         clock: 667MHz (1.5ns)

I ran memtest86+ and it returned no errors. I also reseated the memory, moved the memory modules into different slots, and even increased swap space to 4GB. The server is a test server which runs a pre-configured Apache server (compiled from source) on different ports. Since each developer has their own specific httpd.conf and test environment, more than one Apache servers could be running.

I also checked syslog for any error messages but could not find anything interesting. Even after asking everybody to stop using the server- with minimum memory use – I still get the same error message and my ssh session terminates with the error message that I pointed above.

What should be my next steps for troubleshooting?

Best Answer

Thanks sendmoreinfo,

It appears that TCP checksum is the culprit. I disabled TCP checksum offload:

ethtool -K eth0 tx off rx off 

and it started working again.