TCP Dup ACK, segments lost, retransmission during smtp conversation

smtp

A customer is trying to send emails with (smaller and larger) attachments to one of our exchange servers, but get the connection reset after timeout is met. To me, it seems that the sending server does not receive the ACKs, and hence resends, resulting i DUP ACK from our side.
We are using Cisco ASA, but we're not using any smtp/esmtp policies (aka fixup smtp) for any of the involved interfaces (it is used for a completely different vlan, where no exchange resides).

1.1.1.1 receiving smtp server
2.2.2.2 sending smtp server

Up to and including the "S: 354 Start mail input; end with .", it works like a charm. The problem really comes when data is sent.

Wireshark dump

No   Time        Source                Destination           Protocol Length Info
20   504.923698  1.1.1.1         2.2.2.2         SMTP     78     S: 250 2.1.5 Recipient OK

No   Time        Source                Destination           Protocol Length Info
21   505.304394  2.2.2.2         1.1.1.1         SMTP     60     C: DATA

No   Time        Source                Destination           Protocol Length Info
22   505.304713  1.1.1.1         2.2.2.2         SMTP     100    S: 354 Start mail input; end with .

No   Time        Source                Destination           Protocol Length Info
23   505.599857  2.2.2.2         1.1.1.1         SMTP     1434   C: DATA fragment, 1380 bytes

No   Time        Source                Destination           Protocol Length Info
24   505.620808  2.2.2.2         1.1.1.1         SMTP     1434   C: DATA fragment, 1380 bytes

No   Time        Source                Destination           Protocol Length Info
25   505.620823  1.1.1.1         2.2.2.2         TCP      54     smtp > 55346 [ACK] Seq=450 Ack=2904 Win=64860 Len=0

No   Time        Source                Destination           Protocol Length Info
26   505.919899  2.2.2.2         1.1.1.1         SMTP     1434   [TCP Previous segment lost] C: DATA Fragment, 1380 bytes

No   Time       Source                Destination           Protocol Length Info
27   505.919912  1.1.1.1         2.2.2.2         TCP      54     [TCP Dup ACK 25#1] smtp > 55346 [ACK] Seq=450 Ack=2904 Win=64860 Len=0

No   Time        Source                Destination           Protocol Length Info
28   505.940785  2.2.2.2         1.1.1.1         SMTP     1434   [TCP Previous segment lost] C: DATA fragment, 1380 bytes

No.  Time        Source                Destination           Protocol Length Info
29   505.940797  1.1.1.1         2.2.2.2         TCP      54     [TCP Dup ACK 25#2] smtp > 55346 [ACK] Seq=450 Ack=2904 Win=64860 Len=0

No.  Time        Source                Destination           Protocol Length Info
30   505.961793  2.2.2.2         1.1.1.1         SMTP     1434   [TCP Retransmission] C: DATA fragment, 1380 bytes

No.  Time        Source                Destination           Protocol Length Info
31   505.982494  2.2.2.2         1.1.1.1         SMTP     1434   [TCP Retransmission] C: DATA fragment, 1380 bytes

No.  Time        Source                Destination           Protocol Length Info
32   505.982508  1.1.1.1         2.2.2.2         TCP      54     smtp > 55346 [ACK] Seq=450 Ack=4284 Win=64860 Len=0

No.  Time        Source                Destination           Protocol Length Info
33   506.302829  2.2.2.2         1.1.1.1         SMTP     1434   [TCP Previous segment lost] C: DATA fragment, 1380 bytes

No.  Time        Source                Destination           Protocol Length Info
34   506.302846  1.1.1.1         2.2.2.2         TCP      54     [TCP Dup ACK 32#1] smtp > 55346 [ACK] Seq=450 Ack=4284 Win=64860 Len=0

No.  Time        Source                Destination           Protocol Length Info
35   506.323446  2.2.2.2         1.1.1.1         SMTP     1434   [TCP Retransmission] C: DATA fragment, 1380 bytes

etc etc until timeout met.

We run other exchange servers, to which the sender can send the very same email to. All of our exchange servers sit behind the same firewalls, routers and switches. Probably only the patch cabling that diffs.
oh, and sending attachments on 15MB from gmail to the server works

Normal continous ping:

Ping statistics for 2.2.2.2:
    Packets: Sent = 249, Received = 249, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 82ms, Maximum = 546ms, Average = 138ms
^C

# unfragged packet of 992 bytes works
 C:\Users\someadmin>ping -f -l 992 2.2.2.2

Pinging 2.2.2.2 with 992 bytes of data:
Reply from 2.2.2.2: bytes=992 time=100ms TTL=48
Reply from 2.2.2.2: bytes=992 time=101ms TTL=48
Reply from 2.2.2.2: bytes=992 time=101ms TTL=48
Reply from 2.2.2.2: bytes=992 time=100ms TTL=48

Ping statistics for 2.2.2.2:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 100ms, Maximum = 101ms, Average = 100ms


# unfragged packet of 993 bytes fail
C:\Users\someadmin>ping -f -l 993 2.2.2.2

Pinging 2.2.2.2 with 993 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 2.2.2.2:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

I can however ping googles dns with large packets:

ping -f -l 1472 8.8.8.8

Pinging 8.8.8.8 with 1472 bytes of data:
Reply from 8.8.8.8: bytes=64 (sent 1472) time=31ms TTL=51

ping -f -l 1472 8.8.4.4

Pinging 8.8.4.4 with 1472 bytes of data:
Reply from 8.8.4.4: bytes=64 (sent 1472) time=30ms TTL=51

Cisco ASA policies

class-map inspection_default
 match default-inspection-traffic
!
!             
policy-map type inspect dns preset_dns_map
 parameters
  message-length maximum client auto
  message-length maximum 3096
  no dns-guard
  no protocol-enforcement
  no nat-rewrite
policy-map global_policy
 class inspection_default
  inspect dns preset_dns_map 
  inspect ftp 
  inspect h323 h225 
  inspect h323 ras 
  inspect rsh 
  inspect rtsp 
  inspect sqlnet 
  inspect skinny  
  inspect sunrpc 
  inspect xdmcp 
  inspect sip  
  inspect netbios 
  inspect tftp 
  inspect ip-options 
  inspect icmp 
policy-map shape_policy
 class class-default
  police input 10000000 5000
  police output 10000000 5000
!             

Where should I start looking? Should I start by require the sender to do the same wireshark/tcpdump trace?

Best Answer

It's hard to know for sure but, in my opinion, you have a MTU path issue here. Do a path MTU discovery and reduce the MTU of your gateway (or server NIC) accordingly. If it solves your problem, then you have your proof that some node in the path isn't handling MTU correctly (either dropping the ICMP code 4 packets or simply not sending it back).

Related Topic