Internal Site to Site VPN over MPLS speed issues

mplssite-to-site-vpn

How do I troubleshoot slow performance of a site to site VPN tunnel over a MPLS circuit? What are the relevant reports/stats I should be looking at?

Background:
I support one end of a Site to Site VPN that is used to connect two ends of a Process Control Network (PCN). The PCN is separated from the Business/Corporate network by Juniper SRX/SSG firewalls that also provide the VPN endpoints.

Originally the business network between the sites was connected with an AT&T GigaMAN connection, which as I understand it is a brand name for Metro Ethernet Service. One site was a sub-site of the main site (mine) and any traffic that needed to go to a different company site other than mine from the sub-site passed though the main site before routing to the other sites in the company.

Due in part by cost and in part for additional reliability the Metro Ethernet was replaced by a T3 circuit tied into the company MPLS at the sub-site. The main site was already on MPLS with the rest of the company.
One of the uses for the VPN is scheduled file transfers between sites, and since the switch to MPLS at the sub-site we will have intermittent time outs for the transfers.

I don’t control the company LAN or WAN, just the PCN, so I have to work through another group to find the root cause but don’t know the right questions to ask.

Best Answer

Things I would recommend looking at:

  • Pull event logs from the Juniper boxes, especially looking for drops in the tunnel.
  • Run debug logs on the Juniper boxes, especially if the issues are consistent enough that you can do so without worrying about log rollover or performance issues while debugging.
  • Get any MPLS reports that will show loss of connectivity, bandwidth utilization, etc. as granular as possible in timeframe
  • Do some normal tests. Test various endpoints, file sizes, MTU sizes, QCheck tests, etc. at various times of the day. If you can run these during the intermittent issues, even better.
  • If it can be reproduced, even on a daily basis, try running the endpoints with wireshark logging and then analyze those logs.
  • Try different file transfer protocols. See if the issue is the protocol itself. SMB is pretty poor over a VPN tunnel. Try FTP instead. Test and gather results.

Really overall, the more data points and logs you can gather from various angles, the easier it will be to put the puzzle together.

Related Topic