Routing – What causes packet loss when using full bandwidth from ISP

adslisplatencypacket-lossrouting

We have a small, local ISP that is having some networking issues and I'm trying to understand the cause to better communicate with their network guys. I work with a number of companies with different network/router configurations using their service (dsl).

We have a range of issues but the most repeatable one is that packets are dropped when using a large amount of bandwidth. For example, when I run a speed test and I have a couple pings running at the same time, I see packet loss go to 50 percent and latency go to 10 times what it was. This happens regardless of the customer's configuration.

I've seen the same thing happen when someone is, for example, uploading an album to facebook. All other wan traffic becomes unstable, with high latency and lots of dropped packets. Even when we are not using much bandwidth, however, we still see intermittent packet loss and often unexpectadly high latency's for customers on their network.

  • For a novice network admin trying to understand the issue, is the
    most likely culprit here a problem with the DSL equipment or an
    insufficient router to handle their traffic?
  • How can I test to better understand what is going on?
  • What would be the best way to
    demonstrate the problem to them (and that it is happening for all
    their customers), so as to provide their networking guy with the
    necessary info to resolve the problem?

Best Answer

There are a number of possibilities here, not limited to, but including:

  1. The CPE's CPU is maxing out. (CPE=Customer Premesis Device)

    Check the specifications for the router you are using to make sure it can support the level of traffic you're trying to push. Try and graph the CPE's CPU if you are unsure.

  2. Bufferbloat in your ISP

    If your ISP has configured traffic shaping for DSL tails (to avoid tail dropping packets) - they may have configured their buffer size too large. A good way to tell is by maxing out your connection in the downstream (download) direction. If your latency goes up by 200ms or more, this is probably a problem with your ISP's shaping configuration for DSL tails, and you should complain. They should reduce their buffer depth.

    I'd suggest >50ms of buffer is harmful.

  3. The upstream of your xDSL is being maxed out.

    Remember on most DSL technologies, bandwidth is asymmetric. Thus it takes a much smaller amount of traffic to congest the upstream (upload) on most DSL technologies than the downstream. Make sure you aren't congesting upstream.

  4. Remember tools that graph links are usually averaging samples.

    Tools like rrdtool (and cacti and NMIS and...) are usually showing you a 5-minute sample average of your links. This makes them poor for identifying situations where a user congests a link for 10 seconds or so. That'll just appear as a small bump on a 5-minute-sampled graph. Look for tools which can give you a 'high water mark' of the link as well as a rolling average.

Related Topic