Router – Diagnose Intermittent Disconnect Issues

firewallinternetmodempppoerouter

We have a WatchGuard SOHO 6 with the latest firmware (6.4.1) and an ISP-supplied DSL modem. The firebox's PPPoE settings look like this:

config image

We've been having a problem lately where our internet service will go down every once in a while at no particular time of the day or stress levels, and the only way to get it back up is to reset the modem, the router, or both. Sometimes just the router will do, others it has to be the modem and then the router.

It does not seem to be related to our external IP changing, which logs indicate this is handled fine at least sometimes. (I say at least sometimes because often after a modem reset we get a new IP.)

Both boxes are pretty old but otherwise are working fine.

Last night our connection dropped:

2010-02-24 22:33:38 Local0.Warning      PPPoE: PPPoE connection terminated, service unavailable @35813ms
2010-02-24 22:33:39 Local0.Error        MONITOR: Protocol Down @35814ms
2010-02-24 22:33:40 Local0.Info     MONITOR: Ethernet Link Up @35815ms
2010-02-24 22:33:47 Local0.Warning      PPPoE: Timeout locating PPPoE server, will retry @35821ms
2010-02-24 22:34:36 Local0.Warning      PPPoE: entry duplicated 2 times @35821ms
2010-02-24 22:34:36 Local0.Error        PPPoE: Unable to locate PPPoE server @35871ms
2010-02-24 22:34:36 Local0.Error        MONITOR: Protocol Error @35871ms
2010-02-24 22:34:37 Local0.Info     MONITOR: Ethernet Link Up @35872ms
2010-02-24 22:34:44 Local0.Warning      PPPoE: Timeout locating PPPoE server, will retry @35878ms
2010-02-24 22:35:33 Local0.Warning      PPPoE: entry duplicated 2 times @35878ms
2010-02-24 22:35:33 Local0.Error        PPPoE: Unable to locate PPPoE server @35928ms

…the last five lines repeated ad nauseam every minute until this morning when I reset the modem. Still didn't work. The syslog at the time of modem reset had a momentary interruption from the pattern it had been repeating:

2010-02-25 08:32:17 Local0.Error        MONITOR: Protocol Error @71734ms
2010-02-25 08:32:18 Local0.Info     MONITOR: Ethernet Link Up @71735ms
2010-02-25 08:32:24 Local0.Warning      PPPoE: Timeout locating PPPoE server, will retry @71741ms
2010-02-25 08:33:04 Local0.Warning      PPPoE: entry duplicated 2 times @71741ms
2010-02-25 08:33:07 Local0.Error        MONITOR: Ethernet Link Down @71784ms
2010-02-25 08:33:11 Local0.Error        MONITOR: entry duplicated 3 times @71784ms
2010-02-25 08:33:11 Local0.Info     MONITOR: Ethernet Link Up @71788ms
2010-02-25 08:33:12 Local0.Info     CONFIG: Config file updated @71789ms
2010-02-25 08:33:12 Local0.Info     PPPoE: PPPoE service established @71789ms
2010-02-25 08:33:12 Local0.Info     PPP: Starting login @71789ms
2010-02-25 08:34:12 Local0.Error        PPP: PPP negotiation failed during LCP @71849ms

…after that it continued the same sequence of 'error, link up, timeout, duplicated, unable to locate' that it had been doing all night. Then I reset the router, and it worked fine. What's the most likely cause?

Best Answer

I presume that you have already worked with the ISP and are convinced that it isn't the DSL line or the equipment on their side. If not, I would start THERE. You will spend a bunch of time on the phone, but if the problem is ISP related it will take a while to get fixed, so you should start now. It will likely take a bunch of outages for them to admit the problem, because they close problems if an equipment reset fixes it.

I would obtain (buy or badger the ISP) a replacement DSL modem and see if that fixes the problem. If it does, you're done. If it doesn't, reset the DSL modem to factory settings and return it, because the problem is the WatchGuard.

If you need to replace the WatchGuard (and it isn't under warranty or service), I recommend Astaro. $1000 will get you a great firewall appliance for a small office.

You can also test the WatchGuard directly. If you can put up with some internet downtime, replace it with a standalone PC. Or if your firewall config is simple, get a free trial from Astaro. If the connection remains alive long enough (say three or four times the typical time to failure) then you know the WatchGuard was the problem.

Note #1: You don't need to do these serially, you can move forward on all three at the same time.

Note #2: You can build a great firewall on the cheap with a PC with multiple NICs and something like FS-Security or SmoothWall, but I wouldn't unless you are into rolling your own stuff or are horribly strapped for cash.

Related Topic