HP Procurve 5412zl – Management Unresponsive via IP When CPU is High

hp-procurvemanagementswitch

I've got at least one problem of consistently high CPU on an HP 5412. It works fine while the management interface is unavailable, but i'm unable to login to SSH or the web. It seems to happen at random, and if the cpu goes over ~80% it becomes inaccessible.

I can login via Serial and access the commands, but it's painfully slow to display anything.
I've looked at the logs and it's pretty sparse. The only thing i can think of is there is a trunk between this switch and a Cisco UCS. It's giving some PVID mismatch, but looking at the logs it's been that way for over a year, with this CPU issue only occurring in the past few weeks.

I'm really at a loss as to why the CPU would be so high. Normally this switch runs <20%. I've not dealt with networking let alone switches in years, it's been my experience that you just set them and they run.

I guess it boils down to any idea after 1+ years of running a switch would go from 20% CPU average to 80% average. Also why would high CPU knock out the IP management interfaces.

FWIW I've rebooted the switch to no avail, if anything it's made it worse.

==Update==

I've updated firmware, and fixed the PVID issues. The CPU issue still seems to be present, The difference definitely seems to relate to usage, as the day or so it ran fine was during a weekend. I'm not seeing any egregious port usage, or anything like that. Our Toshiba phones do tend to generate errors, but nothing has really changed recently with them. I've updated HP as well.

Best Answer

There's a very simple answer. Management traffic such as SSH, HTTP, ICMP is handled by the CPU. The main purpose of the switch is traffic forwarding, and management traffic has the lowest priority. Thus when CPU is roofing, all mgmt access gets trampled underway to preserve the network services.

If you're running and old code you might be bumping into a SW issue. Download current SW from HPN website and upgrade. If you don't need the latest and greatest features but rather are interested in stability, go for the Maintenance release, current one is K.15.10.0015m. It has longer lifespan than the others, and new features aren't added so it will update only if there's a bugfix.

PVID mismatch is also a simple one. The PVID of Procurve is the VLAN id of the untagged VLAN. PVID mismatch shouldn't affect the CPU, but it will fill up your log with unnecessary crap. Mismatching the VLAN tagging also causes broadcasts from one subnet to spill into another which isn't a behavior you want to see. So ensure VLAN tagging matches on both ends of the link.