Troubleshooting HP ProCurve – CPU at 100% After Reboot

hphp-procurvetroubleshooting

I have been doing firmware upgrades in HP switches. Two different models have been upgraded:

  • ProCurve Switch 5406zl Intelligent Edge (J8697A): upgraded from K.15.06.0008 to K.15.12.0012
  • HP 2520-24G-PoE Switch (J9299A): upgraded from J.14.54 to J.15.09.0021

Checking each switch right after booting the new image, I have observed something:

  1. Switches loaded the new firmware image without errors and connectivity was recovered as soon as the switch booted up. At this point CPU usage was low (under 10%)
  2. Few seconds later CPU usage raised up to 100% and stood there for several minutes. I could not detect any issue at this point apart from CLI through SSH being a little bit sluggish: normal connectivity, no log messages…
  3. After five to ten minutes at 100%, CPU came back to normal without any change from my side.

Both models were having this behaviour. I rolled back one unit of each model to the previous firmware image and they did behave the same way.

Despite this CPU spike right after boot caused no issue, I wonder if normal network behaviour could be the cause for this spike, but I do not think so. I have considered the following aspects:

  1. Right after boot STP starts running, generating BPDUs and cycling all ports in the switch through the Bloking, Listening, Learning and Forwarding states. However, even with 802.1D this process takes no longer than 1 minute with default timers. Furthermore, I was checking the switches through SSH, so all STP computations were already done by the time I could connect to the switch.

  2. Right after boot the MAC address table is empty and broadcast is needed for the first frames to be forwarded. But I doubt this broadcasting would take 100% CPU, much less for 5 minutes, in a 24 port switch.

  3. All switches acting as L2 devices, no L3 functionality enable, so I discard routing and other L3 processes.

Am I missing something "normal" in network operations that may explain this CPU usage for 5 minutes after reboot while keeping connectivity? Maybe it is some kind of background process the switch runs right after reboot?

Best Answer

This took less than 2 seconds on Google: HP networking portal

HP ProCurve 5400zl Switch Series - High CPU Utilization (99-100%) after Startup Issue

Right after the switch is started, high (99-100%) CPU utilization is observed. Solution

When the switch boots, one of the initialization tasks is creation of encryption keys. Having the keys created in advance means that later, when a feature such as SSL or SSH that uses the keys is configured or used, there is minimal delay in the availability. This task takes a few seconds before it ramps up, depending on the switch model, the configuration being loaded, and the software revision. Even though the task consumes up to 100% of the CPU, it runs at a very low priority. Therefore, if another task requiring CPU cycles is started, this low priority task will back off. If this initialization/key generation task runs uninterrupted by higher priority tasks, it takes about 10 minutes to complete. If the CPU is busy with other tasks, the completion time will be extended.

To verify that the elevated CPU being seen is in fact what has been described here and not something else that requires troubleshooting, please use the commands documented below.

task-monitor cpu (this command was introduced in K.13.04)

show uptime

show cpu

The output will look like the following.

Switch# task-monitor cpu Switch# show uptime 0000:00:01:42.36 Switch# show cpu

99 percent busy, from 27 sec ago
1 sec ave: 100 percent busy
5 sec ave: 100 percent busy
1 min ave: 66 percent busy

Task usage for last 5 sec
 % CPU | Description
-------+--------------------------
   0.3 | Sessions & I/O
  99.7 | System Services