Friend of mine asked this question in Vegas in Cisco Live and Cisco cited lack of customer demand and thermal as main reasons.
CMP is absolutely the right solution and we need it in every vendor switch and router, only proper OOB. Server guys have had this for over decade.
So please start adding this as scoring item in your RFQ, so vendors know there is demand.
Because of the thermal comment I emailed Freescale (most routers and switches use their SOC in control-plane) and asked if they are planning to support anything like Intel vPro, which helps implement OOB functionality...
From our marketing team updates I heard something similar will be
implemented in the next generation of our multicore processors, but
this is under discussion yet, so I do not have any details. Ok, I will
pass your suggestion to our core design team, hope they will take it
into account.
There is no guarantee whatsoever that Cisco would implement OOB on Nexus 7k in future supervisors, but I thought it is interesting that Freescale is considering adding a feature like this to their silicon.
Worst case scenario, your HW has gone bad
Best case scenario, it's cosmetic failure due to software defect, luckily you are in SRE which will be supported until 2015, so maybe upgrade it to latest rebuild.
There are two bugID which will cause this error in very benign way.
- CSCsk03373, due to large packets, fixed in SXH
- CSCsw32280, due to CRC errors, fixed in SXH
You should probably check 'show diag events', it should correlate with these messages.
GOLD gives us description for 'TestErrorCounterMonitor', which gives us some data on understanding the message
ID -- Asic Identification
IN -- Asic Instance
PO -- Asic Port Number
RE -- Register Identification
RM -- Register Identification More
EG -- Error Group
DV -- Delta Value
CF -- Consecutive Failure
TF -- Total Failure
I don't unfortunately have CEF256 cards, so I can't check which ASIC it was, but you should be able to do it with:
remote command switch show platform hardware asic-versions | i 47
IN will which of the ASIC it is, I'm guessing as there is at least 4 of them, it is 'pinnacle' ASIC, which is port-asic in CEF256, as I don't think CEF256 has 4 of any other ASIC.
If it is pinnacle, you should be able to use 'sh int capabilities module X' and 'sh int X capabilities' to determine which ports are sharing the 4th port ASIC.
However as the 'Asic Port Number' is 255, it seems to contradict it being 'pinnacle' as no physical port would have this number.
There are some special ports in the card like EOBC, RBUS, DBUS and fabric. Unfortunately I don't know what 255 means, it might mean some of these special ports, it might be just place-holder value.
If 'Total Failure' or TF correlates with interface CRC errors, it might be CSCsw32280, otoh CSCsw32280 should show sensible PO number.
If everything else fails, buy smartnet for the card for a year. I'd be curious if you'd answer your own question when you solve this as to what was the root cause. And especially if you can find out what is port 255.
Best Answer
"Yes, it's completely impact-free.*
*Your experience may vary."
That's the impression I get from Cisco and the Nexus platform in general. As a general rule, I always try to secure a maintenance window for work on critical infrastructure as a CYA measure. Even if it works flawlessly 99% of the time, there's still that 1% that it won't, and "But Cisco told us it would be fine" does little to placate angry customers.