I am receiving the following syslog messages on a 7609-S.
Jun 17 11:52:27.560 BST: %CONST_DIAG-SP-4-ERROR_COUNTER_DATA: ID:47 IN:4 PO:255 RE:169 RM:255 DV:5 EG:2 CF:10 TF:472
Jun 17 11:52:27.560 BST: %CONST_DIAG-SP-4-ERROR_COUNTER_WARNING: Module 6 Error counter exceeds threshold, system operation continue.
The card in slot 6 is as follows:
router1#show module 6
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
6 48 SFM-capable 48 port 10/100/1000mb RJ45 WS-X6548-GE-TX XXXXXXXX
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
6 000e.d771.8550 to 000e.d771.857f 10.1 7.2(1) 8.7(0.22)FW2 Ok
Mod Online Diag Status
---- -------------------
6 Pass
router1#show ver
Cisco IOS Software, c7600rsp72043_rp Software (c7600rsp72043_rp-ADVENTERPRISEK9-M), Version 12.2(33)SRE3, RELEASE SOFTWARE (fc1)
- 2013-06-02 : I received this message, once, for the first time
- 2013-06-06 : I received the message again, only once
- 2013-06-11 : I received the message again, only once
- 2013-06-17 : I have received this message three times today, in a 2 hour period
Searching on the Internet I see other people reporting this issue and it seems to be an indication of hardware failure on the horizon. Has anyone experienced this error before? It simply means (to the best of my knowledge) that that line card is receiving a high volume of errors, above a certain threshold which causes the system to log a syslog message. Should I be worried about this line card?
I do have some graphs I will post here when I get some time over the next day or two showing interface error counters and traffic etc, although I'm not finding much correlation at this point!
Best Answer
Worst case scenario, your HW has gone bad
Best case scenario, it's cosmetic failure due to software defect, luckily you are in SRE which will be supported until 2015, so maybe upgrade it to latest rebuild.
There are two bugID which will cause this error in very benign way.
You should probably check 'show diag events', it should correlate with these messages.
GOLD gives us description for 'TestErrorCounterMonitor', which gives us some data on understanding the message
I don't unfortunately have CEF256 cards, so I can't check which ASIC it was, but you should be able to do it with:
IN will which of the ASIC it is, I'm guessing as there is at least 4 of them, it is 'pinnacle' ASIC, which is port-asic in CEF256, as I don't think CEF256 has 4 of any other ASIC.
If it is pinnacle, you should be able to use 'sh int capabilities module X' and 'sh int X capabilities' to determine which ports are sharing the 4th port ASIC.
However as the 'Asic Port Number' is 255, it seems to contradict it being 'pinnacle' as no physical port would have this number.
There are some special ports in the card like EOBC, RBUS, DBUS and fabric. Unfortunately I don't know what 255 means, it might mean some of these special ports, it might be just place-holder value.
If 'Total Failure' or TF correlates with interface CRC errors, it might be CSCsw32280, otoh CSCsw32280 should show sensible PO number.
If everything else fails, buy smartnet for the card for a year. I'd be curious if you'd answer your own question when you solve this as to what was the root cause. And especially if you can find out what is port 255.