Electronic – CANBUS multi-node communication fails when electing bus master

can

I'm managing a legacy system (created by my predecessors) which uses CANBUS. In the system there are multiple nodes, which are run by the MC9S08DZ60 microcontroller each. There are of course separate isolator and transceiver. Every node is configured to have a piece of unique hardware information, lets call it CANMAC.

The way the system is designed is that when the system makes a "cold start", nodes try to agree on themselves who is the bus master. They send with the same CANID (001) to the bus their CANMAC and listen on the bus. The node with highest CANMAC gets elected to bus master.

When there are small amount of nodes, everything is kind of ok. At least the election stops and bus master starts its job.

But with larger number of nodes the elections keep on going for ever. Because some nodes have not seen some other nodes and their higher CANMACs.

This is what I get into logs (I have obfuscated it a bit):

14531ms 001: "Valid election candidate message with CANMAC, 7 bytes payload"
14532ms 001: "Valid election candidate message with CANMAC, 7 bytes payload"
14532ms 000: ???                       00 08 00 00
14532ms 000: ???                       00 08 00 00
14533ms 000: ???                       00 08 00 00
14533ms 000: ???                       00 08 00 00
14533ms 000: ???                       00 08 00 00
14534ms 000: ???                       00 08 00 00
14534ms 000: ???                       00 08 00 00
14534ms 000: ???                       00 08 00 00
14534ms 000: ???                       00 08 00 00
14535ms 000: ???                       00 08 00 00
14535ms 000: ???                       00 08 00 00
14535ms 000: ???                       00 08 00 00
14536ms 000: ???                       00 08 00 00
14536ms 000: ???                       00 08 00 00
14536ms 000: ???                       00 08 00 00
14537ms 000: ???                       00 08 00 00
14537ms 001: "Valid election candidate message with CANMAC, 7 bytes payload"
14538ms 001: "Valid election candidate message with CANMAC, 7 bytes payload"

Now what I ask is:

Is it possible that these messages senders, while sending with the same CANID 001 almost at the same time, are able to circumvent the CANBUS natural collision avoidance/arbitration by broadcasting concurrently and essentially overrun bits on the bus causing unknown messages to form to observer?

Best Answer

Yeah it is possible that you get all manner of strange things if you send the same CAN identifier with different data, at the same time. This could result in error frames, because only the identifier part of the frame participates in bus arbitration.

This could be the reason why some nodes don't recognize the one with highest priority, in case they attempt to send at the same time. Or it could simply be that their start-up time is very different.

And this is also the main reason why networks like CANopen use "node id" as an offset to the identifier of data packages. In your case, this should probably have been implemented so that CAN id 0 to 127 are reserved for this "challenge the current master" package, with 0 being the highest priority:

  • Each time a node wakes up, it sends out a message claiming to be the master. If nobody challenges it, it is now the master.
  • Each time a node sees another node with higher priority making that claim, it takes note of who is now the master.
  • Each time a node is the master and sees someone else with lower priority claiming to be master, it responds by sending out its own message once again.