Electronic – How to avoid duplicate messages in a mesh network of mcus

meshmicrocontrollerNetworknetworkinguart

I'm working on a project, where each pcb has a connector on every side, so you can connect these board in a grid. It's basically a mesh network of modules.

These modules need to communicate with each other. Since they are connected in an arbitrary way, there is no predictable path that we can relay on.
The grid can have missing modules, for example a 3×3 arrangement can have the middle module missing, or it can be two 2×2 grids connected with one module.

Messages can be either directly addressed to a node or be a broadcast.

My initial ideas:

I2C:
Length of the line would be pretty long (and pretty much undefined) it's not a good contender. Adding I2C drivers would increase the cost and complexity of the boards.
It would be too slow, because even if two adjacent modules would need high speed communication, it would hog down the whole system. There's also a danger of interference, especially in high speed mode, since because it's a mash network, there could be multiple paths to the same destinations.

UART:
Current solution of choice.
The MCU i'm using has 4 UART modules, so there is a dedicated UART peripheral for each edge of the board.
Each message is received by the MCU and processed. If that MCU was the target, it ends there, if not or it was a broadcast message, it sends it to the other 3 UART peripherals.

The problem:
Each message can get to each mcu multiple times, through multiple paths.
This causes severe problems: we don't no if we acted on a message, and because we don't know that we handled the message, it will be kept repeating, potentially indefinably.

The solutions:
I have not come up with a satisfactory solution yet, here are my ideas so far.

The first idea was to always send a message id, and each mcu will keep track of the last index received for each sender. This way we know which messages are new. We only forward and process messages that have a higher id than the last one.
This has downsides:
We need to send additional data with each message, the id.
Even if the id is a 32 bit number, which is not trivially small for each message, it may overflow. I'm not really sure how could that be handled, since receiving messages delayed on another path does not make that straight forward.
If a message is missing on the faster path, the previous message will be lost, since the counter is already incremented.
With a large number of nodes

An other solution is to store the last n message ids.
This has a downside of higher memory consumption, even just storing the last id for each sender address consumes relatively high amounts of memory, which is a scarce resource on an mcu. Also, if a module receives n+1 messages on a faster path, the messages coming from the slower path will be processed again.

My last idea is to create a map on startup.
This would involve each module sending a message to it's neighbors to has their address, after they've got all the addresses, each node should broadcast their neighbors. Filtering out duplicates is easy on the initial broadcast, since each sender should only send exactly one message. After that all nodes should have the same information, and using the same algorithm the should arrive at the same path, that makes sure that each module get's each message exactly once.
The downside is that this wrecks the ability to hotplug, although that was never a goal, so i don't mind loosing that. This is something that makes software update a bit trickier if the routing part is update, since unless they all run the exact same routing algorithm, some modules may not get the intended messages.

Is there something i'm missing? Is there an easier way to solve this?

Best Answer

One approach is to put a TTL (time to live) field in the message. This is a small integer, and its value is decremented each time the message is forwarded. If the value reaches zero, the message is discarded.

The node creating the message sets the initial TTL to a number that's high enough to get it forwarded to the farthest node in the network. An 8-bit field should be more than enough for any reasonably-sized mesh.

No state information needs to be stored on any of the nodes.