Per Olin Lathrop's suggestion, I'll expand on bit-stuffing.
CAN uses NRZ coding, and is therefor not happy with long runs of ones or zeroes (It loses track of where the clock edges ought to be). It solves this potential problem by bit-stuffing. When transmitting, if it encounters a run of 5 successive ones or zeros it inserts a bit of the other polarity, and when receiving, if it encounters 5 successive ones or zeroes it ignores the subsequent bit (unless the bit is the same as the previous bits, in which case it issues an error flag).
If you are sending all zeroes or all ones for your test data, a string of 64 identical bits will result in the insertion of 12 stuffed bits. This will increase total frame length to 140 bits, with a best-case frame rate of 874 frames / sec. If the data bits are the same as the MSB of the CRC, you'll get another stuffed bit there, and the frame rate drops to 868 frames/ sec. If the CRC has long runs of ones or zeroes, that will reduce the frame rate even further. The same consideration applies to your identifiers.
A total of 16 stuffed bits will produce an ideal frame rate of 850.3 frames/sec, so you ought to consider it. A quick test would be to use test data with alternating bits, and see what happens to your frame rate.
You should avoid interrupts when possible. Issues:
- They mess up predictable real-time behavior.
- Numerous interrupts may lead to unpredicted stack usage.
- It is easy to write very subtle and very severe bugs when sharing data between an ISR and the background program.
That being said, you can avoid these problems with careful system design.
1) is only a problem for non-cyclic interrupts that may arrive at any point in time. As long as the interrupts have deterministic behavior and are triggered cyclically, you can use them. In that case they are no different than a high priority process and you can still predict real-time behavior of the system.
2) can be avoided by reducing the number of interrupt sources as far as possible. Other safety measures is to always allocate a stack which is larger than necessary, and most importantly: place the stack so that upon overflow, it doesn't fail-cascade into other RAM memory segments like .bss or .data! Here is a good article about this.
3) is the hardest one to protect yourself from. Every variable shared between an ISR and the background program has to be handled with lots of care. Two issues exist: re-entrancy and compiler optimizer problems.
Re-entrancy has to be solved in case-to-case basis with atomic access/semaphores/mutex or by temporarily disabling interrupts. This is always tricky and you have to ensure that you have considered every scenario, and that the produced machine code actually does what you think.
The other issue is where your compiler does not realize that your ISR is called by the MCU rather than from your code, and therefore fails to understand that all variables used by the ISR can be updated at any point in time. The compiler may then optimize the background code incorrectly, since it assumes that a certain variable is never used. This bug can be avoided by always declaring variables shared with an ISR as volatile
.
Both of these issues are common sources for very subtle, but often severe bugs. There's no standard way to protect yourself against them, the closest thing to a safety measure is to only allow your most hardened C veteran to write all ISRs. Intermediately experienced programmers, not to mention beginners, always write these bugs, over and over again.
Because of this, it is very hard to justify the use of interrupts in safety-critical applications. You would have to spend lots of time on the design, tests and documentation to verify that every such interrupt is not causing problems. Therefore I can understand why some safety standards bans the use of interrupts entirely.
As for the specific issue of CAN, it sounds a bit like you have either picked the wrong MCU for the task or you are not using the CAN controller correctly. More advanced CAN controllers have rx buffers which you can set up for dedicated messages, and in addition an rx FIFO where the rest of the messages go. I'm pretty sure NXP have such CAN controllers for their Cortex-M families, at least they do on LPC11C.
With such an approach and a carefully designed application-layer CAN protocol, you should not need rx interrupts. All safety-critical CAN networks must be designed to send messages over and over again periodically. If you know that a certain message only arrives once every 5ms, then you merely have to ensure that your background program is fast enough to handle it before the next message arrives.
For SIL4 you would likely have more than one CAN bus: you would dedicate one bus for safety-critical real-time messages and put everything else on another non-critical bus. Redundancy solutions with multiple CAN buses transmitting the same critical data are also used sometimes.
Best Answer
Except for the first message, this is typical SDO traffic (CANopen), with request/response pairs:
For the first pair, a dead giveaway is the 0x4B in the first byte of the response. This indicates that the returned data is of size two bytes (for one byte and four bytes, it is 0x4F and 0x43, respectively). The 0x40 in the first byte of the request indicates it is a read request (the standard uses a different term, "Upload", with the opposite meaning as on the Internet (download) - it is from the perspective of the addressed device).
The request CAN ID is 0x600 + node ID. The response CAN ID is 0x580 + node ID. Thus:
For SDOs, the CANopen index and subindex is in the second, third, and fourth byte (least significant byte first for the CANopen index). So for the first pair, 40 78 60 00, the requestor says: "Device at node ID 0x65 (101), give me your stored value at 6078sub0".
In this case the information is flowing from the addressed device to whoever made the request (the requestor can not be seen from the CAN bus log, but it is usually a central controller in the system or a service tool running on a PC (usually a USB-to-CAN adapter)).
Thus, for the shown traffic, read requests are made for (the response for the last one is not included in the posted CAN bus log):
Strangely, the request for 6041sub0 is repeated.
Furthermore, even though SDOs are usually only for configuration information, the CANopen index range 0x6000 to 0x6FFF is usually used for non-configuration information, like measured quantities or status.
Diving into the manual
The SDO indexes / subindexes can be looked up in the manual (I have included the actual values from the sample CAN bus log):
The first message
Assuming the first message is also CANopen, 4E5 F0 16 00 00: For all CANopen CAN messages the ID is a four-bit function code (0-15) followed by a seven-bit node ID. In this case, 0x4E5 = 1001 1100101b. Thus the function code is 1001b = 9, meaning "PDO4, transmit". The direction of information flow for PDOs (despite, in this case, "transmit") is a matter of definition (depends on the application). The node ID is 1100101b = 0x65.
The node ID for the PDO is the same as for the SDOs.
The information in this PDO, "Transmit PDO 4", is contained in SDO 0x1A03, "Transmit PDO Mapping Parameter 4". If it has not been changed from the default, the data in the PDO is the same as SDO 0x60FAsub0, a signed 32-bit integer:
Conclusion
The motor device with node ID 0x65 sends out the control effort (likely at regular time intervals) using a PDO.
A controller or a real-time monitor window in a service tool reads and display other measured quantities / status from the same motor device using SDOs.