The short answer is that the node must monitor the CAN lines to be idle for a certain time before it attempts to transmit. So if another node is transmitting, it must keep quiet till the other node is done.
A CAN bus is based in differential signalling. The two lines, CAN-High (CAN+) and CAN-Low (CAN-), are both at the same potential when the bus is idle. To send bits, a CAN transmitter puts a differential voltage on the lines of about 2 volts.
A CAN transmitter first sees if the bus is idle and if it is, starts to transmit. How the arbitration works is that a transmitter monitors the bus as it's transmitting. Transmission is done as above by either keeping the two lines at the same potential or at a differential potential. So if the transmitter transmits a bit by keeping the lines at the same potential (sic), but it sees that the two transmit lines have a differential potential, that means that some other node is transmitting and the first transmitter has lost the arbitration. It must then stop transmitting.
When a node first starts transmitting, the bits transmitted are the same until the address of the transmitting node which are obviously different. If two nodes start transmitting together, they will transmit together in sync till the address part is reached. When the address differs, a node will notice a potential difference on the lines even when it is not putting one on the lines. Then it knows it has lost and has to try again. The winning node continues transmitting without knowing that some other node was trying as well. Of course, this logic extends to more than two nodes also.
I hope this helps.
Background Information
I have used CAN a few times now for multiple devices distributed over a physically small area, like within a few 10s of meters. In each case, the CAN bus was internal to the system and we could specify exactly what the protocol over CAN would be. None
of these systems had to, for example, interface with OBDII, NMEA2000, etc, where a specific protocol was already defined. One case was a large industrial machine that required lots of distributed sensors and actuators. The outside world interface just dealt with the overall operation of the machine. How the controller got the sensor information and caused the actuators to do stuff was a internal implementation choice that we happened to use CAN for. In another case, a company needed a good way for their customers to control multiple (up to a few dozen) of the gizmos they make within a single larger system. In this case we specified CAN as one communication means and documented the protocol. This protocol would be implemented by the controller of this system, but not surfaced to the end customer which bought this system as a whole and communicated with it thru different means at a higher level.
The EmCan solution
I have converged on a way of dealing with this over several implementations. I am now in the middle of two more such implementations, and this time I decided to use the previous experience to create a formal spec for a infrastructure layer immediately above CAN. CAN is a well designed protocol as far as it goes, and is directly implemented in a number of microcontrollers nowadays. It seems a natural way to connect multiple little devices over a limited physical distance as long as the data rate isn't too high. Basically, it can do everything you probably would have used RS-485 for 20 years ago, except that more protocol layers are specified, the specification makes sense, and hardware implementations are available built into low cost microcontrollers.
The result of this is what I call EmCan (EMbed CAN). I am slowly filling out the formal protocol specification as I migrate code from the previous implementations, generalize the concepts a bit, and make re-usable firmware modules where the EmCan protocol code can be used without change accross a variety of projects. I'm not really ready to officially publish the spec yet and provide the reference implementations, but you can look at what is there to see where things are heading. The current document is a work in progress, as it itself says.
So far I have PIC 18 and dsPIC 33 implementations of the EmCan device side, a stripped down host implementation for PIC 18, and a more full (more things handled locally) implementation for the dsPIC 33. Everything documented in the current version is implemented and seems to be working. I am working on the byte stream interface right now. I did this before in one of the previous systems, but it was more tied into the application and not a nice separable layer like EmCan.
The issue with a switched load
I think trying to switch the CAN bus with FETs or analog switches is a really bad idea. The main reason for the bit rate versus length tradeoff is not the total resistance of the cable, but the round trip propagation. Look at how CAN detects collisions, and you will see this mechanism assumes signal propagation from one end to the other within a fraction of a bit time. The CAN bus needs to be kept a transmission line. For most implementations, such as when using the common MCP2551 bus driver, the characteristic impedance should be close to 120 Ω. That means a 120 Ω resistor at each end of the bus, so any point on the bus looks like a 60 Ω load.
How EmCan fixes this
EmCan solves the node address problem without requiring special hardware. For details, see the EmCan spec, but basically, each node has a globally unique 7 byte ID. Each node periodically requests a bus address and sends this ID. The collision detection mechanism will guarantee that the bus master sees only one of these requests even if multiple nodes send a address request at the same time. The bus master sends a address assignment message that includes the 7 byte ID and the assigned address, so at most one single node is assigned a new address at a time.
If you are interested in this concept and are willing to discuss details of your system, talk to me. My biggest fear right now is specifying something that will be awkward later or prohibit certain usage that I hadn't considered. Having another implementation in progress as the spec is being finalized would be good for spec development and to test out the reference implemenation if you plan to implement it on Microchip PICs.
Best Answer
Funny that with so many correct answers, I still feel like something is amiss or not clear enough. Even most complete answer by @Nick does not correct some wrong assumptions in the question. So, I'll try to make it simpler.
Wrong. CAN physical layer is unique within many differential buses because it uses wired-AND signalling. While most of them indeed pull data lines in two opposite directions, CAN drivers work as open drain (CAN-L) and open-source (CAN-H). So CAN-L can be either low or high-Z, and CAN-H can be either high or high-Z (well... technically, the transceivers include weak biasing resistors pulling common mode to mid-supply). This prevents electrical collisions, since nodes either actively pull lines in the same direction or let them go and allow termination resistors to equalize the voltage between the lines.
The downside of this, of course, is that slew rate of the dominant-to-recessive transition cannot be increased beyond certain point, effectively limiting the bus speed.
CAN nodes do not have an ID. The IDs usually introduced by higher CAN-based protocols, such as CANopen. But keep reading...
Wrong. The priority of the messages is defined by arbitration field, which includes message ID (either 11 or 29 bits) and RTR bit. I believe CAN FD includes even more bits into arbitration but I am a little bit fuzzy on that new standard.
The bit arbitration is done by CAN controller by monitoring the bus while they are sending. If a node detects a dominant level when it is sending a recessive level itself, it will immediately quit the arbitration process and become a receiver.
Since the dominant bit is logically 0, it follows that the message with numerically lowest arbitration field will win the arbitration, i.e message with ID=1 has priority over message with ID=4.
Now, back to node IDs. Some CAN-based protocols include either sender or target node ID as part of their message ID format. This will effectively create node hierarchy, so in this case node IDs will affect the priority. But again, this is not done on CAN data link layer.
The answer is - it does not. Or, more precisely, not completely. We already established that electrical collisions are avoided by wired-AND signalling.
Logical collisions can be avoided completely by making source node ID a part of arbitration field and enforcing node ID uniqueness. However it is rarely done in practice.
More often message IDs are carefully mapped by their priority in particular application and further distributed between nodes with different functions so that each node can only send messages within its own unique range. This approach further reduces chances of collision.
This scenario does not necessarily make data corrupt. If the messages have different arbitration fields the first node losing arbitration will stop transmitting, allowing the other node to send complete correct message.
This, of course, does not eliminate collisions entirely. If two nodes trying to send messages with identical arbitration fields they both will remain active after the arbitration and might collide in the data fields. If this happens, CRC will be used to detect frame error and the message will be discarded by the receivers, prompting re-transmitting by the senders.
In short, ACK confirmation bits, Error frames and CRC validation are used to ensure data integrity and deal with consequences of logical collisions.
There is nothing in CAN data link layer to help with this. Any node can start sending as soon as it detects idle bus.
Although, there is one important detail in the CAN specification - the transmitting node is required to send 8 recessive bits after 3 bits intermission at the end of last frame, for 11 bits total. The nodes with pending messages wait for 7 recessive bits EOF and 3 bits intermission for 10 bits total before they can attempt to transmit. This guarantees that same node cannot send more than one remote or data frame if there are other nodes waiting for the bus to become idle.
However there are several methods to improve this behavior in higher layer protocols. For example, variable delay between bus going idle and starting the transmission can be introduced, making sure nodes have equal opportunities to start talking.
More complicated mechanisms include round-robin scenarios or centralized bus management nodes that orchestrate communication.