I think implementing the CAN protocol in firmware only will be difficult and will take a while to get right. It is not a good idea.
However, your prices are high. I just checked, and a dsPIC 33FJ64GP802 in QFN package sells for 3.68 USD on microchipdirect for 1000 pieces. The price will be lower for real production volumes.
The hardware CAN peripheral does some real things for you, and the price increment for it is nowhere near what you are claiming.
Added:
Since you seem to be determined to try the firmware route, here are some of the obvious problems that pop to mind. There will most likely be other problems that haven't occured to me yet.
You want to do CAN at 20 kbit/s. That's a very slow rate for CAN, which go up to 1Mbit/s for at least 10s of meters. To give you one datapoint, the NMEA 2000 shipboard signalling standard is layerd on CAN at 200 kbits/s, and that's meant to go from one end of a large ship to the other.
You may think that all you need is one interrupt per bit and you can do everything you need in that interrupt. That won't work because there are several things going on in each CAN bit time. Two things in particular need to be done at the sub-bit level. The first is detecting a collision, and the second is adjusting the bit rate on the fly.
There are two signalling states on a CAN bus, recessive and dominant. Recessive is what happens when nothing is driving the bus. Both lines are pulled together by a total of 60 Ω. A normal CAN bus as implemented by common chips like the MCP2551, should have 120 Ω terminators at both ends, hence a total of 60 Ω pulling the two differential lines together passively. The dominant state is when both lines are actively pulled apart, somewhere around 900mV from the recessive state if I remember right. Basically, this is like a open collector bus, except that it's implemented with a differential pair. The bus is in recessive state if CANH-CANL < 900mV and dominant when CANH-CANL > 900mV. The dominant state signals 0, and the recessive 1.
Whenever a node "writes" a 1 to the bus (lets it go), it checks to see if some other node is writing a 0. When you find the bus in dominant state (0) when you think you're sending and the current bit you're sending is a 1, then that means someone else is sending too. Collisions only matter when the two senders disagree, and the rule is that the one sending the recessive state backs off and aborts its message. The node sending the dominant state doesn't even know this happened. This is how arbitration works on a CAN bus.
The CAN bus arbitration rules mean you have to be watching the bus partway thru every bit you are sending as a 1 to make sure someone else isn't sending a 0. This check is usually done about 2/3 of the way into the bit, and is the fundamental limitation on CAN bus length. The slower the bits rate, the more time there is for the worst case propagation from one end of the bus to the other, and therefore the longer the bus can be. This check must be done every bit where you think you own the bus and are sending a 1 bit.
Another problem is bit rate adjustment. All nodes on a bus must agree on the bit rate, more closely than with RS-232. To prevent small clock differences from accumulating into significant errors, each node must be able to do a bit that is a little longer or shorter than its nominal. In hardware, this is implemented by running a clock somewhere around 9x to 20x faster than the bit rate. The cycles of this fast clock are called time quanta. There are ways to detect that the start of new bits is wandering with respect to where you think they should be. Hardware implementations then add or skip one time quanta in a bit to re-sync. There are other ways you could implement this as long as you can adjust to small diferences in phase between your expected bit times and actual measured bit times.
Either way, these mechanisms require various things be done at various times within a bit. This sort of timing will get very tricky in firmware, or will require the bus to be run very slowly. Let's say you implement a time quanta system in firmware at 20 kbits/s. At the minimum of 9 time quanta per bit, that would require 180 kHz interrupt. That's certainly possible with something like a dsPIC 33F, but will eat up a significant fraction of the processor. At the max instruction rate of 40 MHz, you get 222 instruction cycles per interrupt. It shouldn't take that long to do all the checking, but probably 50-100 cycles, meaning 25-50% of the processor will be used for CAN and that it will need to preempt everything else that is running. That prevents many applications these processors often run, like pulse by pulse control of a switching power supply or motor driver. The 50-100 cycle latency on every other interrupt would be a complete show stopper for many of the things I've done with chips like this.
So you're going to spend the money to do CAN somehow. If not in the dedicated hardware peripheral intended for that purpose, then in getting a larger processor to handle the significant firmware overhead and then deal with the unpredictable and possible large interrupt latency for everything else.
Then there is the up front engineering. The CAN peripheral just works. From your comment, it seems like the incremental cost of this peripheral is $.56. That seems like a bargain to me. Unless you've got a very high volume product, there is no way you're going to get back the considerable time and expense it will take to implement CAN in firmware only. If your volumes are that high, the prices we've been mentioning aren't realistic anyway, and the differential to add the CAN hardware will be lower.
I really don't see this making sense.
CAN sounds the most applicable in this case. The distances inside a house can be handled by CAN at 500 kbits/s, which sounds like plenty of bandwidth for your needs. The last node can be a off the shelf USB to CAN interface. That allows software in the computer to send CAN messages and see all the messages on the bus. The rest is software if you want to present this to the outside world as a TCP server or something.
CAN is the only communications means you mentioned that is actually a bus, except for rolling your own with I/O lines. All the others are point to point, including ethernet. Ethernet can be made to logically look like a bus with switches, but individual connections are still point to point and getting the logical bus topology will be expensive. The firmware overhead on each processor is also considerably more than CAN.
The nice part about CAN is that the lowest few protocol layers are handled in the hardware. For example, multiple nodes can try to transmit at the same time, but the hardware takes care of detecting and dealing with collisions. The hardware takes care of sending and receiving whole packets, including CRC checksum generation and validation.
Your reasons for avoiding PICs don't make any sense. There are many designs for programmers out there for building your own. One is my LProg, with the schematic available from the bottom of that page. However, building your own won't be cost effective unless you value your time at pennies/hour. It's also about more than just the programmer. You'll need something that aids with debugging. The Microchip PicKit 2 or 3 are very low cost programmers and debuggers. Although I have no personal experience with them, I hear of others using them routinely.
Added:
I see some recommendations for RS-485, but that is not a good idea compared to CAN. RS-485 is a electrical-only standard. It is a differential bus, so does allow for multiple nodes and has good noise immunity. However, CAN has all that too, plus a lot more. CAN is also usually implemented as a differential bus. Some argue that RS-485 is simple to interface to electrically. This is true, but so is CAN. Either way a single chip does it. In the case of CAN, the MCP2551 is a good example.
So CAN and RS-485 have pretty much the same advantages electrically. The big advantage of CAN is above that layer. With RS-485 there is nothing above that layer. You are on your own. It is possible to design a protocol that deals with bus arbitration, packet verification, timeouts, retries, etc, but to actually get this right is a lot more tricky than most people realize.
The CAN protocol defines packets, checksums, collision handling, retries, etc. Not only is it already there and thought out and tested, but the really big advantage is that it is implemented directly in silicon on many microcontrollers. The firmware interfaces to the CAN peripheral at the level of sending and receiving packets. For sending, the hardware does the colllision detection, backoff, retry, and CRC checksum generation. For receiving, it does the packet detection, clock skew adjusting, and CRC checksum validation. Yes the CAN peripheral will take more firmware to drive than a UART such as is often used with RS-485, but it takes a lot less code overall since the silicon handles so much of the low level protocol details.
In short, RS-485 is from a bygone era and makes little sense for new systems today. The main issue seems to be people who used RS-485 in the past clinging to it and thinking CAN is "complicated" somehow. The low levels of CAN are complicated, but so is any competent RS-485 implementation. Note that several well known protocols based on RS-485 have been replaced by newer versions based on CAN. NMEA2000 is one example of such a newer CAN-based standard. There is another automotive standard J-J1708 (based on RS-485) that is pretty much obsolete now with the CAN-based OBD-II and J-1939.
Best Answer
I think SPI would be my first choice. I2C and UART are way slower, USB is overkill. 1 MHz is not slow, but certainly doable for a short distance. You might need matched impedances on your lines, and a scope will be very handy for troubleshooting.
I'd assume that errors are rare. On top of SPI, you could implement a go-back protocol: send data as a big block with an ID and a checksum. With each SPI exchange, the ST returns the ID of the last succesfully received block. When that doesn't match what the RaPi hoped for, it must go back to transmitting the last not-yet-succesfully-received block.
When a block is not received OK, the RaPi will notice this with the transmission of the next block, so two blocks are lost. If this is deemed too much the TS could buffer one extra block, or the acknowledge could be piggybacked on a tail part (dummy bytes) of the block itself.