I understand that you wanted to choose a development environment that you were familiar with such that you can hit the ground running, but I think the hardware/software trade off may have boxed you in by sticking with Arduino and not picking a part that had all the hardware peripherals that you needed and writing everything in interrupt-driven C instead.
I agree with @Matt Jenkins' suggestion and would like to expand on it.
I would've chosen a uC with 2 UARTs. One connected to the Xbee and one connected to the camera. The uC accepts a command from the server to initiate a camera read and a routine can be written to transfer data from the camera UART channel to the XBee UART channel on a byte per byte basis - so no buffer (or at most only a very small one) needed. I would've tried to eliminate the other uC all together by picking a part that also accommodated all your PWM needs as well (8 PWM channels?) and if you wanted to stick with 2 different uC's taking care of their respective axis then perhaps a different communications interface would've been better as all your other UARTs would be taken.
Someone else also suggested moving to an embedded linux platform to run everything (including openCV) and I think that would've been something to explore as well. I've been there before though, a 4 month school project and you just need to get it done ASAP, can't be stalled from paralysis by analysis - I hope it turned out OK for you though!
EDIT #1 In reply to comments @JGord:
I did a project that implemented UART forwarding with an ATmega164p. It has 2 UARTs. Here is an image from a logic analyzer capture (Saleae USB logic analyzer) of that project showing the UART forwarding:

The top line is the source data (in this case it would be your camera) and the bottom line is the UART channel being forwarded to (XBee in your case). The routine written to do this handled the UART receive interrupt. Now, would you believe that while this UART forwarding is going on you could happily configure your PWM channels and handle your I2C routines as well? Let me explain how.
Each UART peripheral (for my AVR anyways) is made up of a couple shift registers, a data register, and a control/status register. This hardware will do things on its own (assuming that you've already initialized the baud rate and such) without any of your intervention if either:
- A byte comes in or
- A byte is placed in its data register and flagged for output
Of importance here is the shift register and the data register. Let's suppose a byte is coming in on UART0 and we want to forward that traffic to the output of UART1. When a new byte has been shifted in to the input shift register of UART0, it gets transferred to the UART0 data register and a UART0 receive interrupt is fired off. If you've written an ISR for it, you can take the byte in the UART0 data register and move it over to the UART1 data register and then set the control register for UART1 to start transferring. What that does is it tells the UART1 peripheral to take whatever you just put into its data register, put that into its output shift register, and start shifting it out. From here, you can return out from your ISR and go back to whatever task your uC was doing before it was interrupted. Now UART0, after just having its shift register cleared, and having its data register cleared can start shifting in new data if it hasn't already done so during the ISR, and UART1 is shifting out the byte you just put into it - all of that happens on its own without your intervention while your uC is off doing some other task. The entire ISR takes microseconds to execute since we're only moving 1 byte around some memory, and this leaves plenty of time to go off and do other things until the next byte on UART0 comes in (which takes 100's of microseconds).
This is the beauty of having hardware peripherals - you just write into some memory mapped registers and it will take care of the rest from there and will signal for your attention through interrupts like the one I just explained above. This process will happen every time a new byte comes in on UART0.
Notice how there is only a delay of 1 byte in the logic capture as we're only ever "buffering" 1 byte if you want to think of it that way. I'm not sure how you've come up with your O(2N)
estimation - I'm going to assume that you've housed the Arduino serial library functions in a blocking loop waiting for data. If we factor in the overhead of having to process a "read camera" command on the uC, the interrupt driven method is more like O(N+c)
where c
encompasses the single byte delay and the "read camera" instruction. This would be extremely small given that you're sending a large amount of data (image data right?).
All of this detail about the UART peripheral (and every peripheral on the uC) is explained thoroughly in the datasheet and it's all accessible in C. I don't know if the Arduino environment gives you that low of access such that you can start accessing registers - and that's the thing - if it doesn't you're limited by their implementation. You are in control of everything if you've written it in C (even more so if done in assembly) and you can really push the microcontroller to its real potential.
I'll assume you want to do this wireless, otherwise the solution is obvious.
I would go for infrared. The transmitter can be a small microcontroller sending a pair of bytes to an IR receiver module.
These receiver modules are tuned to a particular protocol. You can use an RC-5 module; RC-5 transmits 14-bit at a time using Manchester encoding. You can easily add a pair of bits to it, and define your own codes.
A 3 m range should not be a problem. I've used Vishay RC-5 receivers at 15 m range. At shorter distances you probably won't even have to direct the LED to the receiver. In a normal size room the signal will reflect off the walls.
A cheap RF solution is the RFM70:
Less than 5 dollar at Digikey. That's far less than an Xbee.
Best Answer
You are effectively describing a "single master, multiple slave" arrangement. When working with such a setup, the first two approaches that come to mind are SPI and I2C.
Personally, I veer away from SPI unless I'm only concerned with a small number of slave devices (low chip count), or it's the only interface available. This is primarily because you'll need a dedicated slave-select line for each slave device, consuming another GPIO pin on your MCU/CPU, or you'll need additional multiplexing circuitry in place to save your spare pins. The one added benefit of this is that you can "broadcast" to groups of devices at the same time (assuming they don't reply to you) rather than having to broadcast to the entire bus like you do with I2C.
I2C isn't bad at all, even for a large number of devices. If you want 50 slaves, you'll need at least one of:
Additionally, I2C has the added benefit of being able to handle voltage mismatches easily if the master and slaves communicate at different logic levels (ie: 3.3VDC vs 5.0VDC).