While your assumption of how the RX works is correct, care needs to be taken with the two TX driving the same line. You want to at least buffer the TX with a reverse diode and a pull-up resistor, like this:
Doing something like this is nothing new and there are lots of references on the internet. This configuration will actually be useful to help detect transmission errors as the transmitter also receives the loop back data.
You might want to look up some stuff on the internet such as this and this.
The ATmega series has a 9-bit UART
The ATmega series of microcontrollers (datasheet) has the ability to use 9 data bits without messing with the parity bit. This functionality is described in the timing diagram:
Note that the bits are numbered in the figure from 0-8, which is a total of 9 bits. The caption refers to this numbering, it does not indicate that you can use 0-8 data bits. The minimum number of bits in a character is 5, and the maximum is 9, not 0-8.
You can set the width of the data section by the UCSZ bits. The settings are described in table 19-7, pictured below:
To set this in C using AVR Libc, you would need to execute the code:
#include <avr/io.h> // _BV() macro, register definitions
// Set the Uart Character SiZe to 9 bits as described in table 19-7
UCSR1B |= _BV(UCSZ12 );
UCSR1C |= _BV(UCSZ11) | _BV(UCSZ10 );
Note that you'll probably want to specify the other bits in these registers while you're at it.
Many other processors also have this
There are almost certainly other processors which support this feature set. Atmel's ATtiny processors have the same USART as the ATmega, and are code-compatible, their AVR32 processors have the same true 9-bit support, but a different programming interface, the dsPIC processors support it, but without a proper parity bit (see page 243 of this datasheet; set bits 1 and 2, PDSEL of the UxMODE register)...the list goes on. The first processor that I checked which did not support it was a Stellaris Cortex-M3 part, which supports 5-8 data bits, but not 8 bits.
But you should use your other constraints to narrow the options first.
In the end, though, you should do your processor selection based on other factors first. You wrote:
I strongly prefer SD card support, and Ethernet / Wifi would be nice (I don't care too much about BlueTooth or USB, so long as they don't significantly increase price).
Most people will access the SD card in SPI mode, and almost everything has an SPI port or two. Ethernet/WiFi is too generic a spec and a much harder requirement to meet - Do you want an integrated MAC with an MII interface? Integrated PHY? Would you prefer to do all the TCP/IP stuff on-chip, or offload practically everything to something like a WIZnet W5100 or Lantronix XPort. You can also use components like the Microchip ENC28J60 to move the MAC and PHY to an external chip, accessed over SPI. Your other requirements are much more exacting than a 9-bit UART.
In fact, you could probably use a $1.50 ATtiny as an SPI'/I2C<->9-bit UART converter if you wanted to. That would be much less expensive than choosing a sub-optimal processor for your other requirements.
Best Answer
Take a look at the Atmel SAM3S. It is not a 16 bit device but I'm not sure why that is a key requirement. I'm also not sure what "low" power means to you. This will be in direct conflict with the requirement for the fast baud rate. To hit the 3.25 MBps you have to run this part at 52 MHz (It runs at 64 MHz max). Another nice feature it has is DMA to help move the high speed serial data. It also meets your memory requirements.