The ATmega series has a 9-bit UART
The ATmega series of microcontrollers (datasheet) has the ability to use 9 data bits without messing with the parity bit. This functionality is described in the timing diagram:
Note that the bits are numbered in the figure from 0-8, which is a total of 9 bits. The caption refers to this numbering, it does not indicate that you can use 0-8 data bits. The minimum number of bits in a character is 5, and the maximum is 9, not 0-8.
You can set the width of the data section by the UCSZ bits. The settings are described in table 19-7, pictured below:
To set this in C using AVR Libc, you would need to execute the code:
#include <avr/io.h> // _BV() macro, register definitions
// Set the Uart Character SiZe to 9 bits as described in table 19-7
UCSR1B |= _BV(UCSZ12 );
UCSR1C |= _BV(UCSZ11) | _BV(UCSZ10 );
Note that you'll probably want to specify the other bits in these registers while you're at it.
Many other processors also have this
There are almost certainly other processors which support this feature set. Atmel's ATtiny processors have the same USART as the ATmega, and are code-compatible, their AVR32 processors have the same true 9-bit support, but a different programming interface, the dsPIC processors support it, but without a proper parity bit (see page 243 of this datasheet; set bits 1 and 2, PDSEL of the UxMODE register)...the list goes on. The first processor that I checked which did not support it was a Stellaris Cortex-M3 part, which supports 5-8 data bits, but not 8 bits.
But you should use your other constraints to narrow the options first.
In the end, though, you should do your processor selection based on other factors first. You wrote:
I strongly prefer SD card support, and Ethernet / Wifi would be nice (I don't care too much about BlueTooth or USB, so long as they don't significantly increase price).
Most people will access the SD card in SPI mode, and almost everything has an SPI port or two. Ethernet/WiFi is too generic a spec and a much harder requirement to meet - Do you want an integrated MAC with an MII interface? Integrated PHY? Would you prefer to do all the TCP/IP stuff on-chip, or offload practically everything to something like a WIZnet W5100 or Lantronix XPort. You can also use components like the Microchip ENC28J60 to move the MAC and PHY to an external chip, accessed over SPI. Your other requirements are much more exacting than a 9-bit UART.
In fact, you could probably use a $1.50 ATtiny as an SPI'/I2C<->9-bit UART converter if you wanted to. That would be much less expensive than choosing a sub-optimal processor for your other requirements.
Considering your request a classic serial interface will do the work perfectly.
From a HDL perspective a serial transceiver is easy to implement, so I think you should start from here. Your board has no RS232 connectors, but you can easily use the expansion headers in order to connect a FT232 chip, that convert your serial interface to USB.
With a solution like this from both PC and FPGA you can use a serial interface and the FT232 will manage all the USB stuff, that is not so simple if you are just starting with HDL. In this way you have not to worry about USB and at the same time you are using a modern interface that every PC has, unlike the RS232.
Best Answer
The SPI peripheral should be the perfect solution for this. Since SPI output on the serial output is 1 for 1 with the data loaded into the transmit register you should be good to go. The main thing to consider is that you get 8 clocks (if dealing with byte data) @ the 20MHz to have the software get the next byte from memory ready to be output.
You may want to investigate. Some MCU's have internal DMA channels built in which can be used to hose data from memory to a peripheral. This would be an excellent way for your application.