Infrared emitters and receivers can be used to transmit/receive data. However, interference with other devices like remote controls might be a problem. If the connection needs to be really reliable, this probably wouldn't be that great of an option. However, you might be able to find a really uncommon wavelength emitter and receiver to limit the amount of interference.
These are pretty common, and very cheap. Packs of 20 emitters for $1 and packs of 100 receivers for $13 can be found on eBay (not including shipping). Lots of electronics stores have them as well.
In principle this is good candidate for FPGA based design. Regarding your requirements:
ad 1. The FPGA most likely will be more expensive, by how much that depends on the device you choose. At first glance smallest Spartan 3 from Xilinx (XC3S50AN) will be more then enough for this task (~10£ from Farnell). I think you can assume this is upper boundary for the cost (it has 56kB RAM inside, so it is more then you need). You may find cheaper device either from Xilinx offering or their competitors Altera and Lattice.
ad 2. The package is the tough issue, I did not saw FPGA with smaller footprint either. However maybe you can use CPLD device (for sake of argument the CPLDs are small FPGAs) which may be in smaller package (PLCC or QFN). On plus side they will be cheaper (even single $) on negative side most likely will not have RAM inside. With CPLD probably you would need external SRAM chip.
ad 3. FPGAs and CPLD current consumption is highly dependent on the programmed design. However there is good chance that FPGA and especially CPLD design would consume less than your current solution.
ad 4. FPGA do have that kind of memory inside, CPLD most certainly not. This may be solved by external sram chip (or two). For example:
|SRAM 1| <--> |CPLD| <--> |uC|
|SRAM 2| <-->
In such arrangement while the uC is writing to SRAM 1, the CPLD will be displaying data from SRAM 2. The CPLD should be able to handle both task simultaneously.
Of course you can solve this in other ways too:
1) use faster uController (ARM for example)
2) use device with some programmable fabric and uC inside (for example FPSLIC from Atmel, however I have never used such devices and I know very little about those)
Standard disclaimer -> as designs are open problems, with many constrains and possible solutions whatever I wrote above may not be true for your case. I believe it is worth checking those option, though.
Best Answer
Given your modest requirements. the pic10f200 is worth a look.
Flash: 256B
RAM: 16 B
Package: SOT23-6
Price@1: $0.41
Price@1k: $0.32
Datasheet