In principle this is good candidate for FPGA based design. Regarding your requirements:
ad 1. The FPGA most likely will be more expensive, by how much that depends on the device you choose. At first glance smallest Spartan 3 from Xilinx (XC3S50AN) will be more then enough for this task (~10£ from Farnell). I think you can assume this is upper boundary for the cost (it has 56kB RAM inside, so it is more then you need). You may find cheaper device either from Xilinx offering or their competitors Altera and Lattice.
ad 2. The package is the tough issue, I did not saw FPGA with smaller footprint either. However maybe you can use CPLD device (for sake of argument the CPLDs are small FPGAs) which may be in smaller package (PLCC or QFN). On plus side they will be cheaper (even single $) on negative side most likely will not have RAM inside. With CPLD probably you would need external SRAM chip.
ad 3. FPGAs and CPLD current consumption is highly dependent on the programmed design. However there is good chance that FPGA and especially CPLD design would consume less than your current solution.
ad 4. FPGA do have that kind of memory inside, CPLD most certainly not. This may be solved by external sram chip (or two). For example:
|SRAM 1| <--> |CPLD| <--> |uC|
|SRAM 2| <-->
In such arrangement while the uC is writing to SRAM 1, the CPLD will be displaying data from SRAM 2. The CPLD should be able to handle both task simultaneously.
Of course you can solve this in other ways too:
1) use faster uController (ARM for example)
2) use device with some programmable fabric and uC inside (for example FPSLIC from Atmel, however I have never used such devices and I know very little about those)
Standard disclaimer -> as designs are open problems, with many constrains and possible solutions whatever I wrote above may not be true for your case. I believe it is worth checking those option, though.
If you're already pulling more than 2mA from the supply, then the 2mA from the clamping diode will simply reduce the load on the supply by that much.
From a DC standpoint, it only matters when you normally draw less than the clamping current because then you're backfeeding the supply, which may allow the voltage to rise.
From an AC standpoint, you might need some filtering depending on what the 24V input is actually doing and how sensitive the circuit is to that kind of supply noise.
Other than that, I think the first version is good.
Best Answer
If the pins have clamp diodes, you need only a series resistor to limit the current. You can see in the application note AN521 from microchip that it connect the pin of the microcontroller directly in the AC.
I have used a voltage divider to connect a PIC to a GPRS module with a speed higher than 19,200 and had no problem.