In principle this is good candidate for FPGA based design. Regarding your requirements:
ad 1. The FPGA most likely will be more expensive, by how much that depends on the device you choose. At first glance smallest Spartan 3 from Xilinx (XC3S50AN) will be more then enough for this task (~10£ from Farnell). I think you can assume this is upper boundary for the cost (it has 56kB RAM inside, so it is more then you need). You may find cheaper device either from Xilinx offering or their competitors Altera and Lattice.
ad 2. The package is the tough issue, I did not saw FPGA with smaller footprint either. However maybe you can use CPLD device (for sake of argument the CPLDs are small FPGAs) which may be in smaller package (PLCC or QFN). On plus side they will be cheaper (even single $) on negative side most likely will not have RAM inside. With CPLD probably you would need external SRAM chip.
ad 3. FPGAs and CPLD current consumption is highly dependent on the programmed design. However there is good chance that FPGA and especially CPLD design would consume less than your current solution.
ad 4. FPGA do have that kind of memory inside, CPLD most certainly not. This may be solved by external sram chip (or two). For example:
|SRAM 1| <--> |CPLD| <--> |uC|
|SRAM 2| <-->
In such arrangement while the uC is writing to SRAM 1, the CPLD will be displaying data from SRAM 2. The CPLD should be able to handle both task simultaneously.
Of course you can solve this in other ways too:
1) use faster uController (ARM for example)
2) use device with some programmable fabric and uC inside (for example FPSLIC from Atmel, however I have never used such devices and I know very little about those)
Standard disclaimer -> as designs are open problems, with many constrains and possible solutions whatever I wrote above may not be true for your case. I believe it is worth checking those option, though.
My first reaction is that trying to externally turn off the micro is the wrong way to go about this. Perhaps you are using the wrong micro, but there are micros that take very little power when in full sleep mode. Take a look at some of the "nanoWatt" (marketing term) PICs and MSP430s. The latest PICs are basically down to a tiny amount of leakage current in sleep, less than 1uA for some.
How low a current do you need? What does the rest of the circuit draw. What is the CPLD current when it's not switching? It's hard to give a good answer without some real numbers.
Saying that something is a "huge issue" is no spec at all. For example, if you are trying to run something as long as possbile on a CR2032 battery, then 1uA sleep current is fine since the effective self-discharge current is more than that. If you want 3 years from a single AA battery, then even more would be acceptable.
EDIT: Something else I should have added. If you really are going to switch power to the micro (I still think that's a bad idea, get the right micro instead), you probably need to switch the ground instead of the power. You say there are IIC and UART lines connected to the micro. IIC has passive pullups, so these will either draw current or power up the micro thru the protection diodes if you try to switch off the power instead of the ground. Logic level UART signals idle high, so there could be a similar issue there too.
In any case, you can apparently redesign the board, so I don't understand how you're stuck on that particular micro, whatever it is. If power is really such a "huge issue", then everything else should be on the table.
Best Answer
One possible technique is to take advantage of a commercial test instrument. Specifically, I'm referring to something called a "Leak Seeker" from Electronic Design Specialists (EDS). If you do your PCB layout such that the power pin(s) for each FPGA is available on a via as close as possible to the chip, finding shorted chips is relatively easy. If the FPGA chips are fed from a power plane in the PCB, ensure that the plane feeds this via from a short trace and that the via is what feeds the power pins for the chip.
The Leak Seeker is a device that is used for finding shorted components on PCBs. The original version is a primarily analog design that injects a small test current into a node on the board, then uses a sensitive voltage-controlled oscillator to guide the user to move the test probe along the trace searching for the short. I have one and it works very well.
The unit has been recently updated and I don't have any experience with the new version. However, the original version was capable of finding shorted chips even if those chips were being fed from a dedicated power plane in the PCB. Leek Seeker page
The reason for having a dedicated via for each chip is that gives you an easily measured point when tracking down damaged chips. The real advantage to this technique is that no extra components are required.