There's two criteria that you can use to evaluate a digital project that help you decide which part best matches your criteria. The first is design size/complexity - how much logic is involved. The second is the input and output requirements in terms of pin count. Speed can be factored in if you can estimate what your slowest function would be. The vendor tools (Altera Quartus II, Xilinx ISE, etc.) will help you once you get in the right ballpark.
PAL/PLA/GAL: These are intended to replace a small to medium size circuits that you might normally implement as LSI logic chips (7400, 4000 series). These can offer better board layouts due to I/O remapping, and lots of simple logic functions. These chips contain non-volatile memory (or one time programmable fuses) and require no power-up configuration time. They may not contain data storage elements.
CPLD: These are larger cousins of the PLA. The designs can be small state machines, or even a very simple microprocessor core. Most of the CPLD chips that I have seen do not have any on-chip SRAM, although the large Cypress CPLD you linked does. CPLDs are more likely to be re-programmable with flash memory, and they also do not require configuration time on power-up.
FPGA: Unlike the CPLD, the logic blocks are based on SRAM instead of flash memory, resulting in faster logic operations. The major down-side with FPGAs is that since the configuration is stored in SRAM, every time the device is powered up the FPGA must load its programming into this SRAM. Depending on the size of your design and the speed of your non-volatile storage, this can cause a noticeable delay from power-on to fully functioning. Some FPGAs have on-chip flash for storing their data, but most use separate memory chips. FPGAs will often have hard-wired multipliers, PLLs, and other logic functions to improve computing speed. Large blocks of on-chip RAM is also available. You will also be able to use high-performance I/O specifications like LVDS, PCI, and PCI-Express.
FPGA with Microprocessor Hard Core: I'm not familiar with these, but I would imagine that your design would center around the microcontroller programming, and the FPGA would augment the microcontroller. The parts you identified make it look like you would start your design with a microcontroller and a FPGA, and then combine the two into one chip/package.
How to decide which is right for you:
The best way is to have your code (Verilog/VHDL) finished, and then use the vendor's tools to try and fit it into the smallest part possible. I know Altera's tool lets you change programming targets fairly easily, so you could keep picking smaller FPGAs, and then smaller CPLDs until your design usage gets close to about 75%. If you require performance, then try to pick devices that have features (fast multipliers) that decrease the speed requirements of the logic. Again, the vendor tools will help you identify if you need to upgrade or if you can downgrade.
Another factor of which part to use is ease-of-use. Using PAL/PLA/GAL logic is probably more effort than constructing the function using discrete logic gates (74HC*, 4000, etc). CPLDs typically require only a single supply voltage, and don't require additional circuitry. They are effectively stand-alone. FPGAs begin to use multiple power supplies for I/O and the logic core, complex I/O standards, separate program memory, multi-layer (>2) PCBs, and BGA packages.
Steps to narrowing down your design requirements would include:
Identify all inputs and outputs for your FPGA/CPLD. This is usually an easy part of the design stage. This way you know what package you're looking at, and how close you can cut it to that margin.
Draw a block diagram of the internal logic. If your blocks look simple (each block would have a hand-full of logic gates and registers), then you probably can use a CPLD. If, however, your blocks have labels such as "Ethernet transciever", "PCI-Express x16 interface", "DDR2 Controller", or "h264 Encode/Decode", then you are almost certainly looking at a FPGA and using HDL.
- Look and see if your interfaces have special I/O requirements, such as special voltages, LVDS, DDR, or high speed SERDES. It's easier to get a chip that supports it than to get an additional translator chip.
Example CPLD Applications:
- Multi-channel PWM with SPI interface
- I/O Expander
- CPU Address Space Decoding
- Clocks (Time keeping)
- Display Multiplexors
- Simple DSP
- Some simple programs can be converted into a CPLD design
Example Hobbyist FPGA Applications:
- Small System-on-Chip (SoC) designs
- Complex protocol bridges
- Signal processing
- Legacy system emulation
- Logic Analyzer/Pattern Generator
For most hobbyist work, you'll be limited to relatively small FPGAs unless you want to solder BGA packages. I would choose between a large CPLD or a cheap FPGA, and the size/speed requirements would dictate which one I needed.
I'll add a bit to Brian Carlton's answer.
Within an FPGA, it's correct; gated clocks are not at all recommended. And the flip-flops will have a separate ENable input so that it's not necessary.
In your case, though, because your gated clock only goes to the output pin and isn't used internally to the FPGA, you can gate your clock without penalty. The way to do it is to make sure the clock gating is done in the output block. Assuming you're using Xilinx, instead of instantiating an
OBUF for your clock output, use an
OBUFT, and you'll get access to the tristate pin of the output buffer. If you're using another vendor's FPGAs there will be an equally easy way to do this.
If you prefer to do this using inference rather than instantiation, you'll need to be sure to enable an option during compiling to push logic into IO blocks. If the gated clock does actually fan-out (but you didn't show it in your diagram), you'll also need to enable an option that allows duplicate logic to be generated.
A Logic Element (LE) is made up of a number of gates. Exactly how long it takes to propagate data from the inputs to the outputs depends just on the combination of gates that the signals pass through, and that depends on how the LE is programmed.
The main controlling factor of the LE is the Look-up Table (LUT). This has 4 data inputs and 4 data outputs, and is basically a little block of static RAM (16 nibbles). The RAM is loaded when the device is programmed by the configuration chip. The incoming data signals are the address lines of the SRAM, and the output data signals are the data bus of the SRAM. A change of address (incoming data) yields a pre-set change of data value (outgoing data).
For modelling the timing you would really need to know which gates are in use for which values of the LUT and factor the propagation of those into your calculation. For the LUT itself, it's just SRAM, so you can just model that as a small piece of SRAM.
The actual layout of a LE in different modes of operation can be found in the document Logic Elements and Logic Array Blocks in Cyclone IV Devices.