I'm looking at different FPGAs for my dissertation project and I keep seeing that the multiplier blocks are 18×18 bit, why is this? Why are they not 16-bit?
Electronic – Why are multipliers 18×18 Bit in FPGAs
fpga
Related Solutions
FPGA's are more expensive than CPLD's, but they can do so much more too. I only use CPLD's where the "instant on" (i.e., no programming time) is required, or when I need something that costs less than US$5 per chip. For everything else, an FPGA is preferable over a CPLD-- in my opinion.
Rarely does it make sense to put a CPU INSIDE the FPGA. When it comes to bang for the buck, you just can't compete with an off the shelf ARM or something similar. The only times it makes sense is when you only need a tiny 8-bit micro (a.la. Xilinx Picoblaze), or you are using an FPGA that includes a hard core and space is more important than cost. For your application, you'll need a reasonable 32-bit CPU to render the graphics for the LCD. So, at the very least, you'll have a 2-chip solution (CPU + FPGA/CPLD).
Yes, FPGA's need to be programmed at power-up. This can be a good or bad thing. It's bad because it means that at the very least you need a small Flash EEPROM chip that they "boot" from. It is good because it means that you can easily do "in the field firmware upgrades". With a CPLD, you'd have to drag out the CPLD programmer hardware/software every time you need to do an upgrade.
For most of the boards I've designed, I've always had a CPU that took care of programming the FPGA. That CPU has been Power-PC's, ARM's, and an Intel Atom. Since that CPU already had Flash, RAM, Ethernet, and Linux it was a piece of cake to implement the FPGA drivers and FPGA programming stuff.
In your case, I would lean toward CPLD's rather than FPGA's. Here's why:
Implementing a video output circuitry in an FPGA is not trivial. If I were assigning that task to an engineer, I would not assign it to a Junior engineer. It would have to be a Sr. engineer with 10+ years of experience. A Jr engineer might be able to "make it work", but it wouldn't be very good. Since you need a 32-bit CPU anyway, you might as well get one with a video output-- like some T.I. ARM's have, as well as the Freescale iMX stuff.
For those big shift registers, you need very little logic and a lot of I/O pins. More specifically, you need about 4 Flip-Flops per I/O pin. A medium sized FPGA will have maybe 300 I/O pins, but 50,000+ Flip-Flops. So you are really paying money for things you will never use. In the end, I think that you will spend about double or triple for an FPGA solution than a CPLD solution.
For someone who is just starting out with programmable logic, CPLD's have a much easier learning curve. There is no sense in making things more complicated before you have to.
And here is another bit of unsolicited advise: Stick with the big FPGA/CPLD people (Xilinx and Altera). Avoid the 2nd and 3rd tier manufacturers like Lattice, Actel, Quicklogic, etc. The big guys are no more expensive (or not significantly so), their tools and support are better (and free!), and the skills you learn will be easier to use on future projects. And most importantly, it will look better on a resume!
I've done this a few times myself.
Generally, the design tools will choose between a fabric implementation and a DSP slice based on the synthesis settings.
For instance, for Xilinx ISE, in the synthesis process settings, HDL Options, there is a setting "-use_dsp48" with the options: Auto, AutoMax, Yes, No. As you can imagine, this controls how hard the tools try to place DSP slices. I once had a problem where I multiplied an integer by 3, which inferred a DSP slice - except I was already manually inferring every DSP slice in the chip, so the synth failed! I changed the setting to No, because I was already using every dsp slice.
This is probably a good rule of thumb (I just made up): if your design is clocked at less than 50 MHz, and you're probably going to use less than 50% of the DSP slices in the chip, then just use the *, +, and - operators. this will infer DSP slices with no pipeline registers. This really limits the top speed. (I have no idea what happens when you use division)
However, if it looks like you're going to run the slices closer to the max speed of the DSP slice (333 MHz for Spartan 6 normal speed grade) Of you're going to use all of the slices, you should manually infer them.
In this case, you have two options.
Option 1: manually use the raw DSP instantiation template. Option 2: use a IP block from Xilinx Core Generator. ( I would use this option. At the same time, you will learn all about core gen, which will help in the future)
Before you do either of these, read the first couple of pages of the DSP slice user guide. In the case of the Spartan 6, (DSP48A1), that would be Xilinx doc UG389: http://www.xilinx.com/support/documentation/user_guides/ug389.pdf
Consider the Core Generator option first. I usually create a testing project in Core Generator for the part I'm working with, where I create any number of IP blocks just to learn the system. Then, when I'm ready to add one to my design in ISE, I right click in the Design Hierarchy, click new source, and select "IP (CORE Generator & Architecture Wizard)" so that I can edit and regenerate the block directly from my project.
In Core gen, take a look at the different IP blocks you can choose from - there are a few dozen, most of which are pretty cool.
The Multiplier Core is what you should look at first. Check out every page, and click the datasheet button. The important parts are the integer bit widths, the pipeline stages (latency) and any control signals. This produces the simplest possible block by taking away all the ports you don't need.
When I was building a 5 by 3 order IIR filter last year, I had to use the manual instantiation template since I was building a very custom implementation, with 2 DSP slices clocked 4x faster than the sample rate. It was a total pain.
Best Answer
Why not? Really, it's completely arbitrary. The cost in terms of chip area for 18×18 bits vs. 16×16 bits is negligible when compared to the area used for all the other resources (especially routing) on an FPGA. If you don't need the extra bits, just ignore them.
However, I think the common practice of making multiplier blocks 18×18 bits has its basis in the fact that the on-chip memories of many FPGAs are mutliples of 9 bits wide.
Why are memories 9 bits wide? Well, 9 bits allows you to store 8 bits plus parity, or 64 bits plus 8 bits of ECC (72 bits total). Applications that care about data integrity can make good use of such memories.
However, there are lots of applications that don't need the 9th bit for protection and would prefer to use it for extra precision in the data. It just wouldn't make sense to design a chip that can store data in 18-bit chunks, but can only process 16-bit chunks.