Electronic – How to convert the number of DSP48/BRAM to the number of LUTs and FFs in FGPA

fpgavhdl

I have a trouble with estimation of logic utilization.

I am Ph.D student who research the efficient implementation of signal processing algorithms. So, I have to compare the logic utilization of proposed method with conventional method.

Therefore, the comparison of gate counts for each methods is the best way to evaluate efficiency of logic utilization.
Unfortunately, as you know, the Xilinx simulator does not provide the gate counts anymore.

Instead of gate count, we can estimate the logic utilization with the number of LUTs and FFs.
So, I did not use the DSP48 and Block memory by using synthesis settings so that every logic can be implemented by LUTs and FFs.

However, although I had configured –max_dsp = 0,
the decimation filter/FFT generated by IPs,FIR compiler/FFT, still used DSP48 and BRAM.

Here are the questions.

1.How can I generate decimation filter without DSP48 using FIR Compiler.
I would like to generate decimation filter only with CLBs using FIR compiler.

2.Is there any criteria convert the number of DSP48 to the number of LUTs and FFs?

3.In addition, is there any criteria convert the number of block memory to the number of LUTs or Memory LUTs?

Best Answer

How can I generate decimation filter without DSP48 using FIR Compiler. I would like to generate decimation filter only with CLBs using FIR compiler.

Configure your FIR IP core and/or your synthesis filters to not utilize DSP slices. There's constrains that allow you to configure that.

By the way, I doubt this is a sensible thing to do. You're doing DSP, so use the DSP hardware. Comparing things as if that doesn't exist is unfairly favoring the thing that doesn't use DSP slices.

Is there any criteria convert the number of DSP48 to the number of LUTs and FFs?

Sure. Implement a multiplier in discrete logic. In fact, your synthesizer will do exactly that when you forbid it to utilize your DSP slizes.

As with everything in FPGAs: there's a tradeoff between speed, amount of resources used and latency. Without defining boundaries for all three, you can't tell how you should be implementing something. You can build a thousands-of-LUTs single-clock multiplier; you can build a much smaller multi-clock-cycle one, or you can build a multi-clock-cycle pipelined one. It all depends on what you define to be appropriate.

In addition, is there any criteria convert the number of block memory to the number of LUTs or Memory LUTs?

This should be answering itself, but: I'm sure you can model a bit stored in block RAM as an FF.