This design does not fit into the number of slices available in this device

planaheadsynthesisxilinx

Below is the device utilization summary for the design(Zynq 7010) and the use of Slice LUTs exceeds the availabile number. Previously it was 82% and now it exceeds after adding a block of checksum code 4 times. Is there any tweek to merge LUTs and reduce its consumption or need to make manual optimization in the code?

Device Utilization Summarry

Below is the synthesis settings:

Synthesis settings

I have used the following settings for implementation to reduce some kind of resource utilization especially LUTs

Implementation settings

*I got some information from this website: Xilinx parameters

After using the above settings, the problem still persists. I am not sure about some settings, whether they are valid for Zynq or not. Any help?

Best Answer

Remove the register duplication, it increases the speed of your design and have very negative impact on the utilization of LUTs in your design.

Those are mainly for the designs that have high fanout and needs to duplicate some of the resources in order to meet the timing.

Also, look into your code and see if you can remove the reset from some of your logic, specially parts that can be packed into SLR or RAM, that is one of the common mistakes people make, removing the reset, will help Vivado to pack some of your logic into BRAM or SLR and you will see a significant decrease in the number of LUTs used.

If none of those works, maybe your design is just too big for the FPGA you are using!

Related Solutions

Electronic – Pipelining and clock frequency issue

Your design will not function correctly if it runs at 100 MHz but is only spec'd (by the tools) to run at 50 MHz. If it does, then it's a one-off miracle that wouldn't work when you make a change and rerun the tools. Don't do it. Don't even do it if your clock is 100 MHz and the tools tell you the design can run at 99.5 MHz.

To solve your problem you can either write a simple 'divide by power of 2' clock divider to reduce the clock frequency (something like this in Verilog):

reg [n:0] count; 

always @(posedge CLK_100) begin
  count <= count + 1;
end

BUFG bg_0 (.I(count[m]), .O(CLK_DIV));

(where 'm' <= 'n' and 'bufg' is a global clock buffer, and must be used for synchronous designs) or use a Digital Clock Manager (DCM).

Hopefully that solves your pipelining issues as well unless you absolutely have to run the entire design at 100 MHz. Other than pipelining you can consider using FIFOs if you have part of the design running at 50 MHz and the other at 100 MHz, but you'll have to say a bit more about what you're doing to get more meaningful help here.

Electronic – Why does this Verilog hog down 30 macrocells and hundreds of product terms

The code you show is essentially a priority encoder. That is, it has an input of many signals, and its output indicates which of those signals is set, giving priority to the left-most set signal if more than one is set.

However, I see conflicting definitions of the standard behavior for this circuit in the two places I checked.

According to Wikipedia, the standard priority encoder numbers its inputs from 1. That is, if the least significant input bit is set, it outputs 1, not 0. The Wikipedia priority encoder outputs 0 when none of the input bits are set.

Xilinx's XST User Guide (p. 80), however, defines a priority encoder closer to what you coded. The inputs are numbered from 0, so when the input's lsb is set it gives a 0 output. However, the Xilinx definition gives no spec for the output when all input bits are clear (Your code will output 3'd7).

The Xilinx user guide, of course, will determine what the Xilinx synthesis software is expecting. The main point is that a special directive (*priority_extract ="force"*) is required for XST to recognize this structure and generate optimal synthesis results.

Here's Xilinx's recommended form for an 8-to-3 priority encoder:

(* priority_extract="force" *)
module v_priority_encoder_1 (sel, code);
input [7:0] sel;
output [2:0] code;
reg [2:0] code;
always @(sel)
begin
    if (sel[0]) code = 3’b000;
    else if (sel[1]) code = 3’b001;
    else if (sel[2]) code = 3’b010;
    else if (sel[3]) code = 3’b011;
    else if (sel[4]) code = 3’b100;
    else if (sel[5]) code = 3’b101;
    else if (sel[6]) code = 3’b110;
    else if (sel[7]) code = 3’b111;
    else code = 3’bxxx;
end
endmodule

If you can rearrange your surrounding logic to let you use Xilinx's recommended coding style, that's probably the best way to get a better result.

I think you can get this by instantiating the Xilinx encoder module with

v_priority_encoder_1 pe_inst (.sel({~|{RL[6:0]}, RL[6:0]}), .code(rlever));

I've nor'ed together all bits of RL[6:0] to get an 8th input bit that will trigger the 3'b111 output when all RL bits are low.

For the llever logic, you can probably reduce the resource usage by making a modified encoder module, following the Xilinx template, but requiring only 7 input bits (your 6 bits of LL plus an additional bit that goes high when the other 6 are all low).

Using this template assumes the version of ISE you have is using the XST synthesis engine. It seems like they change synthesis tools on every major rev of ISE, so check that the document I linked actually corresponds to your version of ISE. If not, check the recommended style in your documentation to see what your tool expects.

Best Answer

Related Solutions

Electronic – Pipelining and clock frequency issue

Electronic – Why does this Verilog hog down 30 macrocells and hundreds of product terms

Related Topic