module averager(
clk,
rst,
n,
sum,
cnt,
out,
avg
);
input [9:0] n;
input clk;
input rst;
output reg [19:0] out;
output reg [9:0] cnt;
output reg [19:0] sum;
output reg [9:0] avg;
integer i = 0;
always @(posedge clk )
if (rst == 1) begin
sum = 20'b0;
cnt = 10'b0;
out = 20'b0;
avg = 10'b0;
end else if (rst == 0) begin
sum = sum + n;
out = sum;
cnt = cnt + 1;
avg = 0;
for (i=0; i<641; i=i+1) begin
if(out >= cnt) begin
out = out - cnt;
avg = avg + 1;
end
end
end
endmodule
The above is the code to implement a cumulative moving average filter. The for loop is used for division to find the average and involves repeated subtraction. However I am getting the following warning and error:
WARNING:Xst:2254 - Area constraint could not be met for block <averager>, final ratio is 509.
WARNING:Xst:1336 - (*) More than 100% of Device resources are used
ERROR:Pack:18 - The design is too large for the given device and package.
Please check the Design Summary section to see which resource requirement for
your design exceeds the resources available in the device.
This must be because I am using large values in the for loop and thus am getting a large circuit that can't be implemented. I am looking for an alternative of the for loop, which could find the average for me. I just need the quotient value.
Design Properties:
Family: Spartan3E
Device: XC3S500E
Best Answer
You are correct with the guessing the for loop.
The for-loop logic is huge when it after it static unrolls. With your current code, you cannot handle the worst case scenario where n=1023. To cover this with your current code you'd need a for loop with 1024 iterations.
Instead of a up counter, you can use a down counter and only examine a slice of the array, where the index represents lsb of the array slice. For example:
This for loop unravels to 10 iterations (9 to 0), each iteration only looks at a 11-bit slice of
out
and only one bit ofavg
. You might not be familiar with the+:
operator. It is a bit-slice operator introduced in IEEE Std 1364-2001. Left side if the start index (dynamic is allowed) and the right side is the bit with offset (must be a static constant). You can read more about it here.Since it is a count down, we can can safely assume (proven mathematically) the upper bits of the slice are zeros and we will never have underflow with the guarding if condition. So we now have ten 11-bit subtracters each with 1-bit assigners which is much smaller logic then the original 642 (should be 1024) 20-bit subtracters each with 10-bit adder.
Few other suggestions:
<=
) in your synchronous logic block. This follows best coding practices and prevents flop-to-flop race conditions in simulation.out
can be simplified to a 10-bit register, assuming you did suggestions 2 & 3. This is because we know the upper bits will always be zeros, so we can save 10 flops.proof of concept: with a simple SystemVerilog testbench here:
If you still experience area issues or performance is not good enough, then you should pipeline your design and/or seeing if there are there is dedicated divider+remainder module defined in your FPGA data sheet that you can instantiate.