Assume we have a module with 32 bits output like this:
module ModuleLow(foo,...);
output [31:0] foo;
Now we want to use it in another module ( a very simple example!):
module ModuleHigh( ..,reset,..);
input reset;
wire [31:0] fooWire;
reg [31:0] fooReg;
ModuleLow module1(.foo(fooWire), ...)
always @(posedge GCLK)
if(reset)
begin
fooReg<= fooWire; // TRIAL-1: fooReg<= 12345.....; => 2.3ns
end
else
fooReg<=fooReg+1; //TRIAL-2 : fooReg<=fooWire+1; => 18ns
This is a very common method of passing a value between modules ( wire -> reg ). But in my case , it leads to a 2028 bit wire that noticeably reduces the speed of a Spartan-III down to 12ns.
I tried these:
1- When I replace the statement fooReg<= fooWire
with a number ( like: fooReg<=12345....;
TRIAL-1 in the code) , the performance jumps high (GCLK timing constraint value <2.5 ns) .
2- When I use the wire itself ( using fooReg<=fooWire+1
; TRIAL-2 in the code example) the performance drops even more (18ns)
From these experiments I concluded that it is much more design friendly to use registers inside a block instead of wires ( routing and DRC problems? ).
I was thinking if there is a way to omit that intermediate wire between modules . This can remove the wiring in "TRIAL-1" part ( initial assignment ) that leads to a higher performance. Something like this:
ModuleLow module1(.foo(fooReg), ...); // using registry without a wire.
I think this is illegal in Verilog (ISE WebPack v14.7 gives error as lvalue assignment problem) but I am looking for some trick or something if exists.
Best Answer
Your conclusion is not correct. It doesn't matter to the hardware whether you use a
reg
or awire
, the issues you are discussing are part of the Verilog syntax.The reason your design gets so much faster when you replace
fooWire
with a number is that your logic isn't really doing anything and it all gets optimized away. The assignment offooWire
tofooReg
requires that signals actually propagate from one part of the chip to another, and that takes time. Changing the assignment offooWire
tofooWire+1
forces the tools to create a 32-bit adder and insert it in the delay path, so of course the design will get slower.By the way, it's register, not registry.