According to the datasheet the hard multiplier takes between 4 and 5 ns to propogate from inputs to outputs in combinational mode. You'll lose a few more 100s of ps getting to and from the multiplier to the rest of your logic. If that's fast enough, then just make use of it.
If not, build your LUT-based multiplier by just writing some code with the *
operator in it, synthesise it, place and route, and see if that's fast enough. You may needs an attribute to force it to not use the hard multipliers (see the MULT_STYLE
attribute in the XST manual). You could even try just forcing a single LUT-based (non-constant) multiplier with that constraint and see what the result is - that's a very quick test.
Only if those fail should you go down the route of hand-building a LUT-based structure - and even then only if you've looked at the output of the synthesiser and are pretty sure you can beat it for some reason. The synthesisers have been tuned to work out constant coefficient multipliers very well in my experience - I doubt coregen will gain much.
Wet finger estimate: A LUT delay is ~0.7ns. Assuming routing delays are of a similar magnitude, you can afford a chain of only 3-4 LUTs in the delay of the hard multiplier. It seems unlikely to me that you'll achieve what you need in that depth of logic.
Your code simulates two multiplexers. These are actually asynchronous components. The fact that Verilog requires data1_temp
and data2_temp
to be declared as reg
's is a quirk of Verilog syntax and your choice of coding style, and doesn't mean these signals would be the outputs of storage elements in a physical implementation.
If you want to capture these values in actual registers, you need to add those explicitly:
reg [7:0] data1, data2;
always @(posedge someclock) begin
data1 <= data1_tmp;
data2 <= data2_tmp;
end
But I would like to know what this mini register file would be made of in hardware. Particularly, the 4x8 bit array consisting of k0,k1,k2,k3.
You haven't shown how these variables are assigned, so it's not possible to say how they are implemented. As your code showed, just declaring them as reg
doesn't guarantee they are implemented with actual storage elements. If you assign them inside a block that begins always @(posedge clk)
then very likely they are flip-flops, but there are ways you could code them that would make them synthesize differently.
I thought when it came to registers and arrays like this, you need a clock to read out data, like RAM?
You need a clock to update a (physical) register. You can read it out at any time. For example:
wire [8:0] sum;
assign sum = k0 + k1;
is perfectly valid code. sum
will change whenever any of its inputs changes. If k0
and k1
are the outputs of flip-flops, their values will only change when there is a clock edge.
For another example, you could equally well describe your multiplexers with code like this:
reg [7:0] k0, k1, k2, k3;
wire [7:0] data1_tmp;
reg [1:0] reg1;
// k<n> and reg1 are assigned elsewhere.
assign data1_tmp = (reg1 == 0) ? k0 :
(reg1 == 1) ? k1 :
(reg1 == 2) ? k2 : k3;
how do I read from this tag_array and do the comparison all within the same clock cycle?
Let me repeat a key point for emphasis: You need to use a clock to assign a new value to a register (an actual hardware register or group of flip-flops). It's output is available at any time.
RAMs are different and how you access the contents of a RAM will depend on details of the type of RAM you use.
I got confused because frankly I don't know enough about memory hardware and how that's possible.
Another key strategy: When you are learning digital logic, I recommend you learn about the physical hardware first, and then work out or study how to simulate it in HDL second. So first, learn what a physical flip-flop is, then learn the standard Verilog methods of describing a flip-flop. Especially if you are trying to write HDL for synthesis, trying to write good code before you learn the capabilities of the underlying hardware will lead you down a lot of dead-end paths.
Best Answer
Try putting a register on the output as well. Generally the timing analysis is done register-to-register, so without an output register it may not be able to give you a good answer.