Rather than addressing the many problems in your source code, let me just show how I'd implement the module you describe.
First, I wouldn't use a sub-module to build the adder; synthesis tools are perfectly able to create adders from behavioral code. Secondly, an elaborate state machine isn't required; the module can simply produce a final result four clocks after each activation of the start
signal. I've added a done
signal to the module interface to make this explicit.
module seq_mult_4bit (
output [7:0] product,
output done,
input [3:0] a,
input [3:0] b,
input clock,
input start
);
reg [7:0] product;
reg [3:0] multiplicand;
reg [3:0] delay;
wire [4:0] sum = {1'b0, product[7:4]} + {1'b0, multiplicand};
assign done = delay[0];
always @(posedge clock) begin
if (start) begin
delay = 4'b1000;
multiplicand = a;
if (b[0]) begin
product <= {1'b0, a, b[3:1]};
end else begin
product <= {1'b0, 4'b0, b[3:1]};
end
end else begin
delay = {1'b0, delay[3:1]};
if (product[0]) begin
product <= {sum, product[3:1]};
end else begin
product <= {1'b0, product[7:1]};
end
end
end
endmodule
If you really want to use an external module for the adder (which is really the point of your question), simply substitute the wire declaration above with the following block of code:
wire [4:0] sum;
rca_4bit adder (
.sum (sum[3:0]),
.c_out (sum[4]),
.a (multiplicand),
.b (product[7:4]),
.c_in (0)
);
Let me know if you have any specific questions about how this implementation works.
First of all, throw out this concept of 'instructions'. They do not exist in Verilog. Nothing is executed. Verilog, VHDL, SystemVerilog, etc. are what are called hardware description languages. They are not executed. They are not interpreted. They define hardware components (logic gates, flip flops, registers, etc.) and their interconnections. (Not entirely accurate I suppose; but the only verilog that you can put on an FPGA - synthesizable verilog - will not be executed or interpreted. Testbenches are a different animal.)
Clocks are used to drive flip flops and registers. Data can be shifted into flip flops and registers on the edges of the clock. So inside of an always @(posedge clk) block, all of the statements will be 'executed' simultaneously and the results will be latched into the registers on the clock edge, according to the rules of how the HDL statements are interpreted. Be very careful where you are using = and <=, though. The meaning of these two assignment operations is very different inside of an always block. The basic idea is that all of the = operations are dealt with first in order of appearance. This happens at the propagation speed of the gates. Then all of the <= are dealt with at the same time, storing the argument into a register. The only thing the clock affects in this case is precisely when the registers are updated. If you are running a simulation, it won't matter how many operations need to occur between registers, but on an FPGA the clock will have to be slow enough to ensure that any changes have been able to propagate through the logic.
Faster clocks can be generated using a device called a phased lock loop (PLL). PLLs are not synthesizeable in verilog, but generally there is a way to instantiate a dedicated PLL component on the FPGA you are using. Actually, I take that back, you can certainly make a digital PLL in verilog, but you can only use it to generate signals lower than the clock frequency. A PLL contains a voltage controlled oscillator, one or more frequency dividers, a phase comparator, and some control circuitry. The output of the VCO is divided down and phase compared with the input frequency. The VCO control voltage is adjusted until the divided down VCO output precisely matches the frequency and phase of the reference signal. If you set the divider to 5 and use 50 MHz for the reference frequency, the PLL will generate a 250 MHz signal that is precisely phase locked to the 50 MHz reference. There are several reasons for doing this. Using a PLL allows generation of multiple clocks so different logic can be run at different speeds e.g. for specific peripheral interfaces or for slow, complex combinatorial logic. It also can allow the device to control its own clock frequency to save power.
Blocking statements inside of always blocks will generate combinatorial logic. Again, this logic will generally always be 'executed' regarless of the clock because it defines actual logic gates. It can be beneficial to use a few temporary variables, but care must be taken to ensure that there isn't so much extra logic that the timing requirements are not met.
Best Answer
Mainly opinion based, and below is mine.
Each always block should either implement combinatorial logic or sequential design, but the sequential design may contain expressions that result in combinatorial logic, as long as the result that is assigned is either combinatorial logic or sequential design for all results of the always block. You two examples already adhere to this rule.
Use blocking assignment for combinatorial logic, and non-blocking assignment for sequential design to avoid race conditions between different always blocks driven by the same clock.
Finally, benefit from using an advanced synthesis/simulation tool, and write the code in a way that is most obvious and readable, in order to reduce risk of bugs, and make future maintenance easier. The purpose of having an advanced synthesis/simulation tool is that the tool handles the tedious task of using resources the best way, so you can write the code the easy way. Thus don't write hardware tailored code with the purpose of utilizing resources the best way, except if you are aware of a specific problem, which is probably not the case with your example code.
With your example, I think the single always block is most obvious and readable, so that should be preferred.