First, note that not all Verilog designs are synthesizable. Usually, only a very specific subset of constructs can be used in a design that is to be realized in hardware.
One important restriction that pops up is that every reg
variable can only be assigned to in at most one always
statement. In other words, reg
s have affinity to always
blocks.
The following types of always
blocks can generally be used.
always @(*) begin
// combinational
end
always @(posedge clk) begin
// sequential
end
In the former case, the *
indicates that the block should be executed whenever any signal used in the block changes or, equivalently, that the block should be executed continuously. Therefore, reg
s that have affinity to combinational always
blocks are implemented as signals computed from other signals using combinational logic, i.e. gates.
Registers that have affinity to always
blocks of the latter type, on the other hand, are outputs of D flip-flops that are clocked on the rising edge of clk
(falling edge if negedge
is used). Inputs to the flip-flops are, again, computed with combinational logic from other signals.
Consider the following, somewhat contrived example.
reg out, out_n;
always @(*) begin
out_n = !out;
end
always @(posedge clk) begin
out <= !out;
end
Here, out_n
is associated with the first always
block, out
with the second. out_n
will be implemented with a single NOT gate that will drive out_n
and be driven from out
(note that it is a pure combinational logic). On the other hand, out
will be driven by a flip-flop clocked from clk
. The input to the flip-flop will again be computed by a NOT gate from out
(which is driven by the aforementioned flip-flop). Optimizing synthesizers will combine the two NOT gates and use one NOT gate and one flip-flop.
Depending on the hardware you have available, other types of constructs can be used. For example, if the flip-flops have asynchronous resets, the following construct is also synthesizable.
always @(posedge clk or posedge rst) begin
if (rst)
// reset
else
// sequential
end
First of all, throw out this concept of 'instructions'. They do not exist in Verilog. Nothing is executed. Verilog, VHDL, SystemVerilog, etc. are what are called hardware description languages. They are not executed. They are not interpreted. They define hardware components (logic gates, flip flops, registers, etc.) and their interconnections. (Not entirely accurate I suppose; but the only verilog that you can put on an FPGA - synthesizable verilog - will not be executed or interpreted. Testbenches are a different animal.)
Clocks are used to drive flip flops and registers. Data can be shifted into flip flops and registers on the edges of the clock. So inside of an always @(posedge clk) block, all of the statements will be 'executed' simultaneously and the results will be latched into the registers on the clock edge, according to the rules of how the HDL statements are interpreted. Be very careful where you are using = and <=, though. The meaning of these two assignment operations is very different inside of an always block. The basic idea is that all of the = operations are dealt with first in order of appearance. This happens at the propagation speed of the gates. Then all of the <= are dealt with at the same time, storing the argument into a register. The only thing the clock affects in this case is precisely when the registers are updated. If you are running a simulation, it won't matter how many operations need to occur between registers, but on an FPGA the clock will have to be slow enough to ensure that any changes have been able to propagate through the logic.
Faster clocks can be generated using a device called a phased lock loop (PLL). PLLs are not synthesizeable in verilog, but generally there is a way to instantiate a dedicated PLL component on the FPGA you are using. Actually, I take that back, you can certainly make a digital PLL in verilog, but you can only use it to generate signals lower than the clock frequency. A PLL contains a voltage controlled oscillator, one or more frequency dividers, a phase comparator, and some control circuitry. The output of the VCO is divided down and phase compared with the input frequency. The VCO control voltage is adjusted until the divided down VCO output precisely matches the frequency and phase of the reference signal. If you set the divider to 5 and use 50 MHz for the reference frequency, the PLL will generate a 250 MHz signal that is precisely phase locked to the 50 MHz reference. There are several reasons for doing this. Using a PLL allows generation of multiple clocks so different logic can be run at different speeds e.g. for specific peripheral interfaces or for slow, complex combinatorial logic. It also can allow the device to control its own clock frequency to save power.
Blocking statements inside of always blocks will generate combinatorial logic. Again, this logic will generally always be 'executed' regarless of the clock because it defines actual logic gates. It can be beneficial to use a few temporary variables, but care must be taken to ensure that there isn't so much extra logic that the timing requirements are not met.
Best Answer
Nonblocking assignments simply defer the actual update of the value until all of the statements in the current always block are evaluated. It has the appearance that all of the statements run "concurrently" or "in parallel", but if this was actually the case, it creates an ambiguity: what happens when you assign the same reg two different values in the same always block? If things are truly concurrent, this is a race condition and the new value will be unpredictable. However, the language semantics dictate something else: that the statements must be evaluated sequentially. If you assign the same reg from multiple places in the same always block, the last one takes precedence. Hence, you can consider that the statements are "evaluated" sequentially, but the regs are all updated with new values concurrently.
The synthesizer will convert the HDL code into logic that implements the equivalent functionality. In hardware, things will naturally be evaluated in parallel if there are no data dependencies, but the ordering of the statements would determine the precedence - which value is selected to be loaded into the next register or logic gate.