I can't comment on your specific application (not being a cryptography expert), however placing a processor on board with a FPGA is an exceedingly common thing to do. Mostly the reason is that you now free up FPGA space to do what FPGA is good at, while using the less expensive separate processor to do what it is good at, perhaps even faster than could be done with a soft CPU running in the FPGA. In addition, larger FPGA's can get quite expensive, compared to faster ARM's which can be fairly reasonably priced.
Basically I think you should use the two chips, but it's hard to make a proclamation for sure without knowing details about your specific area.
The code you show is essentially a priority encoder.
That is, it has an input of many signals, and its output indicates which of those signals is set, giving priority to the left-most set signal if more than one is set.
However, I see conflicting definitions of the standard behavior for this circuit in the two places I checked.
According to Wikipedia, the standard priority encoder numbers its inputs from 1. That is, if the least significant input bit is set, it outputs 1, not 0. The Wikipedia priority encoder outputs 0 when none of the input bits are set.
Xilinx's XST User Guide (p. 80), however, defines a priority encoder closer to what you coded. The inputs are numbered from 0, so when the input's lsb is set it gives a 0 output. However, the Xilinx definition gives no spec for the output when all input bits are clear (Your code will output 3'd7).
The Xilinx user guide, of course, will determine what the Xilinx synthesis software is expecting. The main point is that a special directive (*priority_extract ="force"*)
is required for XST to recognize this structure and generate optimal synthesis results.
Here's Xilinx's recommended form for an 8-to-3 priority encoder:
(* priority_extract="force" *)
module v_priority_encoder_1 (sel, code);
input [7:0] sel;
output [2:0] code;
reg [2:0] code;
always @(sel)
begin
if (sel[0]) code = 3’b000;
else if (sel[1]) code = 3’b001;
else if (sel[2]) code = 3’b010;
else if (sel[3]) code = 3’b011;
else if (sel[4]) code = 3’b100;
else if (sel[5]) code = 3’b101;
else if (sel[6]) code = 3’b110;
else if (sel[7]) code = 3’b111;
else code = 3’bxxx;
end
endmodule
If you can rearrange your surrounding logic to let you use Xilinx's recommended coding style, that's probably the best way to get a better result.
I think you can get this by instantiating the Xilinx encoder module with
v_priority_encoder_1 pe_inst (.sel({~|{RL[6:0]}, RL[6:0]}), .code(rlever));
I've nor'ed together all bits of RL[6:0]
to get an 8th input bit that will trigger the 3'b111 output when all RL bits are low.
For the llever
logic, you can probably reduce the resource usage by making a modified encoder module, following the Xilinx template, but requiring only 7 input bits (your 6 bits of LL
plus an additional bit that goes high when the other 6 are all low).
Using this template assumes the version of ISE you have is using the XST synthesis engine. It seems like they change synthesis tools on every major rev of ISE, so check that the document I linked actually corresponds to your version of ISE. If not, check the recommended style in your documentation to see what your tool expects.
Best Answer
Your question is rather broad.
To start with: the good new is that you don't need to buy an FPGA board to find out how big your design is. The development tool will tell you. It will also tell you if you exceed the number of resources (Memories, LUTs, Registers, DSPs or I/O pins.) If it does not fit, you select a bigger FPGA in the tool setting, until you get to the really BIG ones you probably can't afford because they are e.g. $15000 each.
The second good new is that most FPGA development tools are free, at least for the smaller FPGAs. And 'small' is still rather big.
The not-so-good new is that HLS is still in development. We ran some tests and they still markedly under-perform compared to Verilog or VHDL. But for just comparing algorithms they are probably good enough.
Now as to "flow, parallelism" you get into difficult areas. The more logic in parallel or more pipeline stages the faster the algorithm will run. But also the resources utilization (area) will go up. It is one of the many tasks of an HDL designer to try to find a balance between speed and area.
Getting to "array width/length". That is the fastest way I found to fill an FPGA. I recently designed code for convolution matrices. It was a module which had the matrix width/height as parameters. With little trouble I managed to fill 60% of the FPGA with that module alone (It was supposed to use 15%).