Electronic – Moving brute-force search to FPGA

compilercomponent-selectionfpgavhdl

I am currently working on a scientific hobby project about computing the error detection capabilities of CRCs. Unfortunately the C++ code used for such computations has up to years of run time on normal x64 CPUs, even on multi core systems. Also the power consumption of such systems is a pain.

It came to my mind that the common way of x64 brute-force-searching isn't the best. I would like to move the algorithm to an FPGA. Alas I have worked very little with FPGAs and I lost the minimal knowledge after working in C/C++ software engineering for decades. So I need a little help about the feasibility of my idea before burying myself into the technology.

The algorithm I want to run in hardware is a specialized ~1000 line C++ code that could easily be ported to C. No floating point operations. No standard libraries required. High frequent loops. Lots of basic 64 bit integer arithmetic. Even more binary operations (shift, or, xor, bit-counting, etc.) and some array operations. A few kB of RAM and ROM should be sufficient. No peripherals required. Very few memory allocations are used that could be removed by adapting the code. The computation results can be easily filtered internally so a serial interface should be enough to pass the results to a PC.

I would like to compile the C++ or C code into VHDL code and let it run on a FPGA as fast as possible. Also, since this is a hobby project, the FPGA (including software and a developer board) should be affordable.

My questions:

  • Can I expect a significant speedup? By which order of magnitude?
  • Is there a C/C++ compiler suited for the purpose?
  • Which FPGAs are suitable?

Best Answer

Can I expect a significant speedup? By which order of magnitude?

Sure, by quite a lot. CRCs can be computed on data a byte a a time using a straightforward table lookup. A moderate-sized FPGA (say, a Xilinx XC6SLX75) will have a hundred or more blocks of internal dual-port RAM that allow 200 data streams to be processed in parallel at a rate of one byte per clock cycle, where the clock could be 200 MHz or more. That's a throughput of at least 40 GB/s. How fast is your "x64" CPU?

Is there a C/C++ compiler suited for the purpose?

Not really. If you want to get the most out of your FPGA, you'll want to use an HDL to define the hardware datapath directly. Implementations derived from programming languages are possible, but the performance ranges from lousy to useless.

Which FPGAs are suitable?

That's bordering on a product recommendation, which would be off-topic for this site, but look at the midrange offerings from Xilinx (such as the Spartan-6 series) or Intel (formerly Altera, such as their Cyclone IV series). Inexpensive development boards for these families are readily available from places like Digilent.

Related Topic