I am a programmer new to electronics. I wanted to get a perspective on wether Programmable logic is feasible in allowing a basic math algorithm to be accelerated.
Wanting to solve a ray-intersection algorithm (few multiplications and subtractions ) over a grid of 800×600 (480000) numbers. [I understand that integers would be ideal, and control logic is frowned upon – these constraints I can work around]
Everything I have researched so far says that I can offload this processing from a processor onto a FPGA could be programmed that would calculate the problem space in a very efficient manner.
few questions:
I am thinking of using a CPLD – perhaps literally an Altera MAX® 10 – does this provide the right scale device to get this type of problem set complete ?
If I wanted to calculate the problem set 100/sec would that be possible?
(assuming performance is a problem) Can the problem be easily divided amongst different chips – where each chip wrote it's solution to a different region of ram?
Is this sort of project feasible to attach to a lightning bold/ usb3 / sata or pci express slot in a PC ?
– what type of investment does it take to create a board that sophisticated (timing-wise)?
Could this be done with DSP chips? (I'm having a hard time understanding where DSPs are no longer useable – I understand there typical use in filters etc … but how much more applicable can they be ? Can they perform simple integer math operations?)
Could this be done with discreet logic chips? – I was looking at a this EE project http://www.ele.uri.edu/~vijay/ProjectELE447.pdf … which described all manner of ALU and multiplier chips – could these simply be piped together to express this algorithm efficiently?
How would my needs change if I needed to solve a grid of 4000 x 2000?
Thanks for your time
Best Answer
This is, presumably, for ray tracing acceleration?
See also Can FPGA out perform a multi-core PC?
The MAX10 is a range of FPGAs of varying sizes. They certainly are capable of multiplication: up to 144 different 18x18 calculations per cycle. You need to be careful with FPGA designs not to end up limited by the speed of your DRAM. It's also a fairly substantial project to learn to program one from scratch and the tools are kind of frustrating.
Can the problem be easily divided amongst different chips - well, probably, this kind of tile-based solution is not unusual. Partitioning it is for you to work out or a software question. Remember that coordination between devices is much slower than within devices.
PCIe: I was involved in a project that built some multichannel fast ADC boards with FPGAs mounted in PCIe cards. We had half a dozen soldered by hand by experts, resulting in them costing about $1000 each. The layout took a few weeks; it was done by a graduate student with an experienced engineer looking over his shoulder regularly.
DSPs are built to do integer maths, especially multiply-accumulate (a=b*c+d). See What is the difference between a DSP and a standard microcontroller? They're generally offering similar functionality to SIMD instructions; Intel offer a multiply-accumulate since Haswell.
Could this be done with discrete logic chips?
The paper you linked looks like someone's thesis project of doing actual silicon design - taking that out of the simulator and building a chip from it is again a multiple-thousand dollar operation, and very rarely worth it.
The other approach of assembling chips of varying functions on a PCB has not been sensibly fast since the early 80s.