Python – Hardware accelleration for Python dict

cpumicrocontrollerpython

I wondered if it were possible to accelerate in hardware the Python dict? Dicts are mappings between a key and value that form a large part of the backbone of how Python works. Everything is an object; objects are built with dicts.

I have heard of CAMs (content addressable memories), but apart from a similarity in the name, I don't know if they could help in any thought experiment to create a computer optimised to run Python. (An equivalent to Forth, Java and Lisp optimised machines).

Any ideas and references appreciated.

Best Answer

The answer is a guarded "Yes" : it is possible, but probably not commercially worthwhile. Hardware acceleration for OO languages was once a hot topic, but died down about 25 years ago.

One project was the Linn Rekursiv. It coincided with the rapid rise of RISC hardware and didn't last long enough to see their fall back from the leading edge. Probably the best published article about it was Dick Pountain's Byte article. Picture of the Rekursiv boardhere...

So while the Rekursiv project proved the feasibility of its ideas, its added complexity (about 70000 gates instead of 20000 for a RISC, and all those extra pins!) made it financially unattractive on the small ASICs of the time. Now, with gate budgets in the high millions, you could afford that extra logic and barely notice the cost, but the industry is so heavily entrenched in current practice that you would have to demonstrate some huge advantage - and even then, (like many better technologies available) it would be barely noticed, then (at best) politely ignored. (Disclosure of interest : if this account appears bitter and twisted; I was one of the Rekursiv team)

Now if you need dynamic binding in a C++ program you can watch your CPU stepping painfully through method table after method table looking for the right function to call, instead of a hardware accelerated method hash despatching in 6 cycles (with the commonest "primitives" not just despatched but completed faster than that).

An FPGA on a board with a PCIe interface can be used as a coprocessor to a regular CPU, and you can offload computationally heavy stuff to the FPGA. However the PCIe interface is quite slow, so the cost of the offloaded operation has to be quite high before this is worthwhile.

Some FPGAs incorporate a CPU close to the FPGA fabric and these may serve as a way to prototype your ideas with less overhead than the PCIe bus (but with a lower performance CPU) : I wish this had been around in the Rekursiv days!

Related Topic