Electronic – Architectures for High Throughput Data acquisition with embedded systems

adcembeddedhigh speedramsampling

Goal: to acquire a high-throughput ADC-generated data flow (1 Msample/s @ 16bits) into a System-on-Chip processor. Purpose: real-time data visualization and some (minor) processing in real-time

What are the pros and cons of the various architectures to achieve the goal.

Architecture 1

A Dual Port RAM, taking data from an FPGA which glues the SPI output of the ADCs and the port 1 of the RAM. The SoC IC is connected to this RAM on port 2 using an external memory bus and sees the sampled data in its own memory space. The processor is 100% free from managing the ADC. The sample data is visible in a double buffering scheme from Soc end with interrupts informing the SoC when the buffers are full.

Architecture 2

As in Architecture 1, making the dual port RAM visible on a PCIe bus as an endpoint device (as opposed to using the External Memory Bus which we have seen in several SoCs).

Architecture 3

Use an Soc with embedded SPI controllers and DMA. Program the DMA to be triggered by the End-of-Conversion signal of the ADCs and to move data from the SPI FIFO to memory.

Architecture 4

Use a dual or quad-core processor and devote one core to control the SPI interface with the ADC and poll a GPIO port to detect end-of-conversion.In fact, this solutions implements by software DMA-like functionality.

Architecture 5
Your solution?

Best Answer

First of all, 1 MS/s at 16 bit is just 2MB/s – that's really not too much for USB2 to carry. There's no need for dual port RAM, if we're talking about devices that would lend themselves to visualization or has PCIe like your Arch2 suggests, in my opinion.

The fact that you're doing visualization implies you don't care about latency – what's half a millisecond to the human eye? So, you're pretty free with respect to choice of sample transport.

So:

Arch 1

Lots of components, including an FPGA that does nothing but write a lowly 1 million samples per second to a RAM interface. I'd say, if you go that way, use a feasibly fast bus, and that would include simple SPI or QSPI, and a bit of RAM with the FPGA to implement a ring buffer. No need for dual-port RAM – you'd need to communicate information like "ok, there's new samples available for you" or "no, nothing to fetch right now", anyways.

Arch 2

PCIe sounds like a huge overhead here. Again, the rate we're talking about is 2MB/s.

Arch 3

If your ADC, and your SoC allow you to do that, start with that! Certainly sounds like the easiest, lowest-component-count solution. Often, this doesn't work for electrical reasons. SPI is absolutely a normal interface for an embedded system to have, so I'd assume that it'd be rather easy to find a controller that has it.

Problem remains that you'd still need someone to e.g. generate your sample clock etc.

Arch 4

well, yeah, as you say, a less great version of 3.

Arch 5

1MS/s isn't really high-throughput. In fact, I remember writing firmware for a now defunct ARM cortex-M0 project that ran the internal ADC at 500kS/s and pushed the data through USB2 to a PC. With a slightly more capable MCU, you should be able to do the same. That way, you'd have cheap-as-hell device dedicated to handling ADC data and stuffing it in USB packets, and you'd just have to write a couple lines of Python or C to run on your embedded device to ask the microcontroller for USB bulk packets full of data. Bonus: you can clock down your main CPU whenever you want to, and it will have no effect on the sampling.

Arch 6

Kinda easy. You can all do minimal visualization, sampling at several megasamples per second (complex) and a bit of analysis on ARM cortex-M4, with the help of a bit of glue-FPGA (without own RAM, iirc). This is proven by the open design of the HackRF one. I think it might be worth for you to look into this. From my perspective, it sounds like you'd basically just want to throw out all the RF stuff in that, and use it as is. You'd even get drivers and firmware for free!

HackRF digital block diagram
HackRF hardware components diagrams from the project wiki

Above diagram is simplified, as mentioned, there's a small "glue" FPGA between the ADC/DAC hybrid and the LPC Cortex-M4, as the schematic will tell you.

Related Solutions

Electronic – Smallest embedded linux distro

I'd say you're dreaming. The main problem will be the limited RAM.

In 2004, Eric Beiderman managed to get a kernel booting with 2.5MB of RAM, with a lot of functionality removed.

However, that was on x86, and you're talking about ARM. So I tried to build the smallest possible ARM kernel, for the 'versatile' platform (one of the simplest). I turned off all configurable options, including the ones that you're looking for (USB, WiFi, SPI, I2C), to see how small it would get. Now, I'm just referring to the kernel here, and this does not include any userspace components.

The good news: it will fit in your flash. The resulting zImage is 383204 bytes.

The bad news: with 256kB of RAM, it won't be able to boot:

$ size obj/vmlinux
  text     data     bss     dec     hex filename
734580    51360   14944  800884   c3874 obj/vmlinux

The .text segment is bigger than your available RAM, so the kernel can't decompress, let alone allocate memory to boot, let alone run anything useful.

One workaround would be to use the execute-in-place support (CONFIG_XIP), if your system supports that (ie, it can fetch instructions directly from Flash). However, that means your kernel needs to fit uncompressed in flash, and 734kB > 700kB. Also, the .data and .bss sections total 66kB, leaving abut 190kB for everything else (ie, all dynamically-allocated data structures in the kernel).

That's just the kernel. Without the drivers you need, or any userspace.

So, yes, you're going to need a bit more RAM.

Electronic – Flash and RAM : Code Execution

This depends on the device.

RAM can be built faster than Flash; this starts to become important in about the 100MHz range.

Simple microcontrollers

Small slow microcontrollers execute directly out of Flash. These systems usually have more Flash than SRAM too.

Midrange systems

Once your device gets faster then the situation is a little different. Midrange ARM systems may do that as well, or they may have a mask ROM bootloader that does something smarter: perhaps downloading code from USB or external EEPROMs into internal SRAM.

Large systems

Larger, faster systems will have external DRAM and external Flash. This is typical of a mobile phone architecture. At this point, there is plenty of RAM available and it's faster than the Flash, so the bootloader will copy and execute it. This may involve shovelling it through the CPU registers or it may involve a DMA transfer if a DMA unit is available.

Harvard architectures are typically small so don't bother with the copying phase. I've seen an ARM with "hybrid harvard", which is a single address space containing various memories but two different fetch units. Code and data can be fetched in parallel, as long as they are not from the same memory. So you could fetch code from Flash and data from SRAM, or code from SRAM and data from DRAM etc.