Electronic – How to design a bare-metal Zynq PS-PL system with an accelerator/coprocessor in the PL

fpgazynq

I am new to FPGA development and am trying to build a simple system using the Zynq SoC (on the Zedboard). It will consist of an IP block generated using Vivado HLS which will accept arrays of data, operate on them, and produce result arrays. A bare-metal application (running on an ARM core in the Processing System(PS)) will use the IP block (in the Programmable Logic(PL)).

I am not sure how to manage data-transfer between the PS and the PL. From the information that I have seen so far, one option is to include AXI Stream interfaces on the IP block and use an AXI DMA to transfer data to and from memory.

-For such a design, what would the programming sequence in the bare-metal application look like?

-What would the structure of the C code used to generate the IP be (for the IP to be able to accept and return arrays using AXIS ports)?

I would really appreciate any pointers or links to relevant information.

Thank you.

{Edit (23/05/16): Neural network training is the specific application I am working on. So the input arrays to the IP block are network parameters. They are processed, updated, and returned. This involves multiply-accumulate operations on the rows of the arrays. And network training involves many such "input-process-output" iterations. This can of course be done on a processor (it is actually done using GPUs in most cases) but doing it on an FPGA could be faster and more energy efficient.

There is no need for any user I/O. (I will just be printing to a terminal for debugging purposes.)

I was mainly looking for some guidelines/pointers on managing the interaction between the processor and the IP (designing the interface in HLS and the programming sequence for the bare-metal application) as well as any suggestions on other approaches that I could take (alternatives to using the AXI DMA and AXIS).

(I am using the Xilinx Vivado Design Suite and Xilinx SDK. The design flow would be:
1. Create and export IP using Vivado HLS.
2. Design PS-PL Zynq System using Vivado IP Integrator.
3. Export hardware to Xilinx SDK and write bare-metal application in SDK.)}

Best Answer

Can you be more specific about what you want to do with these arrays? Are you outputting them to the terminal? What is it that you are trying to accomplish with your design? Is it something that could be done with a processor only or are you using the FPGA for a specific hardware algorithm?

Have you read the Xilinx documentation or seen the Vivado training videos? Xilinx has a ton of great documentation and tutorials.

-For such a design, what would the programming sequence in the bare-metal application look like?

I would recommend you use the automated tools to configure the FPGA. You can use the Vivado IP Integrator to configure the internal bus connections with block diagrams.

This tool will create a design file which can then be exported to the SDK where you can write your C code. The communication between the FPGA and ARM should be done with a data type defined in the generated C header files.

-What would the structure of the C code used to generate the IP be (for the IP to be able to accept and return arrays using AXIS ports)?

The IP is generated from the Vivedo IP Integrator. Better use the automated tools if you are new to designing.

Remember the FPGA is a blank slate and must be configured before it is used unlike the processor which is hardwired to Fetch Decode Execute.

Note: I have used Xilinx ISE and designed with VHDL and IP. I am less familiar with the newer tools.