Electronic – Axi DMA correct parameters

dmafpgavivadoxilinx

I'm making my design with Vivado HLs and Vivado and I'm doing some somewhat big transfers between DDR and my custom IP block and vice-versa.

Each transfer from DDR to custom IP is of 256x256x4=262144 bytes and it happens 4 times.

My MM2S (Memory Mapped to Stream) velocity is at 350Mbytes/s and by S2MM is at 200 Mbytes/s.

I know I can get better velocities and I guess these slow ones are related to the parameters of the Axi DMA block.

That's what I came here to ask you, to help me understand which should be the correct parameters since I still can't understand it from reading the logicore product guide.

enter image description here

Width of buffer length n
From what I understand this is the maximum length of the transfer in bytes like so 2^n. So in my case as 2^18=262144 shall I put 18 in here?

Memory Map Data Width
Data width in bits of the AXI MM2S Memory Map Read data bus. I have no idea here. My words have 32 bits and I defined the entrance stream of my block to have a length of 32 bits but what is this?

Stream Data Width
I guess here I should put 32 correct?

Max Burst Size

Burst partition granularity setting. This setting specifies the maximum size of the burst cycles on the AXI4-Memory Map side of MM2S. Valid values are 2, 4, 8,16, 32, 64, 128, and 256.

Again, I have no idea what to put here.

I could do a trial-and-error approach and change parameters until I find the best ones but the problem is that each re-synthesyze and re-implementation in Vivado takes a lot of time…

Best Answer

Width of buffer length n: This is exactly what you think, the largest transfer in byte the IP can perform with a single command. 18 bits may be enough, but it's likely you need 19 bits to represent 2^18, check the datasheet to make sure.

Memory Map Data Width This is on the AXI side. You can put what you want (AXI will upsize/convert as needed), but in my experience it's better to avoid size conversion and clock conversion as much as possible. That means that if your AXI memory is 128 bits 100MHz, you should use the same 100MHz clock here with 128 bits wide port. On the Zynq, it expects 32 or 64 bits, and I guess the upsize/convert are "free" since it's done on the fixed hardware.

Max Burst Size This also affects the AXI side. It's the maximum transfer of Memory Map Data Width bits it will perform in a single transfer request. Higher is usually better, because of the way memories work with bursts. However, it will affect your system's performances (arbitrating) and possibly inflate the core's size if you use store-and-forward (which I'm pretty sure the IP core forces you to use, it used to be optional). The impact of that option depends mostly on the AXI infrastructure and load. On a load-light infrastructure with large write/read acceptance, you won't see any impact.

Stream Data Width This is the AXI-stream side. This is what your own IP needs, in your case it seems to be 32 bits.

Don't forget that the AXI-Stream and AXI port doesn't have to use the same size and clocks. However, for maximum throughput, the AXI port must have higher throughput than the AXI-Stream side.

For instance, if you AXI-Stream (and thus, your core) use 32 bits with a 150MHz clock, it effectively have a throughput of 4.8GBits/s. If your AXI port runs at 100MHz, it can't be 32 bits since it won't have enough throughput (3.2GBits/s < 4.8GBits/s). At 64 bits (6.4GBits/s), you would have enough to feed continuously to your IP core.