cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
264 Views
Registered: ‎05-20-2019

AXI Multi-channel DMA best practices for minimizing latency

Jump to solution

I'm using the AXI Multi-Channel DMA IP for 16 S2MM (PL->PS) channels. Each of the 16 channels provide a steady 2.5 MBytes/second of data. This is on the ZCU111.

My technique for multiplexing those 16 channels into the MCDMA AXIS slave involves a custom "circular switch" that cycles circularly through each of the channels; it shifts from one channel to the next on TLAST. The TLAST generator sets TLAST every 2048 AXIS transfers (32-bits wide). The MCDMA AXIS slave uses a 300MHz clock.

So, the resulting traffic pattern into the MCDMA looks like:
(1) Channel 0 clocks in at a steady 2.5 MBytes/sec (so there are a lot of idle cycles) for 2048 active cycles.
(2) Channel 1 clocks in 2048 active cycles as fast as MCDMA raises TREADY. These were buffered while Channel 0 was active.
(3) Same for channels 2 through 15.
(4) Channel 0 clocks in a few cycles quickly (those accumulated while blocking on Channels 1 through 15) and completes most of the 2048 active cycles at a steady pace.

Everything is working ... but what I'm observing is that I need to wait roughly 250 milliseconds between MCDMA interrupt (as received in the kernel IRQ handler) and when I reliably find those data updated in my circular buffers (DMA coherent memory). If possible, I really need to shorten this apparent latency as much as possible.

When the MCMDA engine issues an interrupt indicating completion of a particular transfer, I'd hope to access those data within a few milliseconds, at most.

Does anyone have any tips/best practices on how to better use MCDMA to minimize this latency that I'm seeing?

What should my inputs width be on the MCDMA AXIS slave (I'm using 64-bits right now)?
Should I cycle through the channels faster in my "circular switch" to reduce the "burstiness" of the traffic?
Should my TLAST packets be shorter/longer?

I realize giving me the "right answer" here is difficult; but anything that I'm doing that's glaringly sub-optimal (again, from the perspective of reducing DMA transfer latency), I'd really appreciate the advice!

Thanks!

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Mentor
Mentor
187 Views
Registered: ‎01-28-2008

Hi @mjm5977 

  The firmware design looks fine as you describe it and similar to my own designs that work as well.

  I suspect the latency is entirely due to the how Linux is processing the buffers regarding coherency. By the time the interrupt is sent, the buffer should be sitting pretty in memory. My suggestion is to move this post to embedded to benefit from their expertise.

Thanks,

-Pat

 

https://tuxengineering.com/blog

View solution in original post

2 Replies
Highlighted
Mentor
Mentor
188 Views
Registered: ‎01-28-2008

Hi @mjm5977 

  The firmware design looks fine as you describe it and similar to my own designs that work as well.

  I suspect the latency is entirely due to the how Linux is processing the buffers regarding coherency. By the time the interrupt is sent, the buffer should be sitting pretty in memory. My suggestion is to move this post to embedded to benefit from their expertise.

Thanks,

-Pat

 

https://tuxengineering.com/blog

View solution in original post

Highlighted
Observer
Observer
173 Views
Registered: ‎05-20-2019

Thanks for the vote of confidence on the FPGA strategy. I'll ask over in embedded as you suggest.

0 Kudos