UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor aptcore
Visitor
7,863 Views
Registered: ‎02-09-2016

memcpy to programmable logic not using DMA

Jump to solution

My system records data from ADCs to DRAM. I then transfer this to a buffer in programmable logic, where it is processed, and then out to a buffer allocated by the CPU.

 

I have configured Linux to leave a 32 MB hole in DRAM at physical address 0x0e000000 and I have 256 KB of memory in programmable logic (PL) accessible at physical address 0x84000000.

 

I use memmap to obtain virtual addresses for these two locations. I can also allocate buffers with malloc, which gives me a virtual address to the CPU buffer (probably in DRAM but cached). I have checked the address alignment and the transfer size (256 KB) and they look good.

 

I can pass the virtual addresses to memcpy() and transfer data successfully between the three locations:

From        To CPU      To DRAM     To PL

CPU        1250 MB/s    115 MB/s    20 MB/s

DRAM        212 MB/s     77 MB/s    19 MB/s

PL           61 MB/s     42 MB/s    15 MB/s

 

 

The problem is that I need a higher transfer rate to the programmable logic.

 

The programmable logic is clocked at 35 MHz with a 32 bit interface, so I believe this limits the transfer rate to 70 MB/s. If I write a 'for' loop and transfer 64 bit words I get the same data rate as memcpy (20 MB/s), so I think memcpy is not using a DMA engine.

 

I see the following printed in the kernel log by the PL330 DMA driver:

dma-pl330 f8003000.ps7-dma: Loaded driver for PL330 DMAC-267056
dma-pl330 f8003000.ps7-dma: DBUFF-128x8bytes Num_Chans-8 Num_Peri-4 Num_Events-16

 

My device tree contains the following entry for the DMA engine:

ps7-dma@f8003000 {
    #dma-cells = <0x1>;
    #dma-channels = <0x8>;
    #dma-requests = <0x4>;
    arm,primecell-periphid = <0x41330>;
    compatible = "xlnx,ps7-dma-1.00.a", "arm,primecell", "a$
    interrupt-parent = <0x1>;
    interrupts = <0x0 0xd 0x4 0x0 0xe 0x4 0x0 0xf 0x4 0x0 0$
    reg = <0xf8003000 0x1000>;
};

 

I am using Linux 3.9, although I have tried backporting the pl330 driver from 3.17 and the speed is the same. Changing compiler optimisation does not have any affect.

 

Is memcpy() using the PL330 DMA?

If not, am I doing something that prevents the DMA engine being used?

Is there an alternative interface to use the DMA (other than writing the registers directly)?

0 Kudos
1 Solution

Accepted Solutions
Scholar rfs613
Scholar
14,295 Views
Registered: ‎05-28-2013

Re: memcpy to programmable logic not using DMA

Jump to solution

memcpy() does not use DMA. It is a generic facility provided by the C library. This library is compiled with knowledge of the target processor (eg. ARM) and perhaps knowledge of the CPU variant (eg. Cortex-A9), but nothing more hardware-specific.

In contrast, the available DMA controllers, and the memory ranges they can access, are highly system specific, and may well change at runtime, as peripherals and/or drivers are loaded/unloaded. So to make use of these facilities, software must actually request DMA. There are many resources on this, here are a couple:

http://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/10693/1/Linux%20DMA%20from%20User%20Space-public.pdf
http://events.linuxfoundation.org/sites/events/files/slides/ripard-dmaengine.pdf
https://github.com/bmartini/zynq-xdma

 

 

0 Kudos
3 Replies
Visitor aptcore
Visitor
7,781 Views
Registered: ‎02-09-2016

Re: memcpy to programmable logic not using DMA

Jump to solution

To correct my original post, the programmable logic is clocked at 35 MHz across the 32 bit M_AXI_GP1 master interface. The memory buffer in the PL is also 32 bits wide, so I believe this limits the transfer rate to 35*4=140 MB/s. The ~70 MB/s I see on transfers from the PL is OK, but I need to speed up transfers to the PL.

 

We are using Vivado 2105.1. Here is an image of the interconnect:

gpconnection.PNG

0 Kudos
Scholar rfs613
Scholar
14,296 Views
Registered: ‎05-28-2013

Re: memcpy to programmable logic not using DMA

Jump to solution

memcpy() does not use DMA. It is a generic facility provided by the C library. This library is compiled with knowledge of the target processor (eg. ARM) and perhaps knowledge of the CPU variant (eg. Cortex-A9), but nothing more hardware-specific.

In contrast, the available DMA controllers, and the memory ranges they can access, are highly system specific, and may well change at runtime, as peripherals and/or drivers are loaded/unloaded. So to make use of these facilities, software must actually request DMA. There are many resources on this, here are a couple:

http://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/10693/1/Linux%20DMA%20from%20User%20Space-public.pdf
http://events.linuxfoundation.org/sites/events/files/slides/ripard-dmaengine.pdf
https://github.com/bmartini/zynq-xdma

 

 

0 Kudos
Visitor aptcore
Visitor
7,446 Views
Registered: ‎02-09-2016

Re: memcpy to programmable logic not using DMA

Jump to solution

Thank you for the explanation. I assumed that because the PL330 driver is now compatible with the Linux DMA Engine API, Linux could use the hardware if it is available, but as you say - memcpy always uses the CPU.

It appears that there isn't a standard user space interface to Linux DMA Engine drivers, so I will need to bridge my user space code to the kernel driver with my own module. The first link is particularly useful for this, and there are some existing solutions e.g. https://github.com/jeremytrimble/ezdma

0 Kudos