UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer paulburnsuk
Observer
6,677 Views
Registered: ‎08-25-2016

Very large streaming DMA transfers with Microblaze/Petalinux and dma_proxy driver

Hi,

I have a Microblaze design with a single 512MB DDR3 SDRAM bank and built with Vivado 2016.1, running PetaLinux 2016.1 (kernel 4.4). I'm trying to capture a large block of streaming data into DRAM and make it accessible to a user space application using zero-copy operations. The data stream is ~128MB in total and cannot be stalled for for very long before it gets into DRAM (BRAM FIFOs give a few microseconds of buffering, at most). I have an axi_dma configured for scatter-gather, using the maximum 8MB transfer length per descriptor, and ample AXI/DRAM bandwidth, so provided I can pass a suitable sg list to the DMA engine everything should be OK. (What I can't get away with is multiple small DMA operations scheduled in software since the overhead of interrupt handling and setup will mean data is lost before it gets into DRAM.)

 

I have been experimenting with the dma_proxy driver (described here https://forums.xilinx.com/xlnx/attachments/xlnx/ELINUX/10693/1/Linux%20DMA%20from%20User%20Space-public.pdf) operating in cached_buffer mode since the entirety of DRAM is cached, and for performance reasons once the data is passed to the application I will need cached access anyway.

With smaller transfers (3MB) this all seems to work exactly as described. The DMA buffers are being allocated using kzmalloc() which seems to have a limit on the amount of contiguous memory that it will allocate. To work with larger buffers I have tried reserving an area at the top of DRAM by limiting the amount visible to the kernel and then replacing the kzmalloc() call with ioremap() to get a virtual address to the reserved area. That operation seems to complete without error and the buffers get set up normally but when the transfer happens and dma_map_single() is called, it returns a dma address which is NOT the physical address I gave to ioremap(). In fact it isn't in DRAM at all, so consequently the DMA engine stops with an error.

 

Any ideas what I might be doing wrong?

Is there a better way of allocating a dedicated large DMA buffer?

Or a better way of generating a single large SG DMA chain?

Can the kzmalloc limts be turned up anywhere?

 

Here's what I get when the (modified) dma_proxy runs its internal test:

(DRAM physical address is 0x80000000-0x9FFFFFFF. The kernel has 256MB (0x80000000-0x8FFFFFFF), DMA buffer is at 0x90000000 upwards).

 

# modprobe dma_proxy cached_buffers=1
[ 42.755476] dma_proxy module initialized
[ 42.760386] dmaengine: __dma_request_channel: success (dma0chan0)
[ 42.801103] create_channel(): ioremap(phys_addr=0x90000000, size=0x00300000) returned virt_addr=0xE8380000
[ 42.810959] dmaengine: private_candidate: dma0chan0 busy
[ 42.817126] dmaengine: __dma_request_channel: success (dma0chan1)
[ 42.901135] create_channel(): ioremap(phys_addr=0x90300000, size=0x00300000) returned virt_addr=0xE8700000
[ 42.910963] Starting dma_proxy Test...
[ 44.002722] Starting transfer...
[ 44.006022] length = 3145720 rx direction
[ 44.010171] transfer(): dma_map_single(addr=0xe8700000, size=0x002ffff8, direction=2) returned dma_addr=0xa8700000. 
[ 44.026172] transfer(): virt_to_phys(addr)=0xa8700000, virt_to_bus(addr)=0xe8700000
[ 44.033961] start_transfer(): len=0x002ffff8, dir=2
[ 44.038906] xilinx_dma_prep_slave_sg(): sg_len=1
[ 44.043683] Creating segment at 0xe0266000:
[ 44.047922] hw->next_desc=0x8e480080
[ 44.051683] hw->buf_addr=0xa8700000 <------------ NOT  A VALID DMA buffer ADDRESS
[ 44.055311] hw->control=0x002ffff8
[ 44.058854] xilinx_dma_prep_slave_sg(): list_len=1, sg_used=3145720
[ 44.065264] Starting DMA transfer with desc[0]=0xe0266000:
[ 44.070808] desc[0].next_desc=0x8e480080
[ 44.074776] desc[0].buf_addr=0xa8700000
[ 44.078657] desc[0].control=0x002ffff8
[ 44.083005] length = 3145720 tx direction
[ 44.087077] transfer(): dma_map_single(addr=0xe8380000, size=0x002ffff8, direction=1) returned dma_addr=0xa8380000.

[ 44.103129] transfer(): virt_to_phys(addr)=0xa8380000, virt_to_bus(addr)=0xe8380000
[ 44.110903] start_transfer(): len=0x002ffff8, dir=1
[ 44.115839] xilinx_dma_prep_slave_sg(): sg_len=1
[ 44.120593] Creating segment at 0xe025d000:
[ 44.124831] hw->next_desc=0x8e478080
[ 44.128536] hw->buf_addr=0xa8380000  <------------ NOT  A VALID DMA buffer ADDRESS
[ 44.132221] hw->control=0x002ffff8
[ 44.135765] xilinx_dma_prep_slave_sg(): list_len=1, sg_used=3145720
[ 44.142163] Starting DMA transfer with desc[0]=0xe025d000:
[ 44.147700] desc[0].next_desc=0x8e478080
[ 44.151668] desc[0].buf_addr=0xa8380000
[ 44.155549] desc[0].control=0x0c2ffff8
[ 44.163962] xilinx-dma 40410000.dma: Channel cf87b02c has errors 10049 cdr 8e478000 cdr msb 0 tdr 8e478000 tdr msb 0

 

 

Any suggestions appreciated,

 

Thanks,

 

Paul

0 Kudos
4 Replies
Scholar milosoftware
Scholar
6,662 Views
Registered: ‎10-26-2012

Re: Very large streaming DMA transfers with Microblaze/Petalinux and dma_proxy driver

Any ideas what I might be doing wrong?

Is there a better way of allocating a dedicated large DMA buffer?

 

dma_alloc_coherent()

It can give you a big contiguous buffer.

Use CMA for reserving the memory.

You can also plug "holes" in RAM in the devicetree using "reserved-memory". CMA + dma_alloc is preferred.

 

Or a better way of generating a single large SG DMA chain?

Can the kzmalloc limts be turned up anywhere?

 

Yes, there's a kernel config for it. Its name ends with "ORDER", forgot the exact name. Under "system settings" I think.

0 Kudos
Highlighted
Observer paulburnsuk
Observer
6,651 Views
Registered: ‎08-25-2016

Re: Very large streaming DMA transfers with Microblaze/Petalinux and dma_proxy driver

Thanks for the quick reply. I looked into using CMA initially, but I couldn't get it to work.  I have enabled CMA (Kernel Features -> Continuous Memory Allocator) and added cma=128M to the kernel command line args. But no CMA memory seems to be allocated:

 

[ 0.000000] Kernel command line: console=ttyS0,115200 earlyprintk cma=128M
[ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.000000] Memory: 441964K/458752K available (3871K kernel code, 103K rwdata, 1020K rodata, 6714K init, 527K bss, 16788K reserved, 0K cma-reserved)
[ 0.000000] Kernel virtual memory layout:
[ 0.000000] * 0xffffe000..0xfffff000 : fixmap
[ 0.000000] * 0xffffe000..0xffffe000 : early ioremap
[ 0.000000] * 0xf0000000..0xffffe000 : vmalloc & ioremap

 

Is CMA definitely supported under Microblaze?

Are there any other CONFIG settings needed to make it work?

 

Thanks again,

 

Paul

 

0 Kudos
Visitor mmcgregor
Visitor
4,720 Views
Registered: ‎11-30-2014

Re: Very large streaming DMA transfers with Microblaze/Petalinux and dma_proxy driver

Hi Paul,

 

I'm running in to exactly the same problem. Did you ever solve the problem of enabling CMA for microblaze?

 

I do see that under Kernel Features that Contiguous Memory Allocator needs to be enabled, however it does not offer the ability to set the CMA size (video by John Linn showed that the CMA size could be set here),  In my case I try to set it on the kernel bootargs by setting CMA=512M but like you dmesg shows "0K cma-reserved".

 

Without CMA working I can only get dma_alloc_coherent() to work for very small sizes.

0 Kudos
Observer paulburnsuk
Observer
4,686 Views
Registered: ‎08-25-2016

Re: Very large streaming DMA transfers with Microblaze/Petalinux and dma_proxy driver

I never got CMA working. It would appear that CMA support for Microblaze is lacking.

And the original problem I had with using ioremap() was that it needs the physical address to be mapped into kernel virtual address space and that doesn't happen if you limit the physical memory visible to the kernel.

 

In the end I managed to enable physical memory reservation from the device-tree by simply patching arch/microblaze/mm/init.c to call early_init_fdt_reserve_self() and early_init_fdt_scan_reserved_mem() immediately before it reserves allocated blocks.

Then you just need to add a reserved-memory{} node to your device-tree to specify physical memory regions to be reserved. The kernel then maps page table entries for the reserved memory but is prevented from allocating it.

In principle, you should then be able to use device-tree lookups in your driver to find the allocated memory at run time. But since the dma_proxy driver is not written as an open platform driver, I just #defined the address into dma_proxy.c. Kludgy but it worked.

I call request_mem_region() to get the physical address and phys_to_virt() to get a kernel virtual address mapping (in place of kzalloc).

 

When it works, during boot, the "Memory:" line shows a corresponding increase in the amount of reserved memory.

 

Hope that helps.

0 Kudos