cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
petervuto
Observer
Observer
1,966 Views
Registered: ‎07-08-2017

AXI read/write checklist?

Jump to solution

I have an IP that reads some data from DRAM, then calculates and writes back an output. This is accomplished over an AXI Interconnect connected to the slave HP port of the processing system.

 

There's a working baremetal test for the IP. It simply declares a source and destination array and passes their addresses over to the IP using Xil_Out32(). No DMA engines, no nothing. Trying to port it over to a linux kernel module, I naively took the same approach: declare two arrays, pass their virt_to_phys()-converted addresses to the IP and wait for the IP to write its output. Doesn't work. 

 

What other linux-specific adjustments could I be missing, aside from the virt_to_phys() translation? I see people bringing up device tree a lot, but as far as I understand that's only an optional thing that lets you avoid hardcoding the base IP address.

0 Kudos
1 Solution

Accepted Solutions
petervuto
Observer
Observer
2,544 Views
Registered: ‎07-08-2017

I have solved my problem. Thanks again to hpoetzl for sending me in the direction of /dev/mem, which confirmed that my problem was with caching.

 

Perhaps it will help others with similar issues in the future if I list what I've learned:

 

 

If you pass a physical address to the PL, it has to get flushed before starting DMA and then invalidated after the transfer has completed but before the PS reads. This is what you would do for baremetal, but I have been unable to find an exported function that does this for Linux - there were a couple finds in other forum threads with the same issue but none worked for me. 

 

The usual and correct solution to this is to use the DMA API, which takes care of this for you. I had seen this recommended many times, but I wasn't sure how to use it and whether it works for IPs that consist on Interconnects only.  My main question was, for all of the functions in the IP, you need to pass a device struct - how would I obtain this?

 

I found an example at:

http://mercury.pr.erau.edu/~siewerts/extra/code/operating-systems/EXAMPLES/Cooperstein-Drivers/s_23/lab1_dma.c

 

Turns out, for the alloc functions (pool and dma_alloc_coherent), you can just pass NULL for the device. For the alternative, streaming approach, you can create a device from scratch. In case the link goes down:

 

#include <linux/device.h>
static void my_release(struct device *dev)
{
	pr_info("releasing DMA device\n");
}
static struct device dev = {
	.release = my_release
};

...
dev_set_name(&dev, DEVICE_NAME);
device_register(&dev);

The entire device tree thing was a red herring for my issue. All I needed were these four lines, and now I had a device to pass to dma_map_single(). Both the streaming [dma_map_single()] and alloc [dma_alloc_coherent()] approaches worked perfectly.

View solution in original post

0 Kudos
6 Replies
hpoetzl
Voyager
Voyager
1,926 Views
Registered: ‎06-24-2013

Hey @petervuto,

 

I naively took the same approach: declare two arrays, pass their virt_to_phys()-converted addresses to the IP and wait for the IP to write its output.

This should work, as long as the physical addresses are reachable from the PL side (which they usually are).

 

I see people bringing up device tree a lot, but as far as I understand that's only an optional thing that lets you avoid hardcoding the base IP address.

Correct the device tree is not required to make it work.

 

I would suggest to simply reserve a little memory space in Linux (e.g. via kernel command line or bootloader) and manually write the addresses with devmem/devmem2 into the (AXI Lite?) config registers for your IP. Then check if the memory specified gets read from/written to.

 

Hope this helps,

Herbert

-------------- Yes, I do this for fun!
petervuto
Observer
Observer
1,919 Views
Registered: ‎07-08-2017

Thank you, that was a quick and very helpful response. I'll try your suggestion out and write back later. Yeah, the control registers are AXI-Lite, slaved to GP. That part seems to work fine.

0 Kudos
petervuto
Observer
Observer
1,806 Views
Registered: ‎07-08-2017
It works when I write and read the data manually with devmem, so I'm pretty sure I'm not flushing the caches correctly (AXI over HP port). I used flush_cache_vmap(virt_addr_start, virt_addr_end), but it must be the wrong function for memory allocated with kmalloc().
0 Kudos
hpoetzl
Voyager
Voyager
1,794 Views
Registered: ‎06-24-2013

Hey @petervuto,

 

I used flush_cache_vmap(virt_addr_start, virt_addr_end) ...

flush_cache_vmap() is used when creating mappings (eg, via vmap, vmalloc, ioremap etc) in kernel space for pages.

 

For DMA memory you want to use dmac_flush_range() and for paged memory flush_dcache_page() but note that proper use of the DMA API should already take care of the cache flushing and invalidation.

 

Best,

Herbert

-------------- Yes, I do this for fun!
petervuto
Observer
Observer
1,769 Views
Registered: ‎07-08-2017

Hi @hpoetzl,

 

I tried dmac_flush_range(), but am getting a "missing symbol" error, presumably because it is not exported. Which means that I do have to use the DMA API. The calls there require a device struct, which it seems I have to get by setting device tree correctly.

 

Another reason I didn't use the DMA API, beside that complexity, was that I'm not sure if all AXI read/writes can be handled by the DMA API or just ones that use also use DMA IPs in the design (like Xilinx® LogiCORE™ IP AXI Direct Memory Access (AXI DMA) core, which I'm pretty sure is not in the design).

0 Kudos
petervuto
Observer
Observer
2,545 Views
Registered: ‎07-08-2017

I have solved my problem. Thanks again to hpoetzl for sending me in the direction of /dev/mem, which confirmed that my problem was with caching.

 

Perhaps it will help others with similar issues in the future if I list what I've learned:

 

 

If you pass a physical address to the PL, it has to get flushed before starting DMA and then invalidated after the transfer has completed but before the PS reads. This is what you would do for baremetal, but I have been unable to find an exported function that does this for Linux - there were a couple finds in other forum threads with the same issue but none worked for me. 

 

The usual and correct solution to this is to use the DMA API, which takes care of this for you. I had seen this recommended many times, but I wasn't sure how to use it and whether it works for IPs that consist on Interconnects only.  My main question was, for all of the functions in the IP, you need to pass a device struct - how would I obtain this?

 

I found an example at:

http://mercury.pr.erau.edu/~siewerts/extra/code/operating-systems/EXAMPLES/Cooperstein-Drivers/s_23/lab1_dma.c

 

Turns out, for the alloc functions (pool and dma_alloc_coherent), you can just pass NULL for the device. For the alternative, streaming approach, you can create a device from scratch. In case the link goes down:

 

#include <linux/device.h>
static void my_release(struct device *dev)
{
	pr_info("releasing DMA device\n");
}
static struct device dev = {
	.release = my_release
};

...
dev_set_name(&dev, DEVICE_NAME);
device_register(&dev);

The entire device tree thing was a red herring for my issue. All I needed were these four lines, and now I had a device to pass to dma_map_single(). Both the streaming [dma_map_single()] and alloc [dma_alloc_coherent()] approaches worked perfectly.

View solution in original post

0 Kudos