cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
athukoralakasun
Observer
Observer
9,968 Views
Registered: ‎10-18-2013

Using DMA from both ARM Cores

HI,

 

I am using ZC706 board with SDK2014.1. 

 

I am currently developing a bare metal application for both ARM cores. In that i need to use DMA from both cores. But i coud not figure out how to do this.

 

I used the example given in "xdmaps_example_w_intr.c"

and impliment that in a single core and it worked fine. But when i used the same code for the other Core as well, It did not work.

 

Can anyone tel me how to do this??

 

I have attached the Xilinx example code here.

 

Thank you.

0 Kudos
9 Replies
celeron2000
Adventurer
Adventurer
9,952 Views
Registered: ‎10-21-2013

DMA is just transmitting data between RAM and device via DMA controller (CDMA, AXI_DMA etc). IMHO you can start DMA transaction from every ARM core, using semaphores. First bee shure that all cores have access to FPGA resources.

It's bare metal application, isn't it?

0 Kudos
vkosar
Visitor
Visitor
9,865 Views
Registered: ‎09-17-2014

It is possible to use single DMA from both CPUs. For the AXI DMA there are two possible solutions:

  1. With interrupts:
    1. CPU0 allocates memory for RX and TX descriptor ring. Those rings should be allocated in noncached memory accesible by both cores.
    2. The RX and TX descriptor ring are initialized by the CPU0.
    3. RX and TX semaphore is initialized by CPU0, those semaphores must be in memory accesible by both cores.
    4. Interrupt handlers are provided by CPU0.
    5. The DMA is initilized by CPU0.
    6. The CPU0 zero informs CPU1 about addresses of descriptor rings and semaphores.
    7. CPUx locks the RX or TX semaphore, set up appropriate descriptors in the ring, updates the tail pointer and releases the semaphore.
    8. If interrupt is being handled by CPU0, the CPU0 locks appropriate semaphore, processes finished descriptors, releases the semaphore. Optionaly it can notify CPUx which initiated the DMA operation that the operation is finished.
  2. Without interrupts:
    1. CPU0 allocates memory for RX and TX descriptor ring. Those rings should be allocated in noncached memory accesible by both cores.
    2. The RX and TX descriptor ring are initialized by the CPU0.
    3. RX and TX semaphore is initialized by CPU0, those semaphores must be in memory accesible by both cores.
    4. The DMA is initilized by CPU0.
    5. The CPU0 zero informs CPU1 about addresses of descriptor rings and semaphores.
    6. CPUx locks the RX or TX semaphore, processes finished descriptors, set up appropriate descriptors in the ring, updates the tail pointer and releases the semaphore. Optionaly it can notify CPUx which initiated the DMA operation that the operation is finished.

The RX or TX locks must be held when accesing (write) the RX or TX ring and AXI DMA registers specific for RX or TX direction. Registers which are not specific for direction (eg. reset) must be accessed (written) only when both locks are held.

 

If the memory buffers are allocated in cacheable regions, then coresponding cache linies must be flushed/invalidated before write/read DMA operation is started.

 

Note that semaphore locking/unlocking is not a cheap operation and accesing the DMA from both CPUs can be slower  than accesing the DMA from single core.

 

          VK

 

--
Ph.D. student at Brno University of Technology | System Architect at RehiveTech spin-off company
0 Kudos
muzaffer
Teacher
Teacher
9,834 Views
Registered: ‎03-31-2012

>> Those rings should be allocated in noncached memory accesible by both cores.

why noncached memory? There is a cache controller with snooping so the processors would get the right data even if it is cached, no?
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
vgokhale
Explorer
Explorer
9,804 Views
Registered: ‎10-25-2011

>> There is a cache controller with snooping so the processors would get the right data even if it is cached, no?

 

No. Cache controllers only snoop the processor bus which includes any data going to/from the private and shared caches. They do not snoop the bus to main memory. So if a hardware peripheral were to change the contents of main memory and if the data at that location was cached in one of the L1s, there would be no way for the processors to know that they now have a stale copy. For this reason, any memory accessed by the PL through the AXI DMA should not be cached.

0 Kudos
muzaffer
Teacher
Teacher
9,793 Views
Registered: ‎03-31-2012

>> any memory accessed by the PL through the AXI DMA should not be cached

 

what if the axi master is connected to the ACP of the interconnect?

 

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
vgokhale
Explorer
Explorer
9,752 Views
Registered: ‎10-25-2011

The ACP will ensure coherency but then you are limited to transfering 512kB of data. Using DMA the "traditional way", i.e. accessing data in system memory will let you access a larger chunk of data.

0 Kudos
muzaffer
Teacher
Teacher
9,732 Views
Registered: ‎03-31-2012

>> The ACP will ensure coherency but then you are limited to transfering 512kB of data.

 

please clarify how this limitation arises. references to trm would be helpful.

 

>> Using DMA the "traditional way", i.e. accessing data in system memory will let you access a larger chunk of data.

 

how does ACP not allow one "access data in system memory" ? (other than trashing the cache a little bit :-)

 

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
vgokhale
Explorer
Explorer
9,726 Views
Registered: ‎10-25-2011

>> please clarify how this limitation arises. references to trm would be helpful.

 

The cache is 512 kB. To acess larger chunks, you'd have to load them from memory and add cache access latency, however small, on top of DDR DMA latency. 

 

>>how does ACP not allow one "access data in system memory" ?

My statement does not say this. 

 

If you are going to frequently access data > 512 kB, you'd be better off using AXI DMA over the HP ports. 

0 Kudos
muzaffer
Teacher
Teacher
9,716 Views
Registered: ‎03-31-2012

>>how does ACP not allow one "access data in system memory" ?

My statement does not say this. 

 

 

you said: "ACP ... but you are limited to transfering 512kB of data. Using DMA ... will let you access a larger chunk of data."

 

as you accept with acp one is NOT limited to 512KB, one can actually access all of the address space the processor or any DMA controller can access. The only potential issue with acp is cache trashing which also can be solved. There is a reason next generation zynq has a fully coherent bus.

 

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos