01-16-2019 05:50 AM
I'm trying to use an AXI DMA with SG in cyclic mode on a Zynq-7020 device to transfer data from PL to DDR (S2MM). Based on existing material I've successfully configured the peripheral but to get normal operation in cyclic mode I have to match the BD ring size with the allocated BD ring number, even if I'm pointing back to the first BD in my last BD. For instance :
Thus, my questions are : can we use cyclic mode by simply poiting back to the first BD in the last BD ? does is mandatory to feet the BD ring size with the number of BD allocated for a cyclic transfer ?
01-24-2019 02:59 PM
To initiate cyclic mode three things must happen in addition to regular SG:
1. Tail BD points to the First BD (page 72) (yes this is correct)
2. Tail Descriptor register is programmed with a value not in the BD chain (page 72)
3. Turn on the Cyclic BD Enable bit (page 26)
The specification does not require sending the BD ring size when doing a Cyclic mode transfer. All the pages noted above come from PG021: LogiCORE IP DMA Product Guide which talks about how to setup Cyclic DMA Mode, mainly on page 72.
An example design that has Cyclic DMA Mode enabled is available on either the Xilinx GitHub as xaxidma_example_sgcyclic_intr.c or can be found on Windows (with an SDK install) at: C:\Xilinx\SDK\<SDK_version>\data\embeddedsw\XilinxProcessorIPLib\drivers\<AxiDma_version>\examples\xaxidma_example_sgcyclic_intr.c.
That design relies on the AXI DMA Bare Metal driver which has useful documentation on ring management. The driver can be found at C:\Xilinx\SDK\<SDK_version>\data\embeddedsw\XilinxProcessorIPLib\drivers\<AxiDma_version>\src\xaxidma.h.
02-06-2019 07:19 AM
In the design example you are talking about the amount of allocated BD for the cyclic transfer is equal to the BD ring size, therefore in this situation it works well.
In my case if I'm creating 10 BD chain, with the tail BD pointing back to first BD, the tail descriptor register pointing out of the BD ring and cyclic bit enabled, I still need to adjsut the BD ring size to 10, otherwise it stops after 10 transfers.
// Create BD ring (2048 BD) uint16_t bd_count = XAxiDma_BdRingCntCalc(XAXIDMA_BD_MINIMUM_ALIGNMENT, RADIO_IF_DMA_RX_BD_HIGH_S2MM - RADIO_IF_DMA_RX_BD_BASE_S2MM + 1); bd_count = 10; // FIXME : why not 2048 ? why need to be set to 10 for 10 BD cyclic transfers ? status = XAxiDma_BdRingCreate(s2mm_bd_ring, RADIO_IF_DMA_RX_BD_BASE_S2MM, RADIO_IF_DMA_RX_BD_BASE_S2MM, XAXIDMA_BD_MINIMUM_ALIGNMENT, bd_count);
02-12-2019 09:55 AM - edited 02-12-2019 10:00 AM
Again, regarding the DMA SG cyclic mode with interrupt, I found that the example is a little bit dummy since the allocated BD number is equal to the total BD ring size. Based on what I can read from the documentation on page 70 "The last descriptor in the chain then points back to the first descriptor in the chain" so whatever is the configuration, since tail descriptor points out of the chain, the DMA will indeed loop until the end condition is reached... In my case I'm using the same code but the total BD ring size (2048) is larger than the allocated BD number (10) for the cyclic transfer and after the first cycle is done the IOC interrupt occurs but no BD are returned from hardware in the interrupt callback. However, if the total BD ring size is exactly equal to the number of allocated BD for the cyclic transfers (10) I can indeed read the completed BD at each IOC interrupt. There is just another strange behavior with the coalesce parameters (set to 2, IOC every 2 transfers during first cycle as expected but then IOC every 10 transfers...).
Please be aware also that I read the documentation many times...
02-12-2019 02:25 PM
Can I ask what the purpose of creating a large (2048) BD Ring without utilizing the full ring would be? If the BD Ring is allocated but not used then it wastes resources, which seems like a bad thing.
That Xilinx provided driver, although useful in some situations, may not be adequate to all scenarios. I believe in your case, the provided driver doesn't seem like it will meet your design criteria, but the option to create a similar driver is available. I think beyond understanding how the provided driver works, you may need to create a different driver for your purposes. The xaxidma.h file contains a lot of good documentation of how the Xilinx provided driver manages the BD Rings and various related pointers. This would be useful if building your own driver was the way in which you wanted to proceed.
In addition, how are you changing the driver to have a total BD ring size of 2048, but only allocate 10 BDs for the cyclic transfer? Since the driver doesn't seem to be created with that use case in mind, I do not see how to easily get that task to work.
To your last point about the coalesce parameters, I believe you are seeing something which I think has a simple fix. In the xaxidma_bdring.c file, when the BD Ring is brought from HW the completed bit is set and never cleared. Cyclic Mode ignores that bit, but the way the driver expects to check if a BD is done is by that completed bit. If, after you check and add completed BDs to BdCount (line 1305 would be a good place to add code), check if the BD Ring is in Cyclic Mode. If so, at that point it should be acceptable to clear the completed bit in the BD Status register. Make sure to flush the cache after writing the Status register to memory.
02-13-2019 01:40 AM - edited 02-13-2019 02:19 AM
Thanks for the quick reply.
The reason I was using 2048 BD ring for a 10 BD cyclic transfer is that the amount of BD is for me a parameter which is defining the granularity of the transfers. In practice, let say I want to capture 1000 ms of ADC data with block of 10 ms, then I'm defining a cyclic chain of 10 BD of 100 ms each, but if I want block of 10 ms then I'm defining a cyclic chain of 100 BD (constant memory space allocation). Thus it's actually just a matter of firmware architecture and I can deal with it. For now, my best workaround is as you suggest to fix the BD ring size to the BD chain length.
Regarding the coalesce parameters, I've followed your recommendation and it's indeed what I was observing. The fix I'm using in xaxidma_bdring.c at line 1305 is :
// Clear BD completed flag XAxiDma_BdWrite(CurBdPtr, XAXIDMA_BD_STS_OFFSET, BdSts & ~XAXIDMA_BD_STS_COMPLETE_MASK); Xil_DCacheFlushRange(CurBdPtr, 64);
Now at each IOC interrupt the return BD count is equal to the actual transfers count and no more the count of the first cycle.
02-13-2019 08:41 AM
To your example, wouldn't you need to define your granularity before even defining the BD ring? Even before writing firmware? The Cyclic mode ring would go through the same set of BDs (a set of 10 in your first example, or a set of 100 in your second), but would not be changing on the fly even in both your examples. Is that correct?
I don't believe on the fly change of size of the BD ring would work, but if you wanted different rings you could probably create a new BD ring and manage two (or more) at the same time. That would again require more work from a SW perspective and tricky, but theoretically doable.
One more note about the coalescing and correctly counting BDs. There should be a check to determine if you are in Cyclic Mode (the RingPtr struct contains a value called Cyclic that tells this information). In your application I don't think this is a problem since you are always in Cyclic Mode, but if not in Cyclic mode I don't believe it would be appropriate to clear the Completed bit.
It sounds like you might have a solution for now. May I ask if one of my answers provided a solution (if so an accepted solution could be potentially useful for others)? If you found the answer you were looking for yourself, could you share that for others (and accept that solution)?
02-13-2019 09:20 AM
I think Caleb's point about the limited utility of the Xilinx driver should be considered. The ring pointer structure and the interupt handling routines don't anticipate what you are trying to do. In cyclic mode, completed or stale descriptors aren't pertinent; the core is just fetching BD's ( and stopping if it fetches one that matches the tail descriptor ). Your interrupt handler should still address the interrupt status via e.g. XAxiDma_BdRingGetIrq and XAxiDma_BdRingAckIrq BUT note that these4 calls take a ring pointer as an argument. You can do this bare knuckled:
// Read pending interrupts
// irq_status = XAxiDma_BdRingGetIrq(JESD_bd_ring_ptr);
irq_status = *(u32*) (0x41E10034) & 0x00007000; // Optimized
// Acknowledge pending interrupts
// XAxiDma_BdRingAckIrq(JESD_bd_ring_ptr, irq_status);
*(u32*) (0x41E10034 ) = irq_status; //Optimized
That snippet is from a microblaze system with the DMA core mapped at 0x41E10034.
However, as could be argued in a typical ( ethernet ) example, the ring pointer structure has multiple pointers ( HwHead, etc ) that facillitate keeping track of what's been processed.
I think you might be able to twist the Xilinx provided data structures to your goal, but you might just be better off rolling your own. The core is pretty single minded about fetching and processing BDs. I suspect the issue can be resolved with a proper interrupt handler.