04-22-2021 12:44 AM - edited 04-22-2021 01:06 AM
for a high-performance data transfer, a Xilinx AXI DMA is used to transfer streaming data (S2MM only, single-channel, SG enabled) to the PS DDR.
The PS is running Linux and a custom Linux Driver (heavily inspired by the Xilinx axidmatest.c application, which in term seems to be a modified version of the vanilla dmatest.c) takes care of setting up the SG list.
The stream to the DMA is formatted so that TLAST is inserted in the stream when reaching the driver specified buffer size (SG Buffer size) and TVALID is asserted only AFTER the cores SG list is issued, however, the core still locks-up (TREADY low) after a couple of BDs.
Within the ISR of the Xilinx DMA Driver, the following error is reported, which seems to correspond to SGIntErr in the S2MM_DMASR register (pg021):
xilinx-vdma 40400000.dma: Channel (ptrval) has errors 100, cdr f049180 tdr f049600
According to PG021:
"This error occurs if a descriptor with the Complete bit already set is fetched. This indicates to the SG Engine that the descriptor is a tail descriptor."
This makes me believe that there is something wrong with the BD's or how they are set up, but I was not able to find anything wrong.
I do not touch the BDs during DMA, only after completion (same as the Xilinx driver)...
With a single BD in the list, everything works. Also if multiple BDs are in the list the driver works IF the blocksize is low or if the time between issuing the BDs is very high ~100ms.
The core not always gets stuck at the first BD in the List, however, the package (tlast and length) in the PL always looks correct.
Filling only half of the provided buffer also makes the DMA stuck.
Each trigger (red) marks the beginning of a transfer of one SG list. The following shows the beginning of the last three SG lists and the one that gets stuck (infinite).
The last transfer will never finish due to TREADY going low.
Any ideas on how to debug this further or what might cause the issue?
04-22-2021 05:18 AM
Just shooting in the dark here: The S2MM controller has a problem (bug) whereby it can accept data prior to being configured. This data can then lock up the design later on once it has been configured. The SGDMA is (likely) just a wrapper around the S2MM driver. So, my suggestion might be to dry holding VALID low for about 5-10 cycles after VALID && READY && LAST to allow the SGDMA driver to reprogram the S2MM, and then try again.
04-29-2021 02:28 AM
I've fight SG-DMA myself for few weeks. Here are my observations:
1. SG-DMA can fetch descriptors on-block and not after one descriptor is done. I've seen exactly behaviour like you describe -- it runs a few BD, software take care of "complete bit", but after few runs Sg-DMA fetches BD right after another. In this short time you have to free/clear your "complete bit". When you haven't done it yet, then the SG-DMA hangs with SGIntErr.
2. The most annoying fact of Sg-DMA -- you have no way to clear SGIntErr and work on next BDs! You have to reset the S2MM!
3. The second annoying fact of Sg-DMA: resetting the S2MM resets ALSO MM2S side of Sg-DMA!
4. When in Simple-DMA mode the IRQs are coming right after TLAST, in SG-DMA mode IRQs are coming as soon as "complete bit" is written and after the next BD readed.
To debug such an issue you have to look on SG-part (fetching and writting of SG-BDs). Then you have the confidence, that readed BD has "complete bit" cleared.