I have a driver roughly based on XAPP1129 to perform DMA transfers to my hardware. I'm performing timing analysis in the driver code using the timebase register with the Linux call get_tbl(). I'm seeing much longer TX times than RX times that don't really make sense to me. When I measure the transfer time in hardware from the LocalLink SOF to EOF for TX, it comes out to around 1060 cycles at 100MHz for 4096 Bytes in my setup which is expected. Roughly the same time is measured in hardware for RX. However, when I measure in the kernel, the TX time is taking much longer then that RX time. I measure from after XLlDma_BdRingToHw to the time the TX interrupt comes back from the DMA controller indicating completion. For RX, I measure from the time I tell the hardware to send up a result via a PLB register write to the time the RX interrupt arrives from the DMA controller. For a 4096 Byte packet, I am seeing ~18500 cycles TX vs ~5500 cycles RX. If any one has an insight as to the large difference between the two, it would be greatly appreciated.
I worked on this some more over the weekend, still didn't figure out what's going on. One thing I tried was to poll the DMA status register until the DMA_COMPLETE bit after giving the hardware control of the buffer descriptor for TX. It seems however, that this polling time is always constant and not dependent on the size of the buffer being transferred. From UG200, this bit should be set when the buffer length decrements to 0. The polling only amounts to a few hundred cycles at 400MHz, and the TX interrupt still takes a long time to come back following this. I'm still looking for advice from anyone who has had any similar experiences.