Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎08-24-2017

C2H Streaming XDMA Linux Driver Broken

Jump to solution

I have spent the last several weeks fixing issues in the XDMA Linux driver (source here:  Specifically, my application requires C2H streaming mode, which I found to be extremely buggy.  The code seems to work for the default parameters of a single transfer of 1024 bytes, but the second you want to do multiple transfers and/or transfers larger than 4096 bytes (at least on my machine), things break in spectacular fashion.


Studying the source code, it appears that the AXI-MM mode was written first and then later on the streaming mode (and the associated circular buffer) were added (this is based on seeing the different code styles in the source code) and the resulting code was not written well.  As a software engineer, I would be embarrassed if my company put code like this out publicly.  Don't get me wrong, I have seen worse code, but never from a company as large as Xilinx.  Was this code ever peer reviewed?  How was it released without even basic (e.g., transfer count greater than 1) testing/QA?


Specifically, I found the following issues:


1. The code assumes that the rx_buffer will always have so many "blocks".  However, this ignores the fact that pci_map_sg() has the ability to combine adjacent memory regions.  On my machine, this occurs, which caused an error to be tripped since the size of the descriptor does not match the "expected" size.

2. The rx_buffer does not use coherent memory and no where is there an effort made to call the appropriate PCI functions to sync the buffer to the CPU.  As a result, on certain machines such as ARM, data corruption is a guarantee.

3. There are obvious places where code is copy/pasted and not properly edited.  Line 4026 in xdma-core.c is a prime example (wrong pointer checked for NULL).  Another place is in char_sgdma_read() (also in xdma-core.c), where several lines are repeated.

4. The read() syscall for streaming mode ignores the size of userspace buffer and instead will read until EOP (buffer overflows anyone?).  How is the userspace application supposed to know how many bytes happen between EOPs?  Also, even if this logic made any sense, it is not implemented properly.  The eop_found flag is set once and then never cleared, so after the first EOP, the code always thinks there is data ready.

5. The overrun condition is never reported to userspace and the logic for it is not properly implemented.  My personal favorite is the while loop on line 3545 of xdma-core.c, which will loop forever (in kernel space) if the overrun condition is detected.

6. BUG_ON is overused and abused throughout the code.  It should not be used for recoverable errors such as failed memory allocations, where instead something like ENOMEM should be returned to userspace.  Line 600 in xdma-core.c is a prime example.


I am only bringing this up to the attention of Xilinx so that the code can be fixed and that other people/organizations do no need to struggle like I did (it was fun explaining to my management why we had to add weeks to our development cycle so that I can re-work and fix a vendor's code).  I have since addressed these issues in my fork of the XDMA driver and things are working.

39 Replies
Registered: ‎04-27-2018

I am using a custom board with an NVidia Tegra TK1 SoC on it. Linux is 3.10.40 or so. The board is mated to a Xilinx Kintex Ultrascale FPGA Dev board. The fpga has an XDMA core where the S_AXIS_C2H0 and S_AXIS_H2C_0 are tied together (for loopback). The v2_xdma driver works for reliable data transmission (now, after I modified it).


If you're asking because you want to fix the 20180420 release of your driver on my setup I'm delighted to assist.


0 Kudos
Xilinx Employee
Xilinx Employee
Registered: ‎05-10-2018

Hi, dashxdr,


Thanks for sharing the information. Currently, XDMA does not support the embedded platform, such as the one you using. We only support X86. However, we will progressively add more platform support in coming releases. Please stay tuned.

Registered: ‎05-18-2018


What is the schedule for ARM platform porting ? 

Thank you.

0 Kudos
Not applicable

Even tho the driver was developed by an intern, Intel's PCIe dma IP core and driver are just amazing.

Xilinx doesn't even provide a proper memory map for the BAR control. Such a mess and wasting time.

0 Kudos
Registered: ‎12-22-2014

Is there any progress on this issue ?

I am facing the same behaviour when runnig the tests with the xdma driver from

I am running ubuntu 18.04 LTS on a x86_64 host.

The script runs smoothly, but the runs at the most once, and then different things may happen if I run the script again, in the best case I just get fake numbers of getting transfers with a data rate of 1.000, and in the worst case the system gets automatically rebooted

Following are the last lines displayed when running the, and then  the log when I attempt to execute the script, (after these messages, the host stopped responding and automatically rebooted)

./ 2 2
(data duty cycle = 88%)
perf.transfer_size = 4096 bytes
perf.iterations = 3436007
(data transferred = 14073884672 bytes)
perf.clock_cycle_count = 498515472
perf.data_cycle_count = 439809087
(data duty cycle = 88%)
data rate ***** bytes length = 4096, rate = 0.882238
perf.pending_count = 0


Info: Number of enabled h2c channels = 2
Info: Number of enabled c2h channels = 2
Info: The PCIe DMA core is memory mapped.
Info: Running PCIe DMA memory mapped write read test
transfer size: 1024
transfer count: 1
Info: Writing to h2c channel 0 at address offset 0.
Info: Writing to h2c channel 1 at address offset 1024.
Info: Wait for current transactions to complete.
/dev/xdma0_h2c_1, W off 0x400, 0xffffffffffffffff != 0x400.
/dev/xdma0_h2c_0, W off 0x0, 0xffffffffffffffff != 0x400.
write file: Unknown error 512
write file: Unknown error 512
Info: Writing to h2c channel 0 at address offset 2048.
Info: Writing to h2c channel 1 at address offset 3072.
Info: Wait for current transactions to complete.

0 Kudos
Registered: ‎12-28-2018
0 Kudos
Registered: ‎10-19-2015

@dashxdr, did you post the updates to v2_xdma to gitlab?  I'm seeing a mess as well for C2H. 

H2C performance was bad with the blocking approach per packet and had to be modified to work at a decent rate.  Sounds like Xilinx isn't going to do anything about it.  FYI, Northwest Logic (NWL) has some PCIe IP that works with Xilinx PCIe hard IP core, includes drivers and provides better capability.  Looks like we'll go that way next time if we don't switch in the middle of this project because of all the XDMA driver issues.  The XDMA engine HW also has a nubmer of issues with the documentation and general approach to pacing.

0 Kudos
Registered: ‎12-28-2018
Xilinx technical support really rubbish, sooner or later collapse.I'm switching to an Intel fpga
0 Kudos
Registered: ‎09-10-2019
Intel's IP core and their driver are amazingly easy to understand and to use.
0 Kudos
Registered: ‎11-14-2019


This is a bit late, but I solved the "Scheduling while atomic" bug as referenced in your quote here:

"I was prepared to start down that path. Then bkuschak pushed the v2_xdma driver. So I invested enough time in that driver to realize it has a missing piece. Meanwhile Xilinx released April 20 or so and what a wonderful world it would be all our problems are solved. Except that driver just kernel panics, it seems to have its own set of fundamental flaws. The panic I found was "BUG: scheduling while atomic" appeared with a stack trace. Like inside an interrupt service handler the kernel driver tries to schedule() or whatever... suggesting to me "OMG these guys blew it again". So if I am going to fix Xilinx's stuff in any case, maybe it's better to mess with v2_xdma which at least has been used by someone."


It seems this is not a ARM related issue but a kernel version issue.  For example on X86, we use a 3.10 kernel and on the ARM I was using a 4.9 kernel.  The bug is because in the xdma_xfer_submit function (from Xilinx version 2 of the driver), they lock a spin_lock called desc_lock and then they go to sleep until the transfer is complete.  This is not allowed in later kernels.  If you change desc_lock to a semaphore the problem will go away.  You can not lock a spin lock and go to sleep.  You can do a semaphore and go to sleep.  I can see in the kernel tree when this BUG was enabled, so I know it is not ARM related.

For me, once this was fixed, version 2 of the Xilinx driver worked fine for AXI-Memory-Map H2C and C2H, on X86 or ARM.  I have done no testing for AXI-Streaming.  In my opinion version 2 of the Xilinx Driver is much more thorough than version 1 and it is professionally written. 

You are probably committed to the version put out by WZ but hopefully this helps anyone else who might want to use the actually Xilinx Example Driver (v2).