cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
dwd_pete
Visitor
Visitor
11,744 Views
Registered: ‎08-24-2017

C2H Streaming XDMA Linux Driver Broken

Jump to solution

I have spent the last several weeks fixing issues in the XDMA Linux driver (source here: https://www.xilinx.com/support/answers/65444.html).  Specifically, my application requires C2H streaming mode, which I found to be extremely buggy.  The code seems to work for the default run_test.sh parameters of a single transfer of 1024 bytes, but the second you want to do multiple transfers and/or transfers larger than 4096 bytes (at least on my machine), things break in spectacular fashion.

 

Studying the source code, it appears that the AXI-MM mode was written first and then later on the streaming mode (and the associated circular buffer) were added (this is based on seeing the different code styles in the source code) and the resulting code was not written well.  As a software engineer, I would be embarrassed if my company put code like this out publicly.  Don't get me wrong, I have seen worse code, but never from a company as large as Xilinx.  Was this code ever peer reviewed?  How was it released without even basic (e.g., transfer count greater than 1) testing/QA?

 

Specifically, I found the following issues:

 

1. The code assumes that the rx_buffer will always have so many "blocks".  However, this ignores the fact that pci_map_sg() has the ability to combine adjacent memory regions.  On my machine, this occurs, which caused an error to be tripped since the size of the descriptor does not match the "expected" size.

2. The rx_buffer does not use coherent memory and no where is there an effort made to call the appropriate PCI functions to sync the buffer to the CPU.  As a result, on certain machines such as ARM, data corruption is a guarantee.

3. There are obvious places where code is copy/pasted and not properly edited.  Line 4026 in xdma-core.c is a prime example (wrong pointer checked for NULL).  Another place is in char_sgdma_read() (also in xdma-core.c), where several lines are repeated.

4. The read() syscall for streaming mode ignores the size of userspace buffer and instead will read until EOP (buffer overflows anyone?).  How is the userspace application supposed to know how many bytes happen between EOPs?  Also, even if this logic made any sense, it is not implemented properly.  The eop_found flag is set once and then never cleared, so after the first EOP, the code always thinks there is data ready.

5. The overrun condition is never reported to userspace and the logic for it is not properly implemented.  My personal favorite is the while loop on line 3545 of xdma-core.c, which will loop forever (in kernel space) if the overrun condition is detected.

6. BUG_ON is overused and abused throughout the code.  It should not be used for recoverable errors such as failed memory allocations, where instead something like ENOMEM should be returned to userspace.  Line 600 in xdma-core.c is a prime example.

 

I am only bringing this up to the attention of Xilinx so that the code can be fixed and that other people/organizations do no need to struggle like I did (it was fun explaining to my management why we had to add weeks to our development cycle so that I can re-work and fix a vendor's code).  I have since addressed these issues in my fork of the XDMA driver and things are working.

1 Solution

Accepted Solutions
dwd_pete
Visitor
Visitor
12,399 Views
Registered: ‎08-24-2017

Two weeks passed and no response from Xilinx... looks like I now know how seriously they take their software support.

 

Intel/Altera it is from now on.

View solution in original post

39 Replies
hbucher
Scholar
Scholar
11,229 Views
Registered: ‎03-22-2016

@dwd_pete I hit a few of these glitches along the way. The AR says it has been updated for 2016.4. 

I saw yet another project on github

https://github.com/RHSResearchLLC/XilinxAR65444

I'd be interested in knowing how your patches differ. 

Would you be willing to put the code on github?

 

 

vitorian.com --- We do this for fun. Always give kudos. Accept as solution if your question was answered.
I will not answer to personal messages - use the forums instead.
0 Kudos
bkuschak
Adventurer
Adventurer
11,109 Views
Registered: ‎10-01-2013

@dwd_pete would you be willing to make your fork of this driver available?  It seems that I'm following the same path right now..

 

0 Kudos
dwd_pete
Visitor
Visitor
11,100 Views
Registered: ‎08-24-2017

Unfortunately, I am not at liberty to release the source code at this time.

 

0 Kudos
dwd_pete
Visitor
Visitor
11,058 Views
Registered: ‎08-24-2017

I am not only asking "questions to a community of volunteers", but this is also the official way to contact Xilinx support regarding such matters (the sales engineer at the distributor I am working with suggested I post here).  Is it not helpful to the community to let them know that there are glaring bugs in the official Xilinx provided code?

 

I don't want to get into the gory legal details regarding my current arrangement, but suffice it to say, someone paid me to fix this driver to get their product to work and the code is not mine to release publicly.  Said people would be rather upset if all of a sudden their competition got the same code for free.  I know that they are still working out licensing and distribution details, so if the code is ever available in a public repo, I'll let you know.  In the meantime, I at least wanted to make people aware of these issues so maybe Xilinx will patch the driver (and actually provide usable, working code) and all these open source fixes to XDMA will be a moot point.

dwd_pete
Visitor
Visitor
11,055 Views
Registered: ‎08-24-2017

Also, I can't speak to the quality or viability of this code, but I had this repo pointed out to me by a colleague: https://gitlab.com/WZab/Artix-DMA1

0 Kudos
bkuschak
Adventurer
Adventurer
11,021 Views
Registered: ‎10-01-2013

I understand that you have restrictions on code release at this time.  Hopefully your employer will see the value of getting these fixes integrated into the open source code, if for no other reason but to save you the time and effort of merging your fixes later.

 

Thanks for the link to this other driver.   I'll take a look. 

Unfortunately my KCU105 just died (power-good failed on the MGT rails) so I'll probably be down for a while. 

 

0 Kudos
dwd_pete
Visitor
Visitor
12,400 Views
Registered: ‎08-24-2017

Two weeks passed and no response from Xilinx... looks like I now know how seriously they take their software support.

 

Intel/Altera it is from now on.

View solution in original post

venkata
Moderator
Moderator
10,817 Views
Registered: ‎02-16-2010
@dwd_pete, your queries are getting reviewed by us. We will update you as soon as we have some update.
------------------------------------------------------------------------------
Don't forget to reply, give kudo and accept as solution
------------------------------------------------------------------------------
0 Kudos
bethe
Xilinx Employee
Xilinx Employee
10,592 Views
Registered: ‎12-10-2013

Wanted to update the thread and let you know that your feedback was delivered to the driver development team and will be incorporated in upcoming releases.  We do appreciate the feedback.

 

As a note, this is a reference driver, and so certain areas are written specifically to match the example design logic (streaming in particular -  of a direct feedback loop with no storage in between).  This means the driver does expect certain configuration settings to be true (the H2C and C2H to be matched in size).   The expectation of this driver is to provide a reference and demonstration of the interaction between the driver and FPGA, but not as a fully functional out-of-the-box driver for all possible user utilization. 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
10,099 Views
Registered: ‎01-22-2018

Has there been any resolution to these issues? I have encountered the same issues mention in this (and many other posts).  Thanks.

0 Kudos
dashxdr
Observer
Observer
10,020 Views
Registered: ‎04-27-2018

April 20 2018 there appears to be a new xdma driver package for linux here:

https://www.xilinx.com/support/answers/65444.html

I managed to build the xdma kernel module into xdma.ko but when I try to run tests I get this:

user.err kernel: [33631.264517] BUG: scheduling while atomic: dma_to_device/3019/0x00000002
user.warn kernel: [33631.273992] Modules linked in: xdma(O) [last unloaded: xdma]
user.warn kernel: [33631.282659] CPU: 1 PID: 3019 Comm: dma_to_device Tainted: G           O 3.10.40+g71cc3bf #45
user.warn kernel: [33631.296649] [<c0016370>] (unwind_backtrace+0x0/0x13c) from [<c0012c0c>] (show_stack+0x18/0x1c)
user.warn kernel: [33631.310898] [<c0012c0c>] (show_stack+0x18/0x1c) from [<c07ee984>] (__schedule_bug+0x58/0x6c)
user.warn kernel: [33631.325115] [<c07ee984>] (__schedule_bug+0x58/0x6c) from [<c07f5dbc>] (__schedule+0x7e4/0x89c)
user.warn kernel: [33631.339516] [<c07f5dbc>] (__schedule+0x7e4/0x89c) from [<c07f3e88>] (schedule_timeout+0x158/0x250)
user.warn kernel: [33631.354373] [<c07f3e88>] (schedule_timeout+0x158/0x250) from [<bf04cb40>] (xdma_xfer_submit+0x434/0x954 [xdma])
user.warn kernel: [33631.370362] [<bf04cb40>] (xdma_xfer_submit+0x434/0x954 [xdma]) from [<bf04fa7c>] (char_sgdma_read_write+0x2cc/0x380 [xdma])
user.warn kernel: [33631.387446] [<bf04fa7c>] (char_sgdma_read_write+0x2cc/0x380 [xdma]) from [<bf04fb4c>] (char_sgdma_write+0x1c/0x24 [xdma])
user.warn kernel: [33631.404393] [<bf04fb4c>] (char_sgdma_write+0x1c/0x24 [xdma]) from [<c01463a4>] (vfs_write+0xbc/0x19c)
user.warn kernel: [33631.419658] [<c01463a4>] (vfs_write+0xbc/0x19c) from [<c0146a54>] (SyS_write+0x60/0x178)
user.warn kernel: [33631.433900] [<c0146a54>] (SyS_write+0x60/0x178) from [<c000ed00>] (ret_fast_syscall+0x0/0x30)

So it seems like great, nice new version, but is it any improvement? Maybe fix the old one rather than release a new one that has a whole new set of problems?

 

The driver is running on an nvidia tk1 core with linux 3.10.40 if that is of any help. I'm not sure I even compiled it correctly. The Makefile didn't work out of the box. In include/libxdma_api.h I had to add

#include <linux/slab.h>

Thanks

 

 

 

 

0 Kudos
bkuschak
Adventurer
Adventurer
10,007 Views
Registered: ‎10-01-2013

As was suggested in a previous post, I have started using the v2_xdma driver from here: https://gitlab.com/WZab/Artix-DMA1 

 

You might want to give it a try.

 

I'm still in the early days of debugging my FPGA design, so I have not heavily tested this driver under load.  And I should say that I have seen the occasional kernel oops, kernel BUG(), and some lockups.  There's definitely room for improvement.  But I am able to get many MB of DMA data flowing from the FPGA to my userspace app, which I was unable to do with the stock Xilinx driver.  So many thanks to Wojtek for posting his work on this driver and responding to my questions!

 

I'm running Debian kernel 4.9.0-6-amd64.  I also had it working with an earlier kernel version on another machine.

0 Kudos
dashxdr
Observer
Observer
10,000 Views
Registered: ‎04-27-2018

@bkuschakwrote:

As was suggested in a previous post, I have started using the v2_xdma driver from here: https://gitlab.com/WZab/Artix-DMA1 You might want to give it a try.

 

I got it to compile by fixing wz-xdma.c line 542

            if(kfifo_put(&ext->kfifo,&db_desc))

The '&' was missing.

I added my PCI id to xdma-core.c in the pci_ids[] near the top:

        { PCI_DEVICE(0x10ee, 0x8024), },

The device files appear but my userspace test program doesn't work. It tries to do a large write to

/dev/wz-xdma0_h2c_0 one one process and a read of

/dev/wz-xdma0_c2h_0 on another process.

 

I tried your tester2 program but it fails

Can't open /dev/wz-xdma0_user: No such file or directory

 

0 Kudos
bkuschak
Adventurer
Adventurer
9,985 Views
Registered: ‎10-01-2013

Let me be clear that this isn't my driver code.  I'm just a user.  I'm only using C2H channels, not H2C.  

You said the /dev/wz-xdma0_user device file appears but the open fails? 

 

What do you get from these commands:

dmesg

lsmod |grep xdma

ls -l /dev/wz-xdma*

sudo lspci -vvv

 

I think the wz-xdma0_user device node is used for mapping the BAR for the PCIe to AXILite interface.  If your FPGA design doesn't implement that interface, perhaps the driver doesn't create that device node.  

 

I would recommend trying to get small reads and writes to work independently before running large transactions simultaneously.  I found it useful to add a VIO module to observe signals and do some interactive control and debugging while running the DMA code.

0 Kudos
dashxdr
Observer
Observer
9,979 Views
Registered: ‎04-27-2018

@bkuschakwrote:

Let me be clear that this isn't my driver code.  I'm just a user.  I'm only using C2H channels, not H2C.  

You said the /dev/wz-xdma0_user device file appears but the open fails? 



The h2c,c2h device files appear, the module is loaded (I'm not passing any special options to the insmod) but file access to the h2c and c2h files always comes back with -1 when I try to read or write.

 

There is no /dev/wz-xdma0_user file at all.

This is the lspci device:

01:00.0 Class 0700: Device 10ee:8024

 

reg_rw works:

root@jetson-tk07:~/dma_driver/tests# ./reg_rw /dev/wz-xdma0_control 0x000 w
argc = 4
device: /dev/wz-xdma0_control
address: 0x00000000
access type: write
access width given.
access width: word (32-bits)
character device /dev/wz-xdma0_control opened.
Memory mapped at address 0xb6fc8000.
Read 32-bit value at address 0x00000000 (0xb6fc8000): 0x1fc08006
0 Kudos
bkuschak
Adventurer
Adventurer
9,975 Views
Registered: ‎10-01-2013

You don't read() or write() those files to send/receive data.  You have to use ioctl() and mmap(). 

Attached is a simplified example of what I did. 

0 Kudos
dashxdr
Observer
Observer
9,942 Views
Registered: ‎04-27-2018

@bkuschakwrote:

You don't read() or write() those files to send/receive data.  You have to use ioctl() and mmap(). 

Attached is a simplified example of what I did. 


I tried the program, TOT_BUF_LEN is too big for the architecture (32 bit address space), so I make it 65536. I had to comment out the open of the _user file, and then the program returns -1 on

ioctl(fd_data, IOCTL_XDMA_WZ_ALLOC_BUFFERS, 0);

 

-Dave

0 Kudos
bkuschak
Adventurer
Adventurer
9,909 Views
Registered: ‎10-01-2013

-1 is EPERM.  It's telling you that you don't have permission.  You must run it as root.

 

        /usr/include/asm/errno.h
        #define EPERM 1 /* Operation not permitted */

 

If you still have problems, run dmesg to see the output from the driver.  Look at the driver source code to see where those message are printed.  If necessary, add more printk() messages to the code and recompile it.  That will help you work out why it is failing.  https://elinux.org/Debugging_by_printing

0 Kudos
dashxdr
Observer
Observer
9,901 Views
Registered: ‎04-27-2018

Actually that was helpful. The return value of ioctl is always -1 if it errors out, then you have to look at errno or do printf with %m in the string:

	ret = ioctl(fd_data, IOCTL_XDMA_WZ_ALLOC_BUFFERS, 0);
printf("ret(0) ret = %d %m\n", ret);

This is what I got:

ret(0) ret = -1 Cannot allocate memory

 

I am running as root on the target.

0 Kudos
bkuschak
Adventurer
Adventurer
10,230 Views
Registered: ‎10-01-2013

If you look at ioctl_do_wz_alloc_buffers() you will see several places where it can return ENOMEM.   Most likely it is failing because it tries to allocate 4GB of buffer space.

 

Change these constants in wz-xdma-consts.h to make them suitable for your target:

 

//Size of a single DMA buffer (MUST BE A POWER OF 2!)
#define WZ_DMA_BUFLEN (4*1024*1024)
//Number of allocated DMA buffers (MUST BE A POWER OF 2!)
//Now we try to use the whole buffer set - 4GB!
#define WZ_DMA_NOFBUFS 1024

dashxdr
Observer
Observer
10,207 Views
Registered: ‎04-27-2018

Hey, that works now (thanks!). I had to set WZ_DMA_NOFBUFS to 4 (1 didn't work for some reason). Now the test program you provided is inside the while() loop waiting for something to come in from the other side.

 

The example program opens up /dev/wz-xdma0_c2h_0. Would I need to write a similiar program to open /dev/wz-xdma0_h2c_0 to initiate a transfer?

 

The original xilinx driver I've been trying to get working provides read + write functionality on the /dev/xdma0_c2h_0 and /dev/xdma0_h2c_0 files. The XMDA xilinx engine is set up for loopback mode, so if I do a file write to /dev/xdma0_h2c_0 the data appears when I read back from /dev/xdma0_c2h_0.

 

It might be obvious what I need to do... I'll report in a bit...

Thanks again!

ETA: Hmmm, it's not clear how to initiate a transfer that this driver can receive. It's like maybe we'd need a matching ioctl for IOCTL_XDMA_WZ_PUTBUF and function...

 

 

 

 

0 Kudos
dashxdr
Observer
Observer
10,119 Views
Registered: ‎04-27-2018

OK it's clear the v2_xdma driver doesn't support a mechanism to WRITE data. So adding that would be an R&D effort.

 

The original Xilinx driver seems to be end of lifed. It works (somewhat) but suffers from the cache coherency problem.

 

The 20180420 Xilinx driver release is just dead on arrival as far as I can tell. There is no reason to assume Xilinx got it right this time either.

 

So: I'm going to pursue adding WRITE functionality to v2_xdma.

 

I am "Shocked...shocked!!!" At just how poor Xilinx is at this. Oh well.

 

venkata
Moderator
Moderator
10,114 Views
Registered: ‎02-16-2010
I would like to ask you few questions related to your observations with v2_xdma driver. Can you please provide answers to them?

Are you testing v2_xdma driver without editing?
Whether the driver failed when tested with IP example design?
What is the failure symptom?
In which Linux OS it is tested?
Which device XDMA design is targeted to?
------------------------------------------------------------------------------
Don't forget to reply, give kudo and accept as solution
------------------------------------------------------------------------------
0 Kudos
dashxdr
Observer
Observer
10,092 Views
Registered: ‎04-27-2018

v2_xdma driver from https://gitlab.com/WZab/Artix-DMA1/tree/master/v2_xdma/software we should clarify. It's not even an official xilinx release.

 

Whether the driver was used without editing: On the arm 32 bit address space target the driver didn't work because it was trying to mmap 4 gb. bkuschak helped resolve that issue. I just reduced the number of buffers to 4. #define WZ_DMA_NOFBUF 4

 

The driver doesn't provide a WRITE ioctl so I can't even test the driver in the first place. I felt some measure of progress just because the sample program bkuschak provided ran enough to get the part where it waits until some data is received. So I don't even know what "IP example design" refers to. My test setup has a xilinx FPGA dev board where there is an xdma core instantiated where the h2c and c2h are wired together. I'm trying to get a write to h2c to return the same data from a read from c2h.

 

The failure with the v2_xdma is that there is no way I can cause a write to h2c. The driver only provides read from c2h. So it looks like I have to figure out how to add the missing functionality to the v2_xdma driver. I did email the author.

 

It'd be nice to use a driver from Xilinx. But the driver I have that sort of works, derived from https://www.xilinx.com/Attachment/Xilinx_Answer_65444_Linux_Files.zip (with minor edits some other engineer did and I'm not sure of their history), has a problem. The problem with that driver is it is not written correctly, it doesn't properly allocate cache coherent pci memory. The target is an arm cpu. So although I can read and write 3M chunks of data, they randomly have bad 64-byte pieces where the L2 cpu cache wasn't flushed or whatever. So it's unusable. Reworking the driver would be a viable option, as some earlier fellow mentioned having already done (dwd_pete, page 1 of this topic). But he can't release his improvements to the driver because his company owns them. So... everyone has to reinvent the wheel I guess. Since Xilinx doesn't seem interested in fixing it...

 

I was prepared to start down that path. Then bkuschak pushed the v2_xdma driver. So I invested enough time in that driver to realize it has a missing piece. Meanwhile Xilinx released https://www.xilinx.com/Attachment/Xilinx_Answer_65444_Linux_Files_rel20180420.zip April 20 or so and what a wonderful world it would be all our problems are solved. Except that driver just kernel panics, it seems to have its own set of fundamental flaws. The panic I found was "BUG: scheduling while atomic" appeared with a stack trace. Like inside an interrupt service handler the kernel driver tries to schedule() or whatever... suggesting to me "OMG these guys blew it again". So if I am going to fix Xilinx's stuff in any case, maybe it's better to mess with v2_xdma which at least has been used by someone.

 

Linux OS: I think it's Yocto based. The kernel is 3.10.40, the target is the nvidia TK1. The xilinx xdma device probably doesn't matter much. The setup I'm working with has a xilinx kintek ultrascape FPGA dev board. The TK1 has an x4 connection to the FPGA core.

 

I'm happy to run tests, answer questions, whatever, to assist in Xilinx getting that rel20180420 driver working. Regarding the v2_xdma my take is Xilinx is just noise -- as in "You guys can't even fix your own stuff, now you're going to get involved in the community's efforts to perfect their repaired version of your stuff."

 

 

 

 

 

 

 

dashxdr
Observer
Observer
10,030 Views
Registered: ‎04-27-2018

@venkatawrote:
I would like to ask you few questions related to your observations with v2_xdma driver. Can you please provide answers to them?


I wish I could give negative kudos. Ok so you asked a bunch of questions, I answered them... any hint you might actually do something for me / us now?

 

Jeez

 

0 Kudos
dashxdr
Observer
Observer
9,978 Views
Registered: ‎04-27-2018

I forked the v2_xdma files into a github repo here:

https://github.com/dashxdr/v2_xdma

 

I managed to get some H2C write functionality but the whole issue of 64 byte dropouts on the received C2H data hasn't been fixed. (Ugh!)

 

I'm kind of running out of ideas. My theory is the ARM cpu cache is the source of the trouble, the XDMA core is writing to memory but the cache is in conflict so some of the data is either lost or ends up in the wrong place.

 

I'm just not sure all the cache consistency code is correct.

 

0 Kudos
xiyuex
Xilinx Employee
Xilinx Employee
9,839 Views
Registered: ‎05-10-2018

Hi, dashxdr,

 

If I understand correctly, you are currently facing some coherency (not consistency) issue on your ARM-based system based on a driver which was branched out from the original Xilinx XDMA driver.

 

  1. Please note that v2_xdma (https://gitlab.com/WZab/Artix-DMA1) is a driver branch created by a third party and Xilinx does not provide direct support for it.
  2. I understand that you were using Xilinx newly released XDMA driver on an ARM-based platform with an embedded OS. Please note that both of the original XDMA driver and the newly release XDMA driver (on April 20, 2018) only support X86 platform. At the moment, we do not support ARM-based platform.
  3. Most of x86 platforms provide hardware-based coherency where interconnect between LLC and main memory maintains coherence for users. Unlike x86, for most of ARM-based platforms, users are responsible for maintaining cache coherency in their software explicitly.

 

dashxdr
Observer
Observer
9,819 Views
Registered: ‎04-27-2018

I think I just made a critical discovery. My modified v2_xdma driver seems to be working, but my v2.c test program was causing data corruption with the line data_buf[i] = 0xda (line 59). When I comment this out the 4MByte data block is transfered reliably.

 

To the xilinx person -- thanks for your response. Coincidence that we both had something to say around the same time...

 ETA: Responding to this:  Unlike x86, for most of ARM-based platforms, users are responsible for maintaining ache coherency in their software explicitly. The problem is the cache consistency stuff can't be done in userspace, it has to be done in the kernel driver. The 20180420 release also just flat out crashes on my setup. Xilinx ought to at least fix that...

 

0 Kudos
xiyuex
Xilinx Employee
Xilinx Employee
9,799 Views
Registered: ‎05-10-2018

Hi, dashxdr, could you provide more information about the system and OS you are using?

0 Kudos