UPGRADE YOUR BROWSER
We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!
05-07-2018 03:36 AM - edited 05-09-2018 07:07 AM
Hi
In my design, I'm using DMA/Bridge Subsystem IP with VC707 (Virtex7 485T).
I'm using the device drivers from AR65444 on Ubuntu 16.04. Vivado 2017.2
Every now and then, I observe the following errors on dmesg
[ 1503.268351] xdma_isr():(irq=31) <<<< INTERRUPT SERVICE ROUTINE [ 1503.316473] xdma_isr():ch_irq = 0xffffffff [ 1503.364590] xdma_isr():user_irq = 0xffffffff
[ 1503.364688] engine_status_read(): H2C0: Status of SG DMA H2C engine: [ 1503.364688] engine_status_read(): H2C0: ioread32(0xffffb57541fc0040). [ 1503.412820] engine_status_read(): H2C0: status = 0xffffffff: BUSY DESC_STOPPED DESC_COMPLETED ALIGN_MISMATCH MAGIC_STOPPED FETCH_STOPPED READ_ERROR DESC_ERROR IDLE_STOPPED
[ 1503.460966] kernel BUG at <driver-path>/Vivado/Xilinx_Answer_65444_Linux_Files/driver/xdma-core.c:1314!
This error occurs randomly.
When it does occur, I have to shut down the PC, program the bitstream and restart my host PC and try again.
Could anyone please tell me why this error occurs ?
What can I do to prevent this error ?
Thank You
Jagannath
05-07-2018 03:57 PM
05-07-2018 10:57 PM - edited 05-09-2018 07:08 AM
Thank you for replying @venkata
I'm using VC707 as a hardware accelerator with PCIe Gen2 to transfer data between host PC and BRAM/DDR3 memory.
My user-space C program has the same snippet of code given in dma_to_device.c, dma_from_device.c and reg_rw.c for transferring data.
Once the hardware accelerator completes execution, legacy interrupt is generated using usr_irq_req.
The device driver from AR65444 has extra code in xdma_isr() which sends netlink unicast message to the above userspace program, so that it collect results from VC707 DDR3 memory.
This C program (with working hardware accelerator) works 3-4 times without any problem. Subsequent executions get stuck in the initial writing to FPGA stage (similar to dma_to_device). The kernel freezes and I'd have to shutdown the PC and program the bitstream and reboot again (works few times, kernel freezes, , shutdown, program bitstream, reboot.. repeat).
The line 1314 in xdma-core.c, as indicated in my original post, points to line "BUG_ON(!transfer);" in the function "struct xdma_transfer *engine_service_final_transfer()" .
Thank you
Jagannth
05-09-2018 06:34 AM
05-16-2018 04:27 AM - edited 05-16-2018 04:28 AM
This issue is still open.
I'd like to point out that this error occurs when Legacy AR65444 driver is used in interrupt mode. I did not face this problem when I used the driver in polling mode.
Is there any difference in performance when mode of operation of drivers is changed ?
When 20th April 2018 release drivers (Xilinx_Answer_65444_Linux_Files_rel20180420.zip), I faced a completely new problem. I noticed xdma_isr() is being executed continuously (as if in an infinite loop) even though FPGA design is idle and not generating legacy interrupt using usr_irq_req. I have mentioned this in an another post (https://forums.xilinx.com/t5/PCI-Express/no-C2H-channels-enabled-AR65444/m-p/856048/highlight/true#M10874).
I look forward to hearing back from you.
Thank You
Jagannath
05-16-2018 12:18 PM
05-17-2018 06:18 AM - edited 05-17-2018 06:22 AM
Whether your observations are with default driver (legacy or rel20180420)?
I answered this already. This error occured when Legacy AR65444 driver is used in interrupt mode. I did not face this problem when I used the driver in polling mode. I did not bother testing with rel20180420 driver because I encountered a far more serious bug with xdma_isr() continuously executing, and I needed xdma_isr() to execute only when the legacy interrupt is raised by my FPGA design. I have attached a log file in this post https://forums.xilinx.com/t5/PCI-Express/no-C2H-channels-enabled-AR65444/m-p/856048/highlight/true#M10874
Can you give me the steps you follow to do the tests (console log of the testing can help)?
I mentioned in message 3 of this thread that My user-space C program has the same snippet of code given in dma_to_device.c, dma_from_device.c and reg_rw.c for transferring data. Although I am not at liberty to share the complete C file, rest assured the the code used to allocate memory, mapping device to memory, transferring data is used verbatim from legacy AR65444 driver files dma_to_device.c, dma_from_device.c and reg_rw.c. The AXI address used to transfer data is also completely valid according to my design.
With respect to sequence of operations occurring in my C file,
After running successfully for a few times, the program gets stuck. The dmesg logs are attached in message 4 of this thread. The driver becomes unusable and I'd have to restart the host PC and reload the driver and we start again.
I would be grateful if the reason for the following log is determined.
[ 1503.268351] xdma_isr():(irq=31) <<<< INTERRUPT SERVICE ROUTINE [ 1503.316473] xdma_isr():ch_irq = 0xffffffff [ 1503.364590] xdma_isr():user_irq = 0xffffffff
On successful transfers, I observed ch_irq values to be 1,2,4 etc. depending upon the channel number or read/write requests. But the appearance of value 0xffffffff is causing all the H2C and C2H engines to stop.
Why would "BUG_ON(!transfer);" in the function "struct xdma_transfer *engine_service_final_transfer()" fail in xdma-core.c file ?
Is the memory pointed by the pointer transfer getting corrupted ? When might that happen ?
02-15-2019 03:44 PM
Any progress on this issue?
09-08-2019 09:52 PM
Any progress on this issue?
09-11-2019 12:33 AM
poll mode usually has higher performance
if you still need interrupt mode, please try with another interrupt mode
parm: interrupt_mode:0 - MSI-x , 1 - MSI, 2 - Legacy (uint)
You can use this command to force interrupt_mode as MSI (HW design enabled) when install the driver.
$sudo insmod xdma.ko interrupt_mode=1