UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
2,361 Views
Registered: ‎08-10-2017

PCIe DMA (AR65444) driver error - help

Hi

In my design, I'm using DMA/Bridge Subsystem IP with VC707 (Virtex7 485T).

I'm using the device drivers from AR65444 on Ubuntu 16.04. Vivado 2017.2

 

Every now and then, I observe the following errors on dmesg

[ 1503.268351] xdma_isr():(irq=31) <<<< INTERRUPT SERVICE ROUTINE
[ 1503.316473] xdma_isr():ch_irq = 0xffffffff
[ 1503.364590] xdma_isr():user_irq = 0xffffffff
[ 1503.364688] engine_status_read(): H2C0: Status of SG DMA H2C engine:
[ 1503.364688] engine_status_read(): H2C0: ioread32(0xffffb57541fc0040).
[ 1503.412820] engine_status_read(): H2C0: status = 0xffffffff: BUSY DESC_STOPPED DESC_COMPLETED ALIGN_MISMATCH MAGIC_STOPPED FETCH_STOPPED READ_ERROR DESC_ERROR IDLE_STOPPED 
[ 1503.460966] kernel BUG at <driver-path>/Vivado/Xilinx_Answer_65444_Linux_Files/driver/xdma-core.c:1314!

 

This error occurs randomly.

When it does occur, I have to shut down the PC, program the bitstream and restart my host PC and try again.

 

Could anyone please tell me why this error occurs ?
What can I do to prevent this error ?

 

 

Thank You

 

 

Jagannath

0 Kudos
9 Replies
Moderator
Moderator
2,323 Views
Registered: ‎02-16-2010

Re: PCIe DMA (AR65444) driver error - help

Do you know which test will trigger the error?
------------------------------------------------------------------------------
Don't forget to reply, give kudo and accept as solution
------------------------------------------------------------------------------
0 Kudos
Adventurer
Adventurer
2,305 Views
Registered: ‎08-10-2017

Re: PCIe DMA (AR65444) driver error - help

Thank you for replying @venkata

 

I'm using VC707 as a hardware accelerator with PCIe Gen2 to transfer data between host PC and BRAM/DDR3 memory.

 

My user-space C program has the same snippet of code given in dma_to_device.c, dma_from_device.c and reg_rw.c for transferring data.

Once the hardware accelerator completes execution, legacy interrupt is generated using usr_irq_req.

The device driver from AR65444 has extra code in xdma_isr() which sends netlink unicast message to the above userspace program, so that it collect results from VC707 DDR3 memory.

 

This C program (with working hardware accelerator) works 3-4 times without any problem. Subsequent executions get stuck in the initial writing to FPGA stage (similar to dma_to_device). The kernel freezes and I'd have to shutdown the PC and program the bitstream and reboot again (works few times, kernel freezes, , shutdown, program bitstream, reboot.. repeat).

 

The line 1314 in xdma-core.c, as indicated in my original post, points to line "BUG_ON(!transfer);" in the function "struct xdma_transfer *engine_service_final_transfer()" .

 

 

 

Thank you

 

Jagannth

0 Kudos
Adventurer
Adventurer
2,272 Views
Registered: ‎08-10-2017

Re: PCIe DMA (AR65444) driver error - help

@venkata

 

I'm attaching the error log that I copied from dmesg.

 

Please help me..

 

BTW, I'm using AXI Interconnect between PCIe/XDMA core and BRAM/DDR3 memory..

0 Kudos
Adventurer
Adventurer
2,212 Views
Registered: ‎08-10-2017

Re: PCIe DMA (AR65444) driver error - help

This issue is still open.

 

I'd like to point out that this error occurs when Legacy AR65444 driver is used in interrupt mode. I did not face this problem when I used the driver in polling mode. 

 

Is there any difference in performance when mode of operation of drivers is changed ?

When 20th April 2018 release drivers (Xilinx_Answer_65444_Linux_Files_rel20180420.zip), I faced a completely new problem. I noticed xdma_isr() is being executed continuously (as if in an infinite loop) even though FPGA design is idle and not generating legacy interrupt using usr_irq_req. I have mentioned this in an another post (https://forums.xilinx.com/t5/PCI-Express/no-C2H-channels-enabled-AR65444/m-p/856048/highlight/true#M10874).

I look forward to hearing back from you.

 

 

Thank You

 

Jagannath

0 Kudos
Moderator
Moderator
2,199 Views
Registered: ‎02-16-2010

Re: PCIe DMA (AR65444) driver error - help

Hi Jagannath,

Whether your observations are with default driver (legacy or rel20180420)?
Can you give me the steps you follow to do the tests (console log of the testing can help)? I would like to check it myself also.

Thanks,
Srinadh
------------------------------------------------------------------------------
Don't forget to reply, give kudo and accept as solution
------------------------------------------------------------------------------
0 Kudos
Adventurer
Adventurer
2,183 Views
Registered: ‎08-10-2017

Re: PCIe DMA (AR65444) driver error - help

Whether your observations are with default driver (legacy or rel20180420)?

I answered this already. This error occured when Legacy AR65444 driver is used in interrupt mode. I did not face this problem when I used the driver in polling mode. I did not bother testing with rel20180420 driver because I encountered a far more serious bug with xdma_isr() continuously executing, and I needed xdma_isr() to execute only when the legacy interrupt is raised by my FPGA design. I have attached a log file in this post https://forums.xilinx.com/t5/PCI-Express/no-C2H-channels-enabled-AR65444/m-p/856048/highlight/true#M10874

 

Can you give me the steps you follow to do the tests (console log of the testing can help)?

I mentioned in message 3 of this thread that My user-space C program has the same snippet of code given in dma_to_device.c, dma_from_device.c and reg_rw.c for transferring data. Although I am not at liberty to share the complete C file, rest assured the the code used to allocate memory, mapping device to memory, transferring data is used verbatim from legacy AR65444 driver files dma_to_device.c, dma_from_device.c and reg_rw.c. The AXI address used to transfer data is also completely valid according to my design.


With respect to sequence of operations occurring in my C file, 

  1. The program writes some data to DDR3 and BRAM memory. 
  2. The program configures some registers using BAR0 (PCIe to AXI Lite). This triggers my FPGA design to start computation.
  3. FPGA design generates legacy interrupt using usr_irq_req pin in XDMA IP block when computation is completed. The ISR xdma_isr() executes, which sends a netlink unicast message to C program indicating that results are ready to fetched from DDR3 memory.
  4. The program reads results from DDR3 memory.
  5. Repeat steps (2), (3) and (4) for fixed number of iterations.

 

After running successfully for a few times, the program gets stuck. The dmesg logs are attached in message 4 of this thread. The driver becomes unusable and I'd have to restart the host PC and reload the driver and we start again.

I would be grateful if the reason for the following log is determined.

[ 1503.268351] xdma_isr():(irq=31) <<<< INTERRUPT SERVICE ROUTINE
[ 1503.316473] xdma_isr():ch_irq = 0xffffffff
[ 1503.364590] xdma_isr():user_irq = 0xffffffff

On successful transfers, I observed ch_irq values to be 1,2,4 etc. depending upon the channel number or read/write requests. But the appearance of value 0xffffffff is causing all the H2C and C2H engines to stop.


Why would "BUG_ON(!transfer);" in the function "struct xdma_transfer *engine_service_final_transfer()" fail in xdma-core.c file ?

Is the memory pointed by the pointer transfer getting corrupted ? When might that happen ?

0 Kudos
Contributor
Contributor
1,073 Views
Registered: ‎12-12-2018

Re: PCIe DMA (AR65444) driver error - help

Any progress on this issue?

0 Kudos
Highlighted
Observer athulya
Observer
406 Views
Registered: ‎08-13-2019

Re: PCIe DMA (AR65444) driver error - help

Any progress on this issue?

0 Kudos
Xilinx Employee
Xilinx Employee
373 Views
Registered: ‎08-02-2007

Re: PCIe DMA (AR65444) driver error - help

poll mode usually has higher performance

if you still need interrupt mode, please try with  another interrupt mode 

 

parm:           interrupt_mode:0 - MSI-x , 1 - MSI, 2 - Legacy (uint)

 

You can use this command to force interrupt_mode as MSI (HW design enabled) when install the driver.

$sudo insmod xdma.ko interrupt_mode=1

 

------------------------------------------------------------------------------
Don't forget to reply, give kudo and accept as solution
------------------------------------------------------------------------------
0 Kudos