cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Contributor
Contributor
1,068 Views
Registered: ‎04-29-2020

U50 XDMA crash

Jump to solution

software: Vivado 2019.2

Hardware: Alveo U50

I use RTL flow to devlope the U50 card. Attach pdf file is the block design.

First add the xdma and HBM module,  use the xdma driver from git, load driver then use script run_test.sh, all test passed.

Sencond add qsfp module to the block desgin, run the script load_driver.sh, linux crash, dmesg see the vmcore-dmesg.txt in attach file

Thanks! 

 

Tags (1)
0 Kudos
Reply
1 Solution

Accepted Solutions
Contributor
Contributor
778 Views
Registered: ‎04-29-2020

reg_rw just mmap 32KB memory

View solution in original post

0 Kudos
Reply
8 Replies
Xilinx Employee
Xilinx Employee
999 Views
Registered: ‎10-19-2015

Hi @liuyong 

DMESG shows some concerning errors

[   83.604856] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2019.2.51
[   83.604859] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, sgdma_timeout: 10 sec.
[   83.604875] xdma:xdma_threads_create: xdma_threads_create
[   83.605594] xdma:xdma_device_open: xdma device 0000:b3:00.0, 0xffff88103e4ee000.
[   83.605725] xdma:map_single_bar: BAR0 at 0xe1200000 mapped at 0xffffc90008c00000, length=1048576(/1048576)
[   84.273299] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
[   84.273332] xdma:map_single_bar: BAR1 at 0xe1300000 mapped at 0xffffc900076e0000, length=65536(/65536)
[   84.273334] xdma:map_bars: config bar 1, pos 1.
[   84.273335] xdma:identify_bars: 2 BARs: config 1, user 0, bypass -1.
[   84.273401] xdma:probe_one: 0000:b3:00.0 xdma0, pdev 0xffff88103e4ee000, xdev 0xffff8801664d6000, 0xffff8801664d4000, usr 16, ch 2,2.
[   84.273441] {1}[Hardware Error]: event severity: fatal
[   84.273455] {1}[Hardware Error]:  Error 0, type: fatal
[   84.273470] {1}[Hardware Error]:   section_type: PCIe error
[   84.273485] {1}[Hardware Error]:   port_type: 0, PCIe end point
[   84.273500] {1}[Hardware Error]:   version: 3.0
[   84.273513] {1}[Hardware Error]:   command: 0x0007, status: 0x0810
[   84.273529] {1}[Hardware Error]:   device_id: 0000:b3:00.0
[   84.273544] {1}[Hardware Error]:   slot: 3
[   84.273555] {1}[Hardware Error]:   secondary_bus: 0x00
[   84.273569] {1}[Hardware Error]:   vendor_id: 0x10ee, device_id: 0x903f
[   84.273586] {1}[Hardware Error]:   class_code: 010007
[   84.273600] Kernel panic - not syncing: Fatal hardware error!

Can you see that your card was linked up in LSPCI? 

What is the RTL flow you are speaking of? Can you point me to documentation? 

Do you have your vendor ID set correctly in the driver? 

Seems like the bar space is not correctly configured. You might want to use the Board Aware flow first for putting in the PCIe and XDMA cores. 

Also, the QSFP modules don't work with AXI memory mapped IP, so you'll need to use QDMA to generate AXI streaming interfaces or design your own RTL to translate from memory mapped to streaming interfaces. 

Do you have other cards connected to your sever? 

How much bar space are you asking for? Looks like the kernel could not allocate contiguous bar space for your card. 

What other changes did you make between your working design and this design? 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Tags (3)
0 Kudos
Reply
Contributor
Contributor
953 Views
Registered: ‎04-29-2020

Can you see that your card was linked up in LSPCI?
--lspci log file is attached
What is the RTL flow you are speaking of? Can you point me to documentation?
--I means Vivado Desing Flow
Do you have your vendor ID set correctly in the driver?
--vendor ID is correct in the driver, because before the QSFP modules add in the bd, all test passed
Also, the QSFP modules don't work with AXI memory mapped IP, so you'll need to use QDMA to generate AXI streaming interfaces or design your own RTL to translate from memory mapped to streaming interfaces.
-- QSFP module config is attached in qsfp_cfg.zip
Do you have other cards connected to your sever?
-- No
How much bar space are you asking for? Looks like the kernel could not allocate contiguous bar space for your card.
-- 1 Megabytes
What other changes did you make between your working design and this design?
-- Just add the QSFP module in the bd file

0 Kudos
Reply
Contributor
Contributor
952 Views
Registered: ‎04-29-2020
we use a third party TOE ip core connect to the QSFP module
0 Kudos
Reply
Xilinx Employee
Xilinx Employee
915 Views
Registered: ‎10-19-2015

Hi @liuyong 

It seems like the U50 drops off the PCIe bus. Can you show me the Vivado power estimate reports? 

How much of the FPGA logic are you using? 

Regards,

M

 

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Tags (4)
0 Kudos
Reply
Contributor
Contributor
891 Views
Registered: ‎04-29-2020

power-summary.pngutilization-summary.png

power report and block design is attached.

THANK YOU

0 Kudos
Reply
Contributor
Contributor
866 Views
Registered: ‎04-29-2020

After cms module add to the block desgin, load driver success! But after add the cms module XDMA-axilite read cms is ok, read other address, core dumped

TIM截图20200507182722.png

Below is the log

[root@redhat tools]# ./reg_rw /dev/xdma0_user 0
argc = 3
device: /dev/xdma0_user
address: 0x00000000
access type: read
access width: word (32-bits)
character device /dev/xdma0_user opened.
Memory mapped at address 0x7f6bae3f0000.
Read 32-bit value at address 0x00000000 (0x7f6bae3f0000): 0xb000f000
[root@redhat tools]#
[root@redhat tools]# ./reg_rw /dev/xdma0_user 0x40000
argc = 3
device: /dev/xdma0_user
address: 0x00040000
access type: read
access width: word (32-bits)
character device /dev/xdma0_user opened.
Memory mapped at address 0x7faecb4dd000.
Segmentation fault (core dumped)
 

0 Kudos
Reply
Xilinx Employee
Xilinx Employee
818 Views
Registered: ‎10-19-2015

Hi @liuyong 

I don't think 0x40000 is mapped to a bar. What does lspci say? 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Reply
Contributor
Contributor
779 Views
Registered: ‎04-29-2020

reg_rw just mmap 32KB memory

View solution in original post

0 Kudos
Reply