01-25-2021 02:31 PM
Issue related to this has been asked previously in following links but couldn't solve the issue.
Sometimes reboots happens just after programming the bitstream while in some cases launched run completes and produces the results but server get rebooted after some seconds. I am having this issue for a long time and every time I workaround by setting lower frequency in Vitis.
System and XRT information as follows.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ System Configuration OS name: Linux Release: 4.15.0-132-generic Version: #136-Ubuntu SMP Tue Jan 12 14:58:42 UTC 2021 Machine: x86_64 Model: PowerEdge T640 CPU cores: 48 Memory: 257637 MB Glibc: 2.27 Distribution: Ubuntu 18.04.3 LTS Now: Mon Jan 25 21:54:28 2021 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ XRT Information Version: 2.5.309 Git Hash: 9a03790c11f066a5597b133db737cf4683ad84c8 Git Branch: 2019.2_PU2 Build Date: 2020-02-23 18:52:05 XOCL: 2.5.309,9a03790c11f066a5597b133db737cf4683ad84c8 XCLMGMT: 2.5.309,9a03790c11f066a5597b133db737cf4683ad84c8 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  0000:3b:00.1 xilinx_u280_xdma_201920_3(ID=0x5e278820) user(inst=128)
I checked AXI firewall trip issues mentioned in above threads with lapc but it didn't report any issues in xbutil status.
Any Kind of helps are appreciated! Thanks in advance
01-28-2021 01:49 AM
01-31-2021 03:38 AM - edited 02-01-2021 12:59 AM
Thanks a lot for your suggestion.
tried disabling fatal error reporting on PCI port which connects U280.
Now I don't observe the reboot after the FPGA Configuration and produced results match with golden.
But it hangs when launching the run again, basically I can run only once after manually rebooting the server.
when I checking the /var/log/syslog, health_check report weird temperature like (-1) and says a hot reset is required.
All the temperature values in the xbutil query appears as "NA"
02-01-2021 02:24 PM
02-02-2021 01:22 AM
Hi @kkvasan ,
typically we need to disable error reporting before flashing a bitstream on the full board and then rebooting.
However, the fact that disabling error reporting helped you it's probably the first step to tracking down the issue.
I suggest you to try to catch what error is reported by the PCIe bus on flashing.