cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
UNKNwYSHSA
Visitor
Visitor
1,506 Views
Registered: ‎10-31-2020

Validate U200 failed

Jump to solution

I run command:

 

./host_setup.sh -v 2020.1

 

and output is:

./host_setup.sh -v 2020.1./host_setup.sh -v 2020.1

Then run command:

 

./xbutil validate

 

the output is:

./xbutil validate./xbutil validate

Output of command

./xbmgmt flash --scan

Is:

780983005d4e14cad912af4e98e0ab6.jpg

Please help me, thanks!

0 Kudos
Reply
1 Solution

Accepted Solutions
JohnFedakIV
Moderator
Moderator
895 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA ,

I spoke with the RMA team and unfortunately because the Alveo card was purchased from a third party and not an authorized distributor - this card can't be considered for an RMA. This is because, as you mentioned, it's not clear what happened to the card before.

If you want, you can still go through the RMA process as described in this AR (https://www.xilinx.com/support/answers/72533.html) to request from the team directly.

As I mentioned in an earlier post, the card is in a usable state. Given that it passed xbutil validate, you can download and run programs on the card. The one concern is that with the SC not responding, it is not clear what will happen in an overcurrent or overtemp condition. Because of this, I would recommend that the system that it is installed in have more than enough airflow and if you want to provide extra caution, monitor the temperature of the card to maintain it within it's limits.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *

View solution in original post

0 Kudos
Reply
17 Replies
UNKNwYSHSA
Visitor
Visitor
1,445 Views
Registered: ‎10-31-2020

And i read this message:

https://forums.xilinx.com/t5/Alveo-Accelerator-Cards/Incomplete-documentation-of-U200-UG1289/td-p/971539 

And the LED lights of my U200 is like this:

ce3f672ba96ba5f3f3053c0d936c6bf.jpg

All LED lights is ON.

Is there any problem with my card?

Thanks!

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
1,378 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA ,

Welcome to the forums!

Based on your message outputs, there is an SC issue. I can see the ERROR: SC is not ready before and after the shell is flashed. This looks similar to these posts https://forums.xilinx.com/t5/Alveo-Accelerator-Cards/SC-Unknown/td-p/1112108 and https://forums.xilinx.com/t5/Alveo-Accelerator-Cards/Error-xbutil-sc-is-not-ready-0x0/td-p/1152027.

The first thing I would try is a full power removal. The Satellite Controller (SC) chip is powered by PCI Express Auxiliary power, which remains active even after host shut-down for most systems. As Emery mentions in that 2nd post - the critical step is to make sure the power has been fully removed from the system (unplug the power cord or remove the card) for about 10 minutes to discharge any capacitors.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
1,356 Views
Registered: ‎10-31-2020

I have followed these posts many times earlier, unplugging the power and waiting for 10 minutes, but the problem still exists. I turned off the power switch last light and the problem remains now.

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
1,293 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA,

So far in seeing this issue, we have been successful by fully removing power to the card. In some cases, it does take multiple attempts. Depending on the functionality of the power switch, there still might be power on the PCIe Aux line. As a next step, I would remove the card from the system for that ~10 minutes time frame.

The quick check to see if the card is running as expected is running the flash --scan command and there shouldn't be any "ERROR: SC is not ready: 0x0" lines and the SC should match the platform on the machine:

 

sudo xbmgmt flash --scan
Card [0000:xx:00.0]
    Card type:		u200
    Flash type:		SPI
    Flashable partition running on FPGA:
        xilinx_u200_xdma_201830_2,[ID=0x5d1211e8],[SC=4.2.0]
    Flashable partitions installed in system:	
        xilinx_u200_xdma_201830_2,[ID=0x5d1211e8],[SC=4.2.0]

 

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
1,258 Views
Registered: ‎10-31-2020

@JohnFedakIV 

I tried these operations for more than 10 times. In one attempt yesterday, the 12V pin of the PCIE interface was broken and the board burned. Now the device manager can recognizes the board only when the board connect with auxiliary power. PCIE interface is not recognized.

ec67bcb57e40861fa502279ead22a42.jpg

How to guarantee the warranty, thank you!

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
1,237 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA,

With the card in a confirmed non-functioning state, the last option is to follow the steps in the 72533 Answer Record (https://www.xilinx.com/support/answers/72533.html) and submit the application for an RMA.

The LED status is one of the points to confirm for the AR, please note the status of the LEDs before and after the PCIe pin was broken.

Once done, this will be routed to the Alveo RMA team and they will contact you directly. 

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
1,209 Views
Registered: ‎10-31-2020

Current situation:

No aux power:
    1 power off:
        LEDs: RED off, BLUE off, ORANGE on, YELLOW on, GREEN on
    2 power on:
        LEDs: RED on, BLUE on, ORANGE on, YELLOW on, GREEN on
        lspci: no xilinx device
        xclmgmt flash --scan: FPGA: none, System: none
        device manager: unknown device

With aux power:
    1 power off:
        LEDs: RED off, BLUE off, ORANGE on, YELLOW on, GREEN on
    2 power on:
        LEDs: RED off, BLUE on, ORANGE on, YELLOW on, GREEN on
        lspci: 2 xilinx device
        xclmgmt flash --scan: FPGA: SC Unknown, System: none
        device manager: xcu200_0

 

Without aux power:

no aux power, power offno aux power, power off

no aux power, power onno aux power, power on

no aux power, list deviceno aux power, list device

no aux power, device manager 11no aux power, device manager 11

no aux power, device manager2no aux power, device manager2

 

With aux power:

aux power, power offaux power, power off

aux power, power onaux power, power on

aux power, list deviceaux power, list device

aux power, device manager1aux power, device manager1

aux power, device manager2aux power, device manager2

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
1,180 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA ,

Thank you for the feedback on this, I'm glad to see that the card is still running with AUX power.

Given that the full power removal isn't helping with the SC, maybe there is another issue going on. Let's take a quick look at the system after running xbutil validate.

The XRT version currently running is a little old, so first let's remove/uninstall XRT and then install the newer 2020.1 XRT version for your OS, this is found on the U200 product page under getting started:

https://www.xilinx.com/products/boards-and-kits/alveo/u200.html#gettingStarted

Once the new XRT is installed, please run the $ xbutil validate command after sourcing xrt ($ source /opt/xilinx/xrt/setup.sh) and provide the output of the following:

1. $ dmesg (Please output to a txt file)

2. lspci for Xilinx devices, full command:

 

$ lspci -v -d 10ee:

 

3. $ xbutil list

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
1,141 Views
Registered: ‎10-31-2020

sudo lspci -v -d 10ee:sudo lspci -v -d 10ee:xbutil listxbutil list

 

Thank you!

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
1,072 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA,

Thank you for providing this information.

I'm looking through the dmesg and the xclbin was loaded:

[ 2740.226977] xocl 0000:07:00.1: ffffa08f5995a0a0 xocl_read_axlf_helper: Loaded xclbin dfd5a66a-36aa-41c6-88bb-c85a86d15512

Can you provide the output of the xbutil validate? Are you still seeing the same issue as in the first post?

Also, please provide the $xbmgmt flash --scan output as well.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
1,048 Views
Registered: ‎10-31-2020

@JohnFedakIV 

xbmgmt flash --scanxbmgmt flash --scanxbutil validatexbutil validate

 

Thank you!

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
1,026 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA,

Thank you for providing this, its good to see that the card now successfully validates.

This leaves the SC not ready error. Before the issues seen on this thread - had the board been programmed and working as expected or was this the first attempt to get the board up and running?

Can you also provide the output of the $sudo xbmgmt flash --scan --verbose?

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
1,005 Views
Registered: ‎10-31-2020

@JohnFedakIV 

xbmgmt flash --scan --verbosexbmgmt flash --scan --verbose

 

I bought this card from someone else. What has been done before is not clear.

Can it be judged that this card is damaged?

Thank you!

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
988 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA,

The card is just in an unusual state. More specifically, the Satellite Control which monitors the power and temperature of the card, isn't responding as expected. This has been fixed in the past with a full power removal, waiting 10 minutes for the capacitors to discharge, and then repowering. However, you have done this many times without success.

The good news is that the card is passing xbutil validate, so it can load and run an xclbin. I would caution you against running your own program on the card as with the SC in it's current state, the card may not be protected from an over current or over temperature event.

Thank you for providing the output. Unfortunately, it looks like the verbose switch didn't give any additional information. Instead, can you provide a txt file of $ xbutil dump? I appreciate you providing all of this information to understand the current state of the card.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
971 Views
Registered: ‎10-31-2020

@JohnFedakIV 

The attachment txt is the output of command "sudo ./xbutil dump".

 

Thank you!

0 Kudos
Reply
JohnFedakIV
Moderator
Moderator
896 Views
Registered: ‎09-04-2020

Hi @UNKNwYSHSA ,

I spoke with the RMA team and unfortunately because the Alveo card was purchased from a third party and not an authorized distributor - this card can't be considered for an RMA. This is because, as you mentioned, it's not clear what happened to the card before.

If you want, you can still go through the RMA process as described in this AR (https://www.xilinx.com/support/answers/72533.html) to request from the team directly.

As I mentioned in an earlier post, the card is in a usable state. Given that it passed xbutil validate, you can download and run programs on the card. The one concern is that with the SC not responding, it is not clear what will happen in an overcurrent or overtemp condition. Because of this, I would recommend that the system that it is installed in have more than enough airflow and if you want to provide extra caution, monitor the temperature of the card to maintain it within it's limits.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *

View solution in original post

0 Kudos
Reply
UNKNwYSHSA
Visitor
Visitor
852 Views
Registered: ‎10-31-2020

@JohnFedakIV 

 

Thank you!

0 Kudos
Reply