cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
xjiang3
Observer
Observer
892 Views
Registered: ‎12-30-2018

Server cannot connect Alveo U280

Jump to solution

I programed Alveo U280 Card with the instruction of Alveo U280 Data Center Accelerator Card User Guide (UG1314). But I forget to "3. After programming has completed, disconnect the card in the hardware manager, and disconnect the USB cable from the Alveo accelerator card.". After I cold reboot the machine, "lspci" cannot show the acceleration card, and Vivado Hardware Manger cannot detect the device. I cannot connect the card and refresh the golden image into the card. How can I fix this problem?

Thanks!

0 Kudos
1 Solution

Accepted Solutions
JohnFedakIV
Moderator
Moderator
662 Views
Registered: ‎09-04-2020

Hi @xjiang3 ,

Thank you for the feedback.

For #2, my understanding is that the blue LED is on for a second and then is no longer seen, correct?
The referenced post indicates that the Blue LED is on for one second, blinks quickly and repeats the process. This would result in a blinking blue LED. To provide background, the blue LED indicates that the FPGA is programed. A blinking LED (symptom of the CATTRIP issue) is the FPGA constantly trying to reprogram itself.

With the red LED on, there is a power delivery issue and if the BIOS is not in safety mode and you have gone through the process outlined above and also tested the card in another machine. Unfortunately, the power issue is most likely on the card. Please follow the RMA process outlined in AR 72533. Please indicate that you have went through the steps described in the Alveo Debugging guide.

With a new card, please use the XRT flow to update the SC on the card to latest version - this will prevent the CATTRIP/D32 pin issue.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *

View solution in original post

0 Kudos
8 Replies
JohnFedakIV
Moderator
Moderator
829 Views
Registered: ‎09-04-2020

Hi @xjiang3 ,

Leaving the cable in after programming can result in the card not coming up on next reboot. A couple questions:

  • Have you done a reboot with the cable unplugged?
  • Did you program the custom image with a separate machine or using the machine that the card was in?
  • Lastly, what is the status of the LEDs on the card (There are 5: red, blue, orange, yellow, green)?

Programming a custom image onto the card shouldn't affect the golden image as the MCS file is set at a 0x0100200 offset and the golden address registers are write protected (see note below Table 5 in UG1314).

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
xjiang3
Observer
Observer
804 Views
Registered: ‎12-30-2018

Thanks for your reply!

1. First I cold reboot server and Alveo U280 with the cable plugged. Then I disconnect the cable to cold reboot. But the server still cannot detect the card by "lspci".

2. I program the card from the machine that the card is in.

But I find my problem have the exact status as https://forums.xilinx.com/t5/Alveo-Accelerator-Cards/U280-in-unrecoverable-state-after-flashing-MCS/td-p/1075114 

So, I wonder if you have some advice to solve this problem? Others say I may have to RAM the card. Although I cannot connect board in Vivado Hardware Manger, I think if I can use openocd to direct write mcs file of golden image into flash on board?

Thanks @JohnFedakIV .

 

0 Kudos
JohnFedakIV
Moderator
Moderator
781 Views
Registered: ‎09-04-2020

Hi @xjiang3,

Thank you for the feedback and highlighting the similar post. A couple questions:

  • Is the LED behavior similar in that post? I am still interested in my question on the status of the 5 LEDs.
  • How did you address AR 72926?
  • Also with this, which version of Vivado are you using?

I do want to note that we don't recommend to program the card from the machine that it is in (mentioned on page 15 in UG1314).

Since the card isn't visible in Vivado HW Manager, you will most likely need to RMA the card. My hope is that we can find the issue that led to this, so that it doesn't happen again.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
xjiang3
Observer
Observer
736 Views
Registered: ‎12-30-2018

Hi, @JohnFedakIV 

Thanks for your reply.

1. The LED on my card is red during cold reboot. This is different form that post .

2. Because I cannot connect the card form Vivado Hardware Manger, I have no way to program new msc file into the card. This answer is useful to avoid this problem, but not fix this problem.

3. Vivado version is 2020.2 .

I think Xilinx needs to highlight this problem in the manual. Anyway, thanks @JohnFedakIV .

0 Kudos
JohnFedakIV
Moderator
Moderator
708 Views
Registered: ‎09-04-2020

Hi @xjiang3 ,

Thank you for the feedback.

There are 5 LEDs on the card, if the blue LED isn't on and the red LED is - that indicates a different issue than the referenced topic. This means that there is a power issue on the card. A couple questions with this to narrow down possible issues:

  • Is the BIOS in Safety mode? This is often exhibited with the fans running at full power, low resolution in video cards, and USB devices not working.
  • Do you have another machine/server to test the card in?
  • Our Alveo Card Debugging guide recommends (described here) 
    • Shut down the system
    • Pull power
    • Reseat the Alveo card
    • Reseat the server risers if applicable
    • Bring system back up
    • Check if the red LED is out

The issue described in AR 72926 has been addressed in newer versions of the U280 SC FW (4.3.10), if the SC has been updated. It was also my understanding that Vivado 2020.2 has a DRC error when routing if CATTRIP is not tied low, can you confirm the D32 connection in your design?

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *
0 Kudos
xjiang3
Observer
Observer
670 Views
Registered: ‎12-30-2018

Hi @JohnFedakIV 

Thanks for your reply.

1. The BIOS is not in safe mode.

2. I tested the card in the formal machine, but I can see the LED shows blue for about one second. Then the LED is red. (This is the same as the post )

3. I find only when I “Unplugging the system from external power”, I can see the LED show blue during cold reboot.

4. Yes, the D32 is not be tied low in my design. Now the mcs file of this design has been loaded into the flash on card, which will be written to FPGA during reboot.

Regards.

0 Kudos
JohnFedakIV
Moderator
Moderator
663 Views
Registered: ‎09-04-2020

Hi @xjiang3 ,

Thank you for the feedback.

For #2, my understanding is that the blue LED is on for a second and then is no longer seen, correct?
The referenced post indicates that the Blue LED is on for one second, blinks quickly and repeats the process. This would result in a blinking blue LED. To provide background, the blue LED indicates that the FPGA is programed. A blinking LED (symptom of the CATTRIP issue) is the FPGA constantly trying to reprogram itself.

With the red LED on, there is a power delivery issue and if the BIOS is not in safety mode and you have gone through the process outlined above and also tested the card in another machine. Unfortunately, the power issue is most likely on the card. Please follow the RMA process outlined in AR 72533. Please indicate that you have went through the steps described in the Alveo Debugging guide.

With a new card, please use the XRT flow to update the SC on the card to latest version - this will prevent the CATTRIP/D32 pin issue.

Regards,
~John

----------------------------------------------------------------------------------
* Please don't forget to reply, kudo and accept as a solution! *

View solution in original post

0 Kudos
xjiang3
Observer
Observer
624 Views
Registered: ‎12-30-2018

Hi @JohnFedakIV 

Yes, the blue LED is on for a second and then is no longer seen.

But I wonder why there is a problem with power delivery? I think I haven't done something wrong with the card.

Thanks very much!

 

0 Kudos