cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Contributor
Contributor
458 Views
Registered: ‎05-04-2020

Alveo card status LEDs and tempreture

I just received an Alveo U200 card. I connected it to the PCIe and PCIe 8-pin power cable. Currently blue, orange and green LED lights are on but I can't see any red LED. After few minutes I felt that the board is getting increasingly very hot so I turned it off. Is this normal? Why is it getting so hot? shouldn't I see a red LED?

 

0 Kudos
10 Replies
Highlighted
Xilinx Employee
Xilinx Employee
375 Views
Registered: ‎06-13-2018

Re: Alveo card status LEDs and tempreture

Hi @nzh :

Alveo cards are available in both active and passive cooling configurations.

The difference is, the fan included on active cooled cards. Active cooling card configuration includes a heat sink and fan enclosure cover to provide appropriate cooling. The passive cooling card is designed to be installed into a data center server, where controlled airflow provides direct cooling. Which one is yours (Active or Passive)?

Red LED is Power good LED.  What do you mean by you can't see any red LED?  Is this red LED on or off?

 

 

Thanks,

Priyanka

 

0 Kudos
Highlighted
Contributor
Contributor
364 Views
Registered: ‎05-04-2020

Re: Alveo card status LEDs and tempreture

@panantra My alveo card is passive cooling. But it gets hot without even being used. I am not running anything on it I just turn on the computer and by the time that the server boots(about 3 min) the card is feels pretty warm. I waited a couple more minuets without using it and it got even got hotter. 

I assume I can't see any red LED, because it's off, I can see that blue, green, orange, yellow LEDs are on. They turn on all together as soon as the computer is turned on. Does it mean that the power is not good?It is connected to PCIe 8pin.

I feel that there is something wrong with the card and its power management that "power good" is not on and it gets hot when it's not being even used. 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
359 Views
Registered: ‎06-13-2018

Re: Alveo card status LEDs and tempreture

Hi @nzh :

As I mentioned earlier, Passive cards do not include a built-in fan and therefore require an external mechanism to ensure proper airflow for cooling.

Note that Passive cards should not be powered without a forced airflow mechanism in place.

Please refer to DS962 https://www.xilinx.com/support/documentation/data_sheets/ds962-u200-u250.pdf  (operating conditions) to get the details.

 

 

Thanks,

Priyanka

-----------------------------------------------------------------------------------------------------
If you find any post has resolved your query, mark it as an accepted solution.

0 Kudos
Highlighted
Contributor
Contributor
349 Views
Registered: ‎05-04-2020

Re: Alveo card status LEDs and tempreture

@panantra Thank you for your reply but I don't think it has anything to do with cooling system. I am not even using the board yet and it gets hot and also the "power good" LED did not turn on. It looks like there is something wrong with the power management.

0 Kudos
Highlighted
Contributor
Contributor
275 Views
Registered: ‎05-04-2020

Re: Alveo card status LEDs and tempreture

@panantra  should I see a red light illuminating on the back of the card when I turn on the computer? What LEDs should I exactly see? 

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
256 Views
Registered: ‎10-19-2015

Re: Alveo card status LEDs and tempreture

Hi @nzh 

Is this the first time you are powering on the FPGA? 

Blue light is the FPGA is programmed. 

Red is power good. 

Is there an inconsistency between the documentation and what you are seeing? 

The cards aren't meant to be debugged with the LEDs alone.

Can you use your host machine to debug? The lspci tool should tell you if the PCIe link is up. Once XRT is installed and the shell is loaded on the card, you can use xbutil to debug.

These are the two basic commands: 

$lspci -vd 10ee: 
$sudo xbutil flash scan

You can also debug with the jtag/usb cable. 

Can you see the FPGA with the maintenance connector plugged in? 

I see you are also interchanging server and computer. What is the model of either server or desktop workstation you are using? 

This passive card needs forced air regardless if it is being used or not. If it is on, it needs to be cooled. 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Tags (1)
0 Kudos
Highlighted
Contributor
Contributor
240 Views
Registered: ‎05-04-2020

Re: Alveo card status LEDs and tempreture

@mcertosi  So the problem is that according to the document, I should see a red LED illuminating at the back of the card to make sure the power voltage is sufficient, correct? But the red LED is off. So I think there is something wrong and I don't know how to figure it out. I suspect that there is something wrong with the board. I tried this on a couple Dell poweredge servers and workstations and the red LED didn't turn on on any of them.

When I run the commands, I get this:

root@argus:/home/argus#  xbutil flash scan
WARNING: The xbutil sub-command flash has been deprecated. Please use the xbmgmt utility with flash sub-command for equivalent functionality.

Card [0000:23:00.0]
    Card type:		u200
    Flash type:		SPI
    Flashable partition running on FPGA:
        xilinx_u200_xdma_201830_2,[ID=0x5d1211e8],[SC=4.2.0]
    Flashable partitions installed in system:	
        xilinx_u200_xdma_201830_2,[ID=0x5d1211e8],[SC=4.2.0]

root@argus:/home/argus# sudo lspci -vd 10ee:
23:00.0 Processing accelerators: Xilinx Corporation Device 5000
	Subsystem: Xilinx Corporation Device 000e
	Flags: bus master, fast devsel, latency 0
	Memory at da000000 (64-bit, prefetchable) [size=32M]
	Memory at dffc0000 (64-bit, prefetchable) [size=128K]
	Capabilities: [40] Power Management version 3
	Capabilities: [60] MSI-X: Enable+ Count=33 Masked-
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [1c0] #19
	Capabilities: [400] Access Control Services
	Capabilities: [410] #15
	Kernel driver in use: xclmgmt
	Kernel modules: xclmgmt

23:00.1 Processing accelerators: Xilinx Corporation Device 5001
	Subsystem: Xilinx Corporation Device 000e
	Flags: bus master, fast devsel, latency 0, IRQ 99
	Memory at dc000000 (64-bit, prefetchable) [size=32M]
	Memory at dfff0000 (64-bit, prefetchable) [size=64K]
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Capabilities: [40] Power Management version 3
	Capabilities: [60] MSI-X: Enable+ Count=33 Masked-
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [400] Access Control Services
	Capabilities: [410] #15
	Kernel driver in use: xocl
	Kernel modules: xocl

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
228 Views
Registered: ‎10-19-2015

Re: Alveo card status LEDs and tempreture

Hi @nzh 

Based on the information you've provided, I do not see anything wrong with the board. While the LEDs being inconsistent with documentation is frustrating, lets continue to debug and see if we can see anything else wrong with the card. 

LSPCI shows the card is linked via pcie, and the management and user physical functions are operational. 

From xbutil scan we see the shell is programmed and the satellite controller is also communicating with XRT. 

These are the main parts of the card. 

Next run xbutil validate to see if the DDR and DMA are working correctly. 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Tags (3)
0 Kudos
Highlighted
Contributor
Contributor
207 Views
Registered: ‎05-04-2020

Re: Alveo card status LEDs and tempreture

@mcertosi This is what I get when I run validate:

 

 xbutil validate
INFO: Found 1 cards

INFO: Validating card[0]: xilinx_u200_xdma_201830_2
INFO: == Starting AUX power connector check: 
INFO: == AUX power connector check PASSED
INFO: == Starting PCIE link check: 
LINK ACTIVE, ATTENTION
Ensure Card is plugged in to Gen3x16, instead of Gen2x8
Lower performance may be experienced
WARN: == PCIE link check PASSED with warning
INFO: == Starting SC firmware version check: 
INFO: == SC firmware version check PASSED
INFO: == Starting verify kernel test: 
INFO: == verify kernel test PASSED
INFO: == Starting DMA test: 
Host -> PCIe -> FPGA write bandwidth = 2833.39 MB/s
Host <- PCIe <- FPGA read bandwidth = 2999.45 MB/s
INFO: == DMA test PASSED
INFO: == Starting device memory bandwidth test: 
Traceback (most recent call last):
  File "/opt/xilinx/xrt/test/23_bandwidth.py", line 1, in <module>
    import pyopencl as cl
  File "/usr/lib/python3/dist-packages/pyopencl/__init__.py", line 30, in <module>
    import pyopencl._cl as _cl
ModuleNotFoundError: No module named 'pyopencl._cl'

ERROR: == device memory bandwidth test FAILED
INFO: Card[0] failed to validate.

ERROR: Some cards failed to validate.

 I also tried to install pyopencl but it looks like that I already have it:

root@argus:/home/argus# python -m pip install pyopencl
Requirement already satisfied: pyopencl in /usr/lib/python3/dist-packages (2015.1)

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
154 Views
Registered: ‎10-19-2015

Re: Alveo card status LEDs and tempreture

Hi @nzh 

You have two things going on here that stand out to me. First, the card is not plugged into a Gen3x16 PCIe slot. This will be a bandwidth issue later. 

Second, the python problem. If you are using ubuntu, using apt to install pyopencl doesn't seem to work right. This answer record is how we recommend fixing the problem. https://www.xilinx.com/support/answers/73055.html 

Unfortunately, your problem has a slightly different error signature than what is in the answer record. You should still try it, but it might not work. 

There are some other forum posts about python, https://forums.xilinx.com/t5/Alveo-Accelerator-Cards/failed-to-validate-U50DD/td-p/1067758

What Linux kernel are you using? 

Regards,

M

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Tags (1)
0 Kudos