UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Observer xela_lin
Observer
1,033 Views
Registered: ‎02-10-2009

Questions about DNNDK installation

Jump to solution
Hi, I follow "deephi_dnndk_v2.06_beta" to install DNNDK on my server, but I face a problem on NCCL installation as below Does anyone have idea to fix the issue? Thanks 1. Error at NCCL version:2.3.5-5 --------------------------------------------------------------------------------------------------------------------- alex@instance-1:~/nccl$ sudo make CUDA_HOME=/usr/local/cuda-9.1 install make: *** No rule to make target 'install'. Stop. alex@instance-1:~/nccl$ ---------------------------------------------------------------------------------------------------------------------- 2. Error at NCCL version:1.3.4-1 ---------------------------------------------------------------------------------------------------------------------- alex@instance-1:~/nccl$ sudo make CUDA_HOME=/usr/local/cuda-9.1 install Grabbing src/nccl.h > /home/alex/nccl/build/include/nccl.h Compiling src/libwrap.cu > /home/alex/nccl/build/obj/libwrap.o Compiling src/core.cu > /home/alex/nccl/build/obj/core.o Compiling src/all_gather.cu > /home/alex/nccl/build/obj/all_gather.o src/common_kernel.h(237): error: class "__half" has no member "x" src/common_kernel.h(237): error: class "__half" has no member "x" src/common_kernel.h(250): error: class "__half" has no member "x" src/common_kernel.h(250): error: class "__half" has no member "x" src/copy_kernel.h(28): error: class "__half" has no member "x" src/copy_kernel.h(28): error: class "__half" has no member "x" 6 errors detected in the compilation of "/tmp/tmpxft_00000b0e_00000000-10_all_gather.compute_61.cpp1.ii". Makefile:111: recipe for target '/home/alex/nccl/build/obj/all_gather.o' failed make: *** [/home/alex/nccl/build/obj/all_gather.o] Error 1 alex@instance-1:~/nccl$ -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Adventurer
Adventurer
954 Views
Registered: ‎10-24-2008

Re: Questions about DNNDK installation

Jump to solution

@xela_lin@aluoTo all other users who may encounter this, DNNDK as of v2.0.7 leverages NCCL v1.3.5.  This version of NCCL is not available on the Nvidia website today.  I have attached the NCCL source files to this post, for redistribution under the terms of the Nvidia license agreement (https://github.com/NVIDIA/nccl/blob/master/LICENSE.txt).

 

NVIDIA CUDA Toolkit 8.0 is required.

 

--Quenton

0 Kudos
10 Replies
Observer xela_lin
Observer
1,032 Views
Registered: ‎02-10-2009

Re: Questions about DNNDK installation

Jump to solution

Hi,

 

   I follow "deephi_dnndk_v2.06_beta" user guide to install DNNDK on my server, but I face a problem on NCCL installation as below

Does anyone have any idea to fix the issue? Thanks

 

1. Error at NCCL version:2.3.5-5

------------------------------------------------------------------------------------------------------------------------------------------------

alex@instance-1:~/nccl$ sudo make CUDA_HOME=/usr/local/cuda-9.1 install
make: *** No rule to make target 'install'. Stop.
alex@instance-1:~/nccl$ 

-----------------------------------------------------------------------------------------------------------------------------------------------

 

2. Error at NCCL version:1.3.4-1

-------------------------------------------------------------------------------------------------------------------------

alex@instance-1:~/nccl$ sudo make CUDA_HOME=/usr/local/cuda-9.1 install
Grabbing src/nccl.h > /home/alex/nccl/build/include/nccl.h
Compiling src/libwrap.cu > /home/alex/nccl/build/obj/libwrap.o
Compiling src/core.cu > /home/alex/nccl/build/obj/core.o
Compiling src/all_gather.cu > /home/alex/nccl/build/obj/all_gather.o
src/common_kernel.h(237): error: class "__half" has no member "x"

src/common_kernel.h(237): error: class "__half" has no member "x"

src/common_kernel.h(250): error: class "__half" has no member "x"

src/common_kernel.h(250): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

6 errors detected in the compilation of "/tmp/tmpxft_00000b0e_00000000-10_all_gather.compute_61.cpp1.ii".
Makefile:111: recipe for target '/home/alex/nccl/build/obj/all_gather.o' failed
make: *** [/home/alex/nccl/build/obj/all_gather.o] Error 1
alex@instance-1:~/nccl$

--------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------------------

 

 

0 Kudos
Adventurer
Adventurer
998 Views
Registered: ‎10-24-2008

Re: Questions about DNNDK installation

Jump to solution

@xela_linIs your target Ubuntu 14.04 or 16.04 or a different distribution?  In the case of the make install error, is there an "install:" rule in the makefile, and are you in the correct directory when executing make install?  Here is a potentially interesting discussion related to the additional error that you see:  https://github.com/NVIDIA/nccl/pull/62

 

--Quenton

0 Kudos
Observer xela_lin
Observer
991 Views
Registered: ‎02-10-2009

Re: Questions about DNNDK installation

Jump to solution

  I adopt Ubuntu 16.04 and CUDA 9.1 for this installation. When I apply "git rebase master" to back NCCL v2.3.5-5, there is no an "install:" rule in the Makefile. so I following "README.md" to install NCCL that can success. but I face a error about "libnccl.so.1"  not found issue when I execute "dcent" after DNNDK installation. I guess that DNNDK need a older NCCL version.

 

 

0 Kudos
Adventurer
Adventurer
982 Views
Registered: ‎10-24-2008

Re: Questions about DNNDK installation

Jump to solution

@xela_linYes, I believe that you are correct.  I see indications from one of our experts indicating that they have also encountered a case where the required dependencies are met by DECENT only when libnccl.so.1 is available.   From this, I would conclude that you would want to install from the v1.3.4 tag.

 

Reviewing the MakeFile, it appears that INSTALL is included. 

 

https://github.com/NVIDIA/nccl/blob/v1.3.4-1/Makefile

 

Does combining the master rebase with the v1.3.4-1 tag resolve the problem?

 

--Quenton

 

 

 

0 Kudos
Observer xela_lin
Observer
976 Views
Registered: ‎02-10-2009

Re: Questions about DNNDK installation

Jump to solution

@qhallThanks for your response, sorry I don't familiar with git and Makefile. would you please give me a example about how to combining the master rebase with the v1.3.4-1 tag. Thanks

0 Kudos
Adventurer
Adventurer
967 Views
Registered: ‎10-24-2008

Re: Questions about DNNDK installation

Jump to solution

@xela_linIt is perhaps I who should apologize as my post may be generating more confusion.

 

What I am suggesting is that I agree based on the information that I have that we need to focus on trying to install v1.3.4.  However, it appears that you tested this previously and enountered error messages, including:

 

src/common_kernel.h(237): error: class "__half" has no member "x"

 

It appears to me that the others who encountered this same error message encountered it because they had migrated to the newer kernel and newer version of CUDA, which is apparently why they were successful in using "git master rebase" to rebase NCCL.  That was this post:  https://stackoverflow.com/questions/12469855/git-rebasing-to-a-particular-tag

 

I am afraid that for the moment, this is all the information that I can offer.  However, I will escalate this thread internally to the compiler team.

 

--Quenton

0 Kudos
Xilinx Employee
Xilinx Employee
931 Views
Registered: ‎02-18-2013

Re: Questions about DNNDK installation

Jump to solution

@xela_lin, please get the source code of NCCL 1.3.4 from https://github.com/NVIDIA/nccl/releases and install it in your host PC.

0 Kudos
Observer xela_lin
Observer
922 Views
Registered: ‎02-10-2009

Re: Questions about DNNDK installation

Jump to solution

@aluo I have downloaded NCCL v1.3.4 from NVIDIA github and re-make it, but I get some errors about "class "__half" has no member "x" as my mentioned before post. now, I have fixed it based on @qhall provided NCCL which can work. when DNNDK is ready, I execute "decent" and get below error. I think that CUDA 8.0 is needed. so I will re-install it with CUDA 8.0

 

decent: error while loading shared libraries: libcudart.so.8.0: cannot open shared object file: No such file or director

0 Kudos
Xilinx Employee
Xilinx Employee
912 Views
Registered: ‎02-18-2013

Re: Questions about DNNDK installation

Jump to solution

@xela_lin Good to know it. Please try CUDA 8.0 to see if it works.

0 Kudos
Highlighted
Adventurer
Adventurer
955 Views
Registered: ‎10-24-2008

Re: Questions about DNNDK installation

Jump to solution

@xela_lin@aluoTo all other users who may encounter this, DNNDK as of v2.0.7 leverages NCCL v1.3.5.  This version of NCCL is not available on the Nvidia website today.  I have attached the NCCL source files to this post, for redistribution under the terms of the Nvidia license agreement (https://github.com/NVIDIA/nccl/blob/master/LICENSE.txt).

 

NVIDIA CUDA Toolkit 8.0 is required.

 

--Quenton

0 Kudos