cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
natheesan
Participant
Participant
1,033 Views
Registered: ‎07-30-2018

Couldn't import torch

Hi, @jasonwu 

When I try to import torch in the vitis-ai-pytorch environment, I am getting an "Illegal instruction (core dumped)" error message.

torch.png

How to fix this error?

Thank you.

Natheesan

0 Kudos
12 Replies
jasonwu
Moderator
Moderator
980 Views
Registered: ‎03-27-2013

Hi @natheesan ,

 

It seems that "vitis-ai-pytorch" environment can only be used on a GPU docker kernel.

I am afraid that my GPU machine is occupied for long time training I would try to borrow a GPU card and have a try on my side.

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
natheesan
Participant
Participant
944 Views
Registered: ‎07-30-2018

Hi @jasonwu ,

Thanks for the response.
I have used PyTorch (1.4.0) using the conda environment and have not faced any issues like this ("Illegal instruction (core dumped)") on my PC.

I am only facing this issue in your Vitis-AI-PyTorch environment.

Do you know anyone else could respond to this issue quickly as it is currently required?

Thanks
Natheesan

0 Kudos
jasonwu
Moderator
Moderator
930 Views
Registered: ‎03-27-2013

Hi @natheesan ,

 

You may contact Xilinx FAE to check if they have bandwith for quick support.

I am afraid that as I know several of my colleagues are on long vacation that may be the reason that we can't give quick response.

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos
philreiter
Visitor
Visitor
907 Views
Registered: ‎10-03-2019

Thank you for this info, @jasonwu . A separate forum thread is also exploring the "illegal instruction" error arising in the vai-q-pytorch environment running on the provided Docker container: https://forums.xilinx.com/t5/AI-and-Vitis-AI/Pytorch-Illegal-instruction-core-dumped/td-p/1134124

@natheesan, please keep us looped in as your feedback from Xilinx FAE, as an expedient solution would be required on my end, as well. Thanks!

jasonwu
Moderator
Moderator
883 Views
Registered: ‎03-27-2013

Hi @natheesan , @philreiter ,

 

I just finish the test for importing torch in my side, it can work expect installing Nvidia-440 driver destory my desktop display.

Please check the test log below:

wuxian@wuxian-ubuntu1804-sw:/workspace$ conda activate vitis-ai-pytorch
(vitis-ai-pytorch) wuxian@wuxian-ubuntu1804-sw:/workspace$ python3
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit
>>> exit()
(vitis-ai-pytorch) wuxian@wuxian-ubuntu1804-sw:/workspace$ python
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> exit()
(vitis-ai-pytorch) wuxian@wuxian-ubuntu1804-sw:/workspace$ pip3 list
Package       Version
------------- ------------------------
certifi       2020.6.20
cffi          1.14.0
mkl-fft       1.1.0
mkl-random    1.1.1
mkl-service   2.3.0
numpy         1.17.2
olefile       0.46
Pillow        7.2.0
pip           20.1.1
protobuf      3.11.4
pybind11      2.5.0
pycparser     2.20
pytorch-nndct 0.1.0-a5f1f45-torch1.1.0
scipy         1.3.1
setuptools    49.2.0.post20200714
six           1.15.0
torch         1.1.0
torchvision   0.3.0
tqdm          4.47.0
wheel         0.34.2

I am using the VAI 1.2 tag to run the test

wuxian@wuxian-ubuntu1804-sw:~/wu_software/Vitis-AI$ git branch
* (HEAD detached at v1.2)
  master

And the docker image is generated via:

cd ./docker
./docker_build_gpu.sh
Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
philreiter
Visitor
Visitor
866 Views
Registered: ‎10-03-2019

Thank you for testing this on your end, @jasonwu. I was also able to successfully run import torch previously, so potentially the "illegal instruction" issue encountered with resnet18_quant.py quantization is separate.

Regarding running the vai-q-pytorch container with GPU support, ensure that you're using using the vitis-ai-gpu Docker environment (the instruction mention vitis-ai

./docker_run.sh xilinx/vitis-ai-gpu:latest

Also, to indicate how many GPUs are used by Docker, add

--gpus=all \

to the docker_run.sh script (substitute "all" for however many GPUs you want allocated, or leave as-is to dedicate all GPU resources to Docker):

elif [[ $IMAGE_NAME == *"gpu"* ]]; then
docker run \
$docker_devices \
-v /opt/xilinx/dsa:/opt/xilinx/dsa \
-v /opt/xilinx/overlaybins:/opt/xilinx/overlaybins \
-e USER=$user -e UID=$uid -e GID=$gid \
-v $HERE:/workspace \
-v /dev/shm:/dev/shm \
-w /workspace \
-it \
--rm \
--runtime=nvidia \
--network=host \
--gpus=all\
$IMAGE_NAME \
bash

@jasonwu, regarding the the "illegal instruction" matter being faced by @masa and I, are you able to successfully run resnet18_quant.py from /workspace/Vitis-AI-Quantizer/vai_q_pytorch/example?

python resnet18_quant.py --quant_mode 1 --subset_len 200

Your help on this thread would be greatly appreciated: https://forums.xilinx.com/t5/AI-and-Vitis-AI/Pytorch-Illegal-instruction-core-dumped/td-p/1134124

0 Kudos
natheesan
Participant
Participant
824 Views
Registered: ‎07-30-2018

Hi @jasonwu 

Thanks for your help. I did the same way that you did. Still, I am getting that error.

I am not getting that error if I used the source code version of vai_q_pytorch (https://github.com/Xilinx/Vitis-AI/tree/master/Vitis-AI-Quantizer/vai_q_pytorch). Only I am facing this issue if I used the Docker environment.

I am not sure, where is the actual error located.

ter.png

Thanks,
Natheesan

0 Kudos
philreiter
Visitor
Visitor
803 Views
Registered: ‎10-03-2019

Thank you, @natheesan , I also only see the "illegal instruction" error within the docker environment.

I am attempting to build and run vai_q_pytorch outside of docker and have followed the README instructions to recreate the environment. Unfortunately, this results in a pytorch-nndct syntax error when running the "import pytorch_nndct" from the instructions:

File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_nndct/apis/quant_api.py", line 78 

    GLOBAL_MAP.set_map(NNDCT_KEYS.QUANT_MODE, quant_mode) 

             ^ 

SyntaxError: invalid syntax 

The docker environment uses a custom pytorch-nndct version, 0.1.0-a5f1f45-torch1.1.0 .

@jasonwu, do you know how this version of pytorch-nndct can be retrieved/reproduced from outside the Vitis AI docker env? Thanks!

 

0 Kudos
philreiter
Visitor
Visitor
768 Views
Registered: ‎10-03-2019

Update: It was determined that there is indeed a typo released with the pytorch_nndct/apis/quant_api.py file in the latest release (1.2.82):

NndctScreenLogger().info(f('GLOBAL_MAP set_map quant_mode')

With this incorrect '(' removed, it is possible to perform quantization (resnet18_quant.py --quant_mode 1) outside the Docker container. However, Xmodel generation fails (resnet18_quant.py --quant_mode 2)  as the XIR package cannot be found. From this thread, https://forums.xilinx.com/t5/AI-and-Vitis-AI/Where-is-vai-c-xir-command/td-p/1129301, it seems that, as XIR is not publicly released, it can only be accessed through the Docker environment.

So the attempt to build vai_q_pytorch locally has come back to resolving the "illegal instruction" error encountered in the Docker env: https://forums.xilinx.com/t5/AI-and-Vitis-AI/Pytorch-Illegal-instruction-core-dumped/m-p/1134124

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007f8325a31cdb in mkldnn::impl::scales_t::set(int, int, float const*) () from /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.6/site-packages/torch/lib/libcaffe2.so

0 Kudos
jasonwu
Moderator
Moderator
662 Views
Registered: ‎03-27-2013

Hi @natheesan ,

 

As I said before the machine I was using meet some display card driver problem here.

So I try to install a new Ubuntu 18.04 and start with the fresh new system.

I install the GPU associated stuffs from here:

https://www.tensorflow.org/install/gpu

And then install the docker environment(I am using VAI 1.1 guide because it is more detailed):

https://github.com/Xilinx/Vitis-AI/blob/v1.1/doc/install_docker/README.md

And then following the flows on VAI 1.2 tag to download and install the image. I still can't meet your issue.

And @philreiter ,

 

I am afraid that I haven't tried for the resnet example on Pytorch yet it would need image data to do further test.

I would try to do the test on my side if the image data is ready.

Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos
philreiter
Visitor
Visitor
620 Views
Registered: ‎10-03-2019

I understand, @jasonwu. I am also investigating this matter on a Vitis AI GitHub issues thread: https://github.com/Xilinx/Vitis-AI/issues/135

0 Kudos
jasonwu
Moderator
Moderator
608 Views
Registered: ‎03-27-2013

Hi @philreiter ,

 

Thanks to your update. And as an update on my side, I still can't find the proper dataset so that I am just using 100 picture of validation images and can't reproduce the issue:

(vitis-ai-pytorch) wuxian@wuxian-Ubuntu1804:/workspace/Vitis-AI-Quantizer/vai_q_
pytorch/example$ python resnet18_quant.py --quant_mode 1 --subset_len 100

[NNDCT_NOTE]: Loading NNDCT kernels...
-------- Start resnet18 test 

[NNDCT_NOTE]: Quantization calibration process start up...

[NNDCT_NOTE]: =>Parsing ResNet...

[NNDCT_NOTE]: =>Quantizable module is generated.(quantize_result/ResNet.py)
100%|#############################################| 4/4 [00:01<00:00,  2.54it/s]
loss: 0.687199
top-1 / top-5 accuracy: 0 / 0

[NNDCT_NOTE]: =>Exporting quant config.(quantize_result/quant_info.json)
-------- End of resnet18 test 
(vitis-ai-pytorch) wuxian@wuxian-Ubuntu1804:/workspace/Vitis-AI-Quantizer/vai_q_
pytorch/example$ 
Best Regards,
Jason
-----------------------------------------------------------------------------------------------
Please mark the Answer as "Accept as solution" if the information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------
0 Kudos