cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
XilinxFan123
Visitor
Visitor
893 Views
Registered: ‎05-17-2021

Usage of Vitis Accelerated Libraries like DSP libraries or BLAS libraries

Jump to solution

Hi Xilinx Support, 

I have installed Vitis 2020.2 on my CentOS version 7.9 machine. I have not connected any hardware and I am using emulator for Debug purposes.

I am using the Vitis IDE. I have followed the steps for installation of XRT as per below link. But when I step through the fft_top function the code into exit code into infinite loop. Attached herewith is the project debug image before it enters the fft_top function.

https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/acceleration_installation.html#vhc1571429852245

Thus please advice regarding how to go about to be able to use DSP libraries and BLAS libraries ? 

Other information is below" 

I have taken the code from the GitHub location : 

https://github.com/Xilinx/Vitis_Libraries.git

Under "Setting up the Environment to Run the Vitis Software Platform" I am unable to see the setup.sh file hence cannot do below step: 

source /opt/xilinx/xrt/setup.sh

Thanks.

 

 

 

fft_top call goes into exit codefft_top call goes into exit code

0 Kudos
1 Solution

Accepted Solutions
kmorris
Xilinx Employee
Xilinx Employee
617 Views
Registered: ‎01-11-2011

The input array is 2 dimensional to allow for the FPGA to process more inputs in a single cycle, which is what the SSR variable is for. In the example, it is set to 4 so it processes 4 inputs at a time, however the order should be as follows:

Input Vector:

ABCDEFGHIJKLMNOP

 

Input Array - inData[SSR][FFT_LEN / SSR] re-represented as inData[row][col]

 col 0col 1col 2col 3
row 0AEIM
row 1BFJN
row 2CGKO
row 3DHLP

 

The output should follow the same order. Based on your adjusted input I can see the same behavior where you are skipping every 4th input to align the array. The options you are changing are described in more detail in the documentation here. Hopefully this helps explain the way the data should be entered.

-------------------------------------------------------------------------
Please don’t forget to reply, kudo, and accept as solution!
-------------------------------------------------------------------------

View solution in original post

11 Replies
kmorris
Xilinx Employee
Xilinx Employee
795 Views
Registered: ‎01-11-2011

CentOS 7.9 hasn't been tested/supported with the 2020.2 tools so the behavior could be due to the incompatibility.

If you are unable to see the /opt/xilinx/xrt/setup.sh location, it is an indication that XRT hasn't been installed properly. I would suggest trying to re-install XRT per the instructions, however it could also be due to the incompatible OS. The log files may indicate an error during installation.

As far as the DSP and BLAS libraries, you can look at the example designs to see how they are implemented and used in the IDE via the templates. When you create a new project, one of the final pages is the template page. On the bottom there is a button for "Vitis IDE Libraries". Clicking this button will allow you to download the examples for the libraries and then choose one of them to use.

-------------------------------------------------------------------------
Please don’t forget to reply, kudo, and accept as solution!
-------------------------------------------------------------------------
0 Kudos
XilinxFan123
Visitor
Visitor
747 Views
Registered: ‎05-17-2021

Hi KMorris/ Xilinx support,

Thanks for your reply.

I have updated my OS to Linux Ubuntu 18.04.5 LTS.

To test the DSP libraries, I am referring to https://xilinx.github.io/Vitis_Libraries/dsp/2020.2/user_guide/L1.html

I have cloned the Vitis Libraries as mentioned earlier. When I execute the following command the output is fine.

$> make run XPART='xczu9eg-ffvb1156-2-e' CSIM=1 CSYNTH=0 COSIM=0

(xczu9eg-ffvb1156-2-e is chosen for zcu102 FPGA number)

However,there are two issues I am facing here.

1. If I change the value of the input data inData[][] to any value greater than 1 and print the same value it prints the negative of that value. Hence the input to fft_top is also negative of what was intended. (Note, a negative number remains negative.)

2. Also, if the input data arrays is modified to take in different values like inData[2][0] = 1 and inData[0][0] = 2; with all other input data inData[][] = 0, then the output obtained is as follows which is incorrect:

===============================================================
--Output Step fuction:
(-1,0)
(-3,0)
(-1,0)
(-3,0)
(-1.29291,-0.707092)
(-2.70709,0.707092)
(-1.29291,-0.707092)
(-2.70709,0.707092)
(-2,-1)
(-2,1)
(-2,-1)
(-2,1)
(-2.70709,-0.707092)
(-1.29291,0.707092)
(-2.70709,-0.707092)
(-1.29291,0.707092)
===============================================================

 

Thus please advice on both the above points for me to be able to use the DSP library for an application.

Awaiting your reply.

Regards.

 

0 Kudos
kmorris
Xilinx Employee
Xilinx Employee
716 Views
Registered: ‎01-11-2011

The example is structured such that the following is used as the datatype for inData:

typedef std::complex<ap_fixed<IN_WL, IN_IL> > T_in;

This is defined in the data_path.hpp file. IN_WL and IN_IL are defined as 16 and 2 respectively, meaning the fixed point value has a width of 16 and the integer length is 2. Setting a value of 2 would be right at the top of that limit, so increasing this value to a higher value (4 for example) would allow higher numbers to be used. You can find the behavior of fixed point values here.

-------------------------------------------------------------------------
Please don’t forget to reply, kudo, and accept as solution!
-------------------------------------------------------------------------
0 Kudos
XilinxFan123
Visitor
Visitor
686 Views
Registered: ‎05-17-2021

Hi KMorris / Xilinx Support,

Thanks for the reply. It has helped me get a better understanding about how to use the Vitis DSP library.

Regarding the sequence of input data in inData[][] I would like to seek clarification. The way the input data is to be provided is not straightforward (at least not like Matlab).

The input-output data is as follows when using the 1Dfloat code.:

===============================================================
--Input Impulse:
1 - j*-0
2 - j*-0
3 - j*-0
4 - j*-0
5 - j*-0
6 - j*-0
7 - j*-0
8 - j*-0
9 - j*-0
10 - j*-0
11 - j*-0
12 - j*-0
13 - j*-0
14 - j*-0
15 - j*-0
16 - j*-0
===============================================================
===============================================================
--Output Step fuction:
136 - j*-0
-32 + j*32
-32 - j*-0
-32 - j*32
-2 + j*10.0547
-2 + j*1.33636
-2 - j*0.397825
-2 - j*2.99321
-2 + j*4.82843
-2 + j*0.828427
-2 - j*0.828427
-2 - j*4.82843
-2 + j*2.99321
-2 + j*0.397825
-2 - j*1.33636
-2 - j*10.0547
===============================================================

But the same input values when given to Matlab fft function like:

fft([1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]).'

give the following results:

ans =

1.0e+02 *

1.3600
-0.0800 + 0.4022i
-0.0800 + 0.1931i
-0.0800 + 0.1197i
-0.0800 + 0.0800i
-0.0800 + 0.0535i
-0.0800 + 0.0331i
-0.0800 + 0.0159i
-0.0800
-0.0800 - 0.0159i
-0.0800 - 0.0331i
-0.0800 - 0.0535i
-0.0800 - 0.0800i
-0.0800 - 0.1197i
-0.0800 - 0.1931i
-0.0800 - 0.4022i

>>

 

This is means the ordering of input data in the Xilinx is not as it is in Matlab. Also, the input arrays is 2-dimentional in Xilinx which seems weird since this is supposed to be a 1-dimentional example.

Please clarify regarding these points.

0 Kudos
XilinxFan123
Visitor
Visitor
656 Views
Registered: ‎05-17-2021

I see there is a parameter fft_output_order_enum in the structure ssr_fft_default_params in data_path.hpp. This needs to be set to SSR_FFT_DIGIT_REVERSED_TRANSPOSED.

However still to match values with Matlab, I have to transpose the input data. So in Matlab if I give values

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16].

For me to be the same output using Xilinx fft_top function I need to feed the data as

===============================================================
--Input Impulse:
1 - j*-0
5 - j*-0
9 - j*-0
13 - j*-0
2 - j*-0
6 - j*-0
10 - j*-0
14 - j*-0
3 - j*-0
7 - j*-0
11 - j*-0
15 - j*-0
4 - j*-0
8 - j*-0
12 - j*-0
16 - j*-0
===============================================================

Please let me know if I have missed something here. Are you seeing the same thing as well?

0 Kudos
kmorris
Xilinx Employee
Xilinx Employee
618 Views
Registered: ‎01-11-2011

The input array is 2 dimensional to allow for the FPGA to process more inputs in a single cycle, which is what the SSR variable is for. In the example, it is set to 4 so it processes 4 inputs at a time, however the order should be as follows:

Input Vector:

ABCDEFGHIJKLMNOP

 

Input Array - inData[SSR][FFT_LEN / SSR] re-represented as inData[row][col]

 col 0col 1col 2col 3
row 0AEIM
row 1BFJN
row 2CGKO
row 3DHLP

 

The output should follow the same order. Based on your adjusted input I can see the same behavior where you are skipping every 4th input to align the array. The options you are changing are described in more detail in the documentation here. Hopefully this helps explain the way the data should be entered.

-------------------------------------------------------------------------
Please don’t forget to reply, kudo, and accept as solution!
-------------------------------------------------------------------------

View solution in original post

mwickert
Observer
Observer
501 Views
Registered: ‎03-10-2021

I followed the original post of @XilinxFan123 in early June, but only recently started doing tests with the SSR FFT on an ultra scale zcu106 board. My experiments are with FFT_LEN = 16 and SSR = 2. I start with the simple "input impulse" test described in the documentation examples. I am using float to avoid scaling issues for now.

The parameter overrides I load are:
struct fftParams : ssr_fft_default_params {
static const int N = FFT_LEN;
static const int R = SSR;
static const fft_output_order_enum output_data_order = SSR_FFT_NATURAL;
static const transform_direction_enum transform_direction = FORWARD_TRANSFORM;
};

I am using a 1D array for input and output via OpenCL, but understand the values should be multiplexed and de-multiplexed to represent 2 input/output. To be clear with FFT_LEN = 16 and SSR = 2 it seems like I am actually doing 2 8-points FFTs? Following the examples this is the way it seems to be based on the number of elements in the 2D arrays.

Test 1: Moving forward then I set x[0] = {1,0} (imaginary part 0) in my 16 element complex array. I get for 16 element complex array output:
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
dtype=float32) # Note this is the output from a file read by Python into a numpy array with the imaginary part set to zero. THese results I believe agree with the example provided.

Test 2: Next I set x[0] = {0,1} (imaginary part 0) in my 16 element complex array. I get for 16 element complex array output:
array([0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j,
0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j, 0.+2.j],
dtype=complex64)

The Test 1 and Test 2 results are consistent, but do not make any sense if this is really a two channel FFT. With only a single nonzero input sample, only the output of one channel should be activated! If say Test 1 was true to my expectations I should be seeing:
array([1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.] as I have not given an input to channel 2. What is going on?

I have tried other simple experiments trying to figure out what is really happening, but have not established a clear pattern. In a sense I have a "black box" that takes inputs and provides outputs, but what is it doing on the inside? The behavior is FFT-like.

Thanks for any help you can provide.

0 Kudos
p27803
Contributor
Contributor
482 Views
Registered: ‎03-31-2017

SSR does not mean it is a two channel FFT.  It means that it performs a single (one channel) FFT, but takes in (and returns) two samples at a time.

0 Kudos
mwickert
Observer
Observer
470 Views
Registered: ‎03-10-2021

@p27803this is helpful information. The FFT is a frame-based operator so to complete the computation of an N-FFT requires the use of all N inputs before an output can be provided. When using an HLS dataflow I am assuming this helps reduce latency? Can you explain how this plays out in practice?

I will have to re-run some of my earlier experiments and see that proper mux/de-mux of the in/out will indeed produce the expected result.

Thanks

0 Kudos
p27803
Contributor
Contributor
463 Views
Registered: ‎03-31-2017

Either reduced latency or processing data with a higher sample rate than the processing clock frequency (hence Super Sample data Rate - SSR)

mwickert
Observer
Observer
301 Views
Registered: ‎03-10-2021

@p27803I just thank you for setting me straight. I can now pass a time domain frame of complex data from the PS to the PL, FFT in the PL, and return it the PS, and make sense of it. I now have dataflow issues to work through. In particular and asymmetrical flow of one N-point frame in and M N-point frames returned. I guess I need to start another thread.

0 Kudos