UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer remy561
Observer
923 Views
Registered: ‎06-20-2018

vload vs pointer mapping

Jump to solution

Hi all,

 

I'm creating a kernel where I load four pixels for bilinear interpolation. These pixels are currently not buffered and thus loaded from global memory. For loading the neighbouring pixels I tried loading them into a float2 variable with either vload2 or pointer mapping as follows:

 

 

__attribute((always_inline))
float2 load_pixel (const int col, const int row, constant float* projection)
{
float2 pixel = {0.0f, 0.0f};
pixel = *(constant float2 *) &projection[row * IMG_WIDTH + col]; //pixel = vload2(0, &projection[row * IMG_WIDTH + col]);

return pixel;
}

 

Both variants of pixel loading result in a correct software emulation output. The only difference is that with the pointer mapping the number of carried dependencies and estimated latency is halved when compared to the vload2. So, according to the estimation report, vload2 registers as two gmem reads, while the pointer mapping loads the two floats in a single memory access.

 

Is this the expected behaviour of vload2, or am I doing something wrong here?

0 Kudos
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
754 Views
Registered: ‎03-24-2010

Re: vload vs pointer mapping

Jump to solution

Hello, 

When offset for vload2 is 2-floats aligned, you may try "pixel = vload2((row * IMG_WIDTH + col)/2, projection);".

Otherwise, please use "pixel = *(constant float2 *) &projection[row * IMG_WIDTH + col];".  Seems that this way leading tool to combine 2 floats into 8 bytes wide.

 

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

View solution in original post

4 Replies
Xilinx Employee
Xilinx Employee
858 Views
Registered: ‎03-24-2010

Re: vload vs pointer mapping

Jump to solution

Seems vload2 is doing right. It takes 2 mem reads.

Could you attach whole file to be investigated?

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------
Observer remy561
Observer
845 Views
Registered: ‎06-20-2018

Re: vload vs pointer mapping

Jump to solution

Hey Brucey,

 

As two neighbouring floating points are loaded from the global memory, shouldn't the compiler change this into a burst load of double the size, making it a single access?

 

I attached the work-in-progress kernel for verification.

 

Some background information:
I'm learning SDAccel by porting the RabbitCT algorithm to FPGA. Here, a 3D voxel space is generated by backprojecting this voxel on the input image. The point where this backprojection intersects the input image is bilinearly interpolated. This interpolated value is then normalized and added to the overall voxel output value.

As the bottleneck lies in loading the four input pixels, I'm currently working on pre-loading the input data, which is not present in the given code.

0 Kudos
Xilinx Employee
Xilinx Employee
755 Views
Registered: ‎03-24-2010

Re: vload vs pointer mapping

Jump to solution

Hello, 

When offset for vload2 is 2-floats aligned, you may try "pixel = vload2((row * IMG_WIDTH + col)/2, projection);".

Otherwise, please use "pixel = *(constant float2 *) &projection[row * IMG_WIDTH + col];".  Seems that this way leading tool to combine 2 floats into 8 bytes wide.

 

Regards,
brucey
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

View solution in original post

Observer remy561
Observer
680 Views
Registered: ‎06-20-2018

Re: vload vs pointer mapping

Jump to solution

Hey Brucey,

My input is not 2 floats aligned as I need to load two neighbouring values from the array at random positions.

Additionally, the tool does not combine the two loads if the input is not constant anymore. My current version of the code uses a private preload cache which results in the neighbouring variables being loaded separately when no partitioning is used. However, with cyclic partitioning, the dependencies between the neighbouring pixels are removed :)

Thanks for the help!

Kind regards,
Remy

0 Kudos