We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for
Did you mean:
Highlighted
Visitor
497 Views
Registered: ‎06-13-2017

## Reciprocal Square Root, HLS cannot infer when to use it.

Hi all,

I try to implement double-precission reciprocal of the square root directly as:

#pragma HLS RESOURCE variable=result core=DRSqrt

result = (1.0/sqrt( X ));

I would expect HLS to instantiate one of the many efficient algorithms that compute that function directly (usually faster and more efficiently than square root itself)

On the contrary, HLS instantiates a normal SQRT module plus a divider (not even reciprocal)

Clearly, this implementation is suboptimal and I don't know if:

- This is the only way HLS can deal with reciprocal square root; or

- I am not coding the operation in the right way

Can any one shed some light on this issue?

Roberto

1 Solution

Accepted Solutions
Scholar
494 Views
Registered: ‎04-26-2015

## Re: Reciprocal Square Root, HLS cannot infer when to use it.

Try using the rsqrt function in the HLS Math library. If you tell HLS to take the square root and then take the reciprocal (as you have done), it's going to do exactly that - it won't try to optimize that process.

2 Replies
Scholar
495 Views
Registered: ‎04-26-2015

## Re: Reciprocal Square Root, HLS cannot infer when to use it.

Try using the rsqrt function in the HLS Math library. If you tell HLS to take the square root and then take the reciprocal (as you have done), it's going to do exactly that - it won't try to optimize that process.

Visitor
466 Views
Registered: ‎06-13-2017

## Re: Reciprocal Square Root, HLS cannot infer when to use it.

Thanks for your reply. I am now able to use double rsqrt(double):

#include"hls_math.h"

#pragma HLS RESOURCE variable=result core=DRSqrt

result = hls::rsqrt( X );

But HW synthesis is still based on inverting the square root.

Targeting 100 MHz, rsqrt takes 2628 FF plus 5709 LUT, compared to 920 FF plus 2094 LUT required by plain sqrt.

The extra cost is due to implementing a DP division.

I've tried several times and I cannot find a reason for this behavior.