UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
5,953 Views
Registered: ‎03-26-2010

FDiv floating point divide produces garbage results in HLS

Jump to solution

Hi all,

 

Ran into an issue with doing single precision floating point divide in HLS 2016.x (tried .2 and .4 both). I used pragmas to make the division be an FDiv core with 1 cycle of latency and 1 cycle of iteration, and compared the synthesized result with that of the normal Vivado floating point IP core set for the same performance.

 

Looking at the Analysis view in HLS I indeed see that the computation in HLS is performed in 1 clock cycle, but the timing summary is an order of magnitude off from the requirement. The synthesized result of the HLS logic shows why - it's a looooong chain of muxes and shifters - 267 levels of logic!!!! The Vivado IP is 3 levels or less... No wonder the HLS result couldn't meet timing.

 

Is there a way to force HLS to use the right IP core to perform the division, or is this really the performance to be expected? An example can be attached, but it's really just basic division.

0 Kudos
1 Solution

Accepted Solutions
Teacher muzaffer
Teacher
10,514 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@dima2882 you are basing your conclusions on inadequately constrained design. Add a set of input & output registers such that there is no IO connected to the FP IP directly and implement your design again.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.

View solution in original post

13 Replies
Teacher muzaffer
Teacher
5,942 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@dima2882

 

>> FDiv core with 1 cycle of latency and 1 cycle of iteration

 

Did you really simulate the Vivado FP IP core and verified its performance for these numbers ? I suspect you did not. What you probably got is an iterative divider which retires 1-2 bits per cycle for some number of cycles. Dividers are complicated and I'd love to see the magical Xilinx IP which can do one single precision floating point divider in one cycle with 3 levels of logic. What you got from HLS is closer to the truth although number of levels seems a little bit excessive.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Explorer
Explorer
5,911 Views
Registered: ‎03-26-2010

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@muzaffer wrote:
Did you really simulate the Vivado FP IP core and verified its performance for these numbers ? I suspect you did not. What you probably got is an iterative divider which retires 1-2 bits per cycle for some number of cycles. Dividers are complicated and I'd love to see the magical Xilinx IP which can do one single precision floating point divider in one cycle with 3 levels of logic. What you got from HLS is closer to the truth although number of levels seems a little bit excessive.

Fair enough - wrote a test bench, did a simulation. Divided 7034.5429 by 589.2358. It took one clock cycle to execute. Happy to provide simulation. I used the excellent float to binary converter to create IEEE754 32-bit test vectors from decimal floats available here: http://www.binaryconvert.com/result_float.html

 

Played with other numbers, also got single cycle execution times.

 

Contention that HLS is producing garbage results still stands.

0 Kudos
Teacher muzaffer
Teacher
5,898 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@dima2882 what about implementation? what do your area & timing numbers look like?

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Explorer
Explorer
5,887 Views
Registered: ‎03-26-2010

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

Just did an implementation. Keep in mind that the FP divider was the only thing in the design and on the chip...

 

The clock was set to be 200MHz, utilization was tiny. 748 LUTs, 69 FFs. Not too much different than the utilization by HLS, but the topology is of course very different.

 

Things seem to have worked quite well with the FP core, not so with the HLS...

0 Kudos
Teacher muzaffer
Teacher
5,865 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@dima2882 can you show a timing report which has the fp_div in it ? I did an implementation with my brand spanking new 2016.4 and my implementation area & timing reports don't show the fp block at all but synthesis timing report fails with -52ns and 218 levels of logic. I am curious if this is specific to 2016.4. I'll try with 2015.4 too.

 

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Explorer
Explorer
5,853 Views
Registered: ‎03-26-2010

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

I'm using 2016.4  as well. The floating point core is definitely in there. Timing report is attached...

0 Kudos
Teacher muzaffer
Teacher
5,826 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@dima2882 then you will have to do a timing back-annotated gate level simulation. I still maintain that it's not possible to have a single cycle fp32 divider at 5ns and 3 levels of logic in an FPGA.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Explorer
Explorer
5,818 Views
Registered: ‎03-26-2010

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

In the timing report, what I'm seeing is that the TDATA ports are properly constrained by the 5ns constraint, but the TVALIDs are not covered by this and are unconstrained... That is where the long carry chains are located.

 

My project is attached - the sim is in there, shows all the single cycle goodness...

0 Kudos
Teacher muzaffer
Teacher
10,515 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@dima2882 you are basing your conclusions on inadequately constrained design. Add a set of input & output registers such that there is no IO connected to the FP IP directly and implement your design again.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.

View solution in original post

Scholar jprice
Scholar
4,430 Views
Registered: ‎01-28-2014

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

@muzaffer is certainly right, there is simply no way you have a floating point core that small with a latency of one cycle at 200 MHz. Look at the datasheet for the core generator to get a feel for the various trade offs you can make. 

0 Kudos
Teacher muzaffer
Teacher
4,423 Views
Registered: ‎03-31-2012

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution
@jprice actually around 800 luts is the proper size for a 32 bit fdiv. The problem is timing. With proper constraints a single cycle divider should be around 50 ns in most recent xilinx chips.
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Explorer
Explorer
4,417 Views
Registered: ‎03-26-2010

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

Finally we're getting somewhere...

 

Putting FFs around the Vivado IP core I/O does indeed make it fail timing as @muzaffer predicted. Brilliant how the core will let you build a physically un-realizable system, although I suppose that's my fault for specifying something it couldn't do... I changed it to 20 cycles of latency and 1 iteration cycle, that's what made it pass timing. Going to try the same thing with the HLS variant, will see what happens...

0 Kudos
Scholar jprice
Scholar
4,394 Views
Registered: ‎01-28-2014

Re: FDiv floating point divide produces garbage results in HLS

Jump to solution

I objected more to the complete lack of flip flops and 1 cycle of latency :). 800 LUTs seems reasonable depending on how they're configured.

0 Kudos