**UPGRADE YOUR BROWSER**

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community Forums
- :
- Forums
- :
- Software Development and Acceleration
- :
- HLS
- :
- Why HLS is not implementing a pure-combinatorial f...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

cerilet

Explorer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2016 06:36 AM

4,982 Views

Registered:
08-26-2014

Hello,

I am making a comparison of a simple multiplication using floating-point and fixed-point formats. The code is the next, where I only change the *in_t* and *out_t* types to either fixed or floating-point formats:

void simple_operations(const in_t mat[2], out_t invOut[2]) { invOut[0] = mat[0] * mat[1]; }

I am a bit confused because when using the same amount of bytes in both formats (32- and 64-bits), both fixed-point versions have more latency than their floating-point counterparts. See next picture:

Does anyone know why is this happening? Because using a pure-combinatorial approach it should output the result in zero or one clock cycles. But apparently, this is not the case and I don't know why.

Many thanks,

Cerilet

1 Solution

Accepted Solutions

austin

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2016 08:00 AM

9,273 Views

Registered:
02-27-2008

c,

One stage of logic, with a register, may operate in some families in as little as 1.5 nanosecond. But the required number of stages for a full multiply is not going to be one stage (not unless you have a 128 bit look up table 64 bits wide).

The tools try their best to use the resources effectively. You can ask for less delay, and see what happens.

Also a 64 bit floating point multiply is in no way equivalent to 64 bit fixed point. Different animals.

Latency increases as pipeline registers are required to meet timing.

Austin Lesea

Principal Engineer

Xilinx San Jose

Principal Engineer

Xilinx San Jose

2 Replies

austin

Scholar

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-13-2016 08:00 AM

9,274 Views

Registered:
02-27-2008

c,

One stage of logic, with a register, may operate in some families in as little as 1.5 nanosecond. But the required number of stages for a full multiply is not going to be one stage (not unless you have a 128 bit look up table 64 bits wide).

The tools try their best to use the resources effectively. You can ask for less delay, and see what happens.

Also a 64 bit floating point multiply is in no way equivalent to 64 bit fixed point. Different animals.

Latency increases as pipeline registers are required to meet timing.

Austin Lesea

Principal Engineer

Xilinx San Jose

Principal Engineer

Xilinx San Jose

cerilet

Explorer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-14-2016 03:34 AM

4,920 Views

Registered:
08-26-2014

Re: Why HLS is not implementing a pure-combinatorial fixed-point multiplication?

Thanks for your answer, Austin.

I actually thought about that and by increasing the clock period, the compiler managed to execute the multiplication with 0 latency. Here the results:

I know the difference between fixed- and floating-point variables, but in my study in which a matrix multiplication is a small part of it, I want to point out the main strengths and weaknesess of both implementations. But actually, I was expecting faster execution times using fixed-point variables. Big surprise!

Thanks again,

Cerilet