UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
231 Views
Registered: ‎02-08-2018

Switch from floating point to fixed point increases latency and resource utilization

Jump to solution

I have heard that fixed point arithmetic requires fewer clock cycles than floating point arithmetic, so I decided to change a part of my program from floating point to 32-bit fixed point arithmetic.  However, the results are the opposite of what I would expect.  Resource utilization increases substantially, and latency also increases.  I isolated a part of the code, which performs an arctangent estimation, shown below. 

Here are the results when I am using floating point arithmetic:

latency = 29 --> 16534 max       BRAM = 2         DSP = 8            FF = 1745          LUT = 2687

Here are the results when I switch from float to ap_fixed<32,10>

latency = 150 --> 74150 max    BRAM = 2          DSP = 68         FF = 8399          LUT = 8473

 

Here is the code

void arcTangent(AXI_STREAM32 &input1, AXI_STREAM32 &input2, AXI_STREAM32 &output)
{
     int32SdCh input_datum;
     ap_fixed<32,10> x[MAX_NUM_VALUES], y[MAX_NUM_VALUES];
     int i, j;
     bool end_loop = false;
     for (i = 0; i < MAX_NUM_VALUES; ++i)        // input data from AXI stream
    {
         input_datum = input1.read();
         union {float a; uint32_t b;} input_value;
         input_value.b = input_datum.data;
         x[i] = ap_fixed<32,10>(input_value.a);
         end_loop = input_datum.last == 1;

         input_datum = input2.read();
         input_value.b = input_datum.data;
         y[i] = ap_fixed<32,10>(input_value.a);
         end_loop = end_loop || (input_datum.last == 1);
         if (end_loop)
             break;
     }

     int num_values = i + 1;

     int32SdCh output_datum;
     output_datum.last = 0;
     for (i = 0; i < num_values; ++i)
    {
         union{float a; uint32_t b;} output_value;
         output_value.a = atan_function(y[i], x[i]);
         output_datum.data = output_value.b;
         output_datum.last = (i+1) >= num_values;
         output.write(output_datum);
     }
}

ap_fixed<32,10> fixed_abs(ap_fixed<32,10> num)                      // fixed point absolute value
{
     if (num >= 0)
         return num;
     else
         return num * -1;
}

float atan_function(ap_fixed<32,10> y, ap_fixed<32,10> x)         // arc tangent function definition
{
     ap_fixed<32,10> conv = 0.28125;
     if (fixed_abs(x) > fixed_abs(y))
    {
         if (x > 0) // 1st and 8th octant
        {
             conv = x*y/(x*x+conv*y*y);
        }
        else // 4th and 5th octant
       {
            ap_fixed<32,10> estimate = PI + x*y/(x*x+conv*y*y);
            if (estimate > PI)
                conv = estimate - 2*PI;
            else
                conv = estimate;
       }
    }
    else if (y > 0) // 2nd and 3rd octant
    {
         conv = PI/2 - x*y/(y*y+conv*x*x);
     }
     else // 6th and 7th octant
    {
        conv = -PI/2 - x*y/(y*y+conv*x*x);
    }
    return conv.to_float();
}

0 Kudos
1 Solution

Accepted Solutions
Highlighted
Xilinx Employee
Xilinx Employee
211 Views
Registered: ‎09-05-2018

Re: Switch from floating point to fixed point increases latency and resource utilization

Jump to solution

@agailey,

Fixed point is a tool to reduce resource utilization, not a guarantee. 32 bit floating point multiplication is actually an addition of the exponent field, and only the 23 bit significand needs to be multiplied, which is actually computationally simpler than doing a 32 bit fixed point multiplication.

The utility of fixed point is the ability to utilize the exact number of bits you need. The ap_fixed<32,10> can precisely represent numbers from approximately -512 to +512 and to the nearest 2.4*10^-7. This is actually more precision than floating point, which can only represent numbers with 6 or fewer decimal digits without loss of precision, which a fixed point number needs about 20 bits to represent precisely. The valid values for the output of arctangent are between -pi/2 and pi/2, so a 3 bit signed number is sufficient for the portion above the decimal point. So without any loss of precision, you should be able to use ap_fixed<23,3>.

But also, this is dependent on the system's specifications. If the inputs don't actually have 6 decimal points of accuracy, or your output can't does need that much accuracy, you can reduce the number of bits even further.

Other things to consider for optimization: 1) Iterators ( i and j in the example ) should be an ap_uint<> type where the number of bits is just large enough to represent MAX_NUM_VALUES accurately. 2) Two 8 bit numbers technically need 9 bits to hold the result of an addition without loss of accuracy and 16 bits to hold the result of a multiplication without loss of accuracy, unless you know something more about the operands, like if conv = 0.28125. 3) The "hls_math.h" has an abs() function for floating point variables which may be more efficient than conditionally multiplying by -1.

I know you asked a general question, but in this specific case of arctangent, you might also consider checking out the "hls_math.h" library's implementation of atan2() for floating and fixed point numbers. The HLS Math Library is documented on page 239 of UG902.

Okay, I hope that helps and that I'm not being too longwinded.

Nicholas Moellers

Xilinx Worldwide Technical Support
4 Replies
Highlighted
Xilinx Employee
Xilinx Employee
212 Views
Registered: ‎09-05-2018

Re: Switch from floating point to fixed point increases latency and resource utilization

Jump to solution

@agailey,

Fixed point is a tool to reduce resource utilization, not a guarantee. 32 bit floating point multiplication is actually an addition of the exponent field, and only the 23 bit significand needs to be multiplied, which is actually computationally simpler than doing a 32 bit fixed point multiplication.

The utility of fixed point is the ability to utilize the exact number of bits you need. The ap_fixed<32,10> can precisely represent numbers from approximately -512 to +512 and to the nearest 2.4*10^-7. This is actually more precision than floating point, which can only represent numbers with 6 or fewer decimal digits without loss of precision, which a fixed point number needs about 20 bits to represent precisely. The valid values for the output of arctangent are between -pi/2 and pi/2, so a 3 bit signed number is sufficient for the portion above the decimal point. So without any loss of precision, you should be able to use ap_fixed<23,3>.

But also, this is dependent on the system's specifications. If the inputs don't actually have 6 decimal points of accuracy, or your output can't does need that much accuracy, you can reduce the number of bits even further.

Other things to consider for optimization: 1) Iterators ( i and j in the example ) should be an ap_uint<> type where the number of bits is just large enough to represent MAX_NUM_VALUES accurately. 2) Two 8 bit numbers technically need 9 bits to hold the result of an addition without loss of accuracy and 16 bits to hold the result of a multiplication without loss of accuracy, unless you know something more about the operands, like if conv = 0.28125. 3) The "hls_math.h" has an abs() function for floating point variables which may be more efficient than conditionally multiplying by -1.

I know you asked a general question, but in this specific case of arctangent, you might also consider checking out the "hls_math.h" library's implementation of atan2() for floating and fixed point numbers. The HLS Math Library is documented on page 239 of UG902.

Okay, I hope that helps and that I'm not being too longwinded.

Nicholas Moellers

Xilinx Worldwide Technical Support
Adventurer
Adventurer
181 Views
Registered: ‎02-08-2018

Re: Switch from floating point to fixed point increases latency and resource utilization

Jump to solution

@nmoeller  Thank you for the information.  It provides some explanation as to why there would be an increase in resources and latency when doing multiplication with fixed point instead of floating point.

That being said, if floating point multiplication is computationally simpler than fixed point multiplication (and also division presumably), then when is it advantageous to use fixed point?  For addition and subtraction?

According to what I read, floating point is more for specialized computationally intensive algorithms while fixed point is better for large general-purpose applications.  What are your thoughts?

0 Kudos
Xilinx Employee
Xilinx Employee
170 Views
Registered: ‎09-05-2018

Re: Switch from floating point to fixed point increases latency and resource utilization

Jump to solution

@agailey,

I'm glad it was helpful.

All things being equal, fixed point addition and subtraction is simpler. With floating point numbers, the significand must be shifted and the exponent adjusted until the exponents are equal. Then a regular addition is performed.

But I think the main takeaway should be that if you are reducing every bit everywhere you can, the fixed point type will be more efficient. In my post above, I discussed how the fixed point type in your example can be reduced to a 23 bit type without any loss of precision, which should actually provide a more efficient multiplication and division than 32 bit floating point.

I think it's less about which application and more about whether the designer has time to analyze the code and minimize the number of bits in all the fixed point types as much as possible within the design parameters. I think there's also something to be said for - if your specifications require a floating point input and output and your design is small, it may be more effort to convert to fixed and back than is gained from using the smaller types.

Nicholas Moellers

Xilinx Worldwide Technical Support
0 Kudos
Contributor
Contributor
164 Views
Registered: ‎03-13-2017

Re: Switch from floating point to fixed point increases latency and resource utilization

Jump to solution

I fully agree about everything but the following (a detail, minor vs the discussion ;) )

@nmoeller wrote:

...
Other things to consider for optimization: 1) Iterators ( i and j in the example ) should be an ap_uint<> type where the number of bits is just large enough to represent MAX_NUM_VALUES accurately.
... 

I my experience,XILINX HLS is able to choice the right size of the RTL signal for any iterator fully specified at synthesis time, i.e. with 'MAX_NUM_VALUES' defined as a fixed constant. I verified this many times in the past by watching the synthesized RTL and now it is one of my 'gold rules' when working with HLS, never broken so far.
If I'm wrong, please let me know.

 

0 Kudos