UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Visitor yangchen4623
Visitor
163 Views
Registered: ‎10-08-2019

How to accelerate the floating point implementation for cubic root on VU9P

Jump to solution

Greetings!

I'm trying to implement a cubic root function on floating point numbers, which is not existing in the IP category. I basically followed the Newton-Raphson method as proposed in the post here.

I used the single precision FP IP on VU9P to implement those FP add (14 cc latency), multiplication (9 cc latency) and division (31 cc latency) functions. However, due to the data dependency in there, those functions cannot be executed in parallel. My Verilog code is attached.

Based on my simulation, I found that even with proper table lookup implementation for that intial guess value, it still needs a few (two or more) iterations to get an relatively accurate value, which will take a few hunderds of cycles to finish.

While in comparision, the latency for IPs like FP exponential (32 cc latency), FP log (31 cc), FP square root (31 cc) etc. can return relatively much better accuracy with shorter latency. Can anyone shed some light on how are those IP's calculate those values? What kind of algorithm they use? Is there a better way to accelerate my cubic root implementation?

Thank you!

 

module cbrt_inner_loop_single
#(
    parameter DATA_WIDTH = 32
)
(
    input clk,
    input rstn,
    input [DATA_WIDTH-1:0] in_cbrt_raw,
    input in_cbrt_raw_valid,
    input [DATA_WIDTH-1:0] in_cbrt_guess,
    input in_cbrt_guess_valid,
    output [DATA_WIDTH-1:0] out_cbrt_raw,
    output out_cbrt_raw_valid
);
    // Stage 1: part_1_out = 2/3 * in_cbrt_guess
    // Latency: 9 cycles
    FP_Single_Mul FP_Part_1(
    ....
    );

    // Delay part_1_out from FP_Part_1 by 9+31 = 40 cycles
    // Wait for Part_2_tmp_2 to finish
    delay_register delay_part_1
    ....
    );

    // Stage 1: part_2_tmp_1 = in_cbrt_guess ^ 2
    // Latency: 9 cycles
    FP_Single_Mul FP_Part_2_1(
        ...
    );

    // Stage 2: part_2_tmp_2 = part_2_tmp_1 * 3
    // Latency: 9 cycles
    FP_Single_Mul FP_Part_2_2(
        ...
    );

    // Delay cbrt_raw from input by 9*2 = 18 cycles
    // Wait for Part_2_tmp_1 and Part_2_tmp_2 finish
    delay_register delay_cbrt_raw
    (
        ...
    );

    // Stage 3: part_2_out = in_cbrt_raw / part_2_tmp_2
    // Latency: 31 cycles
    FP_Single_Div FP_Part_2_3(
        ...
    );

    // Stage 4: out_cbrt_out = part_1_out + part_2_out
    // Latency: 14 cycles
    FP_Single_Add FP_Part_3(
       ...
    );

    // Delay register, propogate cbrt raw input to output, as next stage's input
    // Latency: module latency 63 cycles
    delay_register delay_cbrt_raw_passthrough
    (
        ...
    );
    
endmodule 

 

0 Kudos
1 Solution

Accepted Solutions
Scholar drjohnsmith
Scholar
131 Views
Registered: ‎07-09-2009

Re: How to accelerate the floating point implementation for cubic root on VU9P

Jump to solution
is newton an efficient way of doing a cube root ?
"hackers delight" book seems to indicate its not
https://doc.lagout.org/security/Hackers%20Delight.pdf

this looks interesting
https://docplayer.net/25431850-Fpga-implementation-of-a-binary32-floating-point-cube-root.html
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
1 Reply
Scholar drjohnsmith
Scholar
132 Views
Registered: ‎07-09-2009

Re: How to accelerate the floating point implementation for cubic root on VU9P

Jump to solution
is newton an efficient way of doing a cube root ?
"hackers delight" book seems to indicate its not
https://doc.lagout.org/security/Hackers%20Delight.pdf

this looks interesting
https://docplayer.net/25431850-Fpga-implementation-of-a-binary32-floating-point-cube-root.html
<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>