cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Visitor
Visitor
235 Views
Registered: ‎08-15-2019

timing closure problem

I'm working a vivado project and trying to make time met. But I've read ug949 and ug 906 but i didn't find the anwser. Can you help me with this? the timing report is like this:

2019-08-16_204712.png

I double clicked the path 1 and it gave me the detail message about that path. The message is:

2019-08-16_112348.png

i think the data path is too long, so i add a register in the path. But it dosen't improve the timing at all.

The relative files are upload below. 

Please tell me what's the root cause that making timing failure and how to resolved.

 

 

0 Kudos
7 Replies
Highlighted
Visitor
Visitor
227 Views
Registered: ‎04-02-2019

What's the RTL look like? This is a lot of LUTs to go through in the path, looking like you might need a few registers in there to meet timing.
0 Kudos
Highlighted
Visitor
Visitor
222 Views
Registered: ‎08-15-2019

the relative part is this:

//partial_product1(initdata, init, clock, load, dataout);
partial_product1 cs1(lambda1, init, clock, load, cs1_out);
partial_product2 cs2(lambda2, init, clock, load, cs2_out);
partial_product3 cs3(lambda3, init, clock, load, cs3_out);
partial_product4 cs4(lambda4, init, clock, load, cs4_out);
partial_product5 cs5(lambda5, init, clock, load, cs5_out);
partial_product6 cs6(lambda6, init, clock, load, cs6_out);
partial_product7 cs7(lambda7, init, clock, load, cs7_out);
partial_product8 cs8(lambda8, init, clock, load, cs8_out);
assign odd_sum = cs1_out^cs3_out^cs5_out^cs7_out;
assign even_sum = lambda0^cs2_out^cs4_out^cs6_out^cs8_out;

partial_product1 tfn1(b1, init, clock, load, tfn1_out);
partial_product2 tfn2(b2, init, clock, load, tfn2_out);
partial_product3 tfn3(b3, init, clock, load, tfn3_out);
partial_product4 tfn4(b4, init, clock, load, tfn4_out);
partial_product5 tfn5(b5, init, clock, load, tfn5_out);
partial_product6 tfn6(b6, init, clock, load, tfn6_out);
partial_product7 tfn7(b7, init, clock, load, tfn7_out);
assign b_sum = b0^tfn1_out^tfn2_out^tfn3_out^tfn4_out^tfn5_out^tfn6_out^tfn7_out;

assign lambdaval = odd_sum ^ even_sum;
assign zerodetect = ~((lambdaval[0]|lambdaval[1]) | (lambdaval[2]|lambdaval[3])|
(lambdaval[4]|lambdaval[5])|(lambdaval[6]|lambdaval[7]));

gfmul8 mult1(.in1(b_sum), .in2(odd_sum), .out(errorvalue_tmp1));
rs_inv invers(errorvalue_tmp1, errorvalue_tmp2);
gfmul8 mult2(.in1(errorvalue_tmp2), .in2(Z), .out(errorvalue_tmp3));

always@(errorvalue_tmp3 or zerodetect)
begin
errorvalue_tmp4[0] <= errorvalue_tmp3[0] & zerodetect;
errorvalue_tmp4[1] <= errorvalue_tmp3[1] & zerodetect;
errorvalue_tmp4[2] <= errorvalue_tmp3[2] & zerodetect;
errorvalue_tmp4[3] <= errorvalue_tmp3[3] & zerodetect;
errorvalue_tmp4[4] <= errorvalue_tmp3[4] & zerodetect;
errorvalue_tmp4[5] <= errorvalue_tmp3[5] & zerodetect;
errorvalue_tmp4[6] <= errorvalue_tmp3[6] & zerodetect;
errorvalue_tmp4[7] <= errorvalue_tmp3[7] & zerodetect;

end


register8_wl errorreg(errorvalue_tmp4, errorvalue, clock, en_outfifo);

 

the more code, you can see the files i uploaded

0 Kudos
Highlighted
Visitor
Visitor
205 Views
Registered: ‎04-02-2019

It looks like you have quite a bit of logic before errorvalue_tmp-- it's a composition of lambdaval (odd_sum + even_sum) and zerodetect, which brings together multibit logic from all your cs blocks.
errorvalue_tmp4 is also combinatoric, so there's an additional layer bringing in errorvalue_tmp3. I don't see how to meet timing without registering more of the intermediate signals and restructuring as needed to do that.
Highlighted
Teacher
Teacher
203 Views
Registered: ‎07-09-2009

look at the equation that makes odd_sum and even sum .

these are huge, you then ^ them together, which makes  it  even bigger.

 

Register odd_sum and even _sum, and you will rediuce terms a lot,

    register all partial_product also , gives even fewer terms

 

Rember the inherant single gate width in FPGA's is 4 or 6 terms,

    after that the gate has to be made of cascaded gates ( called LUTS ) ,

         this cascading of LUTS leads to delay in each of the multiple LUT's, and the interconects between them.

 

what this does is trade of the gate width for time,

     whch is a standard logic requirment  of design.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
Highlighted
Visitor
Visitor
192 Views
Registered: ‎08-15-2019

i changed the code like this:

2019-08-16_215519.png

but it dosen't improve the timing at all !

Shouldn't i register it this ways?

0 Kudos
Highlighted
Visitor
Visitor
182 Views
Registered: ‎04-02-2019

Registering signals means putting them in to flipflops-- this means a structure like this:

 

always@(posedge clk) begin

  odd_sum <= cs1_out.....

end

 

What you did was put your logic in to combinatoric always() blocks. I'd recommend reviewing clocked architectures as written in Verilog (ie https://www.chipverify.com/verilog/verilog-always-block) and maybe the Vivado UG and templates.

Highlighted
Visitor
Visitor
141 Views
Registered: ‎08-15-2019

Thank you very much! It works.