cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
anton_v
Visitor
Visitor
596 Views
Registered: ‎01-28-2021

Path multiplication in timing report

Jump to solution

Hi,

I want to understand one issue connected with timing analyzing tool in Vivado 2019.2. I constrained SDR interface with 8bit data bus, the connection to one data register seems like that:

anton_v_0-1611852112518.png

It is one physical path and I know that Vivado analyzes all path in Fast and Slow corners with Max and Min sub models in particular it uses Max models in setup slack calculation and Min - in hold slack calculations.

Therefore, why I have four identical path for each corner when I make report_timing command?

anton_v_1-1611852511573.png

anton_v_4-1611852611472.png

Report command which I used look like:

report_timing -from [get_clocks sdr_in_clk] -to [get_clocks sdr_in_clk] -delay_type min_max -max_paths 100 -nworst 100 -sort_by group -input_pins -routable_nets -name timing_1

It is possible that I used report command not accurate or it is Vivado engine feature?

How can I skip these extra paths?

In addition I attach the whole project.

0 Kudos
1 Solution

Accepted Solutions
anton_v
Visitor
Visitor
502 Views
Registered: ‎01-28-2021

Thanks for reply, Avrum!

Yes, I supposed in my first post (but maybe not exactly understandable) that the Vivado timing analysis works the same way as you described.

My XDC is correct, I didn't accidentally doubled the clock or set_input delay command. Its looks like that:

# Create sdr input clock
create_clock -name sdr_in_clk -period 10.000 -waveform {0.000 5.000} [get_ports ck]

# Set variables
set period 10.000
set data_in_dly_max 0.500
set data_in_dly_min 0.250
set clk_in_dly_max 0.400
set clk_in_dly_min 0.200
set tco_out_max 0.300
set tco_out_min -0.300
set input_delay_max [expr $data_in_dly_max - $clk_in_dly_min + $tco_out_max - $period/2]
set input_delay_min [expr $data_in_dly_min - $clk_in_dly_max + $tco_out_min - $period/2]

# Set input delay
set_input_delay -clock sdr_in_clk -max $input_delay_max [get_ports {dq*}]
set_input_delay -clock sdr_in_clk -min $input_delay_min [get_ports {dq*}]

# Set multicycle: setup=1->0, hold=0->-1
set_multicycle_path -end -setup 0 -from [get_ports {dq*}]

So, no mistakes we have here.

And I used report_timing command which used only sdr_in_clk as both the source and destination clock. Thus, It can't be another clock or other stuff.

I also know about rising edge and falling edge data transfers variations. It tells that when data bit transfers though the path, it can changes from 0->1 (r) or 1->0 (f) on the same logic elements, but in both cases the logic element delay might not be the same. But I don't know that Xilinx uses symmetrical logic element delay times for (r) and (f) transfers!

And you are right, why paths identical and doubled (see pictures of two same setup path for SLOW_MAX corner for dq[0] data port).

2021-01-29_14-46-53.png

2021-01-29_14-50-34.png

As you can see, Vivado definitely uses both (r) and (f) data transfer variations, but not for ZHOLD delay block!

Therefore, we finally find answer for my question and understand why it makes four identical path for one setup and corner condition. In fact, there are no four identical paths, it all different! Vivado combines (r) and (f) data transfers parameters for logic elements on the path.

I think it is mind blowing) If it's hard to understand, please, look at the picture below. it is reports for other two if four "identical" paths.

2021-01-29_15-03-36.png

2021-01-29_15-06-37.png

And picture of this four paths which have the same corner (SLOW_MAX), same relation (setup), same source and destination clocks (sdr_in_clk) and the same start and end points.

As a result, I say thanks (one more time) for Avrum because I didn't notice the (r) and (f) data transfer variations notes on paths report and his answer forced me to analyzed path more attentively. And, thereby, I noticed the different in other two timing paths.

I'm interesting, how can Xilinx uses in FPGA symmetrical (r) and (f) delays after all I've heard what then temperature changes, delays (f) and (r) differs in opposite ways. I'm not expert in it topic, but I'm curious) How can it be? Where I can read about it?

And why Vivado used on ZHOLD delay (f) delay while for upstream elements used (r) delay? Delay can't inverse data transfer it is buffer. It has no logic unless ZHOLD delay can perform signal inversion.

View solution in original post

4 Replies
avrumw
Expert
Expert
569 Views
Registered: ‎01-23-2009

So first, you are correct about there being four combinations, but I want to be clear as to what these are

  • Setup check at slow process corner
  • Setup check at fast process corner
  • Hold check at slow process corner
  • Hold check at fast process corner

As for "min" and "max" we need to be careful. Sometimes it uses the words "delay type max" to mean a setup check and "delay type min" to mean a hold check - that is what the -delay_type min_max is saying - do setup and hold checks.

This is not to be confused with "min" and "max" for a delay component; for every delay there will be four possible delay corners:

  • [SLOW_MAX] - the true slowest the cell can be
    • Used for SCD and DPD for setup checks at slow process corner
    • Used for DCD for hold checks at the slow process corner
  • [SLOW_MIN] - the fastest delay through a cell that can exist on a die with at least one cell at [SLOW_MAX]
    • Used for DCD for setup checks at the slow process corner
    • Used for SCD and DPD for hold checks at the slow process corner
  • [FAST_MAX] - the slowest delay through a cell that can exist on a die with at least one cell at [FAST_MIN]
    • Used for SCD and DPD for setup checks at fast process corner
    • Used for DCD for hold checks at the fast process corner
  • [FAST_MIN] - the true fastest the cell can be 
    • Used for DCD for setup checks at the fast process corner
    • Used for SCD and DPD for hold checks at the fast process corner

I use the following acronyms:

  • SCD: Source Clock Delay
    • From the clock attachment point (the create_clock) to the startpoint of the path (a clocked element or an input port)
    • For an input port this would normally be 0
  • DPD: Data Path Delay
    • From the clock pin of the clocked element that is the startpoint to the data pin of the clocked element that is the endpoint
      • The startpoint can be an input port
      • The endpoint can be an output port
  • DCD: Destination Clock Delay
    • From the clock attachment point (the create_clock) to the destination clocked element (including the clk->data setup/hold requirement)
    • For an output this would normally be 0

 

So that accounts for four reports on the path.

There is an additional dimension - the "rising" and "falling". This is for the source (in this case your input) making a 0->1 transition (r) and a 1->0 transition (f). In all current Xilinx FPGA these are always identical since the libraries are symmetrical, but the timing engine allows for asymmetric cell delays. So this accounts for a factor of 2, making 8 checks on each path (4 setup and 4 hold).

Now you have 8 ports (dq[7:0]), so this would multiply this by 8, resulting in 32 setup and 32 hold checks. 

But you are seeing 64 each.

So I don't know where the last factor of two is coming from.

This is an SDR clock, and it appears that you have only one clock (verify that all of your paths have sdr_in_clk as both the source and destination clock - the ones that you show do). If you were to accidentally have a second clock defined on the same port with the -add defined, you would end up with twice as many reports, but you would see this in the path summary - some paths would not end at sdr_in_clk, but at a clock with a different name.

Another option is that you have two sets of set_input_delay commands, with (at least) the second one having the -add_delay option. This would actually create two startpoints for each physical path; if the two set_input_delay commands had the same -clock, it would be very hard (maybe even impossible) to tell them apart from the timing report. You should review your XDC files, but you can also use the constraint window to see all the constraints applied to the design - it is possible that one set of constraints is coming in from a different file (maybe a scoped constraint file associated with the input capture mechanism, depending on how it was created).

Avrum 

anton_v
Visitor
Visitor
503 Views
Registered: ‎01-28-2021

Thanks for reply, Avrum!

Yes, I supposed in my first post (but maybe not exactly understandable) that the Vivado timing analysis works the same way as you described.

My XDC is correct, I didn't accidentally doubled the clock or set_input delay command. Its looks like that:

# Create sdr input clock
create_clock -name sdr_in_clk -period 10.000 -waveform {0.000 5.000} [get_ports ck]

# Set variables
set period 10.000
set data_in_dly_max 0.500
set data_in_dly_min 0.250
set clk_in_dly_max 0.400
set clk_in_dly_min 0.200
set tco_out_max 0.300
set tco_out_min -0.300
set input_delay_max [expr $data_in_dly_max - $clk_in_dly_min + $tco_out_max - $period/2]
set input_delay_min [expr $data_in_dly_min - $clk_in_dly_max + $tco_out_min - $period/2]

# Set input delay
set_input_delay -clock sdr_in_clk -max $input_delay_max [get_ports {dq*}]
set_input_delay -clock sdr_in_clk -min $input_delay_min [get_ports {dq*}]

# Set multicycle: setup=1->0, hold=0->-1
set_multicycle_path -end -setup 0 -from [get_ports {dq*}]

So, no mistakes we have here.

And I used report_timing command which used only sdr_in_clk as both the source and destination clock. Thus, It can't be another clock or other stuff.

I also know about rising edge and falling edge data transfers variations. It tells that when data bit transfers though the path, it can changes from 0->1 (r) or 1->0 (f) on the same logic elements, but in both cases the logic element delay might not be the same. But I don't know that Xilinx uses symmetrical logic element delay times for (r) and (f) transfers!

And you are right, why paths identical and doubled (see pictures of two same setup path for SLOW_MAX corner for dq[0] data port).

2021-01-29_14-46-53.png

2021-01-29_14-50-34.png

As you can see, Vivado definitely uses both (r) and (f) data transfer variations, but not for ZHOLD delay block!

Therefore, we finally find answer for my question and understand why it makes four identical path for one setup and corner condition. In fact, there are no four identical paths, it all different! Vivado combines (r) and (f) data transfers parameters for logic elements on the path.

I think it is mind blowing) If it's hard to understand, please, look at the picture below. it is reports for other two if four "identical" paths.

2021-01-29_15-03-36.png

2021-01-29_15-06-37.png

And picture of this four paths which have the same corner (SLOW_MAX), same relation (setup), same source and destination clocks (sdr_in_clk) and the same start and end points.

As a result, I say thanks (one more time) for Avrum because I didn't notice the (r) and (f) data transfer variations notes on paths report and his answer forced me to analyzed path more attentively. And, thereby, I noticed the different in other two timing paths.

I'm interesting, how can Xilinx uses in FPGA symmetrical (r) and (f) delays after all I've heard what then temperature changes, delays (f) and (r) differs in opposite ways. I'm not expert in it topic, but I'm curious) How can it be? Where I can read about it?

And why Vivado used on ZHOLD delay (f) delay while for upstream elements used (r) delay? Delay can't inverse data transfer it is buffer. It has no logic unless ZHOLD delay can perform signal inversion.

View solution in original post

avrumw
Expert
Expert
481 Views
Registered: ‎01-23-2009

I'm interesting, how can Xilinx uses in FPGA symmetrical (r) and (f) delays after all I've heard what then temperature changes, delays (f) and (r) differs in opposite ways. I'm not expert in it topic, but I'm curious) How can it be? Where I can read about it?

I don't think there is any specific information on this, but I can speculate...

First, lets realize that FPGAs are not ASICs. The way things work in ASICs is that each cell's timing arc (the timing characterization from an input pin to an output pin of the cell) is either inverting or non-inverting. For example on an AND gate, all input->output paths are non-inverting - if a rising transition on the input causes a transition on the output it will be a rising transition. For a NAND gate, all arcs are inverting; if a rising transition on an input causes a transition on the output, it will be a falling transition. This can be done for most cells, with the notable exception of the XOR/XNOR gate where the rising transition can cause either a rising or falling transition (based on the other inputs).

However, the main element of combinatorial logic in an FPGA is a LUT. The inverting/non-inverting characteristics of a timing arc are not determined by the LUT itself, but by the programming (configuration) of the LUT. Furthermore, a LUT6 can be programmed to have a number of "ambiguous" inversion arcs - functions similar to an XOR gate or built around a similar structure.

So I suspect that the solution to this problem was to simply not care. When they characterized the LUTs, for a given timing arc, they characterized all combinations of inverting and non inverting functions for a given arc, and simply used the most pessimistic numbers for them (fastest for a [*_MIN] corner, slowest for a [*_MAX] corner) for both the rising and falling arcs. So it isn't intrinsic in the cells themselves - I am sure the cells aren't actually symmetric, but just in how Xilinx chose to extract the characterization data that forms the basis for static timing analysis (STA).

And why Vivado used on ZHOLD delay (f) delay while for upstream elements used (r) delay? Delay can't inverse data transfer it is buffer. It has no logic unless ZHOLD delay can perform signal inversion.

The ZHOLD "cell" is probably one of the most poorly documented cells in the library - it is really hard to pin down what it actually is, and whether it shares logic with other functionality (like the IDELAY). So I can only speculate here again...

If the ZHOLD has an optional inversion path (maybe the optional inversion on the IBUFG output or the IDELAY input is actually in the "ZHOLD" cell), then this cell would have "reconvergent fanout"; the input would fan out to two internal path portions; one with a direct connection to a MUX and one with an inverting connection to the other input of the same MUX. Theoretically, this would lead to a doubling of a static timing path going through this cell; looking a the paths from your FPGA input to the capture flip-flop, there would now be two paths; one going through the inverting path portion in the ZHOLD and one going through the non-inverting path portion in the ZHOLD. These would be two separate static timing paths with the same startpoint and endpoint, and hence you would get separate timing reports for each of them. This is consistent with what you are seeing - with one of the paths being inverting through the ZHOLD and the other being non-inverting.

The problem with this, is that if this is the root of the reason why you are seeing the doubling, this should happen almost everywhere in the FPGA. Many cells have optional inverting paths - if this path doubling occurred everywhere there was an optional inverter, then we would see an incredibly proliferation of paths in a design. This would/could even occur for every LUT that implements an ambiguous polarity function (like an XOR) - technically there would be two paths through the LUT from a given input pin.

So I would just chalk it up to a harmless inconsistency in the tool. If the ZHOLD is the only cell with this behavior then the impact on static timing analysis (in terms of performance) won't be significant. And it certainly isn't incorrect from an STA point of view - the timing number for both paths are the same...

Avrum

Tags (1)
anton_v
Visitor
Visitor
456 Views
Registered: ‎01-28-2021

Interesting thoughts and I supposed that it is very close to truth.

Issue with multiplying paths repeats in other paths in FPGA logic too. For example, path with two logic elements between registers unfolds on 8 paths (see picture below) with all combinations of (r) and (f) possible transitions in two logic elements (LUT and CARRY) plus source register (FDCE).

2021-01-29_20-41-47.png

Thus, we see that Vivado considers all possible path combinations with (r) and (f) delays and it seems true because this path has two XOR. For ZHOLD I assume Vivado has special timing behavior or ZHOLD really can invert data. In picture below you can see our distressful path from dq[0] port and there are no other connections to MUX inverted inputs. Another my assumption that it is Device View display problem in Vivado.

2021-01-29_20-50-54.png

Anyway, I think we can stop at this point.

Thanks for interesting discussion and informative response!

Maybe this topic helps to someone one day. I'm closing the topic.

0 Kudos