04-04-2018 08:17 AM
I have an Artix 7 FPGA design which has a custom logic to generate a clock signal from some flip-flop data. This recovered clock is used to sample some data sent to the FPGA. The path I'm describing (with the BEL info) would look like this:
FDRE/Q pin ---> LUT2 ---> FDCE/C pin
The LUT2 output is the recovered clock and this is used to sample some data on the FDCE primitive. I have a requirement to find out the path-delay from FDRE/Q to FDCE/C. I guess the "report_timing" command, may not be useful in this case as it analyzes paths from flip-flop -to flip-flop. I would appreciate if someone can point me in the right direction? And is "get_net_delays" useful in this case?
04-04-2018 09:11 AM
DON'T DO THIS!
This kind of structure violates all recommended good coding practices, and should be avoided at all costs.
- you shouldn't generate a "clock" using any means other than dedicated clock logic
- you absolutely shouldn't generate a "sensitive" signal (i.e. a clock or asynchronous preset/clear) from a LUT
- there is no guarantee that LUT outputs are glitchless...
- the timing of a solution like this will vary from run to run
- as written, the resulting clock is a local clock (which are also discouraged), and hence will have large clock skew
- (and probably a whole bunch of other reasons)
There is almost always a "better" way of doing what you are trying to do that doesn't use asynchronous design practices.
But - if you insist on doing it this way...
The question is (generally) not how the path you outlined behaves, but how the larger system behaves - specifically how this generated clock interacts with other clocks in the design. And, even though this coding is discouraged, Vivado can properly model this from the static timing point of view.
The output of the LUT2 (ick!) is a generated clock, and hence can be described as such with the create_generated_clock command
create_generated_clock -name my_icky_clock -source [get_pins FDRC/C] <relationship> [get_pins LUT2/Q]
This will model a clock that is generated on the output of the LUT using the propagation delay that includes all the stuff upstream of it; the clock delay of the clock driving the first FDRE as well as that FDRE's clock->q, and the routing to and from the LUT - including any possible skew due to the icky clock being a local clock.
The only thing that needs to be determined is the <relationship> - is the generated clock effectively the same frequency as the input clock - in which case the <relationship> is "-divide_by 1" - you haven't described exactly what this LUT/FF arrangement does, so we can't tell what the <relationship> is...
With the clock properly defined, the tool will verify all setup and hold checks between flip-flops clocked on this icky domain and any other (less icky) domain - which is ultimately what you want. If the tool says these paths pass, then the design meets timing...
But again - you are far better off finding a way to do this without a structure like this.
04-04-2018 09:48 PM
I agree with your point on this being a non-standard practice. But this scenario necessitates the use of a combinatorial arrangement to generate the clock. And yes this is a local clock which feeds a few sequential elements. Hence a clk buffer(to remove skew) is not used and the logic has been constrained to a small p-block region.
Also the recovered clock has been constrained. Instead of using the "create_generated_clock", I have used the "create_clock" constraint with the frequency of the recovered clock(this is a fixed value). You had mentioned about LUTs glitching which is a known hazard. To eliminate that and make the compilations rugged, I need to match the routing lengths or path delays to and from the LUT2. Does it make sense?
That's one reason I wanted to check the delay from FDRE/Q -> LUT2 -> FDCE/C. If I use "get_net_delays", the LUT delay is not considered. And will "report_timing -to ***/FDCE/C" give an estimate the recovered clock path delay? Sorry that I can't share more information as it is proprietary.
Thanks for your thoughts.
04-05-2018 07:31 AM
If I use "get_net_delays", the LUT delay is not considered
There is an equivalent command to get the timing arc through the LUT (get_timing_arcs), but this is probably not the way to go. You should focus on path delays...
I have used the "create_clock" constraint with the frequency of the recovered clock(this is a fixed value).
This is not a good idea. Done this way there are two problems
- you have no mechanism of checking the timing between the recovered clock and the base clock
- I presume that at some point you move data from this recovered clock back to the base clock
- you prevent any path tracing through the generation of the recovered clock
If you use a create_generated_clock as I suggest, then you can address both these problems.
For the first issue, the generated clock will allow complete tracing of any path between the recovered clock domain an the base clock domain. This analysis will including a full analysis of the skew between the two clock domains. If you use a primary clock, then this won't be done - it will be up to you to ensure that this clock crossing can be done "in time".
For the second issue, with the create_generated_clock, any timing path that starts at or ends at a flip-flop in the recovered domain will result in a timing analysis on the complete clock path - this will include the LUT2 and the routing of your generation circuit. If you query both setup and hold checks to flip-flops on this domain, both as fast process corner and slow process corner then you will get the timings of this path.
However, this will not be perfect, since you have two paths in your clock propagation path - one through the I0 input of the LUT2 and one through the I1. These will have two different timings. I don't think you will be able to get both at the same process conditions.
When you do a setup check, the source clock delay uses the "longest" clock delay. When you do a hold check, the source clock delay uses the "shortest" clock delay (the destination delay does the opposite in both cases). In a normal analysis, there is only one clock path, so the same cells are used for both analyses. However, in the "longest" delay, the analysis uses the [*_MAX] process corner, whereas the "shortest" uses the [*_MIN]. Thus there are effectively four corners [SLOW_MAX] [SLOW_MIN] [FAST_MAX] [FAST_MIN].
- [SLOW_MAX] - the true slowest PVT corner - this is the slowest any cell can be
- [FAST_MIN] - the true fastest PVT corner - this is the fastest any cell can be
- [SLOW_MIN] - this is the fastest a cell can be on a die that also has at least one cell at [SLOW_MAX]
- this is due to "on chip variation" - not all cells on the die will be at exactly the same process corner
- [FAST_MAX] - this is the slowest a cell can be on a die that also has at least one cell at [FAST_MIN]
- also due to on chip variation
In your case, this analysis will also be messed up by your two clock paths
- the "longest" paths will show (only) the longer of the two paths at the [*_MAX] process corner
- the "shortest" paths will show (only) the shorter of the two paths at the [*_MIN] process corner
Thus you won't be able to get the shorter path at the [*_MAX] corner or the longer path at the [*_MIN] corner.