cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
ms4323
Visitor
Visitor
564 Views
Registered: ‎07-03-2018

Measuring the timing bottleneck of an isolated component without considering the input/output delays

Jump to solution

I am working on a project that involves VHDL generation from Haskell. I want to measure the timing bottleneck of individual blocks as if they are part of a larger design. That means I don't want the IO to slow down the design, assuming the component under test is connected to another portion of the FPGA fabric, not to output pads. As an example, I compare two generated components but both have a critical path which is because of OBUF; is setting the path from the last register to the OBUF to false path the best option here?

0 Kudos
1 Solution

Accepted Solutions
avrumw
Expert
Expert
457 Views
Registered: ‎01-23-2009

The "best" way to do this is with out-of-context synthesis - this does pretty much exactly what you are asking for - synthesizes the module alone as if it is part of a larger design - so it doesn't insert I/O or clock buffers in the design.

Unfortunately, it is not that easy to use - it is most easily done in non-project mode; the support in project mode is pretty limited. But it's not that hard...

You can take a look at the example in the Design Flows Overview (UG892) in chapter 4 - there is  "Non-Project Mode Tcl Script Example" - this gives you the basics of putting together a non-project script. The key difference is in the synth_design command - if you use

synth_design -top <top_module_name> -part <part_name> -mode out_of_context

then the synthesis (and if you want, the rest of the design flow) is done in out_of_context mode. This is a fully synthesized design that is intended to be "dropped in" to a larger design. You can do all the normal analysis of this design - all the capabilities of the GUI can be used (you can use start_gui in non-project mode to open the GUI or you can write a .dcp with write_checkpoint and open it in the gui for analysis, or you can even do the complete flow in the Tcl console of the GUI either by cutting and pasting the commands or by sourcing a script). You can even place and route the design (at least in 7 series - the "full out_of_context" mode has been deprecated for UltraScale/UltraScale+/Versal, but may still be usable for analysis purposes - I haven't tried).

You will need to write constraints for your design - but this is generally as simple as a "create_clock" for the input clock, a default "set_input_delay" to a small number (maybe 1ns?) for all the inputs (on the assumption that they will come from other flip-flops in the full design) and a set_output_delay of the majority of the clock period (assuming they come directly from flip-flops) for all outputs.

You also may want an HD.CLK_SRC constraint if you are planning to try place and route (see the Hierarchical Design user guide - UG905) under "Out-of-Context Design Constraints"). Again, I don't know if this works on UltraScale/UltraScale+/Versal - I have only used it under 7 series.

Avrum

 

View solution in original post

Tags (1)
0 Kudos
7 Replies
bruce_karaffa
Scholar
Scholar
551 Views
Registered: ‎06-21-2017

Are you trying to find the minimum latency of each block or the maximum clock speed?  For latency, I would suggest simulation.  Maximum clock speed is more difficult since it is layout driven and will change as you add more blocks to the design.

0 Kudos
ms4323
Visitor
Visitor
530 Views
Registered: ‎07-03-2018

I'm aware of that. I'm synthesizing the individual components and measuring the maximum clock frequency, but I don't want the outputs to become the bottleneck of my individual designs since I'm comparing them "as if" they are in the middle of a larger design. Is setting the paths including the registers going to OBUF as false path a correct way for this case?

0 Kudos
avrumw
Expert
Expert
458 Views
Registered: ‎01-23-2009

The "best" way to do this is with out-of-context synthesis - this does pretty much exactly what you are asking for - synthesizes the module alone as if it is part of a larger design - so it doesn't insert I/O or clock buffers in the design.

Unfortunately, it is not that easy to use - it is most easily done in non-project mode; the support in project mode is pretty limited. But it's not that hard...

You can take a look at the example in the Design Flows Overview (UG892) in chapter 4 - there is  "Non-Project Mode Tcl Script Example" - this gives you the basics of putting together a non-project script. The key difference is in the synth_design command - if you use

synth_design -top <top_module_name> -part <part_name> -mode out_of_context

then the synthesis (and if you want, the rest of the design flow) is done in out_of_context mode. This is a fully synthesized design that is intended to be "dropped in" to a larger design. You can do all the normal analysis of this design - all the capabilities of the GUI can be used (you can use start_gui in non-project mode to open the GUI or you can write a .dcp with write_checkpoint and open it in the gui for analysis, or you can even do the complete flow in the Tcl console of the GUI either by cutting and pasting the commands or by sourcing a script). You can even place and route the design (at least in 7 series - the "full out_of_context" mode has been deprecated for UltraScale/UltraScale+/Versal, but may still be usable for analysis purposes - I haven't tried).

You will need to write constraints for your design - but this is generally as simple as a "create_clock" for the input clock, a default "set_input_delay" to a small number (maybe 1ns?) for all the inputs (on the assumption that they will come from other flip-flops in the full design) and a set_output_delay of the majority of the clock period (assuming they come directly from flip-flops) for all outputs.

You also may want an HD.CLK_SRC constraint if you are planning to try place and route (see the Hierarchical Design user guide - UG905) under "Out-of-Context Design Constraints"). Again, I don't know if this works on UltraScale/UltraScale+/Versal - I have only used it under 7 series.

Avrum

 

View solution in original post

Tags (1)
0 Kudos
ms4323
Visitor
Visitor
404 Views
Registered: ‎07-03-2018

Thank you that was pretty much what I needed. I have one more concern. I'm also trying to compare the resource utilization and clk frequency of my individual components to corresponding Xilinx IP cores. Is the out-of-context IP generation setting suitable to use when synthesizing with "-mode out_of_context". (I mean I'm just asking if I should avoid generating IPs in general setting since I'm synthesizing with the out_of_context flag for all the designs.)

Thanks!

0 Kudos
avrumw
Expert
Expert
399 Views
Registered: ‎01-23-2009

The -mode out_of_context is, in fact, specifically what is used for synthesizing IP by default  - its primary purpose is for synthesizing IP.

So if you have a project with IP where the synthesis is not set to "GLOBAL" (the default is out-of-context), then the mini-project it creates for the IP (you can see it in the "runs" tab of the GUI, is a full out-of-context run for synthesizing the IP. I think (but am not sure) that you can even open this run and inspect the synthesized results of this out-of-context run without looking at it in your top level project.

So out_of_context mode can be used inside a project. The only limitation is that it can only be set on a sub-module of the top - it cannot be set on the top level design in the project. For any IP this is clearly the case (the top level of your design is never an IP). 

So if you have a design that includes Xilinx IPs as well as your own modules, you can right click on your module and select "out-of-context" - it will create an out-of-context run for this module which is essentially identical to the out-of-context run for IP.

But, in project mode (at least in the past) you couldn't do out-of-context implementation - only synthesis...

Avrum

0 Kudos
ms4323
Visitor
Visitor
173 Views
Registered: ‎07-03-2018

I'm back at the OOC IP synthesis, and I have one last question. If you recall I have my own generated VHDL modules that I measure their maximum frequency with the -mode out_of_context flag individually. My issue is I want to compare the frequency and resource utilization after synthesis with Xilinx IPs "in isolation". So If I generate an IP in OOC and my project doesn't contain any rtl the synthesis runs looks for a top module because as you said the IP is never the top level of the design. Here is my question: how I can measure the max frequency of a single generated IP so that I can compare it to my VHDL modules?

0 Kudos
avrumw
Expert
Expert
147 Views
Registered: ‎01-23-2009

What you are trying to do is not "normal"... But there is probably a way to hack it.

First, you will have to generate a dummy project. In this project, you will have to create the IP you want from the IP catalog. You will need to instantiate it in a dummy top level of your project. I suspect that the project has to be "real enough" - at least legal enough for the tool to see it as syntactically correct - the simplest thing to do would be to instantiate the IP and port out the pins of the IP to the ports of your design.

Now you should try and synthesize your design. When you do this, the first thing Vivado will do is generate the output products of the IP and synthesize it out-of-context. To do this it will generate an "Out-of-Context Module Run" for the synthesis of the IP - you will eventually be able to see this in the "Design Runs" tab in the bottom pane.

Once the OOC run is complete it will then attempt to synthesize your top level design - whether it succeeds or not is irrelevant.

Now the run is synthesized. Unfortunately there is no way to "open" this run within the project scope. So what you will need to do is go find the synthesized .dcp so that you can look at it. It will be located in the directory <project>/<project>.runs/<ip_name>. There you will find the <ip_name>.dcp.

This .dcp is more or less stand-alone; it can be opened in Vivado. You can simply double click on the .dcp and it will launch Vivado and open the IP (you may want to copy it elsewhere if you want to preserve the project). The .dcp will be opened in the GUI but in non-project mode - only the design itself will exist (no project), and it will be in memory.

From here you can do what you want with it. It will automatically have all the internal exceptions required by the IP (if there any are), but it will not have any external constraints. You will probably want to do create_clock commands for the clock ports (or ports) and set_input_delay/set_output_delay for the I/O ports. You may also want to set the HD.CLK_SRC attribute on the clock ports (choose any BUFG location in the device that you specified when you created the dummy project - BUFGCTRL_X0Y0 is as good as any).

Now you can run any of the analysis commands - either through the Tcl console or through the regular menu commands (like Reports -> Timing -> Report Timing Summary).

The IP was probably synthesized at a "modest" clock frequency, so it may not be optimized for the highest speed. You can see the constraints it was synthesized with in the dummy project under <project>.srcs/sources_1/ip/<ip_name> - the file <ip_name>.xdc (and others - I have seen <ip_name>_clocks.xdc) contains all the internal constraints, and the file <ip_name>_ooc.xdc will have the clocks that were used during OOC synthesis.

If you wanted to synthesize it with faster constraints (to see how fast it can go), that would be difficult. You would somehow have to get the run in the dummy project to generate the targets (you might be able to do this with 'generate_targets synthesis [get_ips <ip_name>]'), then go edit the _ooc.xdc, then try and resynthesize your design (which would launch the OOC synthesis run), but this might not work - we are pretty seriously hacking the dependency management system of the project...

Avrum

0 Kudos