cancel
Showing results for 
Search instead for 
Did you mean: 
Observer
Observer
2,201 Views
Registered: ‎02-27-2017

How to synthesize/implement a part of the design (out of context)?

Hi community!

 

Currently I am working on a part of a larger group design which has to operate at 200MHz and some parts at 400MHz.

I have been searching extensively but can't seem to find a proper answer on how to synthesize and implement a part of a design and how to setup the timing constraints in that case.

 

Currently I set the synthesis run to "out_of_context" mode. I have two synchronous clocks where one runs at twice the frequency of the other (200 and 400MHz).

I would appreciate to get some feedback on this setup and what is good/better practise than this. The goal of the design is for illustrative purposes only, it will not go into production (it is not the end of the world if not all settings are production grade).

 

I did try several runs with this setup and started out with clock constraints of 100 and 200MHz. There is not much slack left to push the design to the desired operating frequency. Before drastically rearranging the design, is it possible to try out various implementation strategies? For example, I saw several high fan-out signals (of 512). What are preferred strategies to use at this point, or what kind of general workflow is preferred?

 

Any help would be very much appreciated.

 

0 Kudos
6 Replies
Highlighted
Moderator
Moderator
2,187 Views
Registered: ‎09-15-2016

Re: How to synthesize/implement a part of the design (out of context)?

Hi @ymulder,

 

>>For example, I saw several high fan-out signals

Just another suggestion, you can use the phys_opt_design in implementation phase to replicate the timing critical high fanout nets. If you want to replicate a specific set of nets you can use the -force_replication_on_nets switch to specify the same.
 
For details you can refer UG904 (link).

 

Regards,
Prathik
-----------------------------------------------------------------------------------------------
Please mark the appropriate post as an answer "Accept as solution" in case it helps to resolve your query.
Give Kudos to a post which you think is helpful and reply oriented.
-----------------------------------------------------------------------------------------------

0 Kudos
Highlighted
Guide
Guide
2,186 Views
Registered: ‎01-23-2009

Re: How to synthesize/implement a part of the design (out of context)?

So, out-of-context is the way to go. In project mode through the GUI, you can do out-of-context synthesis. But in non-project mode you can do a complete out-of-context synthesis, place and route; going all the way through the flow gives you a better idea of how the design performs.

 

For OOC constraints, you generally constrain it as if it were a chip; you constrain all the clocks entering the block (with create_clock) and constrain all the I/O (with set_input_delay/set_output_delay). Since all clocks in Vivado are related by default, you don't need to do anything "fancy" to get the tools to understand that they are related (I presume in the "real" design, they will come from the same MMCM/PLL).

 

To make sure the tools do analysis with clock skew, you need to have some additional "HD" constraint (hierarchical design) - the most important one is HD.CLK_SRC - through this constraint you tell the tools what resource is driving the clock. If it is expected that the two clocks are on BUFGs, then simply choose any two BUFGs as the HD.CLK_SRC.

 

Next, you mention that there is "not much slack left...". You have to realize that the tools never work to maximize slack; their goal is to get the design to meet timing and no more. So even if the design could run at 300 and 600MHz, since you only are asking for 200 and 400, the tools will only try and attain these frequencies (and then work to minimize the area used while still meeting the constraints). So if you want to know how much margin there is, you need to over-constrain the design - NOTE: this is the only place (for analyzing the inherent timing of an implementation) where you should over-constrain a design...

 

With the design over-constrained and with all OOC constraints, you can get an idea of the inherent performance of the design. You can even play with different directives and optional steps like phys_opt_design.

 

But, be aware, that an OOC module will always be the "best case"; when integrated in a real design - particularly a design that uses a significant portion of the FPGA resources - the performance of your module will be at best the same as the OOC design, but may be fairly significantly worse...

 

Avrum

Highlighted
Observer
Observer
2,147 Views
Registered: ‎02-27-2017

Re: How to synthesize/implement a part of the design (out of context)?

As always, thank you very much for your extensive reply @avrumw!

 

Currently I use the following constraints for the two clocks to operate at 200 and 400MHz respectively:

create_clock -period 5.000 -name clk1x -waveform {0.000 2.500} [get_ports clk1x]
create_clock -period 2.500 -name clk2x -waveform {0.000 1.250} [get_ports clk2x]

 

I am unsure on how to constrain all of the I/O signals of the design. What are typical numbers to use here when these I/O signals will be connected to other modules in the future?

 

As you suggested, I tried to add (to the same constraints file) the additional "HD" constraint for each clock without any luck. I tried the following:

set_property HD.CLK_SRC BUFGCTRL_X2Y4 [get_ports clk1x]
set_property HD.CLK_SRC BUFGCTRL_X1Y3 [get_ports clk2x]

The suffix numbers to X and Y are based on the device view after implementation (not sure if that is good practise).

However, the compiler shows a critical warning for these commands:

[Common 17-69] Command failed: CLK Source BUFGCTRL_X2Y4 cannot be found in device.

What is the best way to determine which BUFG to use in this case (for both clocks)?

 

Your explanation about the operation of the tool for optimisation is very much appreciated. This confirms what I expected after running another implementation run with increased clock constraints.

 

Thanks again for all the help!

0 Kudos
Highlighted
Guide
Guide
2,129 Views
Registered: ‎01-23-2009

Re: How to synthesize/implement a part of the design (out of context)?

I am unsure on how to constrain all of the I/O signals of the design. What are typical numbers to use here when these I/O signals will be connected to other modules in the future?

 

If the inputs to this module come directly from flip-flops outside this module (or flip-flops on the outputs of the module driving this module), then there should be only a single clock-to-q delay and a net delay. While the net delay can be pretty much anything depending on routing, it is probably reasonable to allow for 1ns (or maybe a bit more) for these delays. The min is probably irrelevant, but a fraction of the max seems reasonable. This would be done with a conventional set_input_delay

 

set_input_delay -clock [get_clocks clk1x] 1      [get_ports <name_of_port>]

set_input_delay -clock [get_clocks clk1x] 0.25 -min [get_ports <name_of_port>]

 

(or clk2x if the input is driven by clk2x)

 

What is the best way to determine which BUFG to use in this case (for both clocks)?

 

All BUFG are pretty much identical in timing, so it doesn't matter which ones you use. Since both clocks comes from the same MMCM they will likely be on adjacent BUFGs in the same half. However, you have to choose ones that exist. In all 7-series designs there are 32 BUFGs numbered from X0Y0 to X1Y15; There are no X2* BUFGs.

 

Given this, I would just use X0Y0 and X0Y1

 

Avrum

Tags (1)
Highlighted
Observer
Observer
2,113 Views
Registered: ‎02-27-2017

Re: How to synthesize/implement a part of the design (out of context)?

Thanks a lot @avrumw! It seems all to work now. I used your suggested min and max delays of 0.25 and 1.0 respectively.

 

However, I am running into another problem. Currently, I am implementing only a part of my design. Roughly one forth.

Before I added the input and output constraints, the design would implement in less than 15 minutes. Now that is 5 hours. Both runs were done using the performance explore strategy. Also, the design fails timing now by 1.6ns while before it passed, even though it was very close. The clock constraints are still 200 and 400MHz.

Are the input and output signal constraints too high? Currently, each input or output goes straight into, or comes straight from, a register.

 

Would it be possible to speed up the implementation phase? Also, since this is not a production class design but rather a proof of concept, would it be possible to ease up on the input and output signal constraints?

 

Any thoughts would be very much appreciated :)

0 Kudos
Highlighted
Guide
Guide
2,100 Views
Registered: ‎01-23-2009

Re: How to synthesize/implement a part of the design (out of context)?

Without seeing the failing paths, I can't speculate as to why it is taking so much longer to implement.

 

That being said, if what you say is true, and all the outputs come directly from FFs and all the inputs go directly to FFs, then the input and output constraints are irrelevant, and you can remove them.

 

In many designs, the inputs do not have FFs, and the input ports are part of a potentially critical path. Without a set_input_delay, these are not timed, and hence not having the set_input_delay can give you misleading information. I the extreme, imagine a portion of a design that has flip-flops only on its outputs and is a pure pipeline stage (i.e there are no paths from FF to FF, only paths from input to FF). Without input constraints, you can set you clock to any frequency and it will still pass.

 

But since your design has FFs on the inputs, the set_input_delay is only doing a check on the path from the port to the immediately connected FF. This is basically meaningless - it can't be the critical path (since there is no logic on it). The real paths are, by definition, now FF to FF, and hence specifying the clocks are sufficient.

 

Avrum

0 Kudos