UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Explorer
Explorer
5,484 Views
Registered: ‎10-07-2016

Timing issue due to badly routed signal

Hello,

after implementing my desing with Vivado 2017.2, I get an intra clock path issue as shown below:

 

Timing_Issue.png

As you can see, the Source and the Desitnation clock is the same. So we are faced to an intra clock path. Further you can see, that the slack has a negative value of -1.367ns, which is a problem. The requirment of 6.7ns is not met, since the Total Delay is 7.806ns. The main problem seems to be the Net Delay, which is extremly high with 7.504ns, although the number of logic levels is only 1 !

When I highlite the path in the schematics, you can indeed see that there is only one logic level between the source and destination register:

 

Schematic.png

 

The question is, why do we gte such big net delays?

 

When looking to the path report, you can see that the net delay between LUT5 and the destination register is 6.934ns, which is huge.

 

Timing_Path.png

When I make this net visible in the floorplan, I can see a very strange routing.

 

Floorplan.png

 

Can anybody tell me why such a routing can happen, and how I can fix this?

 

Best regards

Steffen

0 Kudos
17 Replies
Moderator
Moderator
5,471 Views
Registered: ‎09-15-2016

Re: Timing issue due to badly routed signal

Hi @stgateizo

 

I am not sure about the scenario why the net delay is so much as i am not aware about your design. But can you try locking that particular LUT5 to SLICE_X7Y139 (if there are sufficient sequential sites) .Run the below command in the tcl console:

set_property LOC SLICE_X7Y139 [get_cells <cell_name>]

 

After this run the implementation again and see if it helps to meet the timings.

 

Regards

Rohit

 

Regards
Rohit
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

0 Kudos
Explorer
Explorer
5,458 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hi thakurr,

when I enter the following tcl command:

 

set_property LOC SLICE_X7Y139 [get_cells {design_1_i/DVI/VIDEO/VSPL_0/U0/axis_in_tdata_1[95]_i_1}]

 

I get the follwoing error message:

 

ERROR: [Vivado 12-2285] Cannot set LOC property of instance 'design_1_i/DVI/VIDEO/VSPL_0/U0/axis_in_tdata_1[95]_i_1', Instance design_1_i/DVI/VIDEO/VSPL_0/U0/axis_in_tdata_1[95]_i_1 can not be placed in D5LUT of site SLICE_X7Y139 because the bel is occupied by design_1_i/DVI/VIDEO/axis_reg_tmon/inst/axisc_register_slice_0/m_axis_tdata[79]_INST_0(port:). This could be caused by bel constraint conflict
Resolution: When using BEL constraints, ensure the BEL constraints are defined before the LOC constraints to avoid conflicts at a given site.

 

So it seems that it is not possible, since this slice is already occupied?

 

Regards

Steffen

0 Kudos
Voyager
Voyager
5,428 Views
Registered: ‎06-24-2013

Re: Timing issue due to badly routed signal

Hey @stgateizo

 

This kind of 'insane routing' usually only happens when there is routing congestion in the area where the to-be-connected primitives are. Note that you can easily verify this with the CLB Metrics View.

 

There are several options you have here:

  • Adjust the implementation settings (add some routing directives) to force Vivado to try harder
  • Break the routing down into more than one step and route critical nets first
  • Modify the placement (either via directives or via pblocks)
  • Use a larger FPGA with more routing resources 8-)

Hope this helps,

Herbert

-------------- Yes, I do this for fun!
Moderator
Moderator
5,418 Views
Registered: ‎09-15-2016

Re: Timing issue due to badly routed signal

Hi @stgateizo

 

As @hpoetzl correctly suggested, this bad routing happens when there is routing congestion in that specific region. Try strategies such as Performace_Explore and various other ones as mentioned in the below link, Appendix C

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug901-vivado-synthesis.pdf

 

Also to route the critical nets first,  you can run the below commands after place_design:

route_design -nets [get_nets xxx]

route_design

 

Regards

Rohit

 

Regards
Rohit
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

0 Kudos
Explorer
Explorer
5,414 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello thakurr,

one in addition, I have added the set_property command in a xdc constraints file, and implemented the design again. Now, the timing is met !

 

LUT5 is now located in SLICE_X7Y139, but the target register is now no longer in the same slice, for whatever reason...

 

When I remove the xdc file with the set_property LOC constraint and implement the desing again, the timing issue is also present again. At least the implementation is reproducable...

 

So what does this tell us now?

 

I have to leaf the office now. I'll follow up this post next monday.

 

Nice weekend & best regards

Steffen

 

 

 

 

0 Kudos
Historian
Historian
5,388 Views
Registered: ‎01-23-2009

Re: Timing issue due to badly routed signal

First, this has nothing to do with placement, the three cells involved are all quite close to each other. So there is no "real" reason to try and LOC any of the three cells.

 

The problem is clearly the routing. As others have pointed out, this is probably due to congestion - the tools cannot find a "clearer" path from the source to the destination. This is probably made worse by the fact that you are near the configuration controller, which is a large "hole" in the user logic of the FPGA. It is worth pointing out that the fanout of the LUT5 is quite high - at 106; while this isn't an insanely high fanout, this is probably contributing to the congestion problem, and hence the bad route.

 

The fact that the LOC constraint has an effect on this path is purely coincidental. The placement and routing process is "chaotic" - any change to the initial conditions (no matter how small) can have a radical change on the final result - this is particularly true as you start "stressing" the tool (i.e. with high congestion). By placing the LOC, you changed the initial condition. If this makes the design meet timing then great (and there is a possibility this can happen), but most likely, since the underlying problem is congestion, it just changed the failing path from this one to  another one for the same reason.

 

Trying different strategies - specifically ones to deal with congestion - may help (as others have mentioned). Otherwise you will have to start looking at your RTL to find ways of simplifying it. It may also be possible (but I have rarely had this work) by trying to floorplan the design - you have lots of empty space at the bottom of the FPGA - maybe if you move some non-critical stuff there, you will reduce the congestion around the top.

 

The last thing I will point out is that there is something odd about your clocks. The clock appears to start at the output of an MMCM - this implies that there is a "create_clock" on the output of the MMCM. This is not recommended. Under most circumstances, "create_clock" commands should only be placed on the ports that bring clocks into the FPGA (and the TXOUTCLK of a GTX/H/...). Creating a primary clock inside the FPGA prevents the tool from seeing the entire clock propagation path, which prevents it from seeing certain relationships and analyzing clock skew. While this doesn't matter for a intra-clock path it does matter for inter-clock paths. Furthermore, when done this way, the tools cannot properly account for the jitter of the MMCM, which matters for all paths...

 

Avrum

 

 

0 Kudos
Advisor evgenis1
Advisor
5,374 Views
Registered: ‎12-03-2007

Re: Timing issue due to badly routed signal

Hi @stgateizo ,

 

>> Can anybody tell me why such a routing can happen, and how I can fix this?

 

Nobody mentioned another reason why this happens. Although, this is most often the root cause of the problem in my experience.

 

"prog_full" source register drives multiple modules across the design, including streaming AXI destination "axis_in_tdata*" bus. 

Those source and destination modules have to be placed far from each other for various reasons, not necessarily because of the congested design. For example, proximity to IO pin, PCIe core, transceiver, or specific RAM or DSP block. 

 

So I can suggest a couple of ways to fix this timing issue:

1. Floorplan source and destination modules close together

2. Add pipeline stage to "prog_full". Although it changes functionality and would probably require some redesign and simulation, this could be the most effective way to fix the issue.

 

Thanks,

Evgeni

Tags (1)
0 Kudos
Explorer
Explorer
5,325 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello Avrum,

thank you for your detailed help. Please see my answers below:

 

First, this has nothing to do with placement, the three cells involved are all quite close to each other. So there is no "real" reason to try and LOC any of the three cells.

I think the intention of the LOC constraint as suggested by thakurr was just to see if the placement of the LUT5 table could be improved, by merging it into the target register.

 

The problem is clearly the routing. As others have pointed out, this is probably due to congestion - the tools cannot find a "clearer" path from the source to the destination. This is probably made worse by the fact that you are near the configuration controller, which is a large "hole" in the user logic of the FPGA.

I agree. 

 

It is worth pointing out that the fanout of the LUT5 is quite high - at 106; while this isn't an insanely high fanout, this is probably contributing to the congestion problem, and hence the bad route.

Yes, the fanout of LUT5 is quite high. All targets are close around LUT5. So it can bee that there I run into a congestion situation.

 

The fact that the LOC constraint has an effect on this path is purely coincidental. The placement and routing process is "chaotic" - any change to the initial conditions (no matter how small) can have a radical change on the final result - this is particularly true as you start "stressing" the tool (i.e. with high congestion). By placing the LOC, you changed the initial condition. If this makes the design meet timing then great (and there is a possibility this can happen), but most likely, since the underlying problem is congestion, it just changed the failing path from this one to  another one for the same reason.

Okay.

 

Trying different strategies - specifically ones to deal with congestion - may help (as others have mentioned). Otherwise you will have to start looking at your RTL to find ways of simplifying it. It may also be possible (but I have rarely had this work) by trying to floorplan the design - you have lots of empty space at the bottom of the FPGA - maybe if you move some non-critical stuff there, you will reduce the congestion around the top.

I made bad experiences with floorplanning the design in the past. In most cases, the timing get even worse. I think this makes only  sense if you know exaktly what you do...

 

The last thing I will point out is that there is something odd about your clocks. The clock appears to start at the output of an MMCM - this implies that there is a "create_clock" on the output of the MMCM. This is not recommended. Under most circumstances, "create_clock" commands should only be placed on the ports that bring clocks into the FPGA (and the TXOUTCLK of a GTX/H/...). Creating a primary clock inside the FPGA prevents the tool from seeing the entire clock propagation path, which prevents it from seeing certain relationships and analyzing clock skew. While this doesn't matter for a intra-clock path it does matter for inter-clock paths. Furthermore, when done this way, the tools cannot properly account for the jitter of the MMCM, which matters for all paths...
This is an interesting hint. Below you can see the circuit, which is used to generate the axis_clk signal:

axis_clk.PNG

The clock signal axis_clk is generated by using the mmcm_1, which in turn is fed by the output of a differential input buffer of type IBUFDS. The inputclock source type of the mmcm_1 is set to "Single ended clock capabable pin", as you can see below.

mmcm_1.PNG

 

The reason why I have used here a separate IBUFDS in front of the mmcm_1, was based on the fact, that I need the input clock signal named CLK_IN1_D also for some non timing critical parts, which must be active, when the mmcm_1 is in power down state.

But the most important question here is, whether CLK_IN1_D is constrainted or not, and I can tell you that it is not constrained. Maybe this has to do with the IBUFDS, which was added some time ago. I think, before I have added the IBUFDS, the xdc file, which is automatically generated by the clock wizard was sufficient. But when adding the IBUFDS, I forgot to add a clock constraint for the clock input CLK_IN_1_D.

Avrum, can you confirm if my thoughts are right?

Best regards

Steffen

 

0 Kudos
Explorer
Explorer
5,318 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello hpoetzl,

thank you for this hint. I'm working for a long time with Vivado, but never recognized the metrics view. Can you tell me how to enable this view?

 

Kind regards

Steffen

0 Kudos
Explorer
Explorer
4,855 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello thakurr,

regarding your last post:

 

As @hpoetzl correctly suggested, this bad routing happens when there is routing congestion in that specific region. Try strategies such as Performace_Explore and various other ones as mentioned in the below link, Appendix C

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug901-vivado-synthesis.pdf

 

I first want to find out if there is routing congestion situation. Do you know any Xilinx document which describes how to find routing congestions?

 

Also to route the critical nets first,  you can run the below commands after place_design:

route_design -nets [get_nets xxx]

route_design

Okay, this is one thing I could try out. But to be honest, I don't like such constraints during the development phase of the hardware design, since the constraints become quickly obsolete, when you changing the blockdiagram. This is something I will try out at a later stage, when I finalze the design. So far I will check if I can improve the timing by changing implementation strategies.

 

Best regards

Steffen

0 Kudos
Voyager
Voyager
4,855 Views
Registered: ‎06-24-2013

Re: Timing issue due to badly routed signal

@stgateizo

 

Can you tell me how to enable this view?

 

Sure, when you are on the 'Device' view, just select 'Metrics' from the 'Window' menu.

There right click on the appropriate Metric and select 'Show'

 

Best,

Herbert

-------------- Yes, I do this for fun!
Moderator
Moderator
4,849 Views
Registered: ‎09-15-2016

Re: Timing issue due to badly routed signal

Hi @stgateizo

 

>>I first want to find out if there is routing congestion situation. Do you know any Xilinx document which describes how to find routing congestions?

 

You can run report_design_analysis command on the implemented design to check this. Refer the below link, page 1138 for more details:

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug835-vivado-tcl-commands.pdf

 

Hope this helps.

 

Regards

Rohit

----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

 

Regards
Rohit
----------------------------------------------------------------------------------------------
Kindly note- Please mark the Answer as "Accept as solution" if information provided is helpful.

Give Kudos to a post which you think is helpful and reply oriented.
----------------------------------------------------------------------------------------------

Explorer
Explorer
4,845 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello evgenis1,

the source and destination register are placed relatively close together. I would more tend to a congestion issue, but I need to verify this, as soon as I know how to check this...

 

On the other hand the signal prog_full is a signal which is provided by a FIFO, and which was generated by the FIFO Generator. Maybe its location is locked relatively close to a BRAM, which would mean that the placement is a limited... I'm not the expert in this, but its only an assumption.

 

Since I'm a friend of fixing issues by its root cause, I will first try out your seond suggestion, and put in an additional register stage in the prog_full signal. I will keep you informed.

 

Best regards

Steffen

0 Kudos
Highlighted
Historian
Historian
4,827 Views
Registered: ‎01-23-2009

Re: Timing issue due to badly routed signal

Below you can see the circuit, which is used to generate the axis_clk signal:

 

So, there are some things to consider with your clocking scheme.

 

The first is what you plan to do with "clk". As implemented, there is no obvious clock buffer on this signal... If you are planning on using it for clocking, you need a BUFG on it.

 

The next is the input to the first MMCM. Since you are generating this from the clocking wizard, you should select "No Buffer" for the CLKIN1 input. "Selecting single ended clock capable" implies it comes directly from a pin with no buffer, so the clock wizard inserts an IBUF. This is structurally incorrect since you cannot cascade two IBUFs (the IBUFDS you manually instantiated and the IBUF in the clock wizard), so would likely have caused a failure - I am not sure how the tools resolved this (but I am not as familiar with IP integrator...)

 

Next, I see that you are using two MMCMs in series - with the output of one MMCM driving the input of the second one. This is generally not the best approach - ideally, you should have only one MMCM which generates all the frequencies you need. So, unless there is a "good" reason to do this as two separate MMCMs (which is legal, but not preferred), you should use only one. Assuming you do use two, though, you want to make sure that the connection is "direct"; the output of the first MMCM clk_out1 should not go through the BUFG to get to the second one. But the clk_out1 is also used for other logic, which does require a BUFG. Again, the clocking wizard makes this difficult - you will probably have to take the BUFG out of the first MMCM and manually insert the BUFG.

 

Next, as designed, all these clocks are going to end up being unrelated to each other. Because of all the clock buffers and MMCMs, there will be no known phase relationships between any clocks except cpu_clk and axis_clk, which come out of the same MMCM (assuming they both use the same clock buffers - probably BUFGs).

 

Lastly, I can see how this clock structure can mess up constraints. The clocking wizard will try and apply constraints to the "port" driving the CLKIN1 of both MMCMs. In this case, you don't want that. If possible, I would disable the constraints coming from both clocking wizards (you can disable the XDC files in the IP window), and put a single create_clock on the input of the FPGA, the P side of  CLK_IN1_D.

 

Avrum

0 Kudos
Explorer
Explorer
4,795 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello Avrum,

thank you very much for this really helpful information. Please see my comments below:

 

The first is what you plan to do with "clk". As implemented, there is no obvious clock buffer on this signal... If you are planning on using it for clocking, you need a BUFG on it.

 

The clk signal is indeed driving a counter, which shall be active, even when mmcm_1 is powered down. So you are right, I need a BUFG here!

 

The next is the input to the first MMCM. Since you are generating this from the clocking wizard, you should select "No Buffer" for the CLKIN1 input. "Selecting single ended clock capable" implies it comes directly from a pin with no buffer, so the clock wizard inserts an IBUF. This is structurally incorrect since you cannot cascade two IBUFs (the IBUFDS you manually instantiated and the IBUF in the clock wizard), so would likely have caused a failure - I am not sure how the tools resolved this (but I am not as familiar with IP integrator...)

 

Okay, its the first time I understand the difference between "No Buffer" and "Selected single ended clock capable". I wil change it accordignly.

 

Next, I see that you are using two MMCMs in series - with the output of one MMCM driving the input of the second one. This is generally not the best approach - ideally, you should have only one MMCM which generates all the frequencies you need. So, unless there is a "good" reason to do this as two separate MMCMs (which is legal, but not preferred), you should use only one. Assuming you do use two, though, you want to make sure that the connection is "direct"; the output of the first MMCM clk_out1 should not go through the BUFG to get to the second one. But the clk_out1 is also used for other logic, which does require a BUFG. Again, the clocking wizard makes this difficult - you will probably have to take the BUFG out of the first MMCM and manually insert the BUFG.

 

The reason why I used 2 MMCMs is due to the limited frequency accuaracy, when generating several clock signals. I need to generate 3 clocks, axis_clk = 150MHz, cpu_clk = 100MHz, and the video_clk which must have a clock frequency very close to  130.08MHz.

With 2 MMCMs ad depicted in the figure above I achive 130.078MHz which is very close to 130.08MHz.  When I just use 1 MMCM, the frequency for the video_clk can not be meet exactly. Therefore I thought I use 2 MMCMs and put them in series. But I was not aware, that it is not allowed to to connect the clock input of an MMCM to the output of a bufg, which is used in the first MMCM.

So what is your suggestion in this case, in order to meet the clock accuracy for the video_clk ???

 

Next, as designed, all these clocks are going to end up being unrelated to each other. Because of all the clock buffers and MMCMs, there will be no known phase relationships between any clocks except cpu_clk and axis_clk, which come out of the same MMCM (assuming they both use the same clock buffers - probably BUFGs).

 

The phase realtionship between these signals is from my point of view without any relevance. The clocks axis_clk, video_clk, and cpu_clk are treated asynchronous. For clock crossing domains I use synchronization stages. But there is one important question. Do I need to constraint these clocks as asynchronous clock groups, or will they already be constrained as asynchronous clocks by the Clocking wizard by default ???

 

Lastly, I can see how this clock structure can mess up constraints. The clocking wizard will try and apply constraints to the "port" driving the CLKIN1 of both MMCMs. In this case, you don't want that. If possible, I would disable the constraints coming from both clocking wizards (you can disable the XDC files in the IP window), and put a single create_clock on the input of the FPGA, the P side of  CLK_IN1_D.

 

This is of course a problem which needs to be fixed. So far I didn't know that you can disable the xdc files of IP-cores, but you never stop learning.

 

The question is why Vivado does not give you corresponding error messages. These are simple rules, which could be checked upfront. Instead you get hundreds of warning messages, where 99% are not of interrest or caused by Vivado issues itself.Further, you can find hundreds of pages of information about clock routing, but not a cheat sheet with the most important clock routing rules. In the meantime there are so many different scenarios possbile how you can build up your clock tree, which makes it really easy to do things wrong. And when you make your own IP-cores with ooc-flow it becomes even more difficult regarding constraining.

 

Thank you very much & best regards

Steffen

0 Kudos
Explorer
Explorer
4,786 Views
Registered: ‎10-07-2016

Re: Timing issue due to badly routed signal

Hello avrumw,

I have changed now the clock circuit as depicted below.

clk.PNG

The clock input type of MMCM_0 is now set to "No Buffer". Additionally I removed the second MMCM. I can also life with a 131.25MHz for the video clock. Further I have added a BUFG for the clock signal "clk[0:0]".

 

I hope the circuit is now correct ?

 

Do I go right, that I need now a clock periode constraint for CLK_IN1_D ?

Do I need to constrain all clocks as asynchronous clock groups ?

 

Best regards

Steffen

Tags (1)
0 Kudos
Historian
Historian
4,762 Views
Registered: ‎01-23-2009

Re: Timing issue due to badly routed signal

I hope the circuit is now correct ?

 

The clock structure looks reasonable, but it is hard to tell since all the details are really hidden in the "Utility Buffer" and "Clocking Wizard" modules.

 

Do I go right, that I need now a clock periode constraint for CLK_IN1_D ?

 

Yes. It should be applied to the P side of the differential pair.

 

Do I need to constrain all clocks as asynchronous clock groups ?

 

You need to be careful. If there are clock crossing paths between the domains, then some kind of timing exception is needed. It is unlikely, though, that a set_clock_groups is the right exception. Take a look at this post (and the posts referenced within) on constraining clock crossing paths.

 

Avrum

0 Kudos