UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
5,515 Views
Registered: ‎01-22-2015

implementation helps(?) FPGA I/O pass STA

Jump to solution

UG904 says that “Vivado implementation includes all steps necessary to place and route the netlist onto device resources, within the … timing constraints of the design.” So, we can say that implementation sometimes helps the design pass static timing analysis (STA).  

 

For source-synchronous DDR output* from the FPGA that uses ODDR primitives to forward clock and data, can implementation help these off-FPGA paths pass STA? -or can implementation only report whether these fixed(?) paths pass/fail STA?

 

* please assume that proper constraints (eg. create_generated_clock, set_output_delay) have been written

0 Kudos
1 Solution

Accepted Solutions
Guide avrumw
Guide
9,576 Views
Registered: ‎01-23-2009

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

However, for my DDR output example, I was wondering if implementation is able to do things that help the path pass timing analysis?   For example, can implementation adjust delay between output-data-ODDR and the FPGA-port to help the DDR output path pass timing analysis?

 

No.

 

All the resources in these paths are fixed; their locations are determined by the placement of the I/Os and the packing of the ODDR into the IOB (and presumably the proper use of a clock network). Implementation cannot alter any of these.

 

Furthermore, the Xilinx tools do not attempt to adjust any resource based on I/O constraints. Specifically, even if there is an ODELAY in the path, the tools will not "automatically" change the tap settings of the ODELAY - it is up to the user to choose the tap settings. Similarly, it will not attempt to change the phase of a clock coming from the MMCM.

 

So, for an output DDR interface (or any I/O interface implemented in IOB logic) the constraints only serve as a mechanism of validating the interface.

 

Avrum

View solution in original post

0 Kudos
13 Replies
Scholar austin
Scholar
5,503 Views
Registered: ‎02-27-2008

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

m,

 

The constraints are only met when the bitstream is created.  At that point, (which requires that there were no errors), the timing report is a valid record of the results of application of the constraints.

 

If constrained properly, what you see is what you get (we can only sell what we guarantee).

 

At any point prior to that, there are no guarantees.

 

To put it differently, implementation (to a bitstream) provides a valid timing report.

Austin Lesea
Principal Engineer
Xilinx San Jose
0 Kudos
5,485 Views
Registered: ‎01-22-2015

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

Austin - thanks for your reply. 

 

I am confident that the timing report is valid. 

 

However, for my DDR output example, I was wondering if implementation is able to do things that help the path pass timing analysis?   For example, can implementation adjust delay between output-data-ODDR and the FPGA-port to help the DDR output path pass timing analysis?

0 Kudos
Scholar austin
Scholar
5,479 Views
Registered: ‎02-27-2008

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

m,

 

Without a specific feature implemented (for example odelay), there are slight differences in routing and placement delays that are considered to be changed to meet timing, nothing more.  The odelay provides users control of an adjustable, fine-resolution delay taps so you are able to place the delay exactly where you want it (not chosen by the implementation).

 

 

Austin Lesea
Principal Engineer
Xilinx San Jose
Guide avrumw
Guide
9,577 Views
Registered: ‎01-23-2009

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

However, for my DDR output example, I was wondering if implementation is able to do things that help the path pass timing analysis?   For example, can implementation adjust delay between output-data-ODDR and the FPGA-port to help the DDR output path pass timing analysis?

 

No.

 

All the resources in these paths are fixed; their locations are determined by the placement of the I/Os and the packing of the ODDR into the IOB (and presumably the proper use of a clock network). Implementation cannot alter any of these.

 

Furthermore, the Xilinx tools do not attempt to adjust any resource based on I/O constraints. Specifically, even if there is an ODELAY in the path, the tools will not "automatically" change the tap settings of the ODELAY - it is up to the user to choose the tap settings. Similarly, it will not attempt to change the phase of a clock coming from the MMCM.

 

So, for an output DDR interface (or any I/O interface implemented in IOB logic) the constraints only serve as a mechanism of validating the interface.

 

Avrum

View solution in original post

0 Kudos
5,431 Views
Registered: ‎01-22-2015

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

Austin & Avrum – thanks for your replies!  

 

-continuing with the DDR output interface example.  Suppose I have done the design correctly, written the proper constraints, run implementation, and the interface is working! 

 

Then, if I erase the constraints from the XDC-file and rerun implementation, can implementation “break” the interface?   For example, without the constraints, could implementation do something silly like utilize an ODDR located far away from the FPGA port/pin?

0 Kudos
Moderator
Moderator
5,423 Views
Registered: ‎01-16-2013

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution
Hi,

Why you want to do this? Is it just course of experiment?

For the Input output the primitives are locked with respect to IO/Pins.
So, in case of interface there should not be done by tool as Avrum already mentioned.

If same thing you will do it for internal logic then definitely there is change in placement and routing.

NOTE: Constraints are must for every design.

Thanks,
Yash
5,419 Views
Registered: ‎01-22-2015

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

Yashp – thanks for your reply!  

 

     Why you want to do this? Is it just course of experiment?
The thing is, I have a DDR output interface that is working without any constraints. So, my original thinking was that I’d better understand the relationship between implementation and timing analysis before I start writing constraints (and perhaps break the interface). However, from replies to this post, I now understand that writing constraints only permits STA to be run and will not cause the interface to be modified.


    NOTE: Constraints are must for every design.
My inner-heretic wants to know, “If implementation cannot use the constraints to improve operation (thru improved layout) of the interface then why bother to write the constraints?”

 

Why ask why? Well,

  1. This interface seems easy to design (ie. use ODDR outputs for data and clock and ensure that either the FPGA or the receiving device phase-shifts the latch clock by approximately 90deg into the data “eye” center).
  2. It is easy to draw little timing diagrams that show setup and hold (often well documented) for the receiving device and that seem to quickly tell us whether the interface has enough margin to work.
  3. It is work (forgive my laziness) to write constraints, which often seem to have many uncertainties (my justification for laziness). For example, do we really know the other factors (eg. board trace delay, uncertainty in the 90-deg latch-clock shift, etc) that go into writing proper constraints?
0 Kudos
Highlighted
Guide avrumw
Guide
5,405 Views
Registered: ‎01-23-2009

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

Then, if I erase the constraints from the XDC-file and rerun implementation, can implementation “break” the interface?   For example, without the constraints, could implementation do something silly like utilize an ODDR located far away from the FPGA port/pin?

 

Again, no.

 

The ODDR is physically located in the IOB with the OBUF. There are only two legal outputs of the ODDR - directly to the OBUF or to the ODELAY (which, in turn, can only go to the OBUF). So there is no way for the ODDR to end up anywhere other than in the IOB associated with the pin location.

 

My inner-heretic wants to know, “If implementation cannot use the constraints to improve operation (thru improved layout) of the interface then why bother to write the constraints?”

 

So, first, to be clear (for posterity - I am sure you understand this) - what we are discussing here is purely for interfaces that have all fixed physical resources (dedicated clock buffers, IOB flip-flops or IDDR/ODDR or ISERDES/OSERDES, with or without IDELAY/ODELAY). For these, paths, the constraints cannot modify the timing of the path, but simply indicate whether they pass or fail. For all other paths constraints have a huge impact on implementation (or stated differently, the synthesis, placer and router can produce radically different results based on timing constraints).

 

The answer for I/O interfaces with fixed resources is that without the constraints the interface isn't timed. If you don't time the interface, you have no way of knowing the timing of the part of the interface inside the FPGA.

 

So for an ODDR based output interface, there is little to "know" - but still something. For example, using a 90 degree phase shifted clock, the shift is not exactly 90 degrees, there is some variation between the outputs of the MMCM. Similarly jitter and internal clock skew will play a role in "exactly how close to 90 degrees apart the timing will be". The timing analysis (with constraints) will take all of this into account when giving you timing reports on the interface. That being said, Vivado is currently pretty pessimistic in this analysis... (See this post on output analysis pessimism).

 

For input interfaces, though, there is a lot to know. The precise setup/hold requirements for the capture flip-flop with a given I/O standard and clock structure can only be determined by timing analysis through the tool. Without constraints, you cannot do this timing analysis, and hence you cannot know the setup/hold margins of the interface. Furthermore, without this analysis you can't "tune" the interface (using either an IDELAY or a clock phase shift) to move the sampling to the optimal point (with the largest setup and hold margins).

 

Finally, proper constraints give a "pass/fail" result during the implementation process. So, while the timing will not vary from run to run, the constraints verify that nothing has changed. For example, if you modify your RTL in a way that accidentally pulls the flip-flop out of the IOB, the timing analysis will likely fail. Without constraints, you may not notice (although you should always review the IOB report to ensure that all I/Os that are supposed to have IOB flip-flops do actually end up with IOB flip-flops).

 

Avrum

Scholar brimdavis
Scholar
5,394 Views
Registered: ‎04-26-2012

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

@avrumw "what we are discussing here is purely for interfaces that have all fixed physical resources (dedicated clock buffers, IOB flip-flops or IDDR/ODDR or ISERDES/OSERDES, with or without IDELAY/ODELAY). For these, paths, the constraints cannot modify the timing of the path, but simply indicate whether they pass or fail."

 

 I would add that one key difference to be found in the I/O clocking structures for the newer Ultrascale parts is that the I/O clock network is no longer on a fixed, known, and predictable clock tree as with the  BUFIO/BUFR structures of the older families.

 

The actual I/O timing in Ultrascale now depends upon where the darn tools decide to place the clock root, and over how many clock regions it decides to spread any logic attached to that clock.

 

 In the Ultrascale devices, I/O-timing-constraint-driven generation of the I/O clocking tree would be quite useful, but it does not seem to be implemented in the current tools, at least that I could figure out- when I tightened the I/O constraints, I/O timing results just got more unpredictable.

 

The only way I've found to get predictable Ultrascale timing for the equivalent of BUFIO=>I/O DDR, BUFR=>FABRIC is to manually force all of the following:

 - LOC the input clock buffer to the clock region with the I/O DDR flops in question

 - Force the clock root into that same clock region with USER_CLOCK_ROOT 

 - create a PBLOCK to force anything on that same clock net into that same clock region

 

-Brian

0 Kudos
Guide avrumw
Guide
3,928 Views
Registered: ‎01-23-2009

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

@brimdavis,

 

I am not an expert in the UltraScale clocking structure (is anyone?), but...

 

The location of the CLOCK_ROOT in UltraScale definitely has a significant impact on fabric timing. But, I think there is also a dedicated clock root within the I/O column from the BUFG to the clocked I/O resources, and there is even a dedicated connection from the switch matrix from one I/O bank to the one above and below. These are all still part of the "global clock network" (in that they are driven by the BUFG), but I am not certain if they do (or they don't) go to the CLOCK_ROOT and back to the I/O column. In some of my early investigations, it was my impression that the clock loads in the I/O columns were always fed "directly" whereas loads in the fabric went through the CLOCK_ROOT.

 

This gives both the best and worst of both worlds. If this is true, then the I/O timing in UltraScale is quite similar to the BUFIO/BUFR timing in 7 series - it is fixed and low latency. However, this would introduce significant clock skew between the loads in the I/O column and the loads in the fabric. The tools would have to deal with this skew by fixing significant hold issues (which it can do in most cases) and setup requirement reduction.

 

So, what you say may or may not be true - I am just not sure. Let me know if you have any definitive information either way (like a timing report showing the CLOCK_ROOT or not)...

 

Avrum

0 Kudos
Scholar brimdavis
Scholar
3,922 Views
Registered: ‎04-26-2012

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

@avrumw " it was my impression that the clock loads in the I/O columns were always fed "directly" whereas loads in the fabric went through the CLOCK_ROOT."

 

I didn't see evidence of that in the timing reports of my various attempts, but I might have missed something in the construction or constraining of things- I didn't look in depth at the resulting clock tree after implementation, just the I/O timing report numbers.

 

Not meaning to hijack the thread, just wanted to point out that Ultrascale I/O clocking seems to behave differently (although this is more of a problem for input setup/hold timing than for outputs using clock forwarding, where output data/clock skew is typically more important than absolute delay getting onto the clock network).

 

> Let me know if you have any definitive information either way (like a timing report showing the CLOCK_ROOT or not)...

 

I don't have an example at hand; I'll start a new thread if I find time to put together an simple I/O testcase.

 

Roughly from memory:

  - I was using Ultrascale 'primitive component' mode I/O, not the 'native' mode with bit-slice strobe support.

  - the input clock was on a GC pin in the same bank as the data

  - I tried both with independent BUFG's (one for I/O clocking, one for fabric clock), and with a single BUFG

       IBUFDS => BUFG => IDDR CLK

                    => BUFG => FABRIC CLK

  - fabric logic loads were minimal ( some registers, then into a small clock-domain-crossing fifo )

 

    EDIT2: fix mis-remembered timing numbers (-7ns I/O setup was a different problem in that design):

  - without the forced clk tree placement, setup times roughly -1 ns, with 3-4 ns data window, varying between banks

  - after bludgeoning the clk tree placement, setup times of -0.3 ns, with 2 ns data window, consistent across all banks

  - EDIT: note by 'across banks', I mean comparing different groups of input clk/data, each group within a single bank

 

-Brian

0 Kudos
Scholar brimdavis
Scholar
3,920 Views
Registered: ‎04-26-2012

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

markg@prosensing.com "The thing is, I have a DDR output interface that is working without any constraints. So, my original thinking was that I’d better understand the relationship between implementation and timing analysis before I start writing constraints"

 

Do you know about the report_datasheet command?

This will give you a datasheet-style setup/hold and clock-to-output report for all I/O in an implemented design.

( but I don't know if there's a way to get forwarded-clock-to-data-skew-across-bus numbers out of this report)

Looks like there's an option for reporting skew, see [1] below.

 

Constraints are a good thing for reporting pass/fail on I/O timing, but I usually start out in a new design with just the clocks constrained, then use report_datasheet to inspect I/O timing.

 

-Brian

 

[1] snippet from help report_datasheet :

 

  The following example reports the datasheet with the skew calculation for
  two groups of ports, with the first port of each group providing the
  reference for the skew calculation for that group. In this example, CLK0OUT
  is the forwarded clock for DATA0-4 and CLK1OUT is the forwarded clock for
  DATA4-7:

    report_datasheet -file ds.txt -group [get_ports \
       {CLK0OUT DATA0 DATA1 DATA2 DATA3}] \
       -group [get_ports {CLK1OUT DATA4 DATA5 DATA6 DATA7}]

3,898 Views
Registered: ‎01-22-2015

Re: implementation helps(?) FPGA I/O pass STA

Jump to solution

Avrum – I’m ready to prepay an order for your book on this stuff, which you must write someday.

 

…without the constraints, the interface isn't timed. If you don't time the interface, you have no way of knowing the timing of the part of the interface inside the FPGA. … So for an ODDR based output interface, there is little to "know" ... For input interfaces, though, there is a lot to know.

 

I see now that my radical ideas were based on a pretty simple interface. -very grateful for your explanation of why we should always do timing analysis on these interfaces.

 

-----

 

Brian – thanks for thoughts on the report_datasheet command, which will help me get at the “lot to know”

0 Kudos