UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Contributor
Contributor
4,210 Views
Registered: ‎04-18-2016

Source synchronous edge aligned DDR input constrain failed.

Jump to solution

Hi,

 

    I have constrained a source synchronous edge aligned ddr interface refer to this pgae: 

https://forums.xilinx.com/t5/Timing-Analysis/How-to-constraint-Same-Edge-capture-edge-aligned-DDR-input/m-p/646009#M8411

 

    But the timing between virt_clk and clkout0 is  still can not meet.

 

   My system has  a 3.90625ns(256MHz) DDR interface with a 0.1ns clock/data skew, 50% duty cycle. A MMCM has been inserted between clock input port and IDDR cell to get a 90 degrees phase shift.

 

The timing constrain and report is below.

 

create_clock -period 3.906 -name CLK_P -waveform {0.000 1.953} [get_ports CLK_P]

create_clock -period 3.906 -name virt_clk -waveform {0 1.953}


set_multicycle_path 0 -from [get_clocks virt_clk] -to [get_clocks CLK_P]
set_false_path -setup -rise_from [get_clocks virt_clk] -fall_to [get_clocks CLK_P]
set_false_path -setup -fall_from [get_clocks virt_clk] -rise_to [get_clocks CLK_P]

set_multicycle_path -1 -hold -from [get_clocks virt_clk] -to [get_clocks CLK_P]
set_false_path -hold -rise_from [get_clocks virt_clk] -rise_to [get_clocks CLK_P]
set_false_path -hold -fall_from [get_clocks virt_clk] -fall_to [get_clocks CLK_P]

 

#set_clock_latency -source -fall -min -0.5 [get_clocks {virt_clk CLK_P}]
#set_clock_latency -source -fall -max 0.5 [get_clocks {virt_clk CLK_P}]

 

set_input_delay -clock [get_clocks virt_clk] -clock_fall -min -add_delay -0.1 [get_ports {I_DA_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -max -add_delay 0.1 [get_ports {I_DA_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -min -add_delay -0.1 [get_ports {I_DA_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -max -add_delay 0.1 [get_ports {I_DA_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -min -add_delay -0.1 [get_ports {I_DA_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -max -add_delay 0.1 [get_ports {I_DA_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -min -add_delay -0.1 [get_ports {I_DA_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -max -add_delay 0.1 [get_ports {I_DA_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -min -add_delay -0.1 [get_ports {I_DB_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -max -add_delay 0.1 [get_ports {I_DB_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -min -add_delay -0.1 [get_ports {I_DB_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -max -add_delay 0.1 [get_ports {I_DB_N[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -min -add_delay -0.1 [get_ports {I_DB_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -clock_fall -max -add_delay 0.1 [get_ports {I_DB_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -min -add_delay -0.1 [get_ports {I_DB_P[*]}]
set_input_delay -clock [get_clocks virt_clk] -max -add_delay 0.1 [get_ports {I_DB_P[*]}]


timing.png
0 Kudos
1 Solution

Accepted Solutions
Guide avrumw
Guide
5,836 Views
Registered: ‎01-23-2009

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

From the timing report, this report (and the failure) look real. The tool is telling you that the clock insertion via the BUFMR and BUFR is immense - over 5ns. Part (but only a small part) is due to the IDELAY on the clock input, but a lot more due to the delays on the BUFMR and BUFR themselves, but even more on the routing between these two. I don't think there is anything that you can do about these delays (they are part of the architecture).

 

Given this, I doubt this clocking scheme is viable - the BUFMR timing is pretty bad. With 5ns of clock insertion, none of which is PVT compensated, I suspect it is impossible to capture a data stream at this clock rate.

 

I am not sure why you switched from the MMCM based mechanism. Given that you have loads distributed among different I/O banks, it will probably be better than the BUFMR... I don't know, though, if it will be "good enough" for this clock rate on your device, but at this point it is you only other choice.

 

By the way, to forward a clock out of the FPGA, do not connect the clock directly to the OBUF (CLK0) - use an ODDR for clock forwarding.

View solution in original post

10 Replies
Highlighted
Guide avrumw
Guide
4,179 Views
Registered: ‎01-23-2009

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

The timing constraints from the post you referenced (which you are using as a basis) were defined on the assumption that the clock propagates to the IDDR. This would be done either using a BUFIO or a BUFG directly (with or without an IDELAY in the path). The moment you put an MMCM with a positive phase shift in the loop, everything changes...

The referenced post walks through the FOUR paths involved in the original set of constraints with the IDDR. It further talks about the launch and capture edges. From that post

Now, lets look at the "rule" for multiple clocks in Vivado. To determine which edge is the capture edge, Vivado looks at the un-propagated clocks (thus fpga_clk and virt_clk). If data is launched on one edge, it is by definition captured on the un-propagated clock edge that follows it in time.

But the question in this case is "what is the destination clock".

(By default, and this can now be changed, see the end of this post), when a clock goes to an MMCM, the MMCM generates a new clock. In the case of your MMCM, this clock has (using the the timing of the other post) a rising edge at 2.5ns, a falling edge at 7.5, and the next rising edge at 12.5.

So even though the new clock only "starts" at the output of the MMCM, it is still this "new" clock that determines source and capture edges. So, for the the four timing paths in the original post (a, b, c, and d) the relationships are different (I am doing these in a different order, be sure to note which is a, b, c and d).

c) rising -> rising: The rising edge of virt_clk is at 0. However, the "next" rising edge of the "new" clock is at time 2.5 - this is radically different than the one where the BUFIO used (which had the capture clock at 10ns, which then got propagated to 12.5).

d) falling -> falling: falling edge of virt_clk at time 5 to falling edge of "new" clock at 7.5

a) rising -> falling: rising edge of virt_clk at time 0 to falling edge of "new" clock at 7.5

b) falling -> rising: falling edge of virt_clk at time 5 to rising edge of "new" clock at 12.5

This is now the opposite of the case in the other post - paths c and d are the correct timing relationships as is - without any multicycle path constraints. Paths a and b are wrong, but are harmless since they are longer.

Because of this, the constraints will be different (by default - again, there is another option at the end).

Given this, the easiest solution is to remove all the set_multicycle_path and set_false_path commands so carefully crafted in the other post. They are incorrect here because of this phase shift in the "new" clock (which changes the edge relationships). So with your set_input_delay commands and without all the rest, the timing is "correct".

The other option is new to 2016.3. The behavior I describe about "new" clock, which is a phase delay (also called a waveform delay) was the default in Vivado when a clock is phase shifted in an MMCM. This behavior is clearly different than (say) an equivalent (2.5ns) delay implemented in an IDELAY, which is a propagation delay (also called clock latency).

New to 2016.3, you have the ability to change the behavior of the timing analysis treatment of a phase shift from the MMCM/PLL. The default for all architectures except UltraScale+ is as I describe above, which is called a WAVEFORM delay.

However, in 2016.3, this behavior can be changed from a WAVEFORM delay to a LATENCY. This is done by setting the property

set_property PHASESHIFT_MODE LATENCY [get_cells <instance_name_of_MMCM>]

With this property set (and currently it must be set in the XDC, you cannot set it in the instantiation of the MMCM), the timing behavior of this phase shift is analyzed as a propagation delay - the same as the propagation delay through an IDELAY. So with this option set

  • the edges of the "new" clock remain at 0, 5, and 10
    • this means that the launch and capture edges revert to the cases described in the other post.
  • the propagation delay through the MMCM is increased by 2.5ns (which makes it similar to the delay through an IDELAY)

This makes the MMCM delay behave the same as an IDELAY delay, and hence the constraints you have (which were written for a propagation delay) are correct. So a second option for you is to keep the constraints as you have them, but change the PHASESHIFT_MODE of the MMCM to LATENCY (I would think the first solution is easier and cleaner).

It is very important to note that the default for the PHASESHIFT_MODE is WAVEFORM for the 7 series and UltraScale devices, but LATENCY for the UltraScale+ devices. This can be changed (either way) using the XDC setting, but if you don't override them, the default behavior is opposite in UltraScale+ than in previous architectures.

[EDIT:

WARNING! If you have two different outputs of the same MMCM with different phase shift values (with paths between them), you must NOT use LATENCY mode. In latency mode, the edge determination for paths between two clocks with different phase shift values is incorrect, and hence will result in incorrect timing analysis.

As an example, if you have two clock outputs of the same MMCM with a 90 degree phase shift, the "correct" requirement for a path that starts at CLK0 and ends at CLK90 is 1/4 of a clock period - and this is what the tools will determine in WAVEFORM mode.

However, in LATENCY mode, the tools will determine that the requirement is one full clock, and will have 1/4 extra clock time as propagation on the CLK90, thus effectively resulting in 1.25 clock periods as the allowable propagation delay on this path. This is incorrect.

]

Avrum

 

Tags (1)
0 Kudos
Contributor
Contributor
4,129 Views
Registered: ‎04-18-2016

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution
Hi, avrumw Thank you for your reply and I have learned more about the timing constrain from your post. I am still not have some questions. 1) I am not clear about the definition of the in 'set_input_delay -min ' and 'set_input_delay -max '. 2) The Constrain Wizard in vivado gives two modes for source synchronous edge aligned DDR interface, Edge MMCM and Edge Direct. For the same skew values, Rise/Fall Max/Min value will be different. And the difference between the two modes is the position of the rise data and fall data. But how can I define the rise data and the fall data? In my opinion, suppose the delay of clock and data on the PCB are the same, the data after clock rising edge is always 'rise data', so is the 'fall data'.
edge.png
0 Kudos
Guide avrumw
Guide
4,110 Views
Registered: ‎01-23-2009

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

1) I am not clear about the definition of the in 'set_input_delay -min ' and 'set_input_delay -max

 

The set_input_delay -min and -max are pretty simple

   - the -min defines what is the earliest after a clock edge the data can begin to change

   - the -max defines what is the latest after the clock edge the data can complete its change

 

This timing should be extracted from the driving device. Based on your original post, you specified them as -0.5 and +0.5. This is fairly typical of a device outputting a source synchronous interface - it can be specified as "clock/data skew" of 0.5ns.

 

So, with the numbers you showed, the data can begin it's change no earlier than 0.5ns before the edge of the clock and will complete its change no later than 0.5ns after the edge of the clock. Using the wizard's definition, all 4 of the "skew_??e" are 0.5ns.

 

2) But how can I define the rise data and the fall data?

 

Go back to the original post you referenced. It talks about what I refer to as "the cheater way". Using this mechanism you lie to the tool as to what edge causes which data window so that when it chooses the launch and capture edges, they line up with what is actually required.

 

For the "Edge Direct", they are adding period/2 to all the edges - this is the "cheater way". This defines the data windows as not being the one immediately after the edge, but the one 1/2 clock cycle further forward than that - again, cheating. It is lying to the tool as to which is the "rise data" and which is the "fall data" (so you are right to be confused - that is why I don't like the "cheating way").

 

So, for the Edge MMCM, the delays are correct - in your case -0.5 and +0.5 for min and max for both rise and fall. Used alone (no set_multicycle_path or set_false_path, and you don't even need a virtual clock), this is the correct constraints.

 

For Edge Direct, as I said in the referenced post, you can do it the "correct way" which is what you had - it is correct for Edge Direct (but unfortunately not correct for Edge MMCM). This has the same -0.5 an +0.5 delays that you had originally, but fixes the launch/capture edges with the set_multicycle_path and set_false_path.

 

The constraint wizard, though,  has you do the "cheating way" for Edge Direct by adding 1/2 clock period to everything - thus your set_input_delay -min and -max become PERIOD/2-0.5 and PERIOD/2+0.5 (again with no set_false_path or set_multicycle_path).

 

Avrum

Contributor
Contributor
4,069 Views
Registered: ‎04-18-2016

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

Thanks avrumw. This time I understood more clear about the post I referenced. 

 

  I have changed my design to using a BUFR with an IDELAY in the path.

 

  But after implementation, I have got a large clock skew comes from destination clock delay(both with cheating way and correct way). I am not believe it is true due to the clock period is only 3.906ns.

 

  What's happened.?

clk_skew.png
0 Kudos
Guide avrumw
Guide
4,047 Views
Registered: ‎01-23-2009

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

You haven't given me enough of the timing path to figure out what is going wrong. I need to see the complete path. Most likely there is something architecturally incorrect with your clock structure, but I can't tell without the full timing report.

 

Avrum

0 Kudos
Contributor
Contributor
4,035 Views
Registered: ‎04-18-2016

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

Hi, avrumw

 

   Here is the schematic and timing report after implementation of the DDR interface.

 

   There are four data buses named I_DA, I_DB, Q_DA and Q_DB respectively. Each bus has 8 bits.

 

   Clock has been delayed by IDELAY2. Since one BUFR can not drive more than 50 loads(up to 64 loads in this design), a BUFMR has been inserted to drive two BUFRs after the IDELAY2 cell.

sch_and_timing.png
0 Kudos
Guide avrumw
Guide
5,837 Views
Registered: ‎01-23-2009

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

From the timing report, this report (and the failure) look real. The tool is telling you that the clock insertion via the BUFMR and BUFR is immense - over 5ns. Part (but only a small part) is due to the IDELAY on the clock input, but a lot more due to the delays on the BUFMR and BUFR themselves, but even more on the routing between these two. I don't think there is anything that you can do about these delays (they are part of the architecture).

 

Given this, I doubt this clocking scheme is viable - the BUFMR timing is pretty bad. With 5ns of clock insertion, none of which is PVT compensated, I suspect it is impossible to capture a data stream at this clock rate.

 

I am not sure why you switched from the MMCM based mechanism. Given that you have loads distributed among different I/O banks, it will probably be better than the BUFMR... I don't know, though, if it will be "good enough" for this clock rate on your device, but at this point it is you only other choice.

 

By the way, to forward a clock out of the FPGA, do not connect the clock directly to the OBUF (CLK0) - use an ODDR for clock forwarding.

View solution in original post

Contributor
Contributor
4,005 Views
Registered: ‎04-18-2016

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

This module is only the interface for the ADC, apart of the design. Due to the various clocks in the system, I want to move the MMCM from this interface module to the top module. So I switched the MMCM mechanism to BUFR mechanism.

 

Now I retried the MMCM mechanism again with 90 degree phase shift. But the result is also disappointed. Some path failed between the lunch edge at 0 ns and the capture edge at 0.976 ns(90 degree). And some path failed between the the lunch edge at 1.953 ns and the capture edge at 2.929 ns(270 degree). It seems that the delay of CLK is not enough. But when I increase the phase shift, the same results have been got.

 

shift90.png
shift90_2.png
0 Kudos
Guide avrumw
Guide
3,987 Views
Registered: ‎01-23-2009

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

But when I increase the phase shift, the same results have been got.

 

This doesn't make sense. Increasing the phase shift will decrease the magnitude of the setup failure.

 

Of course, the question is "what does it do to the hold time". You need to look at both.

 

If the sum of the setup slack and the hold slack is positive, then the interface is viable - you can adjust the phase of the MMCM to find the proper value that gets both of them to be positive.

 

If the sum of the setup slack and hold slack is negative, then this is not,

 

Your bit period (unit interval) is 1.95ns, and you lose 0.2ns for the clock/data skew, resulting in a data eye of 1.75ns. This is pretty small. You don't tell us what device you are using, nor what speed grade, but even looking at a faster speedgrade of a Virtex-7, this looks like it is too small. ChipSync clocking (using just the BUFR) might have been able to capture something this fast, but the fact that you are on multiple banks (and hence need the BUFMR) pretty much killed that approach.

 

So, in the end, this probably cannot be done statically (at least not unless you can get all the signals in the same bank and use just the BUFR) - you will need some kind of dynamic calibration to make this interface work...

 

Avrum

Contributor
Contributor
2,120 Views
Registered: ‎04-18-2016

Re: Source synchronous edge aligned DDR input constrain failed.

Jump to solution

  If the sum of the setup slack and hold slack is negative, then this is not.

 

As you said, the sum of the setup slack and the hold slack is negative, so it does not seem to get timing closure.

 

0 Kudos