UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Scholar ronnywebers
Scholar
11,967 Views
Registered: ‎10-10-2014

constraining Center-Aligned Dual Data Rate Source Synchronous Inputs (i.e. RGMII phy rx side)

Jump to solution

Hello,

 

I'm trying to setup the correct constraints for connecting my ethernet phy to the GMII-to-RMII IP block.

 

This interface can be considered more generally as a case of "Center-Aligned Dual Data Rate Source Synchronous Inputs". It should be noted though that there are IDELAY units on each of the 4 data lines. I've included a part of the implemented schematic to understand the interface better.

 

I've added a screenshot of the phy timing, and on purpose I slightly changed the ts an th times to be different from each other, to understand better how to calculate the -min -max values in the set_input_delay constraints. 

 

I've read through the answers in this excellent post, but it still isn't clear to me how to calculate the -min and -max values based on the values in the datasheet of the phy (and to correctly relate these to the set_multi_cycle path constraints). 

 

I've extracted only the relevant constraints for the rx side using the tcl command '

write_xdc -exclude_physical test.xdc', and checked the correct order using '

report_compile_order –constraints'

 

I've included all my questions inline with the constraints in comments, the major one being how to exactly calculate the -min and -max value from the phy's datasheet.

 

####################################################################################
# Constraints from file : 'timing.xdc' (Compile order : NORMAL)
####################################################################################

# create the physical clock entering the FPGA on the RGMII_rxc
create_clock -period 8.000 -name RGMII_rxc -waveform {0.000 4.000} [get_ports RGMII_rxc]

####################################################################################
# Constraints from file : 'design_1_gmii_to_rgmii_0_0_clocks.xdc' (Compile order : NORMAL)
####################################################################################

# create the corresponding virtual clock in the external phy
#
# Q1 : looking at diagrams in tutorials explaining 'data arrival time', at the source FF there's :
#
# - a Tclk1 path : the delay from the (virtual) clock source, to the clock input of the source FF
# - a Tco : the delay to clock the data through the source FF
# - the Tdata path : from the Q of the source FF to the D of the destination FF
# 
# then the tutorials show a calculation : Data arrival time = launch edge + Tclk + Tco + Tdata
#
# -> in this case with an off-the-shelve phy, I have no info on the 'internal workings of the phy', so I have no numbers on
# the Tclk1, Tco and (full) Tdata path. I only have the timing diagram from the phy's datasheet (see included image), so
# is my understanding correct that the datasheet timing actually represent/replaces these 3 paths, and that is why the
# virtual clock needs to be identical to the clock entering the FPGA (and not some 90 degree, time-shifted version of it,
# because the phy does output a shifted version of the clock it recovered from the incoming ethernet signal). 
# So in case of setting up input timing, we always need to use a virtual clock, identical to the clock entering the FPGA,
# and the set_input_delay constraints should take care of the rest, matching the timing with the datasheet?

create_clock -period 8.000 -name design_1_gmii_to_rgmii_0_0_rgmii_rx_clk

# Q2 : these are constraints that come with the IP itself, because they are needed during development and debug of the IP,
# guess these must be overwritten in a 'late' xdc file, to adapt to the specific phy used on the board ?
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -max -1.500 [get_ports [list RGMII_rx_ctl {RGMII_rd[0]} {RGMII_rd[1]} {RGMII_rd[2]} {RGMII_rd[3]}]]
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -min -2.800 [get_ports [list RGMII_rx_ctl {RGMII_rd[0]} {RGMII_rd[1]} {RGMII_rd[2]} {RGMII_rd[3]}]]
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -clock_fall -max -add_delay -1.500 [get_ports [list RGMII_rx_ctl {RGMII_rd[0]} {RGMII_rd[1]} {RGMII_rd[2]} {RGMII_rd[3]}]]
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -clock_fall -min -add_delay -2.800 [get_ports [list RGMII_rx_ctl {RGMII_rd[0]} {RGMII_rd[1]} {RGMII_rd[2]} {RGMII_rd[3]}]]

# this sets false paths, as explained by @avrumw in the referenced post
set_false_path -setup -rise_from [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -fall_to [get_clocks -include_generated_clocks -of [get_ports RGMII_rxc]]
set_false_path -setup -fall_from [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -rise_to [get_clocks -include_generated_clocks -of [get_ports RGMII_rxc]]
set_false_path -hold -rise_from [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -rise_to [get_clocks [get_clocks -include_generated_clocks -of [get_ports RGMII_rxc]]]
set_false_path -hold -fall_from [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -fall_to [get_clocks -include_generated_clocks -of [get_ports RGMII_rxc]]

# Q3 : (correct?) this tells Vivado not to use the default setup check (next rising edge), but to use the same rising edge
# for capture as the one that launches the data
set_multicycle_path -setup -from [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -to [get_clocks -include_generated_clocks -of [get_ports RGMII_rxc]] 0
set_multicycle_path -hold -from [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -to [get_clocks -include_generated_clocks -of [get_ports RGMII_rxc]] -1

####################################################################################
# Constraints from file : 'timing_mode1_late.xdc' (Compile order : LATE
####################################################################################

# Q4 : here we must 'overide' the timing to the correct ones as given by the phy's datasheet - but how to calculate these?
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -max ????? [get_ports {RGMII_rd[*] RGMII_rx_ctl}]
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -min ????? [get_ports {RGMII_rd[*] RGMII_rx_ctl}]
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -clock_fall -max -add_delay ????? [get_ports {RGMII_rd[*] RGMII_rx_ctl}]
set_input_delay -clock [get_clocks design_1_gmii_to_rgmii_0_0_rgmii_rx_clk] -clock_fall -min -add_delay ????? [get_ports {RGMII_rd[*] RGMII_rx_ctl}]

# To Adjust GMII Rx Input Setup/Hold Timing, modify the IDELAY Tap values
#
# Q5 : is this the way to get these values correct  : start 'in the middle' at '16', run implementation, verify if timing is met.
# If not -> adjust values, re-run implementation, and check if improved
# -> repeat until setup & hold are met?
#
# Q6 : why can we 'ignore' these delay values in the calculation of -min & -max in te set_input_delay, is this because it's
# considered as part of the timing paths like a regular net, and the tool just infers the 'net' delay inserted by the IDELAY? 
set_property IDELAY_VALUE 16 [get_cells -hier -filter {name =~ *design_1_gmii_to_rgmii_0_0_core/*delay_rgmii_rx_ctl}]
set_property IDELAY_VALUE 16 [get_cells -hier -filter {name =~ *design_1_gmii_to_rgmii_0_0_core/*delay_rgmii_rxd*}]

# connects the IDELAYs to the same IDELAYCTRL
set_property IODELAY_GROUP gpr1 [get_cells -hier -filter {name =~ *design_1_gmii_to_rgmii_0_0_core/*delay_rgmii_rx_ctl}]
set_property IODELAY_GROUP gpr1 [get_cells -hier -filter {name =~ *design_1_gmii_to_rgmii_0_0_core/*delay_rgmii_rxd*}]
set_property IODELAY_GROUP gpr1 [get_cells -hier -filter {name =~ *i_design_1_gmii_to_rgmii_0_0_idelayctrl}]

 

As a side question, I'm also wondering why the line from the BUFR output on the rgmii_rxc clock  to the IDDR and BUFG clock input is 'dashed' 

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
Tags (1)
rmii phy rx timing.jpg
BD.jpg
IP config.jpg
Schematic 1.jpg
schematic2.jpg
0 Kudos
1 Solution

Accepted Solutions
Highlighted
Historian
Historian
20,580 Views
Registered: ‎01-23-2009

Re: constraining Center-Aligned Dual Data Rate Source Synchronous Inputs (i.e. RGMII phy rx side)

Jump to solution

Many of the concepts involved in analyzing this interface will be similar to the post you referenced, as well as to the information given for constraining an edge aligned interface, described in this post.

 

First, lets define the windows. Assuming that the rising edge of the clock occurs at t=0.0, the falling edge at t=4.0, and the next rising at t=8.0. I will define 4 windows - no constraints will need all 4, but this will give us enough information:

  - Window -1: Centered around the falling edge before the first rising edge (at t=-4.0)

     - starts at t=-5.3, ends at t=-2.9

  - Window 0: Centered around the first rising edge (at t=0.0)

     - starts at t=-1.3, ends at t=1.1

  - Window 1: Centered around the first falling edge (at t=4.0)

     - starts at t=2.7, ends at t=5.1

  - Window 2: Centered around the second rising edge (at t=8)

     - starts at t=6.7, ends at t=9.1

 

Now things get complicated. The question is similar to the ones I had in my referenced post - which edge launches the window, and which edge captures the window. By default, the edge that captures a window is the edge after the one that launches it. So, using the normal rules

  - a window defined wrt. the rising edge of clk at t=0.0 will be caputed by the falling edge of clock at t=4.0

  - a window defined wrt. the falling edge of clk at t=4.0 will be caputed by the rising edge of clock at t=8.0

 

Now, what do we want for this interface? There are really two choices

  1) Capture each window with the clock edge around which it is centered

      - capture window 0 with the rising edge of the clock at t=0.0

      - capture window 1 with the falling edge of clock at t=4.0

  2) Capture each window with the next edge

      - capture window 0 with the falling edge of clock at t=4.0

      - capture window 1 with the rising edge of clock at t=8.0

 

This is not necessarily an obvious choice. If we look (say) at BUFR clocking (which is what is specified here), the required setup/hold of the FPGA (measured at the pins of the FPGA) is Tpscs/Tphcs (you can find this in the appropriate device datasheet), but for all 7 series devices, Tpscs is slightly negative, and Tphcs is fairly large (A7 in -1 speedgrade is -0.38/1.70; this means that the required setup/hold window of the FPGA will occur after the edge of the clock; the data must be valid from t=0.38 to t=1.70.

 

So, if we were to capture window 0 with the rising edge of clock at t=0, we would need to delay the data so that the window we do have [-1.3,1.1] overlaps the required window of [0.38.1.70]. We would do this with IDELAYs with a handful of taps of delay. This is consistent with the interface defined by the IP core, which has IDELAY only on the data. So, I will assume this relationship (#1 above) and not define the other one (#2 above).

 

Now, how do we define the windows.

 

Again, if we want window 0 to be captured with the rising edge of clock 0, we have two choices

  a) the cheating way

       - if we want the rising edge of clock to capture window 0, we have to define it with respect to the preceding edge

           - the falling edge of clock at time t=-4.0

  b) the correct way

      - we define the window with respect to the rising edge at t=0.0, and change the edge relationship

           - this uses the set_multicycle_path 0, set_multicycle_path -hold -1, and all the set_false_paths described in my referenced post

           - these commands are all included by the IP constraint file design_1_gmii_to_rgmii_0_0_clocks.xdc

           - this also needs a virtual clock which  is defined in the constraint file as design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 

              -  (it is a virtual clock since it is not attached to any ports).

 

So lets do b) first

 

To define window 0 with respect to the clock at t=0.0, we know

   - the edge that starts this window can be anywhere from t=-2.9 to t=-1.3

      - the end of window -1 to the start of window 0

 

So the constraints are

 

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -1.3 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -2.9 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -1.3 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -2.9 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

 

These are essentially identical to the ones in design_1_gmii_to_rgmii_0_0_clocks.xdc with the numbers changed to match the -1.3/1.1 SU/H window provided by your (ficticious) RGMII PHY.

 

So, the above are the answer to Q4.

 

Using option a (the cheating way - I don't think this makes sense to do - I am only describing it for completeness) is actually really complicated here. First, we would have to find a way to disable the design_1_gmii_to_rgmii_0_0_clocks.xdc file. While we can "override" the set_multicycle_path commands (and put them back to 1 for setup and 0 for hold), we cannot "un-false_path" the false paths. So once this file is read, we have to stick with option b.

 

To use a, we would have to disable the IP's XDC file (which can be done in project mode by disabling the ENABLED property of the file), and then use different set_input_delay commands. For this option, we would need to define window 1 with respect to the rising edge of the clock at t=0.0 (so it is captured by the falling edge of clock at t=4.0), and window 2 with respect to the falling edge of the clock at t=4.0 (relative to that edge).

 

So the min would be the end of window 0 (at t=1.1) and the max would be the beginning of window 1 (at t=2.7).

 

So the constraints would be:

 

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 2.7 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 1.1 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 2.7 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 1.1 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

 

I think its pretty clear, though, that the preferred option is option b - leave the set_multicycle_path and set_false_path commands in place and use the first set of constraints.

 

Finishing your questions:

 

Q1: The nomenclature described in the constraint window doesn't apply. It assumes they external device gives you a min and max clock to output time. This PHY (as is common in center aligned interfaces) gives you a "provided setup and hold time" - these are actually converses of eachother. The solution is to draw out the waveforms and infer the timings as I did above.

 

Q2: These are "default" constraints. They are actually partly correct since they set up the edge relationships properly (the set_multicycle_path and set_false_path constraints). Only the actual numbers are incorrect, and they need to be overridden in your "late" constraints.

 

Q3: Correct

 

Q4: Answered above

 

Q5: Not really - you should be able to get "close" using calculations

 

The required window is [0.38,1.70], and the provided window is [-1.3,1.1]. So you need to delay the window by at least 1.70-1.10ns=600ps. At 78ps/tap (assuming a 200MHz reference clock on the IDELAYCTRL), this will require 8 taps. However, you want to "center your margin". Your provided window is 2.4ns long, and your required window is 1.32ns, so the entire system will have 1.08ns of margin. Ideally you want to share this margin between setup and hold, so you want to add another 540ps of delay for a total of 1140ps of delay, which results in a tap setting of 15.

 

Now, these numbers are estimates only. It is a good idea to plug them into the tool, do (at least) synthesis, and then do a report_datasheet command. The tools will actually tell you the real numbers, and then you can adjust your tap settings accordingly.

 

Q6:

 

The constraints specify the interface timing at the pins - the timing provided by the system outside the FPGA. However, the IDELAY changes the behavior of the inside of the FPGA - it adds delay between internal elements. We are specifically adjusting these so that the FPGA can capture the timing defined by the set_input_delay -min and -max.

 

Now, a couple of things to consider...

 

IDELAYs add jitter to data. In HIGH_PERFORMANCE_MODE=FALSE (the default), this is +/-9ps per tap. With 15 taps, this is +/-135ps, so your margin will be reduced by 270ps. In HIGH_PERFORMANCE_MODE=TRUE, this is reduced to +/-5ps per tap, which is +/-75ps or 150ps total.

 

Delaying a clock through an IDELAY, though, does not incur a penalty. For this reason, it is generally better to delay clocks rather than data. This is more easily done in edge aligned interfaces (instead of center aligned interfaces) - so if your PHY has the option of either, its generally better to use edge aligned...

 

Avrum

3 Replies
Highlighted
Historian
Historian
20,581 Views
Registered: ‎01-23-2009

Re: constraining Center-Aligned Dual Data Rate Source Synchronous Inputs (i.e. RGMII phy rx side)

Jump to solution

Many of the concepts involved in analyzing this interface will be similar to the post you referenced, as well as to the information given for constraining an edge aligned interface, described in this post.

 

First, lets define the windows. Assuming that the rising edge of the clock occurs at t=0.0, the falling edge at t=4.0, and the next rising at t=8.0. I will define 4 windows - no constraints will need all 4, but this will give us enough information:

  - Window -1: Centered around the falling edge before the first rising edge (at t=-4.0)

     - starts at t=-5.3, ends at t=-2.9

  - Window 0: Centered around the first rising edge (at t=0.0)

     - starts at t=-1.3, ends at t=1.1

  - Window 1: Centered around the first falling edge (at t=4.0)

     - starts at t=2.7, ends at t=5.1

  - Window 2: Centered around the second rising edge (at t=8)

     - starts at t=6.7, ends at t=9.1

 

Now things get complicated. The question is similar to the ones I had in my referenced post - which edge launches the window, and which edge captures the window. By default, the edge that captures a window is the edge after the one that launches it. So, using the normal rules

  - a window defined wrt. the rising edge of clk at t=0.0 will be caputed by the falling edge of clock at t=4.0

  - a window defined wrt. the falling edge of clk at t=4.0 will be caputed by the rising edge of clock at t=8.0

 

Now, what do we want for this interface? There are really two choices

  1) Capture each window with the clock edge around which it is centered

      - capture window 0 with the rising edge of the clock at t=0.0

      - capture window 1 with the falling edge of clock at t=4.0

  2) Capture each window with the next edge

      - capture window 0 with the falling edge of clock at t=4.0

      - capture window 1 with the rising edge of clock at t=8.0

 

This is not necessarily an obvious choice. If we look (say) at BUFR clocking (which is what is specified here), the required setup/hold of the FPGA (measured at the pins of the FPGA) is Tpscs/Tphcs (you can find this in the appropriate device datasheet), but for all 7 series devices, Tpscs is slightly negative, and Tphcs is fairly large (A7 in -1 speedgrade is -0.38/1.70; this means that the required setup/hold window of the FPGA will occur after the edge of the clock; the data must be valid from t=0.38 to t=1.70.

 

So, if we were to capture window 0 with the rising edge of clock at t=0, we would need to delay the data so that the window we do have [-1.3,1.1] overlaps the required window of [0.38.1.70]. We would do this with IDELAYs with a handful of taps of delay. This is consistent with the interface defined by the IP core, which has IDELAY only on the data. So, I will assume this relationship (#1 above) and not define the other one (#2 above).

 

Now, how do we define the windows.

 

Again, if we want window 0 to be captured with the rising edge of clock 0, we have two choices

  a) the cheating way

       - if we want the rising edge of clock to capture window 0, we have to define it with respect to the preceding edge

           - the falling edge of clock at time t=-4.0

  b) the correct way

      - we define the window with respect to the rising edge at t=0.0, and change the edge relationship

           - this uses the set_multicycle_path 0, set_multicycle_path -hold -1, and all the set_false_paths described in my referenced post

           - these commands are all included by the IP constraint file design_1_gmii_to_rgmii_0_0_clocks.xdc

           - this also needs a virtual clock which  is defined in the constraint file as design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 

              -  (it is a virtual clock since it is not attached to any ports).

 

So lets do b) first

 

To define window 0 with respect to the clock at t=0.0, we know

   - the edge that starts this window can be anywhere from t=-2.9 to t=-1.3

      - the end of window -1 to the start of window 0

 

So the constraints are

 

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -1.3 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -2.9 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -1.3 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk -2.9 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

 

These are essentially identical to the ones in design_1_gmii_to_rgmii_0_0_clocks.xdc with the numbers changed to match the -1.3/1.1 SU/H window provided by your (ficticious) RGMII PHY.

 

So, the above are the answer to Q4.

 

Using option a (the cheating way - I don't think this makes sense to do - I am only describing it for completeness) is actually really complicated here. First, we would have to find a way to disable the design_1_gmii_to_rgmii_0_0_clocks.xdc file. While we can "override" the set_multicycle_path commands (and put them back to 1 for setup and 0 for hold), we cannot "un-false_path" the false paths. So once this file is read, we have to stick with option b.

 

To use a, we would have to disable the IP's XDC file (which can be done in project mode by disabling the ENABLED property of the file), and then use different set_input_delay commands. For this option, we would need to define window 1 with respect to the rising edge of the clock at t=0.0 (so it is captured by the falling edge of clock at t=4.0), and window 2 with respect to the falling edge of the clock at t=4.0 (relative to that edge).

 

So the min would be the end of window 0 (at t=1.1) and the max would be the beginning of window 1 (at t=2.7).

 

So the constraints would be:

 

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 2.7 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 1.1 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}]

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 2.7 -max [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

set_input_delay -clock design_1_gmii_to_rgmii_0_0_rgmii_rx_clk 1.1 -min [get_ports {RGMII_rx_ctl RGMII_rd[*]}] -clock_fall -add_delay

 

I think its pretty clear, though, that the preferred option is option b - leave the set_multicycle_path and set_false_path commands in place and use the first set of constraints.

 

Finishing your questions:

 

Q1: The nomenclature described in the constraint window doesn't apply. It assumes they external device gives you a min and max clock to output time. This PHY (as is common in center aligned interfaces) gives you a "provided setup and hold time" - these are actually converses of eachother. The solution is to draw out the waveforms and infer the timings as I did above.

 

Q2: These are "default" constraints. They are actually partly correct since they set up the edge relationships properly (the set_multicycle_path and set_false_path constraints). Only the actual numbers are incorrect, and they need to be overridden in your "late" constraints.

 

Q3: Correct

 

Q4: Answered above

 

Q5: Not really - you should be able to get "close" using calculations

 

The required window is [0.38,1.70], and the provided window is [-1.3,1.1]. So you need to delay the window by at least 1.70-1.10ns=600ps. At 78ps/tap (assuming a 200MHz reference clock on the IDELAYCTRL), this will require 8 taps. However, you want to "center your margin". Your provided window is 2.4ns long, and your required window is 1.32ns, so the entire system will have 1.08ns of margin. Ideally you want to share this margin between setup and hold, so you want to add another 540ps of delay for a total of 1140ps of delay, which results in a tap setting of 15.

 

Now, these numbers are estimates only. It is a good idea to plug them into the tool, do (at least) synthesis, and then do a report_datasheet command. The tools will actually tell you the real numbers, and then you can adjust your tap settings accordingly.

 

Q6:

 

The constraints specify the interface timing at the pins - the timing provided by the system outside the FPGA. However, the IDELAY changes the behavior of the inside of the FPGA - it adds delay between internal elements. We are specifically adjusting these so that the FPGA can capture the timing defined by the set_input_delay -min and -max.

 

Now, a couple of things to consider...

 

IDELAYs add jitter to data. In HIGH_PERFORMANCE_MODE=FALSE (the default), this is +/-9ps per tap. With 15 taps, this is +/-135ps, so your margin will be reduced by 270ps. In HIGH_PERFORMANCE_MODE=TRUE, this is reduced to +/-5ps per tap, which is +/-75ps or 150ps total.

 

Delaying a clock through an IDELAY, though, does not incur a penalty. For this reason, it is generally better to delay clocks rather than data. This is more easily done in edge aligned interfaces (instead of center aligned interfaces) - so if your PHY has the option of either, its generally better to use edge aligned...

 

Avrum

Scholar ronnywebers
Scholar
11,235 Views
Registered: ‎10-10-2014

Re: constraining Center-Aligned Dual Data Rate Source Synchronous Inputs (i.e. RGMII phy rx side)

Jump to solution

Hello @avrumw, thanks a lot for the really great answer, I'm starting to really understand these constraints thanks to you. A few things :

 

1) you're indeed right that the timings from the phy I gave you are not the real ones, as both tsetup and thold are actually 1.2ns (symmetrical) (it's aMarvell phy like the one on the Zedboard), but I modified these values on purpose to make sure I'd understand where tsetup and thold would go in the calculations. But I couldn't fool you apparantly :-)

 

2) regarding the A7 -1 speed grade, I tried to find these in the datasheet, and found the values in the first screenshot below (table 47) of -0.38 / 1.76  -> are these the correct ones (you used -0.38 / 1.70, so I might be looking in the wrong place? not sure if BUFIO corresponds to the BUFR the IP block is using?). As a side question : does the 'sample window' in table 48 relate to table 47? 

 

this 'required' setup/hold times at the FPGA pins you explained, together with drawing the diagrams were for me key to understanding the whole constraint stuff - when just looking at the timing analyser results, I was not really getting it.

 

3) I never really understood the -min and -max options, and set_multi_cycle paths, but looking at the diagrams on paper it becomes more clear.

 

In the mean time I received an explanation for the -min -max calculation from Xilinx support as folows :

 

(asuming the set_multicycle_path with setup 0 and hold -1, and using my imaginary phy timings just to understand this better)

 

The first 2 timing constraints create a data window relative to the rising edge at t=0ns, which

 

a) starts 1.3ns before the rising edge at t=0ns (becaues of the setup 0), hence -1.3 -max

b) lasts 1.1ns after the rising edge, but since hold is analyzed on the previous edge (because of hold -1), we have to subtract 4ns, which gives 1.1 - 4 = -2.9ns -min

 

The other 2 timing constraints create a similar data window, but relative to the falling edge at t=4ns

 

I think to understand / interprete the set_multicycle_path like :  'window -1' being ended by the previous clock edge (hence hold -1) and 'window 0' being initiated by the edge at t=0ns (hence setup 0) 

 

but drawing and looking at the diagrams is easier to follow I must admit :-)

 

4) I am using a Zynq 7Z020 -1 device, I attached the timing diagrams (table 82) for this device too -> these are -0.38 / 1.86 if I'm correct

 

I'll adjust my constraints first and post an updated version, along with the drawed diagrams hereafter (I still don't get timing closure with your new constraints btw), but I'll first straighten the forementioned things. Or maybe it's better to accept your answer here, and start a new post and not contaminate this one further?

** kudo if the answer was helpful. Accept as solution if your question is answered **
Tags (1)
Tpscs and Tphcs - Artix.jpg
Tpscs and Tphcs - Zynq.jpg
0 Kudos
Historian
Historian
11,209 Views
Registered: ‎01-23-2009

Re: constraining Center-Aligned Dual Data Rate Source Synchronous Inputs (i.e. RGMII phy rx side)

Jump to solution

and found the values in the first screenshot below (table 47) of -0.38 / 1.76  -> are these the correct ones (you used -0.38 / 1.70,)

 

Yes, this is the right place - I must have mis-typed them; -0.38/1.76 for a window of 1.38.

 

not sure if BUFIO corresponds to the BUFR the IP block is using?)

 

Not exactly, but they are close. In any case, all these numbers are not to be considered the last word - the tool is the last word, and all timing needs to be checked through the tool.

 

As a side question : does the 'sample window' in table 48 relate to table 47?

 

The sample window is the smallest data window that can be captured with "perfect" dynamic calibration. We know that across PVT the required static window is 1.38. The actual window for any particular device is actually much smaller, but we don't know where it is within this window of 1.38. If you use dynamic calibration to find the "perfect" point to sample, then given all the internal uncertainties (tap granularity, phase error, clock skew), as long as your window is 0.70 (using an MMCM) or 0.46 (using a BUFIO and IDELAY) then the "perfect" dynamic capture mechanism can capture the data.

 

The actual construction of this "perfect" dynamic capture mechanism isn't explicitly stated...

 

In the mean time I received an explanation for the -min -max calculation from Xilinx support as folows :

 

The two explanations are are essentially the same.

 

I am using a Zynq 7Z020 -1 device, I attached the timing diagrams (table 82) for this device too -> these are -0.38 / 1.86 if I'm correct?

 

Yes.

 

Avrum

 

Tags (1)