UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Reply

Constraining a center-aligned source-synchronous DDR signals when input clock has huge path delay

Highlighted
Voyager
Posts: 326
Registered: ‎08-07-2014

Constraining a center-aligned source-synchronous DDR signals when input clock has huge path delay

[ Edited ]

Hi,

 

I am having a design with 2 RGMII interface connecting to external PHYs. I use Vivado 2017.2 and have difficulty in getting the timing to pass in a proper way (I can get xdc to pass Impl but I am sure the gmii_rx_data will be sampled wrong in real hardware). Device is a series7 FPGA. The 2ns delay are added by the PHYs to the rgmii_rx*, so data is stable when rgmii_rx_clk changes.

 

One major constraint is that my rgmii_0_rx_clk and rgmii_1_rx_clk are being fed-in via non-clock capable input pins (this cannot be changed, I must make the design work with the large net delay).

 

Following is an EXCERPT from my xdc (only the most relevant ones are shown for simplicity).

 

create_generated_clock -name clk_125M [get_pins *mmcm_inst_111_to_125/inst/mmcm_adv_inst/CLKOUT0]

# Slot i/f RGMII clocks
create_clock -period   8.000  -name s0_rgmii_0_rx_clk    [ get_ports slot_d1[0][7] ]  
create_clock -period   8.000  -name s0_rgmii_1_rx_clk    [ get_ports slot_d2[0][1] ]

# Disabling the inter-clock analysis
set_clock_groups -name async_clocks -asynchronous \
    -group [get_clocks -include_generated_clocks s0_rgmii_0_rx_clk      ] \
    -group [get_clocks -include_generated_clocks s0_rgmii_1_rx_clk      ] \         
    -group [get_clocks                           clk_125M               ] \          
    -group [get_clocks                           clk_15M625             ]   

# Power-down
set_false_path -to [get_ports slot_d0[0][2]]
set_false_path -to [get_ports slot_d3[0][6]]

# PHY reset
set_false_path -to [get_ports slot_d1[0][2]]
set_false_path -to [get_ports slot_d2[0][6]]

create_generated_clock -name s0_rgmii_0_tx_clk -multiply_by 1 -divide_by 1 -add -master_clock [get_clocks clk_125M] -source \
                       [get_pins slot[0].if_inst/front_end_inst/dual_rgmii.rgmii_bridge_gen[0].rgmii_bridge_inst/gmii2rgmii_trispeed_inst/rgmii_txc_out/C] \
                       [get_ports slot_d0[0][1]]                       
create_generated_clock -name s0_rgmii_1_tx_clk -multiply_by 1 -divide_by 1 -add -master_clock [get_clocks clk_125M] -source \
                       [get_pins slot[0].if_inst/front_end_inst/dual_rgmii.rgmii_bridge_gen[1].rgmii_bridge_inst/gmii2rgmii_trispeed_inst/rgmii_txc_out/C] \
                       [get_ports slot_d3[0][7]] 
 
## RGMII i/p delays 

# Specify input delay values (taken from gmii2rgmii xilinx IP xdc) 
set S0_RGMII_0_MIN_INPUT_DELAY -2.80
set S0_RGMII_1_MIN_INPUT_DELAY -2.80

set S0_RGMII_0_MAX_INPUT_DELAY -1.50
set S0_RGMII_1_MAX_INPUT_DELAY -1.50 

# Specify RGMII_0 input delays
# Format: set_input_delay -clock [get_clocks <clk_name>] -min <value> [get_ports {rgmii_rdx0...rgmii_rdx3 rgmii_rx_ctl}]

set_input_delay -clock [get_clocks s0_rgmii_0_rx_clk ] -min $S0_RGMII_0_MIN_INPUT_DELAY  [get_ports \
                        {slot_d0[0][6] slot_d0[0][7] slot_d1[0][5] slot_d1[0][6] slot_d0[0][5]}] 
                        
set_input_delay -clock [get_clocks s0_rgmii_0_rx_clk ] -max $S0_RGMII_0_MAX_INPUT_DELAY  [get_ports \
                        {slot_d0[0][6] slot_d0[0][7] slot_d1[0][5] slot_d1[0][6] slot_d0[0][5]}]                        
          
set_input_delay -clock [get_clocks s0_rgmii_0_rx_clk ] -min $S0_RGMII_0_MIN_INPUT_DELAY  [get_ports \
                        {slot_d0[0][6] slot_d0[0][7] slot_d1[0][5] slot_d1[0][6] slot_d0[0][5]}] -clock_fall -add_delay
                                                                                                    
set_input_delay -clock [get_clocks s0_rgmii_0_rx_clk ] -max $S0_RGMII_0_MAX_INPUT_DELAY  [get_ports \
                        {slot_d0[0][6] slot_d0[0][7] slot_d1[0][5] slot_d1[0][6] slot_d0[0][5]}] -clock_fall -add_delay                                                                                                   

# Specify RGMII_1 input delays           

set_input_delay -clock [get_clocks s0_rgmii_1_rx_clk ] -min $S0_RGMII_1_MIN_INPUT_DELAY  [get_ports \
                        {slot_d3[0][5] slot_d3[0][4] slot_d2[0][3] slot_d2[0][2] slot_d3[0][3]}]
                        
set_input_delay -clock [get_clocks s0_rgmii_1_rx_clk ] -min $S0_RGMII_1_MAX_INPUT_DELAY  [get_ports \
                        {slot_d3[0][5] slot_d3[0][4] slot_d2[0][3] slot_d2[0][2] slot_d3[0][3]}]                        

set_input_delay -clock [get_clocks s0_rgmii_1_rx_clk ] -min $S0_RGMII_1_MIN_INPUT_DELAY  [get_ports \
                        {slot_d3[0][5] slot_d3[0][4] slot_d2[0][3] slot_d2[0][2] slot_d3[0][3]}] -clock_fall -add_delay
                        
set_input_delay -clock [get_clocks s0_rgmii_1_rx_clk ] -min $S0_RGMII_1_MAX_INPUT_DELAY  [get_ports \
                        {slot_d3[0][5] slot_d3[0][4] slot_d2[0][3] slot_d2[0][2] slot_d3[0][3]}] -clock_fall -add_delay

## RGMII o/p delays   
                     
# Specify output delay values 
set S0_RGMII_0_MIN_OUTPUT_DELAY 1.20
set S0_RGMII_1_MIN_OUTPUT_DELAY 1.20

set S0_RGMII_0_MAX_OUTPUT_DELAY 3.40
set S0_RGMII_1_MAX_OUTPUT_DELAY 3.43 
 
# Specify RGMII_0 output delays 

set_output_delay -clock [get_clocks s0_rgmii_0_tx_clk ] -min $S0_RGMII_0_MIN_OUTPUT_DELAY [get_ports \
                         {slot_d1[0][3] slot_d1[0][1] slot_d0[0][4] slot_d0[0][3] slot_d1[0][4]}]

set_output_delay -clock [get_clocks s0_rgmii_0_tx_clk ] -max $S0_RGMII_0_MAX_OUTPUT_DELAY [get_ports \
                         {slot_d1[0][3] slot_d1[0][1] slot_d0[0][4] slot_d0[0][3] slot_d1[0][4]}]
                         
set_output_delay -clock [get_clocks s0_rgmii_0_tx_clk ] -min $S0_RGMII_0_MIN_OUTPUT_DELAY [get_ports \
                         {slot_d1[0][3] slot_d1[0][1] slot_d0[0][4] slot_d0[0][3] slot_d1[0][4]}] -clock_fall -add_delay

set_output_delay -clock [get_clocks s0_rgmii_0_tx_clk ] -max $S0_RGMII_0_MAX_OUTPUT_DELAY [get_ports \
                         {slot_d1[0][3] slot_d1[0][1] slot_d0[0][4] slot_d0[0][3] slot_d1[0][4]}] -clock_fall -add_delay                                                                                                              

# Specify RGMII_1 output delays

set_output_delay -clock [get_clocks s0_rgmii_1_tx_clk ] -min $S0_RGMII_1_MIN_OUTPUT_DELAY [get_ports \
                         {slot_d2[0][5] slot_d2[0][7] slot_d3[0][1] slot_d3[0][2] slot_d2[0][4]}]
                         
set_output_delay -clock [get_clocks s0_rgmii_1_tx_clk ] -max $S0_RGMII_1_MAX_OUTPUT_DELAY [get_ports \
                         {slot_d2[0][5] slot_d2[0][7] slot_d3[0][1] slot_d3[0][2] slot_d2[0][4]}]                         

set_output_delay -clock [get_clocks s0_rgmii_1_tx_clk ] -min $S0_RGMII_1_MIN_OUTPUT_DELAY [get_ports \
                         {slot_d2[0][5] slot_d2[0][7] slot_d3[0][1] slot_d3[0][2] slot_d2[0][4]}] -clock_fall -add_delay             
   
set_output_delay -clock [get_clocks s0_rgmii_1_tx_clk ] -max $S0_RGMII_1_MAX_OUTPUT_DELAY [get_ports \
                         {slot_d2[0][5] slot_d2[0][7] slot_d3[0][1] slot_d3[0][2] slot_d2[0][4]}] -clock_fall -add_delay

I have made my initial xdc, shown above, for the rgmii rx data, from the constraints mentioned in the docu - GMII to RGMII v4.0, PG160 November 18, 2015. The min/max values of -2.80 and -1.50.

 

 

 Inside the gmii2rgmii module, I have made sure this is done. The IDDRs are clocked with rgmii_rxc_int.

 

    bufg_gmii_rx_clk: BUFG
    port map(
        I => rgmii_rxc_i,    
        O => rgmii_rxc_int   
    );

 

I am having HOLD path violations due to the huge routing delay of rgmii_*_rx_clk. For simplicity I have shown the violations only for rgmii_0_rx_clk. Please see attached screenshots.

rgmii0_clk_summary.jpg

Detailed:

hold_path.jpg

 

After reading the following posts,

https://forums.xilinx.com/t5/Timing-Analysis/How-to-constraint-Same-Edge-capture-edge-aligned-DDR-input/m-p/646009#M8411
https://forums.xilinx.com/t5/Timing-Analysis/constraining-Center-Aligned-Dual-Data-Rate-Source-Synchronous/m-p/673216#M9220
https://forums.xilinx.com/t5/Timing-Analysis/Input-clock-IDDR/td-p/735891

 

I have made some drawings based on the above screen-shot reports, and have the following conclusion (hope I am correct):

Due to the net_delay, the rising_edge of the launch clock rgmii_rx_clk, shifts from 0.0ns to 7.939ns when it reaches the IDDR. Now this is almost 8ns, which is ideally the next rising_edge of the launch_clock. But inside the FPGA at the IDDR input 7.939ns is the real first rising_edge equivalent to the 0.0ns rising_edge of the launch_clock.

 

Now I think I must tell the tool to consider the rising_edge of rgmii_rx_clk at 7.939ns as the first capture edge.

 

Do I do this by the set_multicycle_path command?

Is this correct approach or can it be done in a better way?

 

 

 

 

 

 

 

 

 

--------------------------------------------------------------------------------------------------------
Being a non-Xilinx member, giving out "Kudos" or marking my posts as "Accept as solution" would trigger frequent and better future answers.
--------------------------------------------------------------------------------------------------------
Instructor
Posts: 3,631
Registered: ‎01-23-2009

Re: Constraining a center-aligned source-synchronous DDR signals when input clock has huge path delay

I haven't gone through all your constraints - take a closer look at the posts you referenced.

 

From what little I looked at, you have a window from -1.8ns to 1.1ns - this is "centered" around the rising edge of the clock. If you want to capture this with the rising edge of the clock then you will need the set_multicycle_path 0 and all the associated exception commands ( set_multicycle_path -hold -1, set_false_path between rising and falling clocks for setup, between same edges for hold).

 

But...

 

You are trying to capture a 2.9ns data valid window with a non-clock capable I/O driving a BUFG. I highly doubt that this is a viable clocking structure. Since this isn't on a clock capable I/O you cannot use the BUFIO/BUFR combination, and the BUFG alone is really slow. You don't tell us what actual device (Artix/Kintex/Virtex) or what speed grade, but even on a Kintex-7 in a -2 part (which is middle of the road in terms of performance) the datasheet is indicating that you need windows that are around 3.3ns wide with this clocking scheme (you only have 2.9ns) - and that is without all the extra uncertainty of the fact that you are not on a clock capable I/O.

 

At very least you need to try and use an MMCM to deskew the input clock. Again, this will be using non-dedicated routing to the MMCM, but it will be better than the BUFG directly. Even with this, there is no guarantee that this can be made to work reliably (you will have to try it and get the tool to tell you if it will work).

 

You will also almost certainly need some mechanism of adjusting the delay to match what the system needs - this can be done either with IDELAYs or with the phase shift of the MMCM - neither of those are currently in your system.

 

Avrum

Voyager
Posts: 326
Registered: ‎08-07-2014

Re: Constraining a center-aligned source-synchronous DDR signals when input clock has huge path delay

[ Edited ]

Hi avrumw,

 

I am using A7 -1, so it is -0.38/1.76 (https://forums.xilinx.com/t5/Timing-Analysis/constraining-Center-Aligned-Dual-Data-Rate-Source-Synchronous/m-p/673216#M9220).

 

I didn't try the set_multicycle_path, I will.

 

At very least you need to try and use an MMCM to deskew the input clock.

Thanks for this tip.

 

You will also almost certainly need some mechanism of adjusting the delay to match what the system needs - this can be done either with IDELAYs or with the phase shift of the MMCM - neither of those are currently in your system.

I know using IDELAYs is one way. Will also try it.

 

This is what I have also done in parallel:

I have used a large value on the set_input_delay for the input data paths (min/max delay values shown below), to compensate for the clock path delay. I had obtained a timing clean bitstream. Using ILA core on the gmii_rx paths, I had seen there, that my rgmii_0_rx works (55 55 55 55 55 55 55 d5 da d1 d2 .....) and rgmii_1_rx fails (preamble itself was wrong). I didn't post those results. I think my rgmii_rx_clk was sampling the wrong data window for the rgmii_1 set. Hence I wanted to do it the right way from scratch.

 

# Specify input delay values 
set S0_RGMII_0_MIN_INPUT_DELAY 6.76
set S0_RGMII_1_MIN_INPUT_DELAY 6.76

set S0_RGMII_0_MAX_INPUT_DELAY 6.51
set S0_RGMII_1_MAX_INPUT_DELAY 6.51

 

 

--------------------------------------------------------------------------------------------------------
Being a non-Xilinx member, giving out "Kudos" or marking my posts as "Accept as solution" would trigger frequent and better future answers.
--------------------------------------------------------------------------------------------------------