Showing results for 
Show  only  | Search instead for 
Did you mean: 
Registered: ‎10-16-2018

Constraining outputs: SDR interface, DDR data pin, DDR clock pin


after hours of research within these forums and (mostly) Xilinx documents, I have come to the conclusion that either my Ultrascale+ design and constraints are right and that it is just not possible to meet my required timings or that I have chosen a wrong approach to (at least) constraining the required outputs.

Here some basic points:

  • I want drive the SDR input of a DAC, with clock and data generated within FPGA (source synchronous). A data output rate of 200MSa/s shall be achieved.
  • DAC conditions:
    • Setup time: 2.0ns
    • Hold time: 1.5ns
    • Clock trace length exceeds data trace length by 0.4ns
  • FPGA (XCZU4) conditions:
    • Vector of output data connected to DAC via HD bank pins (max. 250MSa/s)
      • Data is generated internally (just a simple counter in the example design) at 200MHz, a derived 100MHz clock is used to drive the corresponding data through DDR (tools decline 200MHz SDR output as 8ns is minimum SDR HD pin output period).
    • Clock connected to DAC via HP bank LVDS pin pair (many MSa/s)
      • 100MHz FPGA input clock generated by very-low-jitter clock generator
      • FPGA input clock is received via HDGC pins and drives an internal MMCM (no dedicated route from pins) to produce the clocks required for output
      • The DDR component is used to optimized clock output properties. It is driven by a phase shifted 200MHz clock from the same MMCM as the data.

I have tried to produce a minimum design including the required parts, so here it is:

Design schematic:



Main design component:


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library UNISIM;

entity Design is
        g_dataWordLength    : natural := 1
        i_inputClockP       : in  std_logic;
        i_inputClockN       : in  std_logic;

        o_outputClockP      : out std_logic;
        o_outputClockN      : out std_logic;

        o_data              : out std_logic_vector(g_dataWordLength - 1 downto 0)
end design;

architecture Behavioral of Design is

    subtype data is unsigned(g_dataWordLength - 1 downto 0);
    type dataArray is array(natural range <>) of data;

    signal w_processingClock                : std_logic;
    signal w_processingClockHalf            : std_logic;
    signal w_processingClockPhaseShifted    : std_logic;

    signal w_outputClockDDR                 : std_logic;

    signal r_dataCurrent                    : data := (others => '0');
    signal r_dataEvenBuffered               : data;

    signal r_dataEvenOdd                    : dataArray(0 to 1);
    signal r_dataEvenNotOdd                 : std_logic := '1';


    ClockGeneration : entity work.ClockGeneration
        port map(
            i_inputClockP               => i_inputClockP,
            i_inputClockN               => i_inputClockN,
            i_reset                     => '0',--i_resetClocks,
            o_resetDone                 => open,--o_resetClocksDone,
            o_processingClock           => w_processingClock,
            o_processingClockHalf       => w_processingClockHalf,
            o_processClockPhaseShifted  => w_processingClockPhaseShifted

    GenerateData : process (w_processingClock) is
        if rising_edge(w_processingClock) then
            r_dataCurrent <= r_dataCurrent + 1;
        end if;
    end process;

    DataToEvenOdd : process (w_processingClock) is
        if rising_edge(w_processingClock) then
            if r_dataEvenNotOdd = '1' then
                r_dataEvenBuffered  <= r_dataCurrent;
                r_dataEvenOdd(0) <= r_dataEvenBuffered;
                r_dataEvenOdd(1) <= r_dataCurrent;
            end if;
            r_dataEvenNotOdd <= not r_dataEvenNotOdd;
        end if;
    end process;

    DataVectorOutputDDR : for i in r_dataEvenOdd(0)'range generate
        DataOutputDDR : ODDRE1
            generic map(
                IS_C_INVERTED  => '0',
                IS_D1_INVERTED => '0',
                IS_D2_INVERTED => '0',
                SIM_DEVICE     => "ULTRASCALE_PLUS",
                SRVAL          => '0'
            port map(
                Q  => o_data(i),
                C  => w_processingClockHalf,
                D1 => r_dataEvenOdd(0)(i),
                D2 => r_dataEvenOdd(1)(i),
                SR => '0'
    end generate;

    ClockOutputDDR : ODDRE1
        generic map(
            IS_C_INVERTED  => '0',
            IS_D1_INVERTED => '0',
            IS_D2_INVERTED => '0',
            SIM_DEVICE     => "ULTRASCALE_PLUS",
            SRVAL          => '0'
        port map(
            Q  => w_outputClockDDR,
            C  => w_processingClockPhaseShifted,
            D1 => '1',
            D2 => '0',
            SR => '0'

    ClockOutputBuffer : OBUFDS
        generic map(
            IOSTANDARD => "DEFAULT",
            SLEW       => "FAST")
        port map(
            O  => o_outputClockP,
            OB => o_outputClockN,
            I  => w_outputClockDDR

end Behavioral;



ClockGeneration component:


library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library UNISIM;

entity ClockGeneration is
        i_inputClockP                   : in  std_logic;
        i_inputClockN                   : in  std_logic;

        i_reset                         : in  std_logic;
        o_resetDone                     : out std_logic;

        o_processingClock               : out std_logic;
        o_processingClockHalf           : out std_logic;
        o_processClockPhaseShifted      : out std_logic
end entity;

architecture Behavior of ClockGeneration is

    signal w_clockInToBuffer        : std_logic;
    signal w_clockInFromBuffer      : std_logic;
    signal w_clockGeneratorFeedback : std_logic;


        InputBuffer : IBUFDS
            generic map (
                DIFF_TERM       => false,
                IBUF_LOW_PWR    => false,
                IOSTANDARD      => "DEFAULT")
            port map (
                O               => w_clockInToBuffer,
                I               => i_inputClockP,
                IB              => i_inputClockN

        ClockBuffer : BUFGCE
            generic map(
                CE_TYPE         => "SYNC",
                IS_CE_INVERTED  => '0',
                IS_I_INVERTED   => '0',
                SIM_DEVICE      => "ULTRASCALE_PLUS",
                STARTUP_SYNC    => "FALSE"
            port map (
                O               => w_clockInFromBuffer,
                CE              => '1',
                I               => w_clockInToBuffer

        ClockGenerator : entity work.ClockGenerator
            port map(
                clk_out1  => o_processingClock,
                clk_out2  => o_processClockPhaseShifted,
                clk_out3  => o_processingClockHalf,
                powerDown => i_reset,
                locked    => o_resetDone,
                clockIn   => w_clockInFromBuffer

end architecture;



The ClockGenerator/MMCM IP:




with Phase Shift Mode set to WAVEFORM (and matched routing/CLOCK_DELAY_GROUP).


Constraints file including package pins etc.:

I have constrained the output delays based on the Source Synchronous -> Setup/Hold Based --> SDR, Rising Edge example, since even if am a using DDR components, this is still a SDR behavior from DAC perspective.


set_property PACKAGE_PIN D15 [get_ports i_inputClockP];
set_property PACKAGE_PIN D14 [get_ports i_inputClockN];
set_property IOSTANDARD LVPECL [get_ports i_inputClock*];

create_clock -name inputClock -period 10.0 [get_ports i_inputClockP];

set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN [get_nets ClockGeneration/ClockGenerator/inst/clockIn_ClockGenerator]

set_property IOSTANDARD LVCMOS33 [get_ports o_data*];
set_property IOSTANDARD LVDS [get_ports o_outputClock*];

set_property PACKAGE_PIN AB10 [get_ports {o_data[0]}];
set_property SLEW FAST [get_ports o_data*];
set_property DRIVE 8 [get_ports o_data*];
set_property PACKAGE_PIN F8 [get_ports o_outputClockP];

set_max_delay -from [get_clocks -of_objects [get_pins {ClockGeneration/ClockGenerator/clk_out1}]] -to [get_clocks -of_objects [get_pins {ClockGeneration/ClockGenerator/clk_out3}]] [get_property PERIOD [get_clocks -of_objects [get_pins {ClockGeneration/ClockGenerator/clk_out1}]]];

create_generated_clock -name outputClock -source [get_pins ClockOutputDDR/C] -multiply_by 1 [get_ports o_outputClockP];

set tsu                 2.000;  # destination device setup time requirement
set thd                 1.500;  # destination device hold time requirement
set data_trce_dly_max   0;      # maximum data trace delay
set data_trce_dly_min   0;      # minimum data trace delay
set clk_trce_dly_max    0.4;    # maximum clock trace delay
set clk_trce_dly_min    0.4;    # minimum clock trace delay
# output maximum delay value: maximum trace delay for data + tSU of external register - minimum trace delay for clock
set_output_delay -clock outputClock -max [expr $data_trce_dly_max + $tsu - $clk_trce_dly_min] [get_ports o_data];
# output minimum delay value: minimum trace delay for data + tH of external register - maximum trace delay for clock
set_output_delay -clock outputClock -min [expr $data_trce_dly_min - $thd - $clk_trce_dly_max] [get_ports o_data];



My hold timing fails by roughly 1.5ns (could potentially be a bit less if setup would be further optimized):


Min Delay Paths
Slack (VIOLATED) :        -1.544ns  (arrival time - required time)
  Source:                 DataVectorOutputDDR[0].DataOutputDDR/CLK
                            (rising edge-triggered cell OSERDESE3 clocked by clk_out3_ClockGenerator  {rise@0.000ns fall@5.000ns period=10.000ns})
  Destination:            o_data[0]
                            (output port clocked by outputClock  {rise@4.625ns fall@7.125ns period=5.000ns})
  Path Group:             outputClock
  Path Type:              Min at Fast Process Corner
  Requirement:            -0.375ns  (outputClock rise@9.625ns - clk_out3_ClockGenerator rise@10.000ns)
  Data Path Delay:        1.542ns  (logic 1.540ns (99.870%)  route 0.002ns (0.130%))
  Logic Levels:           1  (OBUF=1)
  Output Delay:           -1.900ns
  Clock Path Skew:        1.368ns (DCD - SCD - CPR)
    Destination Clock Delay (DCD):    4.141ns = ( 13.766 - 9.625 ) 
    Source Clock Delay      (SCD):    2.900ns = ( 12.900 - 10.000 ) 
    Clock Pessimism Removal (CPR):    -0.126ns
  Clock Uncertainty:      0.194ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.129ns
    Phase Error              (PE):    0.120ns
  Clock Net Delay (Source):      0.848ns (routing 0.263ns, distribution 0.585ns)

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_out3_ClockGenerator rise edge)
                                                     10.000    10.000 r  
    D15                                               0.000    10.000 r  i_inputClockP (IN)
                         net (fo=0)                   0.000    10.000    ClockGeneration/InputBuffer/I
                                                      0.512    10.512 r  ClockGeneration/InputBuffer/DIFFINBUF_INST/O
                         net (fo=1, routed)           0.034    10.546    ClockGeneration/InputBuffer/OUT
    D15                  IBUFCTRL (Prop_IBUFCTRL_HDIOB_M_I_O)
                                                      0.000    10.546 r  ClockGeneration/InputBuffer/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.091    10.637    ClockGeneration/ClockGenerator/inst/clockIn
                                                      0.017    10.654 r  ClockGeneration/ClockGenerator/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.004    11.658    ClockGeneration/ClockGenerator/inst/clockIn_ClockGenerator
    MMCM_X0Y1            MMCME4_ADV (Prop_MMCM_CLKIN1_CLKOUT2)
                                                      0.230    11.888 r  ClockGeneration/ClockGenerator/inst/mmcme4_adv_inst/CLKOUT2
                         net (fo=2, routed)           0.147    12.035    ClockGeneration/ClockGenerator/inst/clk_out3_ClockGenerator
                                                      0.017    12.052 r  ClockGeneration/ClockGenerator/inst/clkout3_buf/O
    X0Y1 (CLOCK_ROOT)    net (fo=1, routed)           0.848    12.900    w_processingClockHalf
    HDIOLOGIC_M_X0Y11    OSERDESE3                                    r  DataVectorOutputDDR[0].DataOutputDDR/CLK
  -------------------------------------------------------------------    -------------------
                                                      0.175    13.075 r  DataVectorOutputDDR[0].DataOutputDDR/OQ
                         net (fo=1, routed)           0.002    13.077    o_data_OBUF[0]
    AB10                 OBUF (Prop_OUTBUF_HDIOB_M_I_O)
                                                      1.365    14.442 r  o_data_OBUF[0]_inst/O
                         net (fo=0)                   0.000    14.442    o_data[0]
    AB10                                                              r  o_data[0] (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock outputClock rise edge)
                                                      9.625     9.625 r  
    D15                                               0.000     9.625 r  i_inputClockP (IN)
                         net (fo=0)                   0.000     9.625    ClockGeneration/InputBuffer/I
                                                      0.751    10.376 r  ClockGeneration/InputBuffer/DIFFINBUF_INST/O
                         net (fo=1, routed)           0.045    10.421    ClockGeneration/InputBuffer/OUT
    D15                  IBUFCTRL (Prop_IBUFCTRL_HDIOB_M_I_O)
                                                      0.000    10.421 r  ClockGeneration/InputBuffer/IBUFCTRL_INST/O
                         net (fo=1, routed)           0.113    10.534    ClockGeneration/ClockGenerator/inst/clockIn
                                                      0.019    10.553 r  ClockGeneration/ClockGenerator/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.128    11.681    ClockGeneration/ClockGenerator/inst/clockIn_ClockGenerator
    MMCM_X0Y1            MMCME4_ADV (Prop_MMCM_CLKIN1_CLKOUT1)
                                                     -0.295    11.386 r  ClockGeneration/ClockGenerator/inst/mmcme4_adv_inst/CLKOUT1
                         net (fo=2, routed)           0.170    11.556    ClockGeneration/ClockGenerator/inst/clk_out2_ClockGenerator
                                                      0.019    11.575 r  ClockGeneration/ClockGenerator/inst/clkout2_buf/O
                         net (fo=1, routed)           1.065    12.640    w_processingClockPhaseShifted
                                                      0.311    12.951 r  ClockOutputDDR/OQ
                         net (fo=1, routed)           0.278    13.229    w_outputClockDDR
                         OBUFDS (Prop_DIFFOUTBUF_HPIOBDIFFOUTBUF_I_O)
                                                      0.537    13.766 r  ClockOutputBuffer/O
                         net (fo=0)                   0.000    13.766    o_outputClockP
    F8                                                                r  o_outputClockP (OUT)
                         clock pessimism              0.126    13.892    
                         clock uncertainty            0.194    14.086    
                         output delay                 1.900    15.986    
                         required time                        -15.986    
                         arrival time                          14.442    
                         slack                                 -1.544    



My questions are:

  • Should the timing requirements actually be "meetable" by proper design and constraints?
    • Which changes would be required? I would be thankful for any suggestions. I know my knowledge is very limited at this point.
  • Contrary to the DAC device mentioned in the DAC at hand does not have second clock input. As far as I understood the corresponding post, it is therefore not great to have a source (FPGA) synchronous clocking scheme when it comes to jitter but it enables "very tight skew between the clock and data", as is required here.
    • Would there be any more advantageous general approach to this clocking problem involving the FPGA?


I hope I have not missed any information (and not provided too much). Please let me know, if either is the case.

Thank you for your time,


0 Kudos
0 Replies