UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Adventurer
Adventurer
1,512 Views
Registered: ‎10-31-2017

No input/output delays

Jump to solution

I have several high severity warning messages for both, inputs and outputs.

 

The ouputs implement SPI transmitter channels(clock, data out and cs#) with 5MHz clock rate. They are internally clocked by a 80MHz clock. I am not sure how to set the delays as the external devices are not source/system synchronous. Is this a case of false path?

 

Likewise, several inputs are not synchronous. They are fed to deserializers and setup/hold times are handled by the internal state machine with a oversampling scheme. In other words, setup and hold is not a concern, by design. Is this also a case of false path?

 

0 Kudos
1 Solution

Accepted Solutions
Historian
Historian
1,465 Views
Registered: ‎01-23-2009

Re: No input/output delays

Jump to solution

Are these false paths - well, yes and no.

First of all, without constraints they actually aren't paths. A static timing path starts and ends at a clocked element. In the case of a path that goes through a port of the design, one side of the path (the startpoint for an input or the endpoint for an output) are outside the FPGA. These cells do not exist (as far as STA is concerned) unless you have a set_input_delay/set_output_delay associated with the port. If the set_input_delay/set_output_delay don't exist then there are no paths, hence they can't be "false paths".

If you leave them like this, the logic between the final clocked cell and the port will not be part of a path, and hence will have no timing analysis done on it. This will also result in (critical?) warnings from "check_timing" which is specifically looking for ports without constraints.

So, to make these warnings go away, you need to define set_input_delay/set_output_delay on these ports. But as you say, they are not really related to any real clock, but yet you need a clock to specify a set_input_delay/set_output delay.

Lets take the example of your serializer input with dynamic phase adjustment. In this case, you know which clock the input is related to, but you cannot give a meaninful -min/-max for the set_input_delay. The solution here is to make something up, then override it with an exception (we can discuss which exception in a moment).

The same is true for your SPI output - all the outputs are related to the 80MHz internal clock (or what generated it), so you can use this clock. However, the real requirements on the output (the setup/hold which must be met between the generated SPI_CLK and the SPI data) are many many 80MHz clocks long - you cannot specify a meaningful set_output_delay on them.

BUT, in both of these cases, while you don't have proper static timing analysis numbers for the ports, you do actually have requirements!

For the dynamically calibrated inputs, your requirement is that the routing within the FPGA is short and predictable. Since this is to an ISERDES (through and IDELAY), you know that this is the case - the IBUF to IDELAY to ISERDES path is fixed and cannot be altered from run to run. So in this case, your requirement is met by architecture, and hence you can set this path false using "set_false_path -from <my_port>" - that is a correct and reasonable exception.

For the SPI output, you have the same kind of requirement - you want the path from the final FF to the port to be short and predictable. You can do this by forcing the output FF into the IOB, in which case the same argument exists - you can use a set_false_path.

But, if (and this isn't recommended) your final FFs are not in the IOB, then the only way to ensure that you get the "short-ish, predictable-ish" routing from the final FF to the port is to set an output constraint with respect to the source clock. This is an "odd" output constraint since the requirement doesn't come from the needs of the destination device (which is where normal constraints should come from), but it looks like a real "set_output_delay" - set it to a value that is small enough to ensure the route from the final FF to the port is "reasonable" (but one that allows the timing to pass). You can also do this with a set_max_delay exception on the path, but there isn't really an advantage to this... Note - this delay will be fairly large since it will be timing the path from the clock input to the data output, which includes the clock insertion and OBUF delay, but there is a value that will be "just large enough to make this pass" but "still short enough to ensure that the path doesn't get routed with lots of additional delay".

So, yes, these cases are both "not traditional synchronous" cases, and hence are not constrained "normally", but they do need constraints either just to prevent the warnings you are getting or also to ensure the requirements that really exist on these ports.

Avrum

20 Replies
Historian
Historian
1,466 Views
Registered: ‎01-23-2009

Re: No input/output delays

Jump to solution

Are these false paths - well, yes and no.

First of all, without constraints they actually aren't paths. A static timing path starts and ends at a clocked element. In the case of a path that goes through a port of the design, one side of the path (the startpoint for an input or the endpoint for an output) are outside the FPGA. These cells do not exist (as far as STA is concerned) unless you have a set_input_delay/set_output_delay associated with the port. If the set_input_delay/set_output_delay don't exist then there are no paths, hence they can't be "false paths".

If you leave them like this, the logic between the final clocked cell and the port will not be part of a path, and hence will have no timing analysis done on it. This will also result in (critical?) warnings from "check_timing" which is specifically looking for ports without constraints.

So, to make these warnings go away, you need to define set_input_delay/set_output_delay on these ports. But as you say, they are not really related to any real clock, but yet you need a clock to specify a set_input_delay/set_output delay.

Lets take the example of your serializer input with dynamic phase adjustment. In this case, you know which clock the input is related to, but you cannot give a meaninful -min/-max for the set_input_delay. The solution here is to make something up, then override it with an exception (we can discuss which exception in a moment).

The same is true for your SPI output - all the outputs are related to the 80MHz internal clock (or what generated it), so you can use this clock. However, the real requirements on the output (the setup/hold which must be met between the generated SPI_CLK and the SPI data) are many many 80MHz clocks long - you cannot specify a meaningful set_output_delay on them.

BUT, in both of these cases, while you don't have proper static timing analysis numbers for the ports, you do actually have requirements!

For the dynamically calibrated inputs, your requirement is that the routing within the FPGA is short and predictable. Since this is to an ISERDES (through and IDELAY), you know that this is the case - the IBUF to IDELAY to ISERDES path is fixed and cannot be altered from run to run. So in this case, your requirement is met by architecture, and hence you can set this path false using "set_false_path -from <my_port>" - that is a correct and reasonable exception.

For the SPI output, you have the same kind of requirement - you want the path from the final FF to the port to be short and predictable. You can do this by forcing the output FF into the IOB, in which case the same argument exists - you can use a set_false_path.

But, if (and this isn't recommended) your final FFs are not in the IOB, then the only way to ensure that you get the "short-ish, predictable-ish" routing from the final FF to the port is to set an output constraint with respect to the source clock. This is an "odd" output constraint since the requirement doesn't come from the needs of the destination device (which is where normal constraints should come from), but it looks like a real "set_output_delay" - set it to a value that is small enough to ensure the route from the final FF to the port is "reasonable" (but one that allows the timing to pass). You can also do this with a set_max_delay exception on the path, but there isn't really an advantage to this... Note - this delay will be fairly large since it will be timing the path from the clock input to the data output, which includes the clock insertion and OBUF delay, but there is a value that will be "just large enough to make this pass" but "still short enough to ensure that the path doesn't get routed with lots of additional delay".

So, yes, these cases are both "not traditional synchronous" cases, and hence are not constrained "normally", but they do need constraints either just to prevent the warnings you are getting or also to ensure the requirements that really exist on these ports.

Avrum

Adventurer
Adventurer
1,458 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

Hello, @avrumw , thank you for chiming in.

 

Regarding the outputs, as I was working on other timing issues of the design I tried setting all the delays to 0.0 to check the timings and, as expected, they failed (negative slack). I then added the IOB property to the pins and all of them met the timing requirement with a good margin.

I now will work on the  inputs. There are some failing inter-clock paths related to these inputs so I will work on these issues (kind of) at the same time.

The deserializers work similarly to a UART. I am not using ISERDES though. To be honest it did not occur to me checking if it would work with the frame format (falling edge, b"010" pattern followed by digital data and idle in '1'). 

0 Kudos
1,426 Views
Registered: ‎01-22-2015

Re: No input/output delays

Jump to solution

Hi Elder,

Likewise, several inputs are not synchronous. They are fed to deserializers and setup/hold times are handled by the internal state machine with a oversampling scheme. In other words, setup and hold is not a concern, by design. Is this also a case of false path?
The oversampling scheme works well for a slow speed interface like SPI.  As you say, with carefully designed oversampling, these inputs need not undergo formal timing analysis (ie. they pass timing analysis “by design”).  It is safe to put meaningless set_input_delay constraints on these inputs and then false_path them.  Further, since these inputs are asynchronous, you must put synchronizers on them to avoid metastability problems.  The “perfect” synchronizer for this job is one built from the IDDR primitive that was taught to me my Avrum in <this> post.  This perfect synchronizer does double-duty because it also locks the capture register for an asynchronous input into the IOB.

Cheers,
Mark

0 Kudos
Explorer
Explorer
1,375 Views
Registered: ‎12-11-2017

Re: No input/output delays

Jump to solution

SPI can be constrained like any other interface using create_clock, set_input_delay and set_output_delay.

Example:

#80 MHz clock

create_clock -period 12.5 -name ref_clk [get_pins <your 80mhz clock ref>]

# 5MHz clock at pad

create_generated_clock -name spi_clk -source [get_clocks ref_clk] -divide_by 16 [get_ports spi_clk]

#far-end setup / hold for host -> device: assume 10/5ns, adjust as needed

set_output_delay -clock [get_clocks spi_clk] -min -5.00 [get_ports spi_mosi]

set_output_delay -clock [get_clocks spi_clk] -max 10.00 [get_ports spi_mosi]

set_output_delay -clock [get_clocks spi_clk] -min -5.00 [get_ports spi_csn]

set_output_delay -clock [get_clocks spi_clk] -max 10.00 [get_ports spi_csn]

#far-end delay device-> host: assume 5/10 min/max, adjust as needed

set_input_delay -clock [get_clocks spi_clk] -min 5.000 [get_ports spi_miso]
set_input_delay -clock [get_clocks spi_clk] -max 10.000 [get_ports spi_miso]

 

This constrains your IO timing to an external, virtual node. If you're having a hard time meeting the hold requirement at the far end, consider using a neg-edge clock for CSn and MOSI. This is actually common for SPI, which supports positive and negative edge modes (make sure you understand which type your peripheral needs.) In any event, your timing is slow enough that you should not need the IOB constraint, and it may actually be making your life more difficult if you need hold slack.

Your sampled / resynchonized inputs can be treated as false_path to the sampling clock.

Example:

set_false_path -from [get_ports my_async_sig]  -to [get_clocks ref_clk]

0 Kudos
Adventurer
Adventurer
1,339 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

Hello all,

Thank you for your comments. Sorry for a late answer. I have been following the thread while working on some parts of the design it in order to meeting timing constraints (still a work in progress).

@vortex1601 , the SPI clock outputs do not directly use clocks, they were implemented with logic as they are not exactly standard format-wise. At 5MHz rate it meets with great margin the external devices requirements (outputs only). I will keep your suggestions in mind though.

As mentioned before, part of the improvements involved using the IOB flip-flops as suggested by @avrumw, which helped to solve the timing errors of these outputs simply setting the output delays to 0 (setup and hold).

Mark, I confident the deserializers work as intended as they were carefully crafted so I am not worried about the setup/hold times for these.

However the data streams on these inputs are controlled by a clock in the PL and there are internal and external delays that must be accounted for. I designed the logic to thake these into account with some margin, but it occured to me that I should add constraints for them to raise a red flag if the timing is not met. I thought of opening a different thread but I think is may be related to this topic. The sketch below shows the main signals:

              ________________________
sync    _____|                        |_____......
        _______________      ____      ____
clock                  |____|    |____|    |_.....
                            ________
data    ___________________|        |________.....

 

There is a maximum delay between sync input (5MHz) and the first falling edge of clock output (160MHz, N pulses). There are synchronizers to detect the edge of sync and to control the onset of clock; thus, several 160MHz clock cycles between them.  The data input, connected to a serializer, rises after a variable delay, dependent on tracks and devices between the clock and data. So, my question is what constraints I should add to define the maximum delay between sync and clock, if that makes sense. Likewise betwen clock and data as above.

Thanks.

0 Kudos
Explorer
Explorer
1,320 Views
Registered: ‎12-11-2017

Re: No input/output delays

Jump to solution

Which SPI mode are you using? Please have a look at the diagram below (from https://www.corelis.com/education/tutorials/spi-tutorial/)

1-21.jpg

You'll notice that the data driving edge and sampling edge are opposite phase in SPI. This prevents any setup/hold issues. But you need to make sure you're using the correct clock type for your peripheral. Most SPI devices use Mode 3.

0 Kudos
Adventurer
Adventurer
1,309 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

@vortex1601 , the devices controlled by the PL SPI outputs (no inputs in this design) are mode-2 (kind of) meaning the data outputs are changed together with the CLK outputs rising edges and they are stable at the falling edges. The maximum input clock of these devices is 50MHz, with min setup and hold times of 5ns so the margins are huge with a 5MHz clock.

 

While writing the paragraph above I noticed this may look confusing. There are two classes of SPI like external devices, one of them controlled by a low frequency clock (5MHz). These ones have all the timing errors sorted out with small design tweaks and the output constraints. The other device class has data channel as described at the bottom of my previous post.

0 Kudos
1,295 Views
Registered: ‎01-22-2015

Re: No input/output delays

Jump to solution

Hi Elder,

Likewise, several inputs are not synchronous. They are fed to deserializers and setup/hold times are handled by the internal state machine with a oversampling scheme.
I just want to be clear on what you call “oversampling scheme”.  To me, this means you have a slow interface (eg. one with an interface clock, CLKI, running at 5MHz) – and you are not clocking any registers in the FPGA with CLKI.  Instead, you use an FPGA-generated fast clock (eg. CLKF running at 100MHz).  The interface clock, CLKI, and interface data, DATI, are brought into the FPGA through 2-flip-flop synchronizers that are clocked with CLKF.  So, the output of each synchronizer is a sampled version of CLKI and DATI.  Then, in your HDL, you watch the samples of CLKI.  When these samples indicate that a rising-edge of CLKI has occurred, you then wait a little (by counting cycles of CLKF) until you get a sample of DATI that is in the middle of the data-eye.  You keep this one sample of DATI as 1-bit of data.  Then, you again watch samples of CLKI to find the next rising-edge of CLKI, wait a little, get another bit of data – and so on.  This is what I know as an “oversampling scheme”. This scheme only works when the interface clock is slow – because then CLKF has a reasonable frequency (roughly 100-300MHz).

The sketch below shows the main signals: … There is a maximum delay between sync input (5MHz) and the first falling edge of clock output (160MHz, N pulses).
I understand that the interface clock has frequency of 160MHz.  This is too fast for using the “oversampling scheme”.

I confident the deserializers work as intended as they were carefully crafted so I am not worried about the setup/hold times for these.
It sounds like you are using the Xilinx IP called the SelectIO Wizard?  The SelectIO Wizard uses FPGA hardware called the ISERDES.  I haven't used the SelectIO Wizard - but am willing to learn with you.

So, my question is what constraints I should add to define the maximum delay between sync and clock, if that makes sense. Likewise betwen clock and data as above.
If you are using the SelectIO Wizard then document, PG065 (on about page 56), says that only a create_clock and a set_input_jitter constraint are needed to constraint this IP core.  In other words, set_input_delay constraints are not needed!?!  This seems amazing to me and suggests that the SelectIO IP is using some kind of internal oversampling scheme to find and sample the data line in the middle of the data-eye.

Before we discuss things further, please tell me if you are using the SelectIO Wizard for this 160MHz interface?

It is not necessary to use the SelectIO Wizard for this interface.  That is, we could simply treat it as a source-synchronous SDR input interface.

Cheers,
Mark

0 Kudos
Adventurer
Adventurer
1,285 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

Hello, Mark.

I am not using SelectIO Wizard or ISERDES. I designed the deserializer from scratch and a simplified description of it follows. It uses three 160MHz clocks 120 degrees apart. The data input is simultaneously fed to synchronizers on each of those clocks. The first synchronizer to detect the leading rising edge in the data stream determines the internal path chosen by the deserializer state machine to rebuild the data. There is a time window in the chosen path where the data is stable and satisfies the setup and hold times of the deserializer shift register. The use of three phases is equivalent to clocking the deserializer at 480MHz. That is what I meant by oversampling.This whole scheme is necessary as we do not have control over the delay of the data stream relative to the data clock: the delay between the clock falling edges and the corresponding data bits varies from a few ns to 12ns (delays from external components and tracks).

The rising edge of the sync clock (5MHz) indicates when there is new data to be transferred. It puts the deserializer state machine into a wait-for-first-rising-edge state (after a delay determined by a synchronizer). It also triggers the clock generating state machine, also after the delay of a synchronizer. There is a maximum delay between the sync rising edge and the leading falling edge of the clock ouput. This delay is 35ns. This is the delay I would like to check.

I hope this answers your question.

1,262 Views
Registered: ‎01-22-2015

Re: No input/output delays

Jump to solution

Hi Elder,

I designed the deserializer from scratch..
Wowser!  Thanks for giving me a peek at how you did it.

…the delay between the clock falling edges and the corresponding data bits varies from a few ns to 12ns
So, my question is what constraints I should add to define the maximum delay between sync and clock, if that makes sense. Likewise betwen clock and data as above.
Constraints (eg. set_input_delay) and timing analysis are used only for synchronous interfaces. Your 160MHz interface is asynchronous – because your deserializer uses asynchronous methods (eg. synchronizers).  Whether on not your 160MHz interface will work depends entirely on how well your HDL can “make sense” of the sampled data coming out of the synchronizers in your deserializer.

 

0 Kudos
Adventurer
Adventurer
1,242 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

Hello, Mark.

Wowser!  Thanks for giving me a peek at how you did it.

Thanks for the kudos. I tried a version with a 540MHz clock first but we are using the slowest part and the maximum frequency it supports is 520MHz (close call). I further found the basic concept of using three phase clocks in a data sheet so I was a little sad I did not pioneered it. :)

Constraints (eg. set_input_delay) are used only for synchronous interfaces. Your 160MHz interface is asynchronous – because your deserializer uses asynchronous methods (eg. synchronizers).  

OK, that makes sense. Is there a best way to constrain these inputs to eliminate the high severity warnings? Sorry if the answer is obvious but the learning curve is steep and I do not find the documentation particularly helpful.

I have a question about the high speed clock outputs but I will post in a separate post.

About the question on how to constrain the sync-to-clock-first-falling-edge, should I open in a different thread (assuming there is a way to do it)?

 

 

0 Kudos
Adventurer
Adventurer
1,235 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

I am trying to eliminate the high severity warnings of the high speed clock outputs in my design (note: transf_data_clk* is connected to AD_SCLK* in the block diagram):

ODDR_inst : ODDR
generic map(
	DDR_CLK_EDGE => "SAME_EDGE", -- "OPPOSITE_EDGE" or "SAME_EDGE" 
	INIT => '0',   -- Initial value for Q port ('1' or '0')
	SRTYPE => "SYNC") -- Reset Type ("ASYNC" or "SYNC")
port map (
	Q => transf_data_clk(i),   -- 1-bit DDR output
	C => clk_ph_0,    -- 1-bit clock input
	CE => '1',  -- 1-bit clock enable input
	D1 => '1', -- 1-bit data input (positive edge)
	D2 => '0', -- 1-bit data input (negative edge)
	R => '0',    -- 1-bit reset input
	S => not_ad_transfer_clk_en -- 1-bit set input
);

I tentatively added the following constraints trying to eliminate the messages:

 

create_clock -name virt_clk_160 -period 6.250
set_output_delay -clock [get_clocks {virt_clk_160}] -min 0.0 [get_ports {AD_SCLK_P[*]}]
set_output_delay -clock [get_clocks {virt_clk_160}] -max -add_delay 0.0 [get_ports {AD_SCLK_P[*]}]

 

 

I am getting the result below from report_timing. I tried set_output_delay for ODDR with similar results. I am trying to understand why it used the falling edge in this path.

 

-----------------------------------------------------------------------------------------------------------------------------------------------
| Tool Version : Vivado v.2018.3 (win64) Build 2405991 Thu Dec  6 23:38:27 MST 2018
| Date         : Tue Mar 19 15:23:04 2019
| Host         : TIMPEL-PD-0273 running 64-bit major release  (build 9200)
| Command      : report_timing -from [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/ODDR_inst_x[*].ODDR_inst/C}] -to [get_ports {AD_SCLK_P[*]}] -setup
| Design       : SAR_ZYNQ_wrapper
| Device       : 7z020-clg484
| Speed File   : -1  PRODUCTION 1.11 2014-09-11
-----------------------------------------------------------------------------------------------------------------------------------------------

Timing Report

Slack (VIOLATED) :        -4.435ns  (required time - arrival time)
  Source:                 SAR_ZYNQ_i/sar_zynq_IP_0/U0/ODDR_inst_x[3].ODDR_inst/C
                            (falling edge-triggered cell ODDR clocked by clk_160M0_SAR_ZYNQ_clk_wiz_0_0  {rise@0.000ns fall@3.125ns period=6.250ns})
  Destination:            AD_SCLK_P[3]
                            (output port clocked by virt_clk_160  {rise@0.000ns fall@3.125ns period=6.250ns})
  Path Group:             virt_clk_160
  Path Type:              Max at Slow Process Corner
  Requirement:            3.125ns  (virt_clk_160 rise@6.250ns - clk_160M0_SAR_ZYNQ_clk_wiz_0_0 fall@3.125ns)
  Data Path Delay:        2.241ns  (logic 2.240ns (99.955%)  route 0.001ns (0.045%))
  Logic Levels:           1  (OBUFDS=1)
  Output Delay:           0.000ns
  Clock Path Skew:        -5.104ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    0.000ns = ( 6.250 - 6.250 ) 
    Source Clock Delay      (SCD):    5.104ns = ( 8.229 - 3.125 ) 
    Clock Pessimism Removal (CPR):    0.000ns
  Clock Uncertainty:      0.215ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.050ns
    Discrete Jitter          (DJ):    0.150ns
    Phase Error              (PE):    0.136ns
  Clock Domain Crossing:  Inter clock paths are considered valid unless explicitly excluded by timing constraints such as set_clock_groups or set_false_path.

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock clk_160M0_SAR_ZYNQ_clk_wiz_0_0 fall edge)
                                                      3.125     3.125 f  
    M19                                               0.000     3.125 f  CLK_SYS_P (IN)
                         net (fo=0)                   0.000     3.125    SAR_ZYNQ_i/util_ds_buf_0/U0/IBUF_DS_P[0]
    M19                  IBUFDS (Prop_ibufds_I_O)     0.905     4.030 f  SAR_ZYNQ_i/util_ds_buf_0/U0/USE_IBUFDS.GEN_IBUFDS[0].IBUFDS_I/O
                         net (fo=1, routed)           2.205     6.235    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1
    BUFGCTRL_X0Y17       BUFG (Prop_bufg_I_O)         0.102     6.337 f  SAR_ZYNQ_i/clk_wiz_0/inst/clkin1_bufg/O
                         net (fo=1, routed)           1.806     8.143    SAR_ZYNQ_i/clk_wiz_0/inst/clk_in1_SAR_ZYNQ_clk_wiz_0_0
    MMCME2_ADV_X1Y0      MMCME2_ADV (Prop_mmcme2_adv_CLKIN1_CLKOUT2)
                                                     -3.793     4.350 f  SAR_ZYNQ_i/clk_wiz_0/inst/mmcm_adv_inst/CLKOUT2
                         net (fo=1, routed)           1.889     6.239    SAR_ZYNQ_i/clk_wiz_0/inst/clk_160M0_SAR_ZYNQ_clk_wiz_0_0
    BUFGCTRL_X0Y1        BUFG (Prop_bufg_I_O)         0.101     6.340 f  SAR_ZYNQ_i/clk_wiz_0/inst/clkout3_buf/O
                         net (fo=1996, routed)        1.888     8.229    SAR_ZYNQ_i/sar_zynq_IP_0/U0/Clk_ph_0
    OLOGIC_X1Y92         ODDR                                         f  SAR_ZYNQ_i/sar_zynq_IP_0/U0/ODDR_inst_x[3].ODDR_inst/C
  -------------------------------------------------------------------    -------------------
    OLOGIC_X1Y92         ODDR (Prop_oddr_C_Q)         0.472     8.701 r  SAR_ZYNQ_i/sar_zynq_IP_0/U0/ODDR_inst_x[3].ODDR_inst/Q
                         net (fo=1, routed)           0.001     8.702    SAR_ZYNQ_i/util_ds_buf_1/U0/OBUF_IN[3]
    L17                  OBUFDS (Prop_obufds_I_O)     1.768    10.470 r  SAR_ZYNQ_i/util_ds_buf_1/U0/USE_OBUFDS.GEN_OBUFDS[3].OBUFDS_I/O
                         net (fo=0)                   0.000    10.470    AD_SCLK_P[3]
    L17                                                               r  AD_SCLK_P[3] (OUT)
  -------------------------------------------------------------------    -------------------

                         (clock virt_clk_160 rise edge)
                                                      6.250     6.250 r  
                         ideal clock network latency
                                                      0.000     6.250    
                         clock pessimism              0.000     6.250    
                         clock uncertainty           -0.215     6.035    
                         output delay                -0.000     6.035    
  -------------------------------------------------------------------
                         required time                          6.035    
                         arrival time                         -10.470    
  -------------------------------------------------------------------
                         slack                                 -4.435    

 

FWIW the delay between these outputs and the internal clock is not critical. I considered adding the following command and it did eliminate the warnings. But I do not know if this is a good or safe approach.

set_max_delay -from [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/ODDR_inst_x[*].ODDR_inst/C}] -to [get_ports {AD_SCLK_P[*]}] 8.0

 

0 Kudos
1,201 Views
Registered: ‎01-22-2015

Re: No input/output delays

Jump to solution

Hi Elder,

For the 3-wire, 160MHz interface:
I understand the three wires are called SYNC, CLOCK, and DATA. Your deserializer treats CLOCK and DATA as asynchronous inputs. I suggest that SYNC should also be treated as an asynchronous input (ie. run it through a synchronizer).

The rising edge of the sync clock (5MHz) indicates when there is new data to be transferred. It puts the deserializer state machine into a wait-for-first-rising-edge state …. There is a maximum delay between the sync rising edge and the leading falling edge of the clock ouput. This delay is 35ns. This is the delay I would like to check.
I suggest that you modify your HDL code for the deserializer to monitor the samples coming out of the synchronizer on the SYNC line. Your HDL can then check for the needed 35ns delay.

Is there a best way to constrain these inputs to eliminate the high severity warnings?
I understand that you cannot place the synchronizers for SYNC, CLOCK, and DATA into the IOB using Avrum’s method that I described earlier. So, these synchronizers will be in the FPGA fabric. However, you should hold these synchronizers near the FPGA input pins for SYNC, CLOCK, and DATA. A set of four constraints for each of SYNC, CLOCK, and DATA will do this in addition to eliminating the severe warnings on these inputs. The four constraints for each input look something like the following (which are for the SYNC input):

set_input_delay -clock [get_clocks CLK160] <delay1> [get_ports SYNC]
set_max_delay -datapath_only -from [get_ports SYNC] -to [get_cells SYNCHRO1_reg] <delay2>
set_property ASYNC_REG TRUE [get_cells SYNCHRO1_reg]
set_property ASYNC_REG TRUE [get_cells SYNCHRO2_reg]

In these constraints, CLK160, is the name of the 160MHz clock inside your FPGA. SYNCHRO1_reg and SYNCHRO2_reg are the two registers in the synchronizer for the SYNC signal.  Delays, <delay1>=3 and <delay2>=6, are numbers that you can fiddle with, making <delay2> smaller and smaller until the input just passes timing analysis.  In this way, the synchronizer for SYNC will be held near the FPGA pin for this input. It is important to understand that we are letting timing analysis run on these inputs (instead of using set_false_path) only as a means of placing the synchronizers in the right place (near the pins). That is, this timing analysis on the inputs is meaningless as far as your deserializer is concerned.

I have a question about the high speed clock outputs but I will post in a separate post.
Yes, please do.

Adventurer
Adventurer
1,171 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

I understand the three wires are called SYNC, CLOCK, and DATA. Your deserializer treats CLOCK and DATA as asynchronous inputs. I suggest that SYNC should also be treated as an asynchronous input (ie. run it through a synchronizer).

These names are generic, I just wanted to present the scenario. The sync input is also connected to synchronizers.

I suggest that you modify your HDL code for the deserializer to monitor the samples coming out of the synchronizer on the SYNC line. 

I was thinking about a way to make the P&R to detect if the total delay, including the synchronizers and registers between the sync input and the clock outputs, generating an error if it did not met the timing. I spent good part of today trying to figure out if it could be done. I suspect it does not. What I will do is making sure the number of clocks between sync inputs and clock outputs plus the input and output delays are within the required limit. It requires a manual check. It just occurred to me that maybe I could add TCL command with an expression to calculate input and output delays plus N clock cycles and trigger a warning if the result is above a threshold - hmmm I have to check if it is possible and how to do it.

Your HDL can then check for the needed 35ns delay.

My idea was detecting the error during compilation time. Your suggestion looks like a "run time check". This is an interesting idea. I have to think more about it.

So, these synchronizers will be in the FPGA fabric. However, you should hold these synchronizers near the FPGA input pins for SYNC.

I already had synchronizers for the inputs and recently added ASYNC_REG attribute in the VHDL code after learning it is required (IIRC this one more case where ISE was more  "intelligent" than Vivado so I was spoilt ;) ).

set_input_delay -clock [get_clocks CLK160] <delay1> [get_ports SYNC]
set_max_delay -datapath_only -from [get_ports SYNC] -to [get_cells SYNCHRO1_reg] <delay2>

In the end, I ented with these for outputs and inputs:

set_output_delay -clock [get_clocks clk_160M0_SAR_ZYNQ_clk_wiz_0_0] -min 0.0 [get_ports {AD_SCLK_P[*]}]
set_output_delay -clock [get_clocks clk_160M0_SAR_ZYNQ_clk_wiz_0_0] -max -add_delay 0.0 [get_ports {AD_SCLK_P[*]}]
set_max_delay -datapath_only -from [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/ODDR_inst_x[*].ODDR_inst/C}] -to [get_ports {AD_SCLK_P[*]}] 3.500

create_clock -name virt_clk_160 -period 6.25 -quiet
set_input_delay -clock [get_clocks {virt_clk_160}] -min 0.0 [get_ports {AD_SDO_P[*]}]
set_input_delay -clock [get_clocks {virt_clk_160}] -max -add_delay 0.0 [get_ports {AD_SDO_P[*]}]

Your tip about using the internal clock was very useful even though I used it for the outputs, not for the inputs. And the 3.500 is just a little longer than the delay in the ODDR plus the LVDS buffers. For the inputs, the zeros were enough to eliminate errors and severe warnings.

>>I have a question about the high speed clock outputs but I will post in a separate post.
Yes, please do.

In the end I understood what I was doing wrong. I was not using -datapath_only. After I added it, the timing report made more sense.

 

OT: Along this process I made some tweaks to the logic to improve timing errors intra and inter clocks. The only exception was some errors on some paths connected to those lookup tables (no errors in some implementation runs and then a few paths with small negative slacks. Because of that I set multipath constraints for these paths. Now the only remaining errors are related to the AXI interface (piece of cake ;) ).

 

0 Kudos
Adventurer
Adventurer
1,150 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

It is a pitty this forum does not allow to mark more than one post as accepted solution. I marked the first answer by @avrumw as it has some fundamental concepts that do answer my original question. But markg@prosensing.com posts provided some tips that helped me to solve my problem.

 

0 Kudos
Historian
Historian
1,087 Views
Registered: ‎01-23-2009

Re: No input/output delays

Jump to solution

I have a couple of comments...

First, sampling a signal with more than one clock is a bit dangerous. If you are using a single clock, then you can use the IOB flip-flop or the IDDR to sample the incoming signal. This has the advantage of repeatable results from place and route run to run. If you sample it with more than one clock, then either all or at least N-1 of the sampling flip-flops need to be in the fabric. When this happens, you now have a route delay from the pad to the sampling flip-flop. This route delay can vary from run to run, but more importantly, can be different to each of your N sampling flip-flops. At the speeds you are talking about (1/3 of 160MHz, which is 2ns) your route times can be a significant portion of this 2ns, thus making your 3 samples not equally spaced. This is particularly true if one of the flip-flops does end up in the IOB flip-flop and the other 2 don't.

Next, this is quite overcomplicated... Assuming your 160MHz clock is coming from an MMCM (or can come from an MMCM), there is no problem with generating a 480MHz (or even faster) clock and using this as the high speed clock input to an ISERDES. Since the ISERDES can sample at rates up to 800MHz DDR (so 1600Msamples/sec) you can oversample your 160MHz input by up to 10 times (instead of 3). Even if you don't want to go more than 3, you can still use the ISERDES with a 480MHz clock in SDR mode (with 3:1 deserialization). All of these solutions avoid the problem outlined above - the N samples (be it 3 or 10) will always be equally spaced - there is no routing delay to worry about since it is a single cell that is taking all the N samples.

Finally, are you sure this interface can't be captured statically? At 160MHz SDR (which is what this looks like) your bit period is 6.25ns. For most clocking schemes this is absolutely no problem for the I/O of the FPGA - in some clocking schemes you can get down to around 1.5ns. So if your clock is in a clock capable pin, then you can sample this conventionally on the 160MHz clock (assuming the device provides at least somewhat a reasonable clock/data timing relationship). Even if the clock is not on a clock capable pin, this interface is so slow that it might still be possible to capture statically. If this clock is gapped, this presents some additional challenges (you can't use an MMCM on it, so you have to use direct BUFG or BUFR/BUFIO clocking), but you can still do the initial sampling on that clock, and then transfer it via a number of easier clock crossing schemes to a continuously running clock.

Avrum

 

Tags (1)
Adventurer
Adventurer
1,075 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

Hello, @avrumw , thank you very much for your comments.

 

First, sampling a signal with more than one clock is a bit dangerous. If you are using a single clock, then you can use the IOB flip-flop or the IDDR to sample the incoming signal. This has the advantage of repeatable results from place and route run to run. If you sample it with more than one clock, then either all or at least N-1 of the sampling flip-flops need to be in the fabric. When this happens, you now have a route delay from the pad to the sampling flip-flop. This route delay can vary from run to run, but more importantly, can be different to each of your N sampling flip-flops. At the speeds you are talking about (1/3 of 160MHz, which is 2ns) your route times can be a significant portion of this 2ns, thus making your 3 samples not equally spaced. This is particularly true if one of the flip-flops does end up in the IOB flip-flop and the other 2 don't.

You have some excellent points here. In the first version, more than two and a half years ago (working as a contractor) I tried a deserializer based on a 3x-clock but the prototypes were built with the slowest grade device and so it would not support the clock. So I worked this second version. But I think I originally used 180MHz for the transfer clock and the part could not support 540MHz. Later I reduced the clock to 160MHz to keep all the clocks in the system as integer (sub)multiples of the main 80MHz. What I did to mitigate the risk of different paths was forcing the P&R to place the synchronizers registers as close to the IOBs as possible, by setting the lowest delay that would not cause an error:

set_max_delay -datapath_only -from [get_ports {AD_SDO_P[*]}] -to [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/SPI_ADC_Inst/CaptInst[*].ad_capture_inst/ad_data_0_r1_reg[*]/D}] 2.200;
set_max_delay -datapath_only -from [get_ports {AD_SDO_P[*]}] -to [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/SPI_ADC_Inst/CaptInst[*].ad_capture_inst/ad_data_120_r1_reg[*]/D}] 2.200;
set_max_delay -datapath_only -from [get_ports {AD_SDO_P[*]}] -to [get_pins {SAR_ZYNQ_i/sar_zynq_IP_0/U0/SPI_ADC_Inst/CaptInst[*].ad_capture_inst/ad_data_240_r1_reg[*]/D}] 2.200;

Slack varies from 0.14 to 0.49ns. I believe it is sufficiently small for the logic to work (I will have to check my notes though.)

Assuming your 160MHz clock is coming from an MMCM (or can come from an MMCM),

It is. :)

here is no problem with generating a 480MHz (or even faster) clock and using this as the high speed clock input to an ISERDES. Since the ISERDES can sample at rates up to 800MHz DDR (so 1600Msamples/sec) you can oversample your 160MHz input by up to 10 times (instead of 3). Even if you don't want to go more than 3, you can still use the ISERDES with a 480MHz clock in SDR mode (with 3:1 deserialization). All of these solutions avoid the problem outlined above - the N samples (be it 3 or 10) will always be equally spaced - there is no routing delay to worry about since it is a single cell that is taking all the N samples.

I will take a better look at SERDES. I assumed it was used for protocols such as video streams so I did not consider it for the AD data I have to decode but I do not know it enough to claim it can or cannot be used in my design. If it can be used, I agree I would have a more robust solution. Maybe you can help me with this. I am reproducing the sketch of the timing chart of the AD (names are generic):

              ________________________
sync    _____|                        |_____......
        _______________      ____      ____
clock                  |____|    |____|    |_.....
                            ________
data    ___________________|        |________.....

Sync is a low frequency (5MHz) external clock that signals the AD to start a conversion and the PL logic to store previous deserialized data and to prepare to deserialize a new stream. Clock is the serial clock from the PL to the AD. As you see, it is idle for a while then pulses (@160MHz) for the AD to send its data and then returns to idle. A new bit of data is sent after a falling edge of clock but the delay between the falling edge and the new bit varies due to AD delays (it varies a LOT) and tracks delay.

Do you think SERDES could be used to decode this pattern?

Finally, are you sure this interface can't be captured statically? At 160MHz SDR (which is what this looks like) your bit period is 6.25ns. For most clocking schemes this is absolutely no problem for the I/O of the FPGA - in some clocking schemes you can get down to around 1.5ns. So if your clock is in a clock capable pin, then you can sample this conventionally on the 160MHz clock (assuming the device provides at least somewhat a reasonable clock/data timing relationship). Even if the clock is not on a clock capable pin, this interface is so slow that it might still be possible to capture statically. If this clock is gapped, this presents some additional challenges (you can't use an MMCM on it, so you have to use direct BUFG or BUFR/BUFIO clocking), but you can still do the initial sampling on that clock, and then transfer it via a number of easier clock crossing schemes to a continuously running clock.

Clock is not continuous as described above. I am using a ODDR register to create the output clocks (one of the changes I did to improve timings and get timing closure). Because of the variable delays aforementioned, the data input is treated as asynchronous.

0 Kudos
1,051 Views
Registered: ‎01-22-2015

Re: No input/output delays

Jump to solution

@avrumw 

In the oversampling mode you describe, how does ISERDES avoid the problem of metastability (ie. where are the synchronizers)?

Thanks,
Mark

0 Kudos
Historian
Historian
1,016 Views
Registered: ‎01-23-2009

Re: No input/output delays

Jump to solution

markg@prosensing.com 

In the oversampling mode you describe, how does ISERDES avoid the problem of metastability (ie. where are the synchronizers)?

That's a good question.

We don't know the exact structure of the internals of the ISERDES, but we can speculate a bit. The high speed side of the ISERDES is likely a shift register; on each edge (rising, or rising and falling, depending on the mode) of CLK the D input is sampled by the "first" flip-flop in the shift register and the remaining bits are shifted down. When we look at it this way, the shift register looks like a set of metastability registers:

  • The output of the FF is sampled by one and only one receiver (I will revisit this in a minute)
  • The path between the two FFs are constrained in terms of maximum propagation delay (they are in the ISERDES so are fixed, and likely very small)

However the ISERDES is not just a serializer - at the rising edge of CLKDIV, the current content of the serializer is transferred to the slow speed clock in a parallel fashion. So that (slightly) violates the first point above - each flip-flop in the serializer chain has two things sampling it - the next element in the shift register and its corresponding bit in the parallel domain. But, since we only do the parallel transfer once every N clocks, on any one clock cycle, only one of these is true - it either shifts down the chain, or it loads into the parallel transfer. Once data is loaded into the parallel transfer registers, the contents of the shift register become irrelevent until the shift register has completely refilled (N clock edges later). So these two together don't technically violate the "sampled by only one receiver" rule. Of course, the shift register and the parallel register are all in the ISERDES so all paths meet the second requirement.

So, depending on which bit we are talking about (the one just before the parallel transfer or the next one) we have anywhere between 2 and N+1 "metastability chain like" flip-flops. This, of course, assumes that the parallel transfer uses bits 0:N-1 of the serial chain - we don't know that - there could be (say) N+6 bits in the serial chain and the parallel load uses 6:N+5 (thus ensuring there are at least 8 flip-flops in the metastability chain). Similarly, we don't know if the parallel sample goes directly to the Q pins of the ISERDES, or there is a 2nd set of FFs between the parallel FFs and the Q outputs (I suspect the latter is true). If you look at UG471 "ISERDESE2 Latencies", v1.9, p.156, we see that in NETWORKING mode, the latency is 2 CLKDIV cycles (which means there is more FFs somewhere).

Even if we that it is using 0:N-1, and there is only one level of parallel flops, then there still (at least) 2 metastability-like flip-flops. These are on the high speed clock domain, or even twice that (since they are DDR), so 2 may not be enough. If we are really worried about that, we can metastability harden the output of the ISERDES for further metastability resolution; this will increase the latency, but is still valid, since we can see each bit sampled as being in its own metastability chain. And since we know that we are "oversampling", only one of the original samples can have been metastable - so only one bit in the oversampled parallel domain will be potentially unknown - all the others will be properly synchronously sampled values.

And we know this works - Xilinx tells us so. First of all, lots of Xilinx IP and whitepapers use the ISERDES for oversampling. But even the documentation of the ISERDES itself says it is OK to oversample - the ISERDES has an OVERSAMPLE mode (see UG471 "OVERSAMPLE Interface Type", v 1.9, p. 153). The configuration is different here - the sampling flip-flops have been moved around, but even here (where we have no shift register), it clearly shows that the transfer to the outputs of the ISERDES have at least 3 FFs in a row, all implemented with fixed resources....)

Avrum

Adventurer
Adventurer
998 Views
Registered: ‎10-31-2017

Re: No input/output delays

Jump to solution

Hello, @avrumw

Thank you for the explanation. TBH I did not follow half of it but I think I may have grasped the general idea.

If ISERDES may be used as a super shift register where, say, the serial stream is shifted on the rising edges of the 3x clock and the parallel output is sampled on the falling edges, then it could be easily used as the front end of my deserializer logic. I cannot elaborate but with this logic metastability is not a problem.

Maybe there is even a simpler way of using ISERDES, depending on how this oversample mode works. I will take a deeper look at the documentation when I have a chance. 

Anyway, for the time being, using set_input_delay with a delay slightly higher than the minimum physical limit has forced the P&R to place the synchronizers close to the IOBs, making the skew between the inputs of the synchronizers of the three subchannels (one per clock phase) less than 0.3ns (worst case - most are around 0.15ns). FWIW the deserializer would work even with a skew of 1ns. 

Some implementation runs have produced one timing error due to one input with a marginally negative slack. After changing implementation strategy from Default to PerformanceExplore there was no error, at the cost of a few extra seconds on compilation time.

0 Kudos