cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Anonymous
Not applicable
10,173 Views

After par timing issue using MMCM generated clk

 

 

 

 

Hello, I'm seeing some very strange post PAR behaviour in my Virtex6 based design.

Practically speaking I've a design  which has two clock domains. The first is driven by o GC pin with an external clock signal at 250 MHz. The second should theoretically run with an internal generated 70 MHz clock signal.

I used the coregen wizard to generate a 250 to 80 MHz PLL using a MMCM block (bufg input bufg output, balanced perf... internal feedback,...) -> 1 input and 1 output. And now the strange thing:

when I use the MMCM generated 70 MHz clock signal I get an after PAR max frequency of 35 MHz. This could theoretically make sense if the design would have been done using bad design rules (which anyway isn't the case), ...   BUT, in order to troubleshot my design I removed the MMCM generating the 70 MHz and I added an additional clock input driving the same net which was previously driven by the MMCM output... the after PAR max frequency jumps up to 206 MHz.

Since the design hasn't changed and using an external clock signal the global timing constrain is passing with a big margin I assume there is something wrong with the MMCM generated by coregen.

Below I reported a couple of details for the 2 implementations and the obtained timings.

I also attacche after PAR static timing analisys and the xco file for the pll.

 

 

Internal MMCM generated clk_fx3 case:

 

ucf file content

 

NET "clk_virt_p" TNM_NET = "clk_virt";

NET "clk_virt_n" TNM_NET = "clk_virt";

TIMESPEC "TS_clk_virt" = PERIOD "clk_virt" 4 ns HIGH 50%;  #the 250 MHz differential clk

 

 

Clock Information:

-----------------------------------    +------------------------                +-------+

Clock Signal                             | Clock buffer(FF name)         | Load  |

----------------------------------      -+------------------------               +-------+

clk_virt_p                                 | IBUFDS+BUFG                  | 432   |

clk_virt_p                                 | MMCM_ADV:CLKOUT0+BUFG  | 351   |

-----------------------------------    +------------------------                +-------+

 

 

SINT Maximum frequency:    385.840MHz

PAR Maximum frequency:     35.374MHz

 

 

External fake clk_fx3 case

 

ucf file content :

 

NET "clk_virt_p" TNM_NET = "clk_virt";

NET "clk_virt_n" TNM_NET = "clk_virt";

TIMESPEC "TS_clk_virt" = PERIOD "clk_virt" 4 ns HIGH 50%;  #the 250 MHz differential clk

NET "clk_fx3" TNM_NET = "clk_fx3";

TIMESPEC "TS_clk_fx3" = PERIOD "clk_fx3" 14.2 ns HIGH 50%;  #the virtual 250 MHz differential clk

 

 

Clock Information:

-----------------------------------+------------------------                    +-------+

Clock Signal                        |   Clock buffer(FF name)             | Load  |

-----------------------------------+------------------------                    +-------+

clk_virt_p                            |   IBUFDS+BUFG                     | 432   |

clk_fx3                                |   IBUFG+BUFG                      | 351   |

-----------------------------------+------------------------                    +-------+

 

SINT Maximum frequency:    326.052MH

PAR Maximum frequency:     206.186MHz

 

Any help on solving this issue would be appreceated.

0 Kudos
7 Replies
avrumw
Guide
Guide
10,163 Views
Registered: ‎01-23-2009

You need to examine the worst path with the MMCM. You will amost certainly find that it is a path between the 250MHz clock domain and the 70MHz clock domain (or vice versa).

 

ISE has the concept of related and unrelated clocks.

 

When you bring the two clocks in on separate pins, they are treated as unrelated (unless you do something special). Any paths between unrelated clocks are unconstrained by default (you can look at the unconstrained path report to confirm this). Therefore the tool does not attempt to do any optimization or analysis of these paths.

 

When both clocks come from a common input (either through an MMCM or not), the two clocks are considered as related clocks. Now the paths between them are constrained; the tools will figure out the closest approach between the two clocks (and considering the frequencies here, that is going to be a VERY small number), and then does the timing against that (which is obviously failing badly).

 

If you really are crossing between these two domains

  a) you will need a proper synchronization circuit on the path and

  b) you will need a timing exception (a FROM TO constraint) on this path to override the default timing analysis on the path

 

Avrum

0 Kudos
Anonymous
Not applicable
10,139 Views

Thank you for your valuable inputs.

 

There is still something I don't understand. Yes there are data crossing clock domains in both directions.

When data or control signals are crossing clock regions they are doing it using classical a double FFD  synchronizer, dual clock fifos and flag transport structures ( handshaked double FFDs ) and this both in case I use the internal and the external clock signal. Since using the external clock a realistic post P&R  maximal frequency is reported I was assuming cross between clock domains have been correctly performed.. isn't it ?

 

When I use the external 70 MHz clock (no MMCM) I get( in the post P&R Static timing analysis)  a lot of reports like this:

 

Paths for end point fx3_interface_inst/clk_cross_bus_size/s_bus_1_2 (SLICE_X64Y53.CX), 1 path

--------------------------------------------------------------------------------

Slack (hold path):      -0.182ns (requirement - (clock path skew + uncertainty - data path))

  Source:               fx3_interface_inst/s_cmd_in_data_size_2 (FF)

  Destination:          fx3_interface_inst/clk_cross_bus_size/s_bus_1_2 (FF)

  Requirement:          0.000ns

  Data Path Delay:      0.115ns (Levels of Logic = 0)

  Clock Path Skew:      0.262ns (1.710 - 1.448)

  Source Clock:         s_fx3_clk_in_g rising

  Destination Clock:    s_sys_clk_BUFG rising

  Clock Uncertainty:    0.035ns

 

  Clock Uncertainty:          0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE

    Total System Jitter (TSJ):  0.070ns

    Total Input Jitter (TIJ):   0.000ns

    Discrete Jitter (DJ):       0.000ns

    Phase Error (PE):           0.000ns

 

  Minimum Data Path at Fast Process Corner: fx3_interface_inst/s_cmd_in_data_size_2 to fx3_interface_inst/clk_cross_bus_size/s_bus_1_2

    Location             Delay type         Delay(ns)  Physical Resource

                                                       Logical Resource(s)

    -------------------------------------------------  -------------------

    SLICE_X66Y53.CQ      Tcko                  0.098   fx3_interface_inst/s_cmd_in_data_size<3>

                                                       fx3_interface_inst/s_cmd_in_data_size_2

    SLICE_X64Y53.CX      net (fanout=4)        0.106   fx3_interface_inst/s_cmd_in_data_size<2>

    SLICE_X64Y53.CLK     Tckdi       (-Th)     0.089   fx3_interface_inst/clk_cross_bus_size/s_bus_1<3>

                                                       fx3_interface_inst/clk_cross_bus_size/s_bus_1_2

    -------------------------------------------------  ---------------------------

    Total                                      0.115ns (0.009ns logic, 0.106ns route)

                                                       (7.8% logic, 92.2% route)

 

... which I was supposing reports a clock domain cross (since source and destination clock aren't the same). This also tells me that ISE isn't considering the domains as independent... I'm I wrong ?

 

 

 

For crossing control signals I'm using the following  code:

entity clk_cross_flag is

            port (

                        clk_a                : in  std_logic;

                        flag_in                         : in  std_logic;

                        clk_b               : in  std_logic;

                        flag_out           : out  std_logic

            );

end clk_cross_flag;

 

architecture beh of clk_cross_flag is

 

            signal  flagtoggle_clka                        : std_logic;

            signal  synca_clkb                               : std_logic_vector(2 downto 0);

            begin

 

      proc_clka : process(clk_a)

      begin

 

         if rising_edge(clk_a) then

            if (flag_in = '1') then

               flagtoggle_clka <= not flagtoggle_clka ;

            end if;

         end if;

      end process;

 

 

      proc_clkb : process(clk_b)

      begin

         if rising_edge(clk_b) then

            synca_clkb <= synca_clkb(1 downto 0) & flagtoggle_clka ;

         end if;

      end process;                                 

      flag_out <= synca_clkb(2) xor synca_clkb(1);

 

end beh;

 

 

for bus without simultaneous cross requirements instead:

 

entity clk_cross_bus_simple is

            generic ( width : natural := 1 );

            port (

                        bus_in              : in  std_logic_vector(width -1 downto 0);

                        clk_b               : in  std_logic;

                        bus_out                       : out std_logic_vector(width -1 downto 0)

            );

end clk_cross_bus_simple;

 

architecture beh of clk_cross_bus_simple is

 

            signal  s_bus_1                                    : std_logic_vector(width-1 downto 0);

            signal  s_bus_2                                    : std_logic_vector(width-1 downto 0);          

           

            begin

 

                        bus_out <= s_bus_2 ;

                       

                        proc_clkb : process(clk_b)

                        begin

                                                           

                                    if rising_edge(clk_b) then

                                                s_bus_2 <= s_bus_1;

                                                s_bus_1 <= bus_in;

                                    end if;

                        end process;

 

end beh;

 

 

 

 

Are this structure the right one to cross a clock domain? If yes I just have to set TIGs on all cross paths ?

 

Thanks in advance and best regards.

Joel

0 Kudos
avrumw
Guide
Guide
10,135 Views
Registered: ‎01-23-2009

Its not a question of whether the clock crossing structures are corret or not.

 

The structure of the synchronizers is irrelevent to static timing analysis - whether they work properly or not is a functional issue. As far as static timing analysis (STA) is concerned these paths are "regular" paths until told otherwise with an exception; they start at a flip-flop on one domain and end on a flip-flop on another domain.

 

For most clock crossing circuits, an exception is required to make the clock crossing circuit complete. Without them, you are letting the tool do its default analysis of these paths, which is (generally) incorrect for the paths between clock domains. The only difference between the condition when the two clocks come from the same MMCM vs. when they come from independent inputs is that the tools do different incorrect things on these paths.

 

As I mentioned above, the tool treats paths between clocks from the same MMCM (related clocks) differently than paths between clocks that come from different inputs (unrelated clocks).

 

For unrelated clocks, the paths are considered unconstrained and are not timed. Since there are no constraints on the paths when the two clocks come from independent pins, obviously they do not fail STA - so your design meets timing. This may actually be underconstraining your design, which is bad (it may fail functionally - albeit very rarely).

 

For related clocks, the tools have a different behaviour; they attempt to find the closest approach of the rising edges of the two domains, and use that as the constraint between the clocks. Now your design fails to meet timing.

 

They are both incorrect. For (most) clock crossing circuits, you need a timing exception that is appropriate for the clock crosser. These may be TIGs (for simple synchronizers), but may also need to be FROM:TO constraints for more complex ones. When specified correctly, these will override the "default" behavior of the STA tool - be it the rule for unrelated or related clocks.

 

Avrum

 

0 Kudos
avrumw
Guide
Guide
10,134 Views
Registered: ‎01-23-2009

Looking at the two synchronization circuits, the one for the clk_cross_flag is simple enough that it doesn't need any additional constraints. Unless latency of the clock crosser is critical (i.e. it must happen on the very next edge vs. it can wait one more clock period with nothing bad happening), then a TIG is acceptable for this crosser.

 

For the second one, its less clear. What data is on the bus when you are trying to cross the domains? How do you ensure bus coherency through the clock crosser - the code here does not address the coherency issue, so either

  - this clock crosser may not function properly

  - you address the coherency issue outside this module - i.e.

       - you sample the data only when it is known stable (using some flag to determine when it is stable, which is crossed using the other synchronizer) or

       - the bus is Gray coded

 

If it is something like the last two (Gray coded, or sampled when stable), then the crosser will only work if the skew is controlled - either the skew between the different bits of the Gray code or the skew between the flag and the data bits if you sample when stable. In these cases, a TIG is not really correct - a FROM:TO (or MAXDELAY) is required to constrain the propagation time between the source domain and the destination domain. If this is the case, then (as I mentioned before) the crosser was underconstrained when using independent clocks (where the path wasn't timed).

 

Avrum

0 Kudos
Anonymous
Not applicable
10,127 Views

Hi, as I mentioned the seccond cross structure is used only where transient bus coherency isn't required, in the other cases buss are crossed using a combination of the two structures: the bus is crossed into an inermediate register on the destination clock domain and is moved in the final register using a bus a crossed bus change flag signal (as you mentioned as one of the possible solutions)

 

What about Xilinx 2 clock FIFO macros? do I've to set TIGs there also or does are they assumed implicitly by ISE ?

 

It is still unclear to me how the static timing analizer is able to come to a meaningful conclusion (even at very low frequency) in case of an internal generated MMCM clock which is drifting aganst the source clock. There is no stable timing relation, only the port and net delays are meaningful since clock to clock and S/H times are continuously changing and drifting away,

 

Thanks, Joel

 

0 Kudos
Anonymous
Not applicable
10,121 Views

As long as I'm quite sure that all crossing points are "pretty clean" would it be correct to apply a global TIG between the two clock domains ?

Something like this:

NET " sys_clk" TNM_NET = sys_clk_grp; # the 250 MHz clock
NET "fx3_interface_inst/fx3_if_clk_pll_inst/fx3_if_clk_out" TNM_NET = fx3_clk_grp; # the MMCM generated 70 MHz clock

TIMESPEC TS_false_path1 = FROM sys_clk_grp TO  fx3_clk_grp  TIG;
TIMESPEC TS_false_path2 = FROM fx3_clk_grp TO  sys_clk_grp  TIG;

I tried it and it seems to solve the difference between internal and external asynchronous clocks. The max frequency in both cases are now the same but I don't know if what I did is correct. Considering that data path inside the crossers have 0 logic levels (just FF to FF) clock cross data path constrains can be avoided ... Right ?

Did I the correct thing or do I have to set a TIG individually on each instantiated clock crossing entity?

 

The above approach seems to work on all my data/ctrl clock domain crossing  components however I fund still some cross paths. All these paths are inside CoreGen  generated asynchronous FIFOs:

 

Paths for end point fx3_interface_inst/strm_cap_fifo_inst/U0/xst_fifo_generator/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_9 (SLICE_X61Y41.BX), 1 path

--------------------------------------------------------------------------------

Delay (hold path):      -0.196ns (datapath - clock path skew - uncertainty)

  Source:               fx3_interface_inst/strm_cap_fifo_inst/U0/xst_fifo_generator/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/rd_pntr_gc_9 (FF)

  Destination:          fx3_interface_inst/strm_cap_fifo_inst/U0/xst_fifo_generator/gconvfifo.rf/grf.rf/gntv_or_sync_fifo.gcx.clkx/gsync_stage[1].wr_stg_inst/Q_9 (FF)

  Data Path Delay:      0.242ns (Levels of Logic = 0)

  Clock Path Skew:      -0.053ns (0.821 - 0.874)

  Source Clock:         s_fx3_if_clk rising at 71.428ns

  Destination Clock:    s_sys_clk rising at 68.000ns

  Clock Uncertainty:    0.491ns

 

They are not making the timing to fail but still makes me a bit unconfident about effects and behaviours of TIGs. Why is the TIG working on all my stuff but not with the Xilinx FIFOs (FIFO generator 9.1,  with checked "disable timing violations" checkbox)?  

0 Kudos
avrumw
Guide
Guide
10,115 Views
Registered: ‎01-23-2009

This is a topic of much debate. Many people use TIGs between clocks that are covered by clock synchronizers. However, this is technically incorrect. As I mentioned above, most clock crossers do have a timing requirement, that the skew between bits of the crosser (either between the "valid" and the data, or between different bits of a Gray coded synchronizer) be within a certain limit. The limit is different depending on how the synchronizer is built, but there is a limit.

 

In FPGA, an unconstrained path can be routed in any possible way. Without a constraint it is theoretically possible to incur rediculous routing delays. However, in practice you generally won't. So, its very unlikely that you could get a synchronizer with enough skew to make it fail. So, by TIGging the path, you will get the timing analyzer to stop complaining, and you will most likely get a functioning design.

 

But some day, some one is going to have a design fail due to excessive skew on a synchronizer. That's why I prefer using FROM:TO constraints on the paths - that way we still stop the timing analyzer from complaining and we guarantee that there won't be excessive skew on the synchronizer.

 

Whichever way you go, it is acceptable to apply the constraint to the clock groups; as long as you have verified that all paths are through synchronizers. This is true both for the TIG and the FROM:TO (and since the FROM:TO is no harder than the TIG, why not do the FROM:TO?)

 

As for the builtin FIFOs it depends. In Vivado (the new tool) when you use IP, a constraint file comes with it and is automatically used. In ISE, a UCF is generated for sime cores, but you need to either add it to the project or copy its contents into your UCF file. Without the constraints (exceptions) the tools will time the path through the FIFO normally (and hence will likely fail).

 

For the Built-In FIFO, skew is not an issue - the clock crossing is all in hard logic, so there is no risk of excessive skew on the synchronizing paths. In this case a TIG is acceptable. However, it has to be put on the clocks (since the FFs inside the FIFO that do the synchronizing are part of the hard logic, so they have no names). Since this is pretty invasive (it would TIG all paths between the domains in your design not just in the FIFO), the XDC/UCF file does not include this TIG - you have to do it.

 

For BRAM/distributed RAM FIFOs, the UCF/XDC contain a FROM:TO (or set_max_delay in XDC) constraint for the FFs involved in the synchronization.

 

I don't know why you are still seeing the path in the FIFO - it should be covered by the clock domain TIG.

 

Avrum

0 Kudos