cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
marcp
Observer
Observer
426 Views
Registered: ‎01-20-2017

Questions regarding Clock Domain Crossing

Hello,

I have already implemented a couple of clock domain crossings, using asynchronous FIFO for data and XPM_CDC_SINGLE, XPM_CDC_PULSE as well as XPM_CDC_ASYNC_RST macros for control signals. As I am now working on a big design with multiple clock crossings, I am currently diving a bit deeper into this topic. While plenty of resources can be found online and also in this forum (thanks for that!) there are still a couple of questions that I have been unable to answer yet. I hope someone could help me out.

1) I understand the concept behind an FF-Synchronizer and I assume that such a synchronizer is also used inside of XPM_CDC_SINGLE. However, it seems that a simple FF-Synchronizer is only working properly when the source clock is slower than the destination clock. Is this correct? What is the correct way to transfer single-bit signals (not only pulses, where a Toggle-Synchronizer could be used) from a fast to a slow domain? And is there some Xilinx module, IP or macro for that (like XPM_CDC_SINGLE)?

2) If my source clock and my destination clock are in phase (e.g. they are derived from the same main clock and one is 2x the other), do I still have to implement a proper CDC? In my understanding it should be possible to omit the synchronizer etc. and for 'slow to fast' only capture every second cycle in fast domain, while for 'fast to slow' just keep the signals stable for two clock-cycles in the fast domain.

3) Is it ok to use the XPM_CDC_ARRAY_SINGLE macro for multi-bit control values, if I can guarantee that the output value in the destination clock domain is only processed a couple of clock-cycles (> synchronizer-FF count) after a value change?

4) Just to be sure: When using an asynchronous FIFO, it does not matter if go from 'slow to fast' or from 'fast to slow', right?

5) Are there maybe some additional sources covering the topic of CDC in a greater detail than the usually introductory sources on the web? E.g. source which also differentiate between 'slow to fast' and 'fast to slow' crossings or between clocks that are in phase and clocks that are completely asynchronous.

Thank you in advance,

Marc

0 Kudos
Reply
4 Replies
avrumw
Guide
Guide
404 Views
Registered: ‎01-23-2009

However, it seems that a simple FF-Synchronizer is only working properly when the source clock is slower than the destination clock. Is this correct?

Not quite. It only works if the transition rate (or data rate) of the signal is slower (the rule of thumb is 1/2 the rate) of the destination clock.

So even if the source comes from a faster clock domain, if this signal is guaranteed never to make more than one transition in N clocks, and N*Tfast_clk  is significantly bigger (again, lets say 2x) than Tslow_clk, then this will work (the real number isn't two, it is really one plus some characteristics of the rest of the system, but people use two to be safe).

An example is a reset signal or a global enable. Assuming these stay in one state "long enough" before transitioning to the other state, then a simple synchronizer (or metastability hardener) is sufficient for this clock domain crossing circuit (CDCC). It is important to note that the simple synchronizer (N back to back flip-flops) need the ASYNC_REG property set on all flip-flops in the synchronizer chain and need an exception on the path between the last flip-flop on the source domain and the first flip-flop in the synchronizer chain. The source data must come from a flip-flop on the source domain - not combinatorial logic.

If the signal signal is not limited to one transition every 2*Tslow_clk, then this kind of synchronizer doesn't work - it can miss transitions (which are now effectively pulses). But, in this case, there is no synchronizer that will work, so you need to find a way of "making this true" (I will come back to this)

If my source clock and my destination clock are in phase (e.g. they are derived from the same main clock and one is 2x the other), do I still have to implement a proper CDC?

No you don't need a "true" CDCC in this case. Well technically you are clock domain crossing so you need a CDCC, but it isn't an asynchronous CDCC - you don't have to worry about metastability or bus coherency, just about not missing data. What you describe are the basics - edge detecting a pulse from the slow domain to the fast domain, holding data for multiple clocks on the fast domain to cross to the slow domain, etc... These are all clock domain crossing techniques, but are not asynchronous CDCCs. Furthermore, the timing of these are understood by the tools and hence you don't need any timing exceptions.

Is it ok to use the XPM_CDC_ARRAY_SINGLE macro for multi-bit control values, if I can guarantee that the output value in the destination clock domain is only processed a couple of clock-cycles (> synchronizer-FF count) after a value change?

Yes. But you have to be very careful with this. You also need to ensure that the skew on the paths between the source and destination domain are controlled - either with a set_max_delay -datapath_only, or with a set_bus_skew, or both. This is known as a MUX synchronizer or a CE synchronizer - you properly synchronize a signal that tells you when a transition has occurred and then sample the data some delay after that when the signal is known to be stable. The main problem with this is that it is slow - you need to have multiple destination clock periods where the data is stable - something like 5 or 6 (since it needs to be stable before the earliest possible capture and after the latest possible capture).

Just to be sure: When using an asynchronous FIFO, it does not matter if go from 'slow to fast' or from 'fast to slow', right?

From a static timing point of view, yes. But from a system point of view you also have a throughput issue. If you are going from a fast domain to a slow domain, and data is written to the FIFO every fast clock period, then the FIFO will quickly overflow since you can't empty it fast enough. This is the fundamental problem for all fast->slow clock domain crossings - you must manage the data rate so that the slow domain doesn't "miss" data.

A common way of dealing with this is by parallelizing it. Lets say your fast domain is just a bit faster than your slow domain. The way to bring "full rate" data across the domain is to widen the FIFO to hold 2 or more samples in parallel (lets say N). In the fast domain collect N successive samples into a word that is N times the width of a single sample. Push that into the FIFO - this will happen with a rate of Ffast_clk/N. The slow domain can then pop these groups of samples at a rate of Fslow_clock a long as Fslow_clk > Ffast_clk/N (you don't need any margin - as long as it is strictly greater then you are OK). This is what I was alluding to before - you must control the data rate crossing the clock domain - the data rate in this case is 1/N of the clock rate.

A similar technique can be used for pulses from the fast domain to the slow domain. You figure out how often you can transfer data from the fast domain to the slow domain (real data - say using the CE or MUX synchronizer described above). Lets say that's every 8 clocks. On the fast domain, every 8 clocks you count the number of events that occur - this can be a number somewhere between 0 and 8. You then cross this number to the slow domain at a rate of Ffast_clk/8. This is what I alluded to above about "making it true" that the data rate on the fast domain is slow enough to cross to the slow domain.

Are there maybe some additional sources covering the topic of CDC in a greater detail than the usually introductory sources on the web? E.g. source which also differentiate between 'slow to fast' and 'fast to slow' crossings or between clocks that are in phase and clocks that are completely asynchronous.

Not that I know of. But again the "fast to slow" vs. "slow to fast" isn't that complicated - you just need to manage the data rate. Clocks that are in phase are handled by the tools from a static timing point of view - you only need to manage the data rate. So, while there are lots of styles of CDCCs, they all boil down to the same things:

  • Manage the data rate
  • Manage metastability 
  • Manage bus coherency

If you take care of all of these (when they apply) including the timing exceptions that apply as a consequence of these, then that is all you need to do.

Avrum

Tags (2)
marcp
Observer
Observer
331 Views
Registered: ‎01-20-2017

Dear @avrumw ,

thank you very much for your detailed post! While answering all my questions it is a great improvement for my understand about clock domain crossing. I really appreciate this!

In the design I’m currently working on the majority of the clock domain crossings are between clocks that are in phase and one clock is 2x as fast as the other, exactly as described in question 2). After reading your answer to this question, I'm wondering if you could maybe elaborate a bit more on how to design such "synchronous CDCCs" that they are properly recognized by the tool (Vivado in my case)?

The design, which I inherited from another developer, already includes some kind of "synchronous CDCC" at a few places. With a self-written module, input signal pulses are stretched to 3-6 clock cycles in source domain and then directly transferred to the destination domain where an edge detection is performed to generate the output signal. While this crossing seems to be fine from a theoretical timing point of view, Vivado does not recognize it and produces a setup-violation. Do you have an example for “synchronously crossing” single-bit and vector signals in such a way that Vivado does recognize it? Are there any (special) constraints required?

Thank you very much,

Marc

0 Kudos
Reply
dsheils
Moderator
Moderator
321 Views
Registered: ‎01-05-2017

Hi @marcp 

You mentioned above that "While this crossing seems to be fine from a theoretical timing point of view, Vivado does not recognize it and produces a setup-violation. Do you have an example for “synchronously crossing” single-bit and vector signals in such a way that Vivado does recognize it? Are there any (special) constraints required?"

Why do you think that Vivado does not recognise it? Vivado should recognise a 'synchronous cdc'. Do both of your clocks come from the same source? If so  then it sounds like you simply have a setup violation.

Vivado has commands such as report_cdc and report_clock_interaction (available from the GUI also) that can help you analysis clock interactions in your design. Is the report_clock_interaction showing any unsafe crossings?

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
0 Kudos
Reply
avrumw
Guide
Guide
298 Views
Registered: ‎01-23-2009

I'm wondering if you could maybe elaborate a bit more on how to design such "synchronous CDCCs" that they are properly recognized by the tool (Vivado in my case)?

Paths that go between two different related clocks aren't "true" CDCCs, they are merely a different subset of normal synchronous timing paths. Vivado (and other tools) don't need to recognize them, since, from a static timing analysis point of view, they are "just normal synchronous paths". So anything that uses normal synchronous design and gets the job done is what you need here.

input signal pulses are stretched to 3-6 clock cycles in source domain and then directly transferred to the destination domain where an edge detection is performed to generate the output signal

Things like this are perfectly acceptable for these applications. Assuming that the number of cycles they are stretched is sufficient to to ensure that N*Tsrc_clk >= Tdst_clk (for synchronous crossing greater than or equal is sufficient, for true CDCCs, this needs to be "significantly bigger - say 2x) then this will work to bring an "event" from one domain to the other (I prefer toggle synchronizers, since they can handle more events per second). But there is nothing wrong with this. And, again, the tools don't need to recognize these as CDCCs, nor will they (the report_cdc commands will not report on synchronous CDCCs).

... and produces a setup-violation.

Now this is a problem. But it isn't because the tool does not recognize the CDCC, it is because of something "wrong" with your design. If these two clocks are truly synchronous then it should be very difficult to fail setup (unless there is a large combinatorial network on this path). But for the clocks to be synchronous they must

  • Come from the same clock source (i.e. the same MMCM)
    • The tools can deal with things that are more complex, but this is the best starting point
  • They must not have a "really weird" ratio between them
    • 1:N and N:1 are clearly fine, as are reasonable N:M. But when N and M get really large with no common divisors (like, say 31:32), then these become difficult
  • Use the same kind of clock buffer
    • i.e. either both use BUFG or both use BUFH, but not use one of each
    • mixing different flavors of BUFGs (BUFG, BUFGCTRL, BUFGMUX, BUFGCE, BUFGCE_DIV) are all fine - these are all really the same resource
  • In UltraScale/UltraScale+/Versal the clocks must be in the same CLOCK_DELAY_GROUP

If all these things are done, then there shouldn't be a setup violation between these domains. If you are still seeing one, then you need to figure out which one of these isn't being met (assuming the problem isn't as I mentioned above, simply due to too much combinatorial logic along the paths).

Avrum

0 Kudos
Reply