UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Observer iso_larry
Observer
439 Views
Registered: ‎06-10-2015

Feeding multiple deserializers with a single clock

Jump to solution

Hello, I have to acquire 32 LVDS input pairs clocked by a single 500 MHz LVDS clock with a Spartan 7. The deserializer primitives used by the SelectIO interface wizard have a maximum data width of 16 bits, therefore I need to use two of them fed by the same clock. If I understand correcly, I cannot configure the deserializers to have an external clock, because I cannot connect both of them to a single input pin pair. Therefore I need to use an internal clock generated by an MMCM.

If I understood correctly, by using an internal clock both the "input" clock and the "divided" clock are inputs for the deserializer. In other words I need to feed the MMCM with my input clock and generate two output clocks, one at the same frequency and phase (in this case 500 MHz) and the other divided by the multiplex ratio (in this case 500/8=62.5 MHz).

Is that correct? Sorry if the question is trivial but I'm quite new to these primitive.

Thanks!

L.

0 Kudos
1 Solution

Accepted Solutions
Historian
Historian
407 Views
Registered: ‎01-23-2009

Re: Feeding multiple deserializers with a single clock

Jump to solution

First, realize that the SelectIO Wizard is nothing other than a wizard to build an "interface" using a collection of cells that exist inside the FPGA. The Wizard is, in this case, interconnecting up to 17 IBUFDS components (for bringing 16 differential data signals + 1 differential clock into the FPGA), one BUFIO and one BUFR to buffer the clock, potentially 1, 16, or 17 IDELAY cells to delay the clock and/or data, and 16 ISERDES components for deserializing each of the data bits.

There is nothing magic about the number 16 other than the SelectIO Wizard doesn't support the building of wider interfaces. This is not a limitation of the device. So by manually instantiating all the required components you can build wider interfaces.

BUT - there are some physical limits based on different clocking schemes. Take a look at this post on input interface clocking schemes. In all of these schemes, the clock must be on a clock capable pin (or differential pair if it is LVDS).

Using the "fastest" clocking scheme - Direct using BUFIO and/or BUFR - all the data bits must be in the same I/O bank. However, there are only 24 pairs of differential I/O in a bank - this means that using this scheme you can only build interfaces up to 23 bits wide (one pair for the clock , the other 23 for data). If these signals were single ended, then you could do interfaces up to 49 bits (a bank has 50 pins, but only 24 pairs of them can be combined for differential data).

If you need to go wider, you can use the BUFMR to connect to the BUFIO/BUFR in the I/O banks above and below the clock capable pin. This would allow you to build interfaces that are 71 signals wide (using all 3*24 differential pairs in 3 adjacent banks). But the BUFMR reduces the maximum performance of the interface. The architecture of this is shown in UG472 Appendix A.

Any of the global clocking schemes (BUFG direct or BUFG via MMCM) can be used for any number of inputs - up to the total number of differential pairs of pins on the die.

Again, you can build any of these manually without the artificial limitation of 16 signals imposed by the SelectIO Wizard. One way to do this easily is use the Wizard to build the interface for 16 bits (in a fake project) then copy the generated RTL source from the wizard and modify it to do the width you want (it should actually be pretty easy to do this). Then add this file as an RTL source to your real project. Note: Never modify the output product of an IP within the project where it was created - Vivado can (and will) overwrite your modifications during the build process.

But, if you choose to use an MMCM then your clocking scheme is correct, assuming your deserialization is 8:1 SDR or 16:1 DDR (since your DIVCLK is 1/8 of your CLK).

For all of these approaches, though, be aware, 500MHz is pretty fast. If the interface is SDR this gives you 2ns per bit period, which is probably doable with most clocking schemes (as long as the sending device has reasonable clock-to-data skew). But it is not trivial to get the timing of this right - you need to write perfect timing constraints and tune your interface (with IDELAYs or with the MMCM phase shift) to the exact right setting. If it is 500MHz DDR, then the bit period is only 1ns - at this speed, no clocking scheme is fast enough for static capture of this interface.

Avrum

Tags (1)
8 Replies
Historian
Historian
408 Views
Registered: ‎01-23-2009

Re: Feeding multiple deserializers with a single clock

Jump to solution

First, realize that the SelectIO Wizard is nothing other than a wizard to build an "interface" using a collection of cells that exist inside the FPGA. The Wizard is, in this case, interconnecting up to 17 IBUFDS components (for bringing 16 differential data signals + 1 differential clock into the FPGA), one BUFIO and one BUFR to buffer the clock, potentially 1, 16, or 17 IDELAY cells to delay the clock and/or data, and 16 ISERDES components for deserializing each of the data bits.

There is nothing magic about the number 16 other than the SelectIO Wizard doesn't support the building of wider interfaces. This is not a limitation of the device. So by manually instantiating all the required components you can build wider interfaces.

BUT - there are some physical limits based on different clocking schemes. Take a look at this post on input interface clocking schemes. In all of these schemes, the clock must be on a clock capable pin (or differential pair if it is LVDS).

Using the "fastest" clocking scheme - Direct using BUFIO and/or BUFR - all the data bits must be in the same I/O bank. However, there are only 24 pairs of differential I/O in a bank - this means that using this scheme you can only build interfaces up to 23 bits wide (one pair for the clock , the other 23 for data). If these signals were single ended, then you could do interfaces up to 49 bits (a bank has 50 pins, but only 24 pairs of them can be combined for differential data).

If you need to go wider, you can use the BUFMR to connect to the BUFIO/BUFR in the I/O banks above and below the clock capable pin. This would allow you to build interfaces that are 71 signals wide (using all 3*24 differential pairs in 3 adjacent banks). But the BUFMR reduces the maximum performance of the interface. The architecture of this is shown in UG472 Appendix A.

Any of the global clocking schemes (BUFG direct or BUFG via MMCM) can be used for any number of inputs - up to the total number of differential pairs of pins on the die.

Again, you can build any of these manually without the artificial limitation of 16 signals imposed by the SelectIO Wizard. One way to do this easily is use the Wizard to build the interface for 16 bits (in a fake project) then copy the generated RTL source from the wizard and modify it to do the width you want (it should actually be pretty easy to do this). Then add this file as an RTL source to your real project. Note: Never modify the output product of an IP within the project where it was created - Vivado can (and will) overwrite your modifications during the build process.

But, if you choose to use an MMCM then your clocking scheme is correct, assuming your deserialization is 8:1 SDR or 16:1 DDR (since your DIVCLK is 1/8 of your CLK).

For all of these approaches, though, be aware, 500MHz is pretty fast. If the interface is SDR this gives you 2ns per bit period, which is probably doable with most clocking schemes (as long as the sending device has reasonable clock-to-data skew). But it is not trivial to get the timing of this right - you need to write perfect timing constraints and tune your interface (with IDELAYs or with the MMCM phase shift) to the exact right setting. If it is 500MHz DDR, then the bit period is only 1ns - at this speed, no clocking scheme is fast enough for static capture of this interface.

Avrum

Tags (1)
Observer iso_larry
Observer
395 Views
Registered: ‎06-10-2015

Re: Feeding multiple deserializers with a single clock

Jump to solution

Thank you, very useful and comprehensive answer!

The device I'm going to interface is a dual channel ADC (Texas Instruments ADC07D1520) which provides both SDR and DDR output. Based on Spartan-7 datasheet (DS189 table 15), the -2 speed grade devices should handle both modes at 500 MHz, but I suppose that this is specified in the ideal situation: everything on one bank, no funky clock buffers in the middle, etc.

I was planning to start with SDR and then try DDR, which would be beneficial for reasons related to the system architecture (i.e. it would allow me to sample additional signals at the same ADC rate). With DDR I could fit everything on a single bank, since in this mode I would have only 8 data lanes (7 from the ADC plus the already mentioned additional signal). I will place the pins so that I can test both modes in the prototype.

However I will start easy, my first try will be SDR and MMCM. Do you think it's beneficial for the performances to derive all the four clocks from the MMCM (two at 500 MHz and two at 62.5 MHz), in order to route them separately to the two serializer blocks?

Thank you again!

L.

0 Kudos
Historian
Historian
375 Views
Registered: ‎01-23-2009

Re: Feeding multiple deserializers with a single clock

Jump to solution

Based on Spartan-7 datasheet (DS189 table 15), the -2 speed grade devices should handle both modes at 500 MHz,

Be very careful with Table 15 - you are still in the "marketing" portion of the datasheet. These numbers are only vaguely attainable under ideal conditions, and it even states in the comments, that these results are using Dynamic Phase Adjustment (DPA). As I have been saying all along, implementing DPA is very complicated - there are a few app notes, but no "hard" solution to this problem. Furthermore, the main downfall of all DPA solutions is that it is impossible to "prove" that they work across process/voltage/temperature (PVT) - you cannot do static timing analysis (STA) on a dynamic phase adjustment scheme. So, without skew lot parts (which Xilinx does not supply), a temperature chamber and a variable voltage supply (and a huge amount of engineering effort) to do shmoo testing on your system, there is no way to gain confidence that a DPA solution will work in production.

As a result, DPA should always be considered an absolute last resort.

All my focus has been on static capture - if you define and constrain this correctly and the tools say the interface meets timing, then this is guaranteed to work across PVT. However, I am telling you now that 500MHz DDR is impossible to do statically. At 500MHz SDR it will depend greatly on your solution - needing 32 channels will make this very challenging (if not impossible).

Using the MMCM for clocking is also a non-starter for 500MHz SDR. If you look at table 54 of DS189, you will see that all parts in all speed grades need more than 2ns data windows when using BUFG with MMCM clocking (which is what Tpsmmcm/Tphmmcm are showing).

So, your choices are

  • Get the pin count down to 23 or below so that you can use a single bank with BUFIO/BUFR clocking - this will probably work
  • Use the BUFMR and 3 adjacent banks so that you can handle all 32 bits, but I doubt that this will meet timing - the BUFMR adds a lot of extra window requirement
  • Use DPA. I will warn you now, you will not get a lot of help in designing and qualifying a DPA solution from the forum (or elsewhere)

Avrum

Tags (1)
0 Kudos
Observer iso_larry
Observer
351 Views
Registered: ‎06-10-2015

Re: Feeding multiple deserializers with a single clock

Jump to solution

Ok, now I'm worried.

I've set up a "skeleton" project with this solution (Spartan7, SDR, MMCM). I still have to figure out how exactly timing constraints work in Vivado (in ISE everything seemed much easier, even if maybe less flexible) but assuming that I've set up everything correctly, it gives me very bad timing violations: -1 ns slack for each input signal, which with a 2 ns clock period sounds very bad.

Do you think my work would be easier by switching to a higher 7 series model? Actuallly the numbers for Kintex or even Virtex don't seem very different.

P.S. Apparently the easier solution to my problems would be to duplicate the ADC output clock externally from the FPGA, so that I can use two different clock inputs for each channel and limit the I/Os to 16. Of course this will add up external propagation delay (e.g NB6N11S adds between 270 and 470 ps), but I can quantify it and add a constraint to verify if it checks out.

L.

0 Kudos
Observer iso_larry
Observer
215 Views
Registered: ‎06-10-2015

Re: Feeding multiple deserializers with a single clock

Jump to solution

Hi, I've made some further tests by duplicating clocks externally, as I've described in the previous message.There are some improvements indeed, but it still seems very hard to get correct timings. The clock path requires almost 3 ns:

without_delay.png

The ADC datasheet claims a data-to-clock delay of ±50 ps, therefore the input timing constraints were set as something like:

set_input_delay -clock [get_clocks adc1_clocki] -min -add_delay 0.95 [get_ports {adc1_input_in[*]}]
set_input_delay -clock [get_clocks adc1_clocki] -max -add_delay 1.05 [get_ports {adc1_input_in[*]}]
etc...

The only way I've found to remove timing violations is to add external delay composed by the sum of four contributions:

  1. The ADC data-to-clock delay, as in the previous case (min=0.95 max=1.05);
  2. An additional 2 ns delay representing the fact that I will sample the data clocked on the previous cycle;
  3. The input-to-output delay introduced by the LVDS clock buffer (min=-0.47 max=-0.27)
  4. The additional delay introduced by two-inches long microstrips added to the clock path (min=-0.30 max=-0.26)

Adding all of that contributions, I get:

set_input_delay -clock [get_clocks adc1_clocki] -min -add_delay 2,18 [get_ports {adc1_input_in[*]}]
set_input_delay -clock [get_clocks adc1_clocki] -max -add_delay 2,52 [get_ports {adc1_input_in[*]}]
etc...

And doing this, I'm (barely) within timing constraints with slacks of about 0.15 ns for both setup and hold:

with_delay.png

In theory the Vivado timing analysis should consider the worst-case scenario, and I've also considered the worst-case for every external delay source. But something tells me that this kind of tricks are not realistically achievable. Am I wrong?

L.

0 Kudos
Historian
Historian
211 Views
Registered: ‎01-23-2009

Re: Feeding multiple deserializers with a single clock

Jump to solution

The only way I've found to remove timing violations is to add external delay composed by the sum of four contributions:

I am not following what you mean by "adding delays", but lets put this aside for the moment.

Assuming the original constraints are correct, it is pretty easy to see if an interface is even possible or not, simply by looking at the Slack of the setup and hold check together.

Right now your setup check has a negative slack of 1.173ns. If you look at the hold time check for the same interface (or at least for the same bit of the same interface) the interface is only viable if it as positive slack that is greater than or equal to 1.173ns. If this is not the case (Setup Slack + Hold Slack < 0) then the interface cannot be captured statically.

If the sum is greater than 0 then the width of the provided window is larger than the width of the required window, but it is in the wrong place. If we can find a way to delay either the clock or data accordingly, then it should work, as long as the mechanism for adding the delay doesn't widen the required window by more than sum of the slacks. This can delay is usually accomplished with an IDELAY which does widen the window, so going back to the viability, you need the sum of the slacks to be fairly positive - if the sum is greater than about 0.5ns then you have a chance.

 

An additional 2 ns delay representing the fact that I will sample the data clocked on the previous cycle;

If you do need to change the edge relationship, you shouldn't do it by adding delay to the constraints - you should do it the "right way" using the set_multicycle_path command - take a look at this post on constraining DDR input interfaces.

Avrum

 

0 Kudos
Observer iso_larry
Observer
185 Views
Registered: ‎06-10-2015

Re: Feeding multiple deserializers with a single clock

Jump to solution

Thank you Avrum, very informative as usual! By "adding delays" I mean adding actual external delays such as microstrips and buffers, to change the aligment between data and clock that enters the FPGA so that I can change the timing constraints accordingly.

My though process was as follows:

  • The ADC outputs data at the clock falling edge ±50 ps, therefore (assuming to sample the data at rising edge) we have a min (hold) constraint of 0.95 ns and a max (setup) constraint of 1.05 ns;
  • Then I've added the 2 ns (one cycle) delay. I suspected that this wasn't the correct way to do it, but could this actually introduce a different result than using the set_multicycle_path command?
  • Then I've considered the external delays that I will "physically" add to the external clock, which are the LVDS buffer (propagation delay between 0.27 and 0.47 ns) and the 2-inches microstrip (propagation delay between 0.26 and 0.30 ns for standard FR-4). Since I'm delaying the clock, these contributions have to be subtracted to the min and max constraints, and the worst case (the one that widens the "wrong" window the most) is to subtract the bigger delays to the min value and the smaller delays to the max value.

Adding all together,

min = 0.95+2-0.47-0.3 = 2.18 ns
max = 1.05+2-0.27-0.26 = 2.52 ns

Using these constraints, I get a worst slack of 0.153 ns (setup) and 0.057 ns (hold), which are positive, but summed together are quite less than your 0.5 ns reference value. Should I conclude that this solution is not enough "safe"?

L.

0 Kudos
Historian
Historian
171 Views
Registered: ‎01-23-2009

Re: Feeding multiple deserializers with a single clock

Jump to solution

Using these constraints, I get a worst slack of 0.153 ns (setup) and 0.057 ns (hold), which are positive, but summed together are quite less than your 0.5 ns reference value. Should I conclude that this solution is not enough "safe"?

Your analysis seems reasonable. My comment about needing 0.5ns of margin was on the "feasibility" analysis - before you adjust the timing of the clock/data relationship, and assuming that you would use the IDELAY to do this adjustment. Adding the IDELAY to the path costs you about 0.5ns of margin (or a little less).

But in your case, you are fixing the timing on the board, so if, after you have accounted for everything (i.e. duty cycle, clock jitter, etc...), the tool says both setup and hold are passing - even with 0 margin each - then your interface will work.

Avrum

0 Kudos