cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
taghavi_x
Visitor
Visitor
550 Views
Registered: ‎02-26-2020

OSERDES performance and LVDS_25 IO propagation delay

Hello,

I have a problem connecting some information together,

In Spartan6 grade -3 OSERDES(SDR,LVDS io, BUFPLL clock, DW=4) can perform at 1080 Mb/s (T_period < 1ns) and at the same time T_IOOP of a LVDS_25 speed grade -3 is 1.65ns how these two piece of information can be interpreted ?

 

 Additionally in post PR simulation of my design on a XC6SLX45T-FGG484-3 I cannot get a signal with period of 2ns through a LVDS_25 obufds ! ? the signal comes at the I. of the OBUFDS but output is constant when I reduce the frequency output follows.

P.S. I'm working on SP605 eval board.

 thank you in advance for your help.

 

All the best

Amirali

0 Kudos
5 Replies
drjohnsmith
Teacher
Teacher
533 Views
Registered: ‎07-09-2009

The easy answer is 

  I don't know, I just believe the tools when I constrain them

     but I'd like to know what Xilinx say

My "guess" is its where the route is from, one being a DDR   the other the serdes, but that is a guess.

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
taghavi_x
Visitor
Visitor
386 Views
Registered: ‎02-26-2020

Hi,

I think that I wait long enough for a comment from Xilinx,

I should say I don't really understand your answer, I can not relate "route", "DDR" and "SERDES" maybe I was not clear enough

so I write my question in more detail once more,

in "Spartan-6 FPGA Electrical Characteristics",  PP. 18, table 25 it is mentioned  that 

SDR/DDR LVDS transmitter, BUFPLL/2xBUFIOS Datawidth=4-8 Speedgrade -3 should workd 1080Mb/s (>500MHz)

1) I'm assuming when LVDS is mentioned it means the performance of transmitter including IOB otherwise "LVDS" here serves no purpose.

 2) if this is internal performance of O/ISERDES2 without IOB why is that even matters !?

to make matter more complicated there is another document XAPP1064 with title of  "Source-Synchronous Serialization and
Deserialization (up to 1050 Mb/s)" 

in contrast in the same document ("DC & switching char.") PP. 221 Table 28,  TIOP for speed grade -3 for IOB LVDS_25 is mentioned 

1.01ns (my first post is wrong) this means LVDS_25 freq <500Mhz I also double check this with IBIS model of "LVDS_25_TB_25" from Spartan6 IBIS model file

there is a delay of 1.2 ns (from time zero until pad reaches its 50%, in best case, 50ohms termination, with V_fixture=1.14V)

 

I further investigate the situation on my SP605 board,

I routed out an LVDS signal to the FMC connector and loop it back to FPGA on the FMC card

on transmitter side I have an OSERDES, SDR, 1000Mhz, DW=4bits(fixed value 1010) on the Receiver side I have BUFIO2 with the output

CLKDIV connected to PLL_ADV and a banal 32bit counter to create a blinker, in this case when I program my FPGA the PLL is locked and the LED blinks, if I back-calculate

the blink frequency to the input frequency (BUFIO2 divider ratio, PLL_ADV divider and counter) I came to 500Mhz.

I don't have differential probe and suitable measurement device to carry out test with engineering standard.

 

 I know this test is not so meaningful as it depends on PVT, and which die seats in my FPGA (which respect to speed corner) but everything is so confusing

and I cannot see if a product is design with this FPGA what can be guaranteed and what can be expected !?

 

All the best

Amirali

0 Kudos
avrumw
Expert
Expert
362 Views
Registered: ‎01-23-2009

So, you have sort of figured out the problem. The LVDS performance is supposed to be 1050Mbps (or maybe 1080, I'll use 1050), which is 952ps, but the propagation delay through the OBUF is 1.65ns. How can these two work together?

The answer is that (since Xilinx says so) it can. This essentially means that you have more than one bit in the OBUF at the same time! This seems to be impossible, but it isn't. We are assuming the OBUF is one monolithic buffer, but it (almost certainly) isn't. It probably has several stages of pre-driver (and pre-pre-driver, etc...) before the very large final buffer (which is really a whole bunch of smaller buffers in parallel). Given that it is a chain of buffers or inverters, it is certainly possible to have more than one bit propagating through different parts of the buffer simultaneously. So this covers the physical "how can this possibly work".

Now for simulation. 

Timing simulation uses models of the cells. When it comes to this characteristic (the input changes more rapidly than the propagation through the cell), there is no single "correct" way to model this. The above example is a good one

  • If the OBUF were a monolithic cell, then the output could not switch this fast
    • The actual observed result would most likely be a signal that doesn't make any legal transitions - it would probably oscillate with very low amplitude around the midpoint
    • The simulator would be correct in modelling this as either not changing the output or driving it to an X
  • If the OBUF were a chain of buffers, then the output would be legal, allowing more than one transition to be "in the cell" at the same time

So simulators have to make a decision, and they do, and (at least in most simulators) the decision is customizable - defining the delay as "inertial" or "transport". 

If the OBUF is monolithic, the delay through it would be inertial - if the cell can't complete a propagation between two input edges, the output stays the same.

If the OBUF is a chain of buffers, the delay though it would be (at least partly) transport - the cell output will follow the cell input even if the input edges arrive faster than the propagation delay of the cell.  There is also an interaction with the pulse rejection (pulse_r) and pulse error (pulse_e) limits, which allow you to specify how small a pulse can get through a single element. 

I glanced at the XSIM documentation, but it isn't clear exactly how these are set, or how you should set them - it may not even be possible to change it in a way that makes sense; these parameters are global, and the internal interconnect of the FPGA will not work if things are changed to a purely (for example) inertial propagation delay.

So, I don't know how to fix it, but I am sure this explains why your simulations don't appear to be working. You can confirm this by sending a more complex patter - try something like 010100110011000111000111 - you will see at some point the longer strings of 0's and 1's will start getting through... 

But, even if you can't simulate it, this doesn't mean it will not work on the real FPGA.

Avrum

taghavi_x
Visitor
Visitor
284 Views
Registered: ‎02-26-2020

Hi Avrumw,

 Thank you for taking time and the extensive reply.

"The answer is that (since Xilinx says so) it can." seems to me like a verse from an Holly script (just a gleeful opening from my side). your answer trigger me to look into another direction that I think I should have done before but I'm not so smart and so fast after all.

 I look into many other companies LVDS repeater and buffers. The rule that "f_max = 1/(2*propagation delay) and data rate >= 2*f_max" always applies(more on my thought about f_max down below).  

about the idea that Xilinx have come up with some sort of travelling wave amplifier for their LVDS buffer or interpreting the chain of pre-drivers and output driver as a distributed circuit seems to me unrealistic. after all wave length at 1 GHz in Silicium(er=11) or a low K dielectric will be >  90mm which is several time the size of the package itself, if I understood your comment correctly !.

 most of the companies have treated their LVDS buffer as an analog rather than a pure digital black box. depending on jitter, Vod and relation to the LVDS standard they have

characterized their buffer and came up with these numbers which are still open to the designer to exploit them for his purpose. in Xilinx LVDS case f_max = 1/2*(1.01ns) = 495 MHz  = 990 Mbps (2 toggle per cycle) the extra mileage (I think) come from the fact that Vid(min) for LVDS is |100|mV and LVDS_25 Vod(min) is |247|mV which can be  interpretation depending of require noise margin in circuit, trace loss, .... .

 

 your comment about modeling of the buffer and description of "Transport" and "Inertial" delay was very enlightening. But why Xilinx couldn't have come-up with a way to model it properly or at least provide a configurable model assuming HDL language capabilities and their Simulator(and assuming that I've done everything correct, that I still doubt it) is out of my area of Expertise. May be a comment from Xilinx can solve the issue.   

 About the limitation of IBIS in fully capturing the true behavioral of the buffer, as far as I could have done google-kind-of-research it is a known issue(even has further issues with different line impedance, etc). maybe an hspice simulation of the buffer model was the best solution but it is only for reach people.

 after all said, all these efforts and time I've spend on this issue could have been save with a 10min of the time of an Xilinx expert but apparently they don't care for the petit costumers.

 All the best

Amirali

avrumw
Expert
Expert
241 Views
Registered: ‎01-23-2009

"The answer is that (since Xilinx says so) it can." seems to me like a verse from an Holly script (just a gleeful opening from my side).

No, it's not. While there are some places where Xilinx can stretch the truth a little for marketing purposes, Xilinx does not outright lie in their datasheet. If the datasheet says something is possible then it is - they have characterized it and demonstrated that it works.

about the idea that Xilinx have come up with some sort of travelling wave amplifier for their LVDS buffer or interpreting the chain of pre-drivers and output driver as a distributed circuit seems to me unrealistic. 

It is not. Anyone who has worked with ASIC style I/O drivers know that you cannot simply take the output of an internal (ASIC core) cell, which has nano-Ampere drive capabilities and drive a huge buffer in the IO that has tens of milli-Ampere drive capability - it just can't be done. You need to take the signal through a chain of successively larger buffers (or inverters) to get from your min-sized buffer to the final buffer that can drive your load (it is even provable that the ideal ratio between the sizes of these buffers is e;  2.71828 - I remember doing the proof when I was in University). When you have a chain of buffers or inverters, each one of these buffers/inverters will exhibit the characteristics you describe - they can't pass a signal through that is faster than the propagation through that buffer. But if you have (say) 5 buffers in this chain, each one of them has a delay that is roughly 1/5th of the total delay of this chain, and hence you can pass a signal through that is substantially faster than the aggregate delay of the 5 buffers. The OBUF delay you are seeing (T_IOOP) is this aggregate delay.

Avrum