09-28-2015 08:06 AM
Hi members, I am seeking advice from timing experts to help me through my first high(er) speed interface task.
Basically, I have a Virtex 4 xc4vlx25-11ff668 which outputs a source-synchronous parallel bus, with clock, at 450Mhz in DDR format (differential signals on all). I need to input these signals into a Zynq xc7z100-2ffg900. Once the data is in the Zynq, I will operate upon the data at either half the clock rate (225MHz) or a quarter of the clock rate (112.5MHz), so I will need to do a little bit of buffering.
In the Virtex 4 design, it uses ODDR and OBUFDS on the data lines, and uses an ODDR with D1 = '1' and D2 = '0' (the suggested clock-forwarding scheme) and an OBUFDS on the clock. We have a clock constraint on the input clock source to tell it "this is a 450Mhz clock", and we DO end up with a timing score of 0. We could change how this works if needed.
In the Zynq, our plan is to have the data and clock going through IBUFDS and IDELAYE2. The clock will go through a BUFG and the data will go into an IDDR. I can put static delays on the IDELAYE2s to meet timing, and then use the dynamic delay calibration if needed at run time.
1) How do I find out what the setup/hold time will be coming out of the Virtex 4? Is this something I can use ISE to generate a report for? I need to find this out so I can create constraints in the Zynq's Vivado project so the timing closure tools know what to expect.
2) Which document/spec can I look up to find the setup/hold times for the IDDR? I guess that will tell me if this design is even feasible.
3) Is the plan for the Zynq architecture good? There seems to be a lot of options for both clock and data: BUFG, BUFR, DCM, MMCM, IDDR, ISERDES. Not sure which choices are "appropriate" for this type of interface.
09-28-2015 08:56 AM
For the output timing of the Virtex-4 you can look at the datasheet report, which is one of the outputs of the timing report. If you constrain your output bus using the REFERENCE_PIN keyword, it will produce a report on the bus. An example is in UG625, v14.5 p.187. In your case you would create a timegroup for your forwarded data
NET TxData[*] TNM = TNM_FwdData;
Then constrain the output pins relative to your input clock, but with the reference pin of the forwarded clock TxClock
TIMEGROUP TNM_FwdData OFFSET = OUT AFTER CLK_IN REFERENCE_PIN “TxClock” RISING;
TIMEGROUP TNM_FwdData OFFSET = OUT AFTER CLK_IN REFERENCE_PIN “TxClock” FALLING;
Now the datasheet will report the max skew on the bus relative to this clock.
However, this isn't really necessary. If the ODDR for the clock and the data are driven by the same clock and are "relatively close together" (ideally in the same bank), then the skew will be very small - you can probably assume +/-100ps.
The document for constraints in Vivado (which I recommend you use) is UG903.
If this is your first exposure to XDC/SDC constraints, then I highly recommend taking the Vivado Design Suite Advanced XDC and Static Timing Analysis for ISE Software Users class from Xilinx.
I also recommend taking a look at this forum post.
Message 4 of this post shows the three(ish) ways of clocking an input interface. Each one has different timing characteristics. The post was for a V4, but the choices are the same in Zynq, except that instead of the DCM you would use an MMCM (and hence the timing Tpsmmcmcc/Tphmmcmcc), and there is no equivalent to source synchronous mode of the MMCM.
The one you are proposing is the worst of the three, and definitely will be too slow to capture your data.
Unfortunately, none of these is likely to be good enough. A 450MHz DDR interface has a 1.11ns Unit Interval (UI). With the +/-100ps of the V4, this brings this down to around 900ps at the receiver. Duty cycle of the internal V4 clock, jitter, and board signal integrity will reduce this further. The best static clocking in the Zynq (Tpscs/Tphcs) requires larger windows - even in the fastest speedgrade, you need over 1ns...
So, to capture this interface, you will need to use some form of training or dynamic calibration...
09-28-2015 10:35 AM
Thank you very much for the detailed response avrumw. Are you saying that static clocking into the Zynq for 450Mhz DDR is just not going to be possible no matter what I try, or just for the architecture that I mentioned (IDDR) ?
I guess I'm a bit confused by this since DS191 Table 60 says that ILOGIC switching characteristic has a setup/hold time of 0.01/0.29 ns, Table 62 says that ISERDES setup/hold is -0.02/0.12 ns, Table 72 says BUFIO max clock tree is 800 Mhz, etc. These all seems to be numbers smaller than 900 ps. How does Tpscs/Tphcs stomp them all?
09-28-2015 11:10 AM
Yes, 450MHz DDR will not be possible with static capture using any clocking mechanism.
The ISERDES setup/hold is for the cell itself based on the clock pin of the cell. To use the ISERDES, you have to get the clock to the ISERDES clock pin. This is done using an IBUF, a BUFIO and the I/O clock network. All of these exist on the die and incur delay and (more importantly) process/voltage/temperature (PVT) dependent delay variation. It also doesn't include the IBUF on the data pin. These all contribute to the uncertainty of the capture, and hence results in the Tpscs/Tphcs of the complete clock mechanism (which is what this is - the pin to pin setup/hold requirement when using IBUF, BUFIO, I/O clock network, for the clock insertion and IBUF, ISERDES for the data path.
The Tpscs/Tphcs is for a given I/O standard - the number might be slightly better for LVDS (for example) - you would have to implement it and use Vivado to time it for you. But, its almost certainly not going to be "enough" better to capture a sub-900ps window.
09-28-2015 12:01 PM
Thanks again! You are helping me with some serious mental breakthroughs :
So the large Tpscs/Tphcs is related to PVT variations, and those uncertainties are too large to gaurantee a successful static capture from a 450MHz DDR interface. Cool that makes sense. Dynamic calibration/training is the way to go as it will effectively compensate for those variations, and then I'm back to just needing to satisfy the ILOGIC's setup/hold time (which should also be proven out after the calibration).
1) When I'm building the project in Vivado, do I just ignore any timing errors I get related to that input path, knowing that I'm just going to be doing dynamic calibration at run time? Or set those signals as "false path"?
2) Is there a rule of thumb on when to perform calibration? Just once at powerup? Or do I need to monitor temp/voltage and recalibrate if they drift too far?
09-28-2015 12:35 PM
For the timing of dynamically calibrated inputs, its probably best to define some set_input_delay commands for them, and then declare the paths false. This will prevent them from showing up in either the failed timing reports or the "check_timing" checks.
As for what kind of calibration to do, its really hard to tell. I generally prefer continuous time calibration mechanisms that continuously track the process/voltage/temperature instead of one-time calibration mechanisms, but it depends on the application. Many calibration mechanisms revolve around sampling the incoming clock with itself to determine where the transitions are, and then sampling the data out of phase with that (for a DDR interface that means 90 degrees away) - this can be done continously, but I haven't found a "good" answer record that describes the mechanism...
11-02-2015 07:50 AM - edited 11-02-2015 07:57 AM
I'm still struggling with this. I put the ADC into a training pattern mode and capture the data and adjust the IDELAYs accordingly on a per-bit basis. That appears to be working generally well for short bursts, and I get a decent range of tap values that work (usually tap value 3 to 12 is about average). I'm always choosing the center of the "good" range.
The problem is that when I get enough training patterns, I eventually get some glitches/mistakes no matter which tap value I use. I tried to improve my scenario by keeping the 24 bits into 2 adjacent clocking regions and using BUFIO/BUFR, and it did seem to improve, but not completely fix. I'm at about 99.5% accurate, but that is not good enough for our application.
I then moved to ISERDES, but also did not have much more luck. I then tried to make the tools treat the clock as an even worse clock by setting the jitter constraint to 300 ps. I still meet timing with the exception of the input pins to the IDDR (set as a false path in constraints). We have a high quality power supply powering the board and some nice fans keeping everything cool (yes I did try to turn the fans off to make sure they were not inducing issues).
I then played around with the differential termination on the zynq and the output constraints on the virtex 4 (drive strength, rise speed) nothing seemed to make a difference.
Is there anything left to try? Does this seem like a hardware integrity issue as opposed to a FPGA design flaw? I attached my IDDR setup (only showing 1 bit per clock region, but it's actually 12 bits per clock region). For the FIFOs I use to cross the data into the global clock domain, I wait until both FIFOs declare that they are not empty, then wait a few more clock cycles, and then read continuously. It should theoretically be driven from the exact same frequency (generated CLK_A and CLK_B from the same source on the Virtex 4).
11-03-2015 02:17 PM
I don't see anything wrong with the setup you have.
This kind of instability is often a signal integrity issue.
How are the signals getting from the V4 to the Zynq? Are they routed as proper differential signals on the board (and I assume these FPGAs are both on the same board)? Are you sure you are using compatible I/O standards (i.e. LVDS or LVDS_25, depending on the I/O banks)?
At these frequencies signal integrity is critical. Any little issue with how the signals are routed, the I/O standards, noise on the board, power planes, etc... can cause enough degradation to make these data eyes become unusable.