cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Visitor
Visitor
676 Views
Registered: ‎05-29-2019

Constraining 200 MHz parallel output port to system clock

Hello everyone,

I am trying to figure out how to give the tool (Vivado 2018.1) the right constraints for my design.

We have a modulation circuit inside the FPGA working that output I&Q samples @ 200MHz to the DAC. 16bit bus for each.

The Clock is coming from the DAC, goes through a BUFG then MMCM were we re-generate 200, 100 and 50 MHz for different modules in the design.
So I have my "DATA_CLK" coming from the DAC to the FPGA Pin, and my 200M_CLK coming from the MMCM. The latter is the one I use for my modulation circuit.
Both clocks are synchronous and phase-aligned

 

The timings given by its datasheet on input parallel ports are : 
ts= 4.6 ns and th= -1.5 ns (worst case) which gives a datavalid window of 3.1ns (which I think is big for a clk period of 5ns..)


Now I tried several combinations of constraints, and I always ended up with timing failures...

I think I should define my constraints related to the input DATA_CLK, and not the internal 200M_CLK, right ?

The constraints I think reflect the input timings of the DAC are : 

set_output_delay -clock [get_clocks iDATA_CLK] -max 4.6 [get_ports {oPORTA_IO[*] oPORTB_IO[*]}]
set_output_delay -clock [get_clocks iDATA_CLK] -min 1.5 [get_ports {oPORTA_IO[*] oPORTB_IO[*]}]


With this I got failed timings in intra-clock path 200M_CLK to DATA_CLK.
Arrival time= 4.660 and Required time= 0.250
obviously the hold time of the component is a problem versus the 5ns period data clock...

Is it possible to add delay to the data or clock, like add +5ns of delay on clock so I get my data 1 clock cycle later (which is not a problem)


I am a little bit lost as it's the first time I have to do this with Vivado.

The same design was working on Spartan6 with the constraint :

NET "B0_PORTA_IO(*)" TNM = PortA_out;
TIMEGRP "PortA_out" OFFSET = OUT 14 ns AFTER "B0_GCLK16";
INST "B0_PORTB_IO(*)" TNM = PortB_out;
TIMEGRP "PortB_out" OFFSET = OUT 14 ns AFTER "B0_GCLK16";

 and I think the person who did that tried to add more delay to pass timings.


What's the best way to get this to work ?

Thank you all for reading me

 

 

0 Kudos
Reply
6 Replies
Guide
Guide
630 Views
Registered: ‎01-23-2009

This is a terrible clocking structure. In order to meet timing on this the clock has to:

  • come from the DAC
  • traverse the board to the pin of the FGPGA (I will call this point A)
  • come in through the IBUF of the FPGA
  • go through the MMCM
  • go through the BUFG
  • go through the global clock network (I will call this point B)
  • go through the IOB flip-flop (I am assuming you are using the IOB flip flop)
  • go through the OBUF
  • traverse the board back to the DAC
  • meet the setup and hold time requirements of the DAC

It is true that the MMCM is working on compensating for all the delay between point A and B - so in an ideal world this would be 0ns, but its not - it has some variation, and some fixed delay. But even without that, you have a LOT of delays, all adding uncertainty (across process/voltage/temperature - PVT) to the timing of the output. The biggest one is the OBUF; depending on the IOSTANDARD, SLEW and DRIVE of the OBUF, this delay can be REALLY LARGE - way larger than 5ns, with upwards of 3ns of PVT uncertainty. And there is nothing the tools can do about this - the constraints will not change the timing of this, just tell you by how much it is failing - and, since the constraints look right, it is likely telling you it is failing by a fair amount...

Most DACs have two clocks - a DAC clock (for actually clocking the analog section of the DAC) and a "data clock", which is intended to be generated by the device that is sending the data to the DAC (in this case the FPGA). This is done because it is far easier to meet timing in a source synchronous (or clock forwarded) clocking mechanism than what you have here, which is (what I sometimes call) "worse than system synchronous"...

I don't know if (or how) this system worked on your Spartan-6. Timing passed in ISE since the constraints were meaningless - basically the FPGA was told "take as long as you want for these outputs, the DAC has essentially no timing requirements (14ns!)", so you wouldn't have gotten any failure reports from ISE. But any system built this way would likely be very unreliable - you would be regularly violating the setup/hold requirement of the DAC, and hence it one would expect to (at least at some combinations of PVT) see improperly captured digital values, which would result in incorrect analog outputs.

Now, with a 3.1ns window, it might be possible to get this working with very fast I/O - for sure something set ti SLEW=FAST and DRIVE=24mA (which will likely cause ringing and other bad SI issues), but it also might not...

So before we go looking at how to fix the timing violations, you should first look at the clocking structure. Does this DAC have a "data clock" (to be forwarded from the FPGA), and, if so, is it too late to modify the board to take advantage of this. If not, then you should absolutely do that. Only if that is not an option should you try and fix this with the FPGA (by modifying the I/O attributes, and, almost certainly, the internal clocking...)

If you can, tell us what DAC you are using (part number) and/or send a link to the datasheet...

Avrum

0 Kudos
Reply
Visitor
Visitor
582 Views
Registered: ‎05-29-2019

Thank you for this complete answer !


I agree with you on everything, the architecture, the bad timings in ISE (I think the previous engineer did that to get rid of the timig errors more than giving the tools the proper numbers), the fast IO and SI problems I will get with that much drive.

 

unfortunately the DAC doesn't have an input CLK to clock input data, only output... So this prevents me from changing to a source-synchronous, which I would have prefered. I have to make it work (right now it's working, but without constraints it could stop working from a version to the other) with this architecture.


The DAC is an AD9788, with an input REF_CLK of 800MHz, an oversampling ratio of 4 and an output DATA_CLK of 200MHz to the FPGA (which is a Zynq Z020).

I can see that the part has some king of input timing optimization circuitry, that can be manually controled or set to auto. In the previous design, the auto functionnality was not used, but this can be used to help.

The most important thing would be to constraint the design so that all 2x16 bits of parallel samples get output at the (almost) exact same moment.
Then if the group arrives late, I could use the timing correction on the DAC input to help with hold or setup if one fails due to the delay.

 

How can I define a group constraint so that the tool forces all these signals to have the same delay ? I did not find any constraint that would do that for .xdc 


Also, what can I modify in the internal clocking to optimize timings ? In most of our designs I try to keep less clocks and more clk_EN, and run clock not too fast (not more than needed)
In this design, I use the 200MHz clk for the last stages of the process (RF FIR & CIC filters with *5 and *50 interpolation) and the derived 100MHz to clock everything else so there's only 2 clocks. I chose to run the 200M DATA_CLK into MMCM then output a clk_200M and clk_100M, would it be better to generate only the clk_100M and use directly the 200M DATA_CLK from DAC instead of re-generating a 200M through MMCM ?

 

Thank you for the help & advices !

0 Kudos
Reply
Guide
Guide
572 Views
Registered: ‎01-23-2009

unfortunately the DAC doesn't have an input CLK to clock input data, only output... So this prevents me from changing to a source-synchronous

(Wierd device...) - that appears to be true...

I can see that the part has some king of input timing optimization circuitry, that can be manually controled or set to auto. In the previous design, the auto functionnality was not used, but this can be used to help.

(Again, wierd device...). Yes - if, after doing the timing anlysis you determine that it is impossible to statically meet the setup/hold time of the device (which is the preferred mechanism) then you should turn this on. There is no way of knowing how much of a window this really needs, nor if it is guaranteed to be "hitless" - some of these mechanisms will only adjust after an error has occurred... But if this is all you have, then you should use it.

The most important thing would be to constraint the design so that all 2x16 bits of parallel samples get output at the (almost) exact same moment.
Then if the group arrives late, I could use the timing correction on the DAC input to help with hold or setup if one fails due to the delay.

The best way to manage this is to use the IOB registers on the outputs. If all the data come directly from the IOB registers, then the skew between them is mostly the clock skew inside the FPGA (which is bounded) as well as some "on die" variation from pad to pad, and the substrate skew (which is also bounded). The only way to minimize it further is to place the I/Os carefully - for example, the 16 I/Os for a single interface should be placed in one I/O bank, and should be "closer" to the center (or reflected around the center) of the bank - that will minimize the clock skew.

How can I define a group constraint so that the tool forces all these signals to have the same delay ? I did not find any constraint that would do that for .xdc

If you use the IOB registers then constraints won't matter - there is nothing for the tool to change that can introduce any change in delay. If you really needed to, I suspect the "set_bus_skew" command could be used to constrain the skew on the outputs (this command was introduced in one of the 2018 versions - I can't remember which one), but, again, if you use the IOB registers (which is the "best" solution) then this won't do anything.

Also, what can I modify in the internal clocking to optimize timings ?

For skew, it doesn't matter if you use the DATA_CLK or the clk_200M (the input or output of the MMCM) - as long as both are on a dedicated clock network (a BUFG/BUFH/BUFR/BUFIO) then the skew should be the same. I don't even think a different clock resource (BUFG vs. BUFH vs. BUFR) will make a difference - maybe the BUFIO will be lower, but I'm not even sure of that (and there are lots of limitations with using the BUFIO). However, be careful with your clock crossings - bringing data between two outputs of the same MMCM is no problem (so your clk_100M to clk_200M is handled correctly), but crossing between different kinds of clock buffers (i.e. BUFR vs. BUFG) or between inputs of your MMCM and outputs (DATA_CLK to clk_200M) are more complicated - the tools will probably handle them, but they will likely result in larger hold time problems for the tool to fix.

Avrum

0 Kudos
Reply
Visitor
Visitor
518 Views
Registered: ‎05-29-2019

Sorry for the delayed answer, I've been developing other parts of the design and didn't have the time to look closely at the timings.

I had read all you said and updated my constraints, I'm now using these : 

set_property IOB TRUE 	[get_ports {oPORTA_IO[*] oPORTB_IO[*]}]
set_output_delay -clock [get_clocks iDATA_CLK] -max 4.6 [get_ports {oPORTA_IO[*] oPORTB_IO[*]}]
set_output_delay -clock [get_clocks iDATA_CLK] -min 1.5 [get_ports {oPORTA_IO[*] oPORTB_IO[*]}]


The iDATA_CLK is the clock from the DAC, that enters the FPGA and then the MMCM.
PORTA_IO and PORTB_IO regs are clocked on the CLK_200M from MMCM's output.
I'm not sure if the constraints should use CLK_200 or DATA_CLK ? Because the DAC's input section is clocked on DATA_CLK so that what I chose as reference for the constraints, but the internal data is clocked with CLK_200M. Should I "resample" my PORTx_IO with one more register stage clocked on DATA_CLK (the reg tha is in the IOB)

anyway I'm getting timing failures of 2 to 5 ns so I'm guessing this would be really hard to get working..
In fact, it is working everytime, but without being in the constraints...

0 Kudos
Reply
Teacher
Teacher
515 Views
Registered: ‎07-09-2009

Just a note,

set_output_delay  does NOT set the output delay.

Its a constraint and say the output delay must be less than max more than min.

   If not it fails constraints. It does not move the clock / data for you.

There are IOB delay blocks that can be used to move data, or the phase of the clockcan be shifted using the MMCM,

 All a bit advanced stuff. though fun to learn.

 

 

 

<== If this was helpful, please feel free to give Kudos, and close if it answers your question ==>
0 Kudos
Reply
Visitor
Visitor
510 Views
Registered: ‎05-29-2019

Don't worry I'm not trying to set an output delay.
I use the set_output_delay constraint to let the tool know about the setup and hold requirement of the device at the destination of the output port.

 

But I was thinking of using delay block or phase shifting on the input clock to help timings pass. That's my next move to try to improve the timing results.

0 Kudos
Reply