cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Observer
Observer
879 Views
Registered: ‎09-12-2019

ADC DDR capturing using FMC interface even odd in Verilog

Hi all, What's the best way to capture the data from a 14-bit ADC that outputs even bits at the rising edge and odd bits at the falling edge of the clock?

My beginner's naive approach in verilog is:

always @(posedge CLK_AB_P) //save P in even bits of CH_A_14
begin
CH_A_14[00] <= CH_A_P[0];
CH_A_14[02] <= CH_A_P[1];
CH_A_14[04] <= CH_A_P[2];
CH_A_14[06] <= CH_A_P[3];
CH_A_14[08] <= CH_A_P[4];
CH_A_14[10] <= CH_A_P[5];
CH_A_14[12] <= CH_A_P[6];
end

always @(negedge CLK_AB_P) //save N in odd bits of CH_A_14
begin
CH_A_14[01] <= CH_A_N[0];
CH_A_14[03] <= CH_A_N[1];
CH_A_14[05] <= CH_A_N[2];
CH_A_14[07] <= CH_A_N[3];
CH_A_14[09] <= CH_A_N[4];
CH_A_14[11] <= CH_A_N[5];
CH_A_14[13] <= CH_A_N[6];
end

CH_A_14 is a reg that and CH_A_P, CH_A_N are the differential inputs from the external ADC.

CLK_AB_P is the clock provided by the ADC card. 

Does anybody have a heads up/suggestion/guideline on how to deal with this problem?

FPGA: ZC706 eval kit

ADC: TI ADS62P49 (via an FMC LPC card - 4DSP FMC150 )

I'm attaching the snippet of the datasheet that talks about the differential outoput.

Thank you in advance.

Screen Shot 2020-03-31 at 2.06.02 PM.png
Screen Shot 2020-03-31 at 2.05.51 PM.png
0 Kudos
10 Replies
Highlighted
Guide
Guide
850 Views
Registered: ‎01-23-2009

So, no... 

The system you are working with is a pretty common system, and hence there are lots of "specialized" resources in the FPGA to deal with it. Lets take them one at a time.

First, lets talk about the differential signals. The differential signals (both the clock and data) are sent differentially for electrical reasons - you can do faster and better signalling on a printed circuit board (and especially across connectors) with differential signals. However, inside the FPGA they should be "converted" to normal (single ended) signals. As with all signals entering the FPGA, they need to go through input buffers - physically this is the only way for a signal to enter the FPGA. These are the IBUF and IBUFDS primitives in the Xilinx devices. It is recommended that you manually instantiate them; if you don't the tools have some ability to infer them, but you should still instantiate them. The IBUF is for single ended signals, and the IBUFDS is specifically for differential signals. Each differential pair must be placed on a differential pair of pins on the FPGA; for each pin that carries a _P signal, there is one and only one corresponding _N pin for the _N signal - you must place them correctly. Since this is on an FMC card, I am certain the designer designed the FMC card correctly and the differential pins are on proper differential pairs on the FPGA (the FMC standard is designed for the proper handling of differential circuits). Furthermore, these will have to be properly placed and configured in your XDC file; they need to have the proper pin locations set (consistent with the FMC board) and the correct I/O standard (probably LVDS or LVDS_25, depending on the I/O voltage).

Second, for DDR signals, you should use the dedicated clocked elements for capturing the data. At slower speeds (up to 300MHz to maybe as high as 400MHz) you can use the IDDR, for faster speeds you need to use the ISERDES. 

Both the input transceiver (IBUFDS) and the capture logic (IDDR/ISERDES) are documented in the 7 Series SelectIO User Guide, UG471.

Next you need to deal with the proper clocking structure. The differential clock also needs to come in through an IBUFDS (or IBUFGDS, which is just a shorthand for an IBUFDS used for a clock input). From there it will need a proper clock distribution mechanism. The clock path also needs to be manually instantiated by the user, although the Clocking Wizard will help you with some of the more complex clocking structures. The choice of clocking structure can be critical to proper capture of the data. Take a look at this post on different I/O capture mechanisms.

Finally, you must have correct and complete timing constraints (in an XDC file) for the interface in order to ensure that your system will work.

The good news is that most of this (the XDC file for pin definition and maybe even timing, and maybe even a reference design) should be available from the manufacturer of the FMC card - it is really the components on the FMC card that determine most of this (although you will need one that is specific to the ZC706 board) - you seem to need to register with them in order to find out what is available.

Avrum

Highlighted
Observer
Observer
812 Views
Registered: ‎09-12-2019

Thank you for your awesome explanation.

Each differential pair must be placed on a differential pair of pins on the FPGA; for each pin that carries a _P signal, there is one and only one corresponding _N pin for the _N signal - you must place them correctly. Since this is on an FMC card, I am certain the designer designed the FMC card correctly and the differential pins are on proper differential pairs on the FPGA (the FMC standard is designed for the proper handling of differential circuits). Furthermore, these will have to be properly placed and configured in your XDC file; they need to have the proper pin locations set (consistent with the FMC board) and the correct I/O standard (probably LVDS or LVDS_25, depending on the I/O voltage).

The schematics do show each pair in the way you mentioned, the FMC card is Vita standard. So I mapped them to the corresponding package pin of XV7Z045 in the ZC706 (table 1-33, pages 71, 72, 73 of UG954). However, I don't get why the I/O standard specified in table 1-33 is LVCMOS25 if the pins can be used as differential. For example, the differential clock inputs are AE13 and AF13 respectively, the table says they're LVCMOS25, but in my constrains I will have them set as LVDS_25 as suggested by the ADC datasheet. 

Second, for DDR signals, you should use the dedicated clocked elements for capturing the data. At slower speeds (up to 300MHz to maybe as high as 400MHz) you can use the IDDR, for faster speeds you need to use the ISERDES. Both the input transceiver (IBUFDS) and the capture logic (IDDR/ISERDES) are documented in the 7 Series SelectIO User Guide, UG471.

Sampling rate will be no greater than 20.48Mhz so looks like IDDR will do the work. I'm studying them in the guide you mentioned.

Next you need to deal with the proper clocking structure. The differential clock also needs to come in through an IBUFDS (or IBUFGDS, which is just a shorthand for an IBUFDS used for a clock input). From there it will need a proper clock distribution mechanism. The clock path also needs to be manually instantiated by the user, although the Clocking Wizard will help you with some of the more complex clocking structures. The choice of clocking structure can be critical to proper capture of the data. Take a look at this post on different I/O capture mechanisms.

I instantiate the IBUFGDS forwarding its output to my logic. Is this a correct procedure or should I use an additional BUFG first? Is that what you mean when you say "proper clock distribution mechanism" ? Sorry but I don't understand what you mean by "the clock path also needs to be instantiated manually". The tool does not infer the clock path from the IBUF to the rest of the logic? I might be conceptually wrong, but I don't see the need for a MMCM. I prefer not using the clocking wizard because It does not have a differential output and I might wanna use it in the future.

Finally, you must have correct and complete timing constraints (in an XDC file) for the interface in order to ensure that your system will work.

For this part I'm using the "Vivado Language Template > Input Delay Constraints > Source Sync > Edge-Aligned - CLK directly to FF". I'm using the setup/hold values defined in the datasheet of the ADC plus the board delay provided by the ZC706 documentation. Please let me know if that's a wrong approach. 

Many thanks again for your help!

PS: Why not the BSP ? The BSP for this design does not serve our purpose for a couple of reasons. It only delivers the ADC outputs via ethernet to a host computer. You could suggest that I can track it down to the DDR capture but the project is a patch of several other modules generated by a proprietary tool, supper buggy. The FPGA part is vhdl and for research reasons I'm required to work in verilog. Not to mention that it is only windows 7 in Vivado 2015. 

0 Kudos
Highlighted
Guide
Guide
774 Views
Registered: ‎01-23-2009

I don't get why the I/O standard specified in table 1-33 is LVCMOS25 if the pins can be used as differential

(Almost) every pin on the FPGA can be used as a single ended input, or, with its P/N partner, as a differential input - it is up to the designer (in conjunction with the board design) to determine which it is. Similarly, any bank can be used with one of several I/O voltages - again, it is up to the designer (in conjunction with the board design) to choose that appropriately. Once it is chosen, the XDC  and RTL for the FPGA project must be set accordingly; if you are going to use a pair of pins as differential, then the RTL should use an IBUF and the XDC should declare it as a single ended standard; if you are going to use it as differential, then the RTL should use an IBUFDS and the XDC should declare the pair as a differential standard.

In your case, the ZC706 board can also be both - the board merely routes the pins to the FMC connector - as far as the board is concerned, each pair can still be either two single ended signals or a pair of differential signals. It is up to the designer of the FMC card that is ultimately plugged into the FMC connector to determine that. Table 1-33 has chosen to represent them as single ended, but the differential pairs are definitely allowed. In your case, the FMC card is using them as differential, so your RTL and XDC must match.

Is this a correct procedure or should I use an additional BUFG first? Is that what you mean when you say "proper clock distribution mechanism" ?

The post I referenced documents a number of "proper clock distribution mechanisms". The simplest is the "Direct capture with BUFG", which (as the name implies) needs a BUFG between the IBUFDS and the flip-flops. The other clocking mechanisms need other things; an MMCM with BUFGs, a BUFIO and BUFR combination (or just a BUFR). So, yes, adding a BUFG is sufficient, but it may not necessarily be the best. That being said, you are only running at 20.48MHz, so even an inefficient clocking scheme is probably sufficient.

For this part I'm using the "Vivado Language Template > Input Delay Constraints > Source Sync > Edge-Aligned - CLK directly to FF". I'm using the setup/hold values defined in the datasheet of the ADC plus the board delay provided by the ZC706 documentation. Please let me know if that's a wrong approach.

Yes, this is a reasonable approach. However, the drawing you send me has the clock and data "center aligned", not "edge aligned", so you should be using the right template. This is good, though, since an edge aligned interface generally will not meet timing without a BUFG (or an IDELAY) to significantly modify the clock data relationship - even at slow clock rates.

Avrum

Highlighted
Observer
Observer
751 Views
Registered: ‎09-12-2019

Table 1-33 has chosen to represent them as single ended, but the differential pairs are definitely allowed

Thanks for the clarification in the I/O standards. From the FMC card documentation it looks like all the interfaces are all LVDS_25. I'll double check my XDC.

So, yes, adding a BUFG is sufficient, but it may not necessarily be the best. That being said, you are only running at 20.48MHz, so even an inefficient clocking scheme is probably sufficient.

I did a quick test adding a IBUFGDS followed by a BUFG. I used this new single-ended clock to clock a couple of testing modules and it appears to be working. Is there any advantage (for my goal of having the ADC data captured) of using a clocking wizard instead of the IBUGDS and the BUFG ?  

Considering that the clock situation is pacified, going back to the ADC data. From the UG you mentioned in your first answer, I get that the first component that each pin CHA_P &CHA _N is a IBUFDS. That will "transform" it to a single ended pin, let's say CHA_DDR. Then looks like I should input this CHA_DDR to an IDDR primitive and it will be again transformed to a differential pair. Then I should use the new diff pairs in the module I described in my first question (?? I'm not sure about this, looks like I'm running in circles. The clock for IDDR of course must be the same one that comes from the ADC. Is my understanding correct?

Yes, this is a reasonable approach. However, the drawing you send me has the clock and data "center aligned", not "edge aligned", so you should be using the right template. This is good, though, since an edge aligned interface generally will not meet timing without a BUFG (or an IDELAY) to significantly modify the clock data relationship - even at slow clock rates.

Thanks for pointing that. Now the diagram in the template makes much more sense. 

About the IDELAY, I've seen another posts and ref designs using an IDELAY after the IBUFDS. Is this something that I should consider in my project? 

Thank you very much. I really appreciate your help.

0 Kudos
Highlighted
Guide
Guide
709 Views
Registered: ‎01-23-2009

Then looks like I should input this CHA_DDR to an IDDR primitive and it will be again transformed to a differential pair.

No. The IDDR is an "Input Double Data Rate register" - it has a single D input, which it samples twice per clock; once on the rising edge of the clock, which puts out Q1 and once on the falling edge of the clock, which puts out Q2. There are various different modes of the IDDR, but in "Same Edge Pipelined" mode, this results in two bits (one on Q1 and one on Q2) which are updated on the rising edge of the clock where Q1 is the value sampled on the previous rising edge of the clock and Q2 is the value sampled on the falling edge before this rising edge. 

Specifically for your interface, on the rising edge of the clock where the even bits of Sample N+1 are on the interface, you would get 14 bits from the Q1 and Q2 outputs of the 7 IDDRs you have (connected to the 7 outputs of the 7 IBUFDS, which each take in a differential pair). The Q1 outputs would represent the 7 even bits of Sample N and the Q2 would be the 7 odd bits of Sample N.

The clock for IDDR of course must be the same one that comes from the ADC. Is my understanding correct?

Yes. The differential clock would come in through an IBUFDS (or IBUFGDS), would go to a BUFG, which would drive the C pins of all the IDDRs (on both edges of the clock) as well as the rest of your logic (using only the rising edge of the clock). Your logic which is running on the rising edge of the clock would then get a complete sample every clock from the interleaved bits of the Q1 and Q2 outputs of the 7 IDDRs.

About the IDELAY, I've seen another posts and ref designs using an IDELAY after the IBUFDS. Is this something that I should consider in my project? 

Refer back to my post on input capture styles. This may be necessary. If the constraints aren't met "naturally" by the relationship of the clock and data on the pins, then you may need IDELAYs (on either the clock or the data) to change the clock data relationship internally to meet timing. However, if this is really a center aligned interface at 20MHz, timing should be really easy (unless the clock to output uncertainty of the ADC is really bad), so the center aligned interface should end up with tons of margins on both the setup check and the hold check; IDELAYs should be unecessary.

Avrum

Highlighted
Observer
Observer
662 Views
Registered: ‎09-12-2019

Hi avrumw, thanks again for your help.

Specifically for your interface, on the rising edge of the clock where the even bits of Sample N+1 are on the interface, you would get 14 bits from the Q1 and Q2 outputs of the 7 IDDRs you have (connected to the 7 outputs of the 7 IBUFDS, which each take in a differential pair). The Q1 outputs would represent the 7 even bits of Sample N and the Q2 would be the 7 odd bits of Sample N.

Thanks for the clear explanation. That's exactly the construction I have right now. I have a digital ramp as a test bench and in the simulation I'm able to collect it in the output of the IDDR after the positive edge of each clock cycle. Here is the code:

generate // input DDR buffers
for(i=0; i < 7; i = i + 1)
begin
IBUFDS #(
.DIFF_TERM("FALSE"), //Differential Termination
.IBUF_LOW_PWR("FALSE"), //Low power="TRUE" , Highestperformance="FALSE"
.IOSTANDARD("LVDS_25"))
IBUFDS_inst (
.O(CH_A_DDR_BUFF[i]), // 1-bit output: Buffer output
.I(CH_A_P[i]), // 1-bit input: Diff_p buffer input (connect directly to top-level port)
.IB(CH_A_N[i]) // 1-bit input: Diff_n buffer input (connect directly to top-level port)
);

IDDR #(
.DDR_CLK_EDGE("SAME_EDGE_PIPELINED"), // "OPPOSITE_EDGE", "SAME_EDGE"
// or "SAME_EDGE_PIPELINED"
.INIT_Q1(1'b0), // Initial value of Q1: 1'b0 or 1'b1
.INIT_Q2(1'b0), // Initial value of Q2: 1'b0 or 1'b1
.SRTYPE("SYNC") // Set/Reset type: "SYNC" or "ASYNC"
) CH_A_IDDR_inst (
.Q1(CH_A_14[2*i]), // 1-bit output for positive edge of clock
.Q2(CH_A_14[2*i + 1]), // 1-bit output for negative edge of clock
// .Q1(CH_A_DDR_Q1[i]),
// .Q2(CH_A_DDR_Q2[i]),
.C(CLK_AB), // 1-bit clock input
.CE(ENA), // 1-bit clock enable input
.D(CH_A_DDR_BUFF[i]), // 1-bit DDR data input
.R(R), // 1-bit reset
.S(S) // 1-bit set
);

end //for
endgenerate

So I ran the synth/imp with an ILA probe in the "CH_A_14". I set the "test pattern" of the ADC to output the same ramp. When I capture the output of the ILA, all I see in the CHA_A_15 are zeros with some random ones in the middle. Even if I set the "test pattern" of the ADC to be "all ones". Am I conceptually wrong in this test? Or do you think the ADC is not working correctly? 

About the timing of the inputs. I'm using the template attached. The datasheet of the ADC says that for this sampling rate, the set up and hold time are both 2ns. So, as the graph suggests, I'm setting all the "dv_b" of the template to be 22.214. Which is half cycle minus 2 (24.214 - 2). The set_input delay generated by the template are:

set_input_delay -clock CLK_AB_P -max 0.000 [get_ports [get_ports CH_A_*]]
set_input_delay -clock CLK_AB_P -min 22.414 [get_ports [get_ports CH_A_*]]
set_input_delay -clock CLK_AB_P -clock_fall -max -add_delay 0.000 [get_ports [get_ports CH_A_*]]
set_input_delay -clock CLK_AB_P -clock_fall -min -add_delay 22.414 [get_ports [get_ports CH_A_*]] 

That's different from the post you refer but in that case it was an edge aligned data. In addition, I don't have any clock constraints, except for the creation of the clock with its respective period.

Sorry for the long post with more questions, I really need to make this work. Thank you again.

Screen Shot 2020-04-05 at 7.58.15 PM.png
0 Kudos
Highlighted
Guide
Guide
618 Views
Registered: ‎01-23-2009

Those constraints don't look right. Looking at the datasheet and using the template, all 4 timing parameters (dv_*) should all be 2.0ns. You should literally use the template as your XDC constraints - cut the whole template and paste it into your xdc file with all the values changed

set input_clock "CH_AB_P"
set input_clock_period 24.212
set dv_bre 2.0
set dv_are 2.0
set dv_bfe 2.0
set dv_afe 2.0
set input_ports [get_ports CH_A_*]

set_input_delay -clock $input_clock -max [expr $input_clock_period/2 - $dv_bfe] $input_ports;
set_input_delay -clock $input_clock -min $dv_are                                                  $input_ports;
set_input_delay -clock $input_clock -max [expr $input_clock_period/2 - $dv_bfe] $input_ports -clock_fall -add_delay;
set_input_delay -clock $input_clock -min $dv_are                                                  $input_ports -clock_fall -add_delay;

That being said, I have two observations.

This interface is RIDICULOUSLY restricted. The bit period is 24.4ns, and out of that only 4ns carry valid data; that's over 20ns of uncertainty! Given that, this interface got a whole lot more complicated. With just BUFG clocking the device probably needs more than 2.0ns of setup time. The Zynq datasheet doesn't give pin-to-pin timing parameters, but the similar Kintex-7 325T-2 (DS182 Table 48) requires 2.94ns of setup (0.94ns more than the ADC gives). The total width of the window required is 2.88ns, which is smaller than the 4.0ns (setup+hold) provided by the ADC, so this is probably possible to capture even with BUFG clocking, but will require an IDELAY - most likely one on the clock to move the FPGAs setup/hold window a bit later.

Finally, while your constraints are not correct, and even if they were correct, they would fail timing without an IDELAY (or maybe an MMCM), this would probably not account for what you are seeing in the ILA. You might see some incorrect bits from time to time, you should still be able to recognize the test patterns... So what you are seeing in the ILA is "something else".

Avrum 

Highlighted
Observer
Observer
563 Views
Registered: ‎09-12-2019

In my first implementation I was using 2 for all dv_* and yes, timing was failing. Since I'm using a low rate sample I though that constraint was wrong, that's way I had a different interpretation in the last post. But as you said, this is ridiculous. Anyways...

So, according to the template, "input_clock_period" is the full period so it should be 48.212, right? The template divides it by 2 when setting the delays.

Right now, to do a quick test, I forced the implementation to not fail by setting the dv_* to 4.1. The output ramp is PERFECT in the ILA. Then, I decide to use a clock wizard in the clock input from the ADC to try to meet the timing requirements. For a clock wizard with MMCM, the place fails (the error is: [place 30-575] sub-optimal placement for a clock-capable IO pin and MMCM pair). So I went ahead and change the wizard to use PLL instead of the MMCM. With the clocking wizard using PLL, the timing analysis still failing (by a tiny amount compared to the design with only IBUFGDS and BUFG), so I "improved" the constraints by setting dv_bre and dv_bfe to 3, keeping dv_are and dv_afe 2.  However, the ramp in the ILA has errors now. They're not big in magnitude but there are A LOT of little variations. They even look periodic in some sense.

I assume this is due to the phase difference between the data/clock and the uncertainty you mentioned in your last reply. So I started looking for the IDELAY as a way to sync the signals... Is there any documentation about delay primitives in general? The UG471 just describe the primitive assuming that you know everything about delays and you're ready to plug your values... for example, I need to know what a "tap" is, which clocks should I use, why there's an output counter, if a fixed delay is enough, how to convert a delay in ns to the "IDELAY_VALUE", not to mention that the IDELAYCTRL is required...

Another option that I have is to remove the PLL, use only the BUFG and accept that my timing constrains are not correct... But that off course can bring errors when I input the real analog signal with the generator...

Thanks very much. Your help is being crucial. 

0 Kudos
Highlighted
Guide
Guide
544 Views
Registered: ‎01-23-2009

So, according to the template, "input_clock_period" is the full period so it should be 48.212, right? The template divides it by 2 when setting the delays.

Yes. My mistake. If the frequency is 20.48MHz, the period would be 48.83ns. The template will divide this by 2 for you.

Right now, to do a quick test, I forced the implementation to not fail by setting the dv_* to 4.1. The output ramp is PERFECT in the ILA.

This does not make sense.

If the interface is implemented correctly, then all the resources involved in the capture are "fixed"; the clock pins are (presumably!) on a clock capable pair, the BUFGs are all almost identical and (half of them) are reachable from the clock capable pin via a fixed route. The clock distribution trees (outputs of the BUFG) are fixed and are nearly identical regardless of which BUFG is used. The IDDR are fixed in the IOBs. As a result (and as opposed to other internal paths that use resources that can change from run to run), the timing of this interface is fixed, and hence will not be affected by the constraints. The only thing constraints do for this kind of an interface is tell you if it is expected to work or not (and in your case, with the correct constraints and the simple BUFG capture with no IDELAY, the answer is "not").

Changing the constraints won't affect this behavior - it will merely change the report on the interface. The interface should behave identically regardless if you have entered constraints that are correct (and fail) constraints that are incorrect (and pass or fail) or even no constraints (other than the create_clock on the clock capable inputs, which constrain the internal paths as well as the interface paths).

So what you are reporting shouldn't happen. If it does, you have "something else" wrong - either an incorrectly constrained internal path, or a bad clock domain crossing are the two most common reasons for designs that sometimes work and sometimes don't.

Avrum

0 Kudos
Highlighted
Observer
Observer
520 Views
Registered: ‎09-12-2019

 incorrectly constrained internal path, or a bad clock domain crossing are the two most common reasons for designs that sometimes work and sometimes don't.

I don't have any other signal entering the clock domain of the ADC, just the diff input from the A and B channels. The other constraints I have are for the SPI interface, which I run using an internal FPGA (wizard with MMCM). I marked the clocks to be physically exclusive. Clock pins P and N from the FMC are in pins AE13 and AF13 respectively, I could not find in the UG954 if they're clock capable. 

the timing of this interface is fixed, and hence will not be affected by the constraints.

I understand. It's just a report then... I does not affect the way the tool do its work. Since the ramp output is exact even though timing is failing, the ADC su and hold time are not correct in the datasheet and they are actually bigger than 2ns? This makes me infer that if I use an IDELAY to delay de outputs and make the implementation pass the timing analysis, the output of the ramp will likely be incorrect...

When I use the clk wizard with PLL, with the correct constraints, the time analysis during synthesis fails, but it does not fail in the implementation step. I though that was not possible. I though you should always make sure your timing analysis is correct during synth before advance to implementation. Anyways, the design with a PLL  pass timing analysis for implementation, but gives wrong output results. To summarize:

With only IBUFGDS(or IBUFDS) & BUFG, synthesis pass time analysis, implementation fails, but output is as expected.

WIth Clock Wizard with PLL, fail synthesis time analysis, pass implementation but output is wrong. 

Thank you again.

 

0 Kudos