05-10-2012 08:03 PM
I inherited a pretty big FPGA modem project from a contractor, I am in the process of porting it to our new board and figured this would be an opportunity to think about improvements. This project features an AD9627 Dual ADC connected via two 12-bit parallel, single-ended, SDR buses to a Spartan-6 LX45 (an interface as simple as interface logic can be, according to some senior voices in this forum).
The FPGA supplies an LVDS clock to the ADCs, and each ADC returns the data+accompanying strobe at the same rate, delayed by a somewhat known amount of time (with a ~3ns margin between min and max values)
A whole FPGA edge is dedicated to the ADCs, and for the sake of flexibility I have made sure each ADC gets a BUFIO2 zone for itself, with the strobes connected to matching clock inputs. For the moment we are only using a single channel of the ADC chip.
The code we have now already works, I am just looking for knowledgeable opinions on the way this kind of interface is typically implemented, if the current implementation could be changed to better suit the features of the Spartan-6 FPGA, or if there's anything glaringly wrong with it.
Just to give you an idea of my skill level, I have roughly one year of professional experience and I spent it writing, debugging and modifying VHDL code for Ethernet, DDR2, and various slow serial interfaces on Xilinx FPGAs with very little expert feedback.
I have read a certain amount of Xilinx datasheets and have good knowledge of the Spartan-6 I/O and clocking architectures although I have never used SelectIO advanced functions such as SERDES or IODELAY. My knowledge of timing constraints is also lacking.
So here goes :
On the original, contractor hardware+VHDL, the main oscillator drives a DCM that synthesises a fixed 150MHz clock sent to the ADC using ODDR2.
The return data is latched in the fabric using its associated strobe (made global) and is then downsampled/resynchronised by a (fabric) NCO-generated demodulator clock.
At some point the demodulator NCO clock was driving the ADC, and the return strobe was driving the rest of the demodulator logic (with no PLL on the return path). It was in my opinion a more elegant solution, but the developer was experiencing ADC glitches at certain sampling frequencies.
Pros : It works well enough, frequency can be changed dynamically.
Cons : Multiple clock domains (1 for ADC CLK, 1 for ADC Strobe and 1 for demodulator).
The simplest improvement for me to bring would probably be to use a PLL to generate the fixed 150MHz ADC sampling clock., as PLLs offer superior jitter specifications, or even drive the ADC straight from the crystal. From there I can see two choices apart from leaving the return path the way it is :
1) Try to delay the return signal using IODELAY and a fixed value, so that return data can be sampled using the same 150MHz clock that was sent to the ADC. This would essentially make everything fall into the same clock domain, but I would have to know the precise delay value between the ADC clock and the return data, or trust that the "typical" value from the ADC datasheet is indeed what we have (+ signal propagation delays).
The perfect solution would be to use the IODELAY on the strobe input path to compute the delay between the strobe and the ADC clock and cascade it down to the other IODELAY blocks on the bus, but I have seen no mention of that feature anywhere. Calibration would have been done when no signal is expected (destructive calibration as every signal is single-ended).
2) Insert an additional PLL on the return path, driven by the strobe through a BUFIO2. This PLL would in turn run the demodulator logic and allow any subsequent downsampling by the demodulator to be synchronous.
I am of course envisioning going back to sending a variable sampling frequency to the ADC, but that would require more work. In this case would a reconfigurable PLL be more desirable than a fabric NCO?
What are your opinions on this system? How are things typically done using this hardware? How would you drive signal processing logic using this interface?
Any opinions are greatly appreciated. Thank you very much!
05-12-2012 11:28 PM
I normally use a clock generator/PLL external to the FPGA (an AD9517 in my most recent design) to generate the ADC sample clock, since the jitter you get from the DCMs and PLLs in the FPGA is orders of magnitude higher than discrete clock generators can provide. This may or may not be important for your design - there's lots of documentation covering the effects of jitter on ADCs. The external clock generator also makes it easy to change speeds on the fly.
Anyhow, one clock generator output per ADC, plus one output going to a global clock input on the FPGA. The DCO from each ADC then goes to BUFIOs to capture the ADC data lines. It's likely you won't need to do anything beyond possibly inverting the BUFIO clock to meet input sample/hold timing at 150MHz SDR. The BUFIOs then clock BUFRs which in turn clock FIFOs, which are used to safely go from each individual ADC's clock domain to the global clock domain (driven by a BUFG from the clock generator). Data processing is then done in the global clock domain.
This clocking setup has been used by me successfully in several designs, I hope it helps.
05-13-2012 10:45 AM - edited 05-13-2012 11:25 AM
I read the ADC datasheet, and there seems to be some crucial signal timing information missing.
On page 12, there are specifications for Ts and Th -- setup and hold time of the Data outputs with respect to the DCO output clock. But these specifications are listed as typical, not the required minimum and maximum specifications which are necessary to be confident of adequate timing margins.
Furthermore, if you add the setup and hold time numbers, the sum is 6.66nS. This is the clock period of 150MHz. It seems there is absolutely no skew whatsoever on the 12 data outputs with respect to the DCO output. Do you believe this to be realistic?
If you are building a few prototypes, using typical datasheet timing specs is probably OK. If you are designing for production, using typical numbers is not OK.
Spartan-6 FPGAs have IODELAY2 blocks which will align input data to input clock at the input registers. All you need to provide are the timing constraints, and ISE will 'do the right thing' -- if timing contraints permit. For now, just use the AD9627 typical timing numbers, and see what happens.
In the following example:
module s6_forum_top (adc_clock150m, adc_data, inclock, out_data);
input [11:0] adc_data;
output [11:0] out_data;
(* IOB="TRUE" *)
reg [11:0] adc_inreg; // input reg using clock from ADC
reg [11:0] sync_data=0; // fabric reg using clock from ADC
(* IOB="TRUE" *)
reg [11:0] out_data; // output register using internal clock
always @(posedge adc_clock150m)
adc_inreg <= adc_data; // input register using ADC output clock
sync_data <= adc_inreg; // fabric register using ADC output clock, requires BUFG
always @(posedge inclock)
out_data <= sync_data; // output register using internal clock
timing contraints (note, I have reduced 'data valid time' from 6.66nS to 4nS)
NET "inclock" TNM_NET = "inclock";
TIMESPEC "TS_inclock" = PERIOD "inclock" 6.67 ns HIGH 50%;
NET "adc_clock150m" TNM_NET = "adc_clock150m";
TIMESPEC "TS_adc_clock150m" = PERIOD "adc_clock150m" 6.67 ns HIGH 50%;
NET "adc_data*" OFFSET = IN 3.83ns VALID 4ns BEFORE "adc_clock150m"; #typ timing numbers
The purpose of this design is to illustrate how simple a source-synchronous parallel input register (plus clock) can be, using the Xilinx tools. If you want to avoid the multiple clock domains problem in your post, the only solution which comes to mind is the following:
Without the dynamic timing adjustment in the IDELAY2 blocks, you might yet be able to avoid the problem of multiple clock domains, but the risks increase dramatically. If you fix a certain delay in the IDELAY2 blocks to align input data to the system clock, you are relying on a narrow range of delay variation over process, temperature, and voltage -- both in the FPGA and in the ADCs. You can probably get away with this on a single board, with some hand-tuning, but this is probably not a safe and practical approach for replication in production.
The ADC tPD spec (clock to data output) is 2.2 - 6.4 nS, a range of (at least) 4.2nS over which the data timing can vary with respect to the system clock. This skew range exceeds the reach of simple clock domain crossing techniques -- you will likely need a full-blown FIFO for clock domain crossing.
-- Bob Elkind