05-24-2012 05:11 AM
I am designing a data receiving component of a larger system.
We have what I would call a quasi-source synchroneous transmission:
Data is arriving at 160Mbps however the clock is only 40MHz. So I have 4 bits arriving during a single 25ns cycle - let's call them bit 0,1,2,3. Bit 0 is edge aligned with the rising edge of the 40MHz clock. The plan is to generate a 160MHz clock from the 40MHz clock input using an MMCM and then use this generated 160MHz clock in single data rate scheme to capture the 160Mbps data.
Since the data is edge-aligned to the 40MHz clock I configure the MMCM so that the 160MHz clock is phase-shifted by 180 degrees to put the rising edge in the center of the data transmission.
My question is how to code up the constraints for this scheme? How do I tell the tools that data is arriving at 160Mbps and what are the validity periods of the data?
I tried applying multiple OFFSET=IN constraints like this:
#these are the data inputs at 160Mbps: INST "P<0><0>" TNM = sdr_c_0; INST "P<0><1>" TNM = sdr_c_0; INST "P<0><2>" TNM = sdr_c_0; INST "P<0><3>" TNM = sdr_c_0; INST "P<0><4>" TNM = sdr_c_0; #... #P<0><24> is the clock line at 40MHz TIMEGRP "sdr_c_0" OFFSET = IN 24.0 ns VALID 4.0 ns BEFORE "P<0><24>" RISING; TIMEGRP "sdr_c_0" OFFSET = IN 17.75 ns VALID 4.0 ns BEFORE "P<0><24>" RISING; TIMEGRP "sdr_c_0" OFFSET = IN 11.5 ns VALID 4.0 ns BEFORE "P<0><24>" RISING; TIMEGRP "sdr_c_0" OFFSET = IN 5.25 ns VALID 4.0 ns BEFORE "P<0><24>" RISING;
However translate complains that the first constraint is overriden by the second, second by the third... so only the 4th is taken into account.
I have also tried to put an OFFSET=IN constraint using the MMCM generated clock
TIMEGRP "sdr_c_0" OFFSET = IN 3.125 ns VALID 6.25 ns BEFORE "CMX_input_inst/BCLK160PLL<0>" RISING;
but that is also ignored and the warning says:
"ConstraintSystem:168 - Constraint <TIMEGRP "sdr_c_0" OFFSET = IN 3.125
ns VALID 6.25 ns BEFORE "CMX_input_inst/BCLK160PLL<0>" RISING;>
[sources/CMX_top.ucf(1792)]: This constraint will be ignored because
NET "CMX_input_inst/BCLK160PLL<0>" could not be found or was not
connected to a PAD."
- which is correct the network is not an external input...
Any help on how to formulate this constraint is greatly appreciated.
05-24-2012 05:21 AM
What I would do is constrain only the data window nearest the input clock edge. If you believe
that a 4x frequency multiplier works correctly, then the other three data windows should line
up as well.
TIMEGRP "sdr_c_0" OFFSET = IN 0 ns VALID 4.0 ns BEFORE "P<0><24>" RISING;
Note that this means that the valid period starts right at the rising edge and lasts
for 4 ns. If it actually starts 1 ns after the edge write:
TIMEGRP "sdr_c_0" OFFSET = IN -1 ns VALID 4.0 ns BEFORE "P<0><24>" RISING;
In the end what you really want is to make sure that all of the input registers are in the
IOB (this would be the case if you use DDR registers or the ISERDES). Then with
the proper clock phase, there is realy nothing that place & route can do to change
the timing and everything should just work. The constraint then just becomes a
way to check that the build meets the timing requirements, not a way to enforce it.
By the way, if you use DDR input registers you don't need to multiply the clock
by 4, only by 2. If you don't need 160 MHz for some other reason this could reduce
the power of the overall design.
05-24-2012 08:00 AM
Thank you for your suggestion - I am still a bit confused though.
In the scheme you are suggesting how is the 2nd 3rd and 4th 'tick' of the 160MHz clock treated? (where I labeled as the '1st' the rising edge right after the rising edge of the 40MHz clock). Won't the tools try to place components (and/or check) such that data is valid on input FF on all rising edges of the 160MHz clock? (clearly this is not possible to satisfy if I specify only one valid window).
About the placement of input registers in the IOB - is this necessary? Currently I am not using IDDR or ISERDES. I simply have a process clocked by the 160MHz clock whose inputs are connected to the IODELAY outputs. I would think that the path delay from the IODELAY to the FF could actually help to satisfy the constraint - since the FF can be placed 'mid way' between the IODELAY and MMCM. Am I misguided here?
If you think it is needed I could select "-pr" option for Map "Pack I/O Registers/Latches into IOBs" do it?
Thank you for your help,
05-24-2012 01:14 PM
Gabor is probably working on the assumption that there will be one and only one flip-flop at 160MHz sampling the data input (which is the "right" way to do this).
Since there is only one FF involved, the data relationship between the clock edges and data windows in the 2nd, 3rd and 4th window will be the same as the first - there is one clock path and one data path to the sampling flip-flop. So constraining the first window should be sufficient. This, of course, assumes that each of the 4 data eyes are identical (are all exactly 1/4 of the 40MHz clock period). One UCF constraint that specifies the timing relationship at the pins between the clock pin and the data pin will define the characteristics of (only) the first data eye.
As for using the ISERDES or IOB FF instead of a slice FF, your point is valid (not forcing the IOB into the FF can help your internal timing path), but generally it is the I/O timing that is the most critical - you can always fix an internal path by additional pipelining. The nice thing about using the IOB FF is that once you create your sampling mechanism to sample in the middle of the data eye (either using phase shifting of the MMCM or the IDELAY on the data), the clock/data relationship will be guaranteed by the architecture of the FPGA - every PAR run will have identical timing results for the interface. Using an internal FF, the timing will vary run-to-run, so you will have to make sure your constraints are exactly correct, taking in to account all possible factors (jitter, inter-symbol interference, edge rates, etc...)
05-27-2012 02:14 AM
So I tried this:
- I add FF which samples the input port exactly once per cycle (of the 160MHz clock) (no clock enable)
-I enabled the switch of Map: "Pack I/O Registers/Latches into IOBs" -> "For Inputs Only" (I looked in the FPGA editor laterand I see that these FFs are indeed placed in "ILOGIC" sites)
-I applied constraint as suggested by Gabor (on the first eye only)
-I also place MMCMs and BUF(R/G) in most logical locations - so that there are really only few almost equivalent routes PAR can use. (i.e. different clock lines - but each should have almost identical delay). So I am really relying on the constraint to tell me if I have adjusted phase of the 160MHz clock so that I am within the data valid eye for all inputs (and if not by how much).
When I run like this PAR after about a minute of running says:
WARNING:Route:466 - Unusually high hold time violation detected among 562 connections. The top 20 such instances are printed below. The
router will continue and try to fix it
P<6><10>:I -> CMX_input_inst/Inst_CMX_data_delay/iodelgen_chan[6
P<6><15>:I -> CMX_input_inst/Inst_CMX_data_delay/iodelgen_chan[6
P<6><12>:I -> CMX_input_inst/Inst_CMX_data_delay/iodelgen_chan[6
I kept it running overnight >11hrs!!! but then my patience run out...
So It seems that PAR is somehow trying to satisfy mutually exclusive constraints - i.e. validity of data on the IOB on all cycles of the 160MHz clock (which is impossible to satisfy since if data is valid on the 1st cycle it can not be valid on the 2nd)
Do you see any solution?
05-27-2012 02:59 PM
Without seeing more information (ideally the failing path after PAR - look into "smartPreview" to halt PAR even if it hasn't finished fixing the hold times).
However, I don't think the warning messages are coming from the 4 phases of the data.
I am concerned about your statement about placing BUFG/BUFR and MMCM "in the most logical location". The structure of the clock capture mechanism has to be designed to do two things
- capture the data from the incoming pin and
- make this data available to the rest of the internal system
Using an IBUFG->MMCM->BUFG is probably the right way to do this, but since you are doing clock multiplication, you will need to use two BUFGs
- one BUFG on CLKFBOUT back to CLKFBIN (to provide the deskewing phase for the MMCM in order to be able to control the clock data eye) - this will be a 40 MHz clock (since your input clock is 40 MHz)
- one BUFG on CLKOUT0 (which is the 160MHz clock) - this clock will be used to capture the data at the IOB FFs
The phase of the 160MHz clock will need to be adjusted to be consistent with what your board provides, which needs to also be consistent with the constraints - if the data is "edge aligned" (the first window starts at the rising edge of the 40MHz clock), then Gabor's constraints are right - however, you will need to adjust the phase of the CLKOUT0 to capture the data. In this configuration. The amount of phase shift is knowable, though, since, in this configuration, you need to satisfy the datasheet constraints of Tpsmmcmcc and Tphmmcmcc (the setup and hold time when using an MMCM for your clock and clock capable inputs). The actual values depend on your device, but you will need to shift your clock forward by at least enough to satisfy Tpsmmcmcc. Using this mechanism, you shouldn't need the IDELAY - the MMCM phase shift can take care of any phase offset you need.
The next question, is what do you do with this data. If all the remaining logic is running on the clock generated by the BUFG on the CLKOUT0 then everything is fine. However, if you move this data to another clock, then you need to either
- ensure this "other" clock is in phase with the capturing clock - generally this means another clock coming from the MMCM with an identical buffer (i.e. a BUFG), with a "reasonable" phase difference (there are combinations that won't work)
- if this is not the case, then you will need to use a clock crossing circuit to transfer data from one domain to the other (which will likely require a TIG or FROM:TO constraint to override the normal PERIOD constraints).
Its more likely that your hold failures are coming from missing the incoming data eye (the one that you have defined) or, more likely, in transferring this data to another domain (particularly since the tool is complaining about 562 connections...)