cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
15,597 Views
Registered: ‎11-10-2012

RMII Timing

Hi

 

I have a Spartan6 device with a 200MHz oscillator and a Ethernet PHY LAN8720A.

The interface to the PHY is driven by a clock signal generated by the FPGA.

C5 Details - Bus Tracing.png

 

To get the REF_CLK out to the PHY, I use the clock2pin, which is this one: 

 

entity clock2pin is
  port
  (
  CLOCK_IN: in  std_logic; 
  CLOCK_OUT: out  std_logic
  );

end entity clock2pin;

architecture IMP of clock2pin is

	signal CLOCK_INn : std_logic;
begin
	CLOCK_INn <= '0' when CLOCK_IN = '1' else '1';
   ODDR2_inst : ODDR2
	port map (
      Q => CLOCK_OUT, -- 1-bit output data
      C0 => CLOCK_IN, -- 1-bit clock input
      C1 => CLOCK_INn, -- 1-bit clock input
      CE => '1',  -- 1-bit clock enable input
      D0 => '1',   -- 1-bit data input (associated with C0)
      D1 => '0',   -- 1-bit data input (associated with C1)
      R => '0',    -- 1-bit reset input
      S => '0'     -- 1-bit set input
   );

end IMP;

For the internal stuff I use a 50MHz at 0° phase, for the REF_CLK going out, 270° phase is working "most of the time".

But sometimes I change the design and it fails to meet timing, and sometimes the ethernet does not work.

 

Datasheet of PHY LAN8720A

 

How can I do timing constraints for this scenario?

 

On page 72 or the DS, the timing constraints for the device are listed.

 

Thanks for answers

 

Frank

 

 

 

 

 

 

 

0 Kudos
12 Replies
Highlighted
Guide
Guide
15,577 Views
Registered: ‎01-23-2009

Trying to accommodate this RMII Phy using a clock forwarded clock is going to be somewhat difficult... Generally these kinds of PHYs are more easily done using a system synchronous approach.

 

As with any interface, there are two parts of the interface to consider - the transmit (from FPGA->PHY) and receive (from PHY -> FPGA).

 

Using a clock forwarded interface, the FPGA -> PHY direction is trivally simply. With a clock coming from an ODDR and data coming from an IOB FF, there is virtually no skew at the outputs of the FPGA (maybe a few 100ps). Since the PHY requires only a 4ns setup and 1.5ns hold for a total window of 6ns in the 20ns period. Of course, this 6ns is in exactly the "wrong" place for using the same internal FPGA clock for clock and data, but using different outputs of the DCM (as you described) allows you to generate two clocks with a known phase shift (say the 270degree shift you suggested) with high precision (the phase error is +/-350ps at this frequency). This means that you have more than 13ns of margin on the transmit side of the interface.

 

However, the receive side is a big problem. We need to get the clock out of the FPGA to the PHY and then get the data back, and then capture the data. In theory, the total path is from output FF to of the clock to the input FF of the data, but the data for that analysis isn't easily available,  will use the published data for a conventional clock input deskewed internally (Tickofdcm and Tpsdcm/Tphdcm from the Spartan-6 datasheet) as that is the most accurate data we have (without designing the system and looking at the trace results) - this is a little pessimistic, but still indicative of the overall problem.

 

Lets assume a 50MHz clock arrives at the input of the FPGA. Lets call the rising edge of this "time 0". I know this is not what you have in your system, but it allows me to use something as a reference (and the analysis will still be somewhat accurate)

 

Based on this, the forwarded output clock (REF_CLK) generated by your clock2pin would exit the FPGA at Tickofdcm (from the Spartan-6 datasheet). You didn't specify the actual device or speed grade, but the -2 parts seem fall in the range of 4.5-6ns - so lets assume 5ns. We don't have a direct number for the min, but its probably reasonable to assume its somewhere arount 2.5ns - I am using the 3:1 rule of thumb for min-max (due to process temperature and voltage variation - PVT) of a silicon device, but not all of Tickofdcm is PVT dependent, so 2.5 is probably reasonable.

 

Now it needs to get to the PHY. There will be some board delay, including some PVT variation - the variation is pretty small on a board, so lets say 0.4-0.5ns (assuming a short trace).

 

Then the PHY generates its data. From the datasheet you provided, the clock-Q time is 3ns - 14ns (Tohold - Toval)

 

Then the board back is another 0.4 - 0.5ns
.

 

Lets sum this up

 

2.5   -  5.0

0.4   -  0.5

3.0  -   14.0

0.4  -  0.5

--------------

6.3 -   20ns

 

Thus, referenced to this (fictitious) clock arriving at time 0, the data is available between 20ns and 26.3ns after the rising edge of this clock.

 

In order to accurately sample data in an IOB, you need to satisfy the SU/H requirement of the FF. Using Tpsdcm/Tphdcm (again, this varies from part to part), lets say the device needs 2ns setup and 0ns of hold (which is an average for S6 devices in -2 speed grade). Thus we need to find a clock has this SU/H in the data window described above (20ns to 26.3.ns) - if you use a different phase of the DCM, you incur the +/-350ps phase error, so the window narrows to 20.35 to 25.95. Taking out the 2ns required SU/H requirement of the FPGA, there is 3.6ns of margin on this side of the interface (assuming you can find the "perfect" clock to sample this.

 

This shows the problem in using a forwarded clock for this interface. The TX side has 13ns of margin and the RX has 3.6ns. Using a more conventional "system synchronous" interface (where both the FPGA and PHY are clocked on the same external clock), the margins on both sides are better balanced.

 

That being said, you should still be (BARELY able to capture this interface). The clocking structure you suggest will capture the data 1/4 phase after the generation of the clock, which is at 25ns. This satisfies the 2ns setup time (the data arrives at 20.35), and just BARELY satisfis the 0ns hold requirement (with 0.95ns of margin).

 

Now, this analysis is not entirely accurate - in some ways its a bit pessimistic (using Tpsdcm/Tphdcm/Tickofdcm double counts any innacuracy in the clock trees, which really cancel out), but it doesn't take into account jitter (which is pretty significant - the DCM adds +-150ps, which takes 300ps more out of the margins on each side). Taking all this into account, the hold margin is VERY tight - this may be what is causing your system to be intermittent.

 

So the next questions:

  - how are you generating the 50MHz clocks? Are you going through two DCMs in series? If so, what feedback are you using to each DCM

      - regardless, this will add additional jitter

  - are you sure you are using IOB FFs for the inputs?

  - is the board already designed? If not, there may be better clock topologies that better distribute the margins

 

Avrum

Highlighted
Anonymous
Not applicable
15,378 Views

Hi,

 

for my project I have to do similar calculations. But here is something what I don't understand. How did you get this:


@avrumw wrote:

Thus, referenced to this (fictitious) clock arriving at time 0, the data is available between 20ns and 26.3ns after the rising edge of this clock.

 


from this: 6.3 -   20ns ? Could you clarify this for me?

 

0 Kudos
Highlighted
Guide
Guide
15,362 Views
Registered: ‎01-23-2009

Lets assume that there is an edge of this fictitious clock at time 0ns. Since it is a 50MHz clock, the next one will be at 20ns.

 

If the total delay from the rising edge of the clock is from 6.3 to 20ns, then after the first clock edge (at time 0), we are not guaranteed to have stable data until 20ns later (the longest possible clock->Q of the whole system) - the slowest possible combination of all components in the path will result in the data becoming valid at 20ns.

 

However, the next clock edge also happens at time 20ns (coincidentally). Since the propagation for the fastest possible system can be as short as 6.3ns, then the data will start to change from the old value to the new value 6.3ns later - hence at 26.3ns.

 

So, across all combinations of process temperature and voltate, we are only "guaranteed" to have stable data from 20ns to 26.ns after each rising edge of the clock (thus between 0ns and 6.3ns after each rising edge of the clock).

 

Avrum

0 Kudos
Highlighted
Anonymous
Not applicable
15,341 Views

Thanks a lot.

What can I do in situation when I don't have clock edge (both positive and negative) in that narrow time window when I should capture incoming data? Is there any constraint which has impact on relation between clock and data? I always wonder when OFFSET is useful. Is it for this purpose?

0 Kudos
Highlighted
Historian
Historian
15,328 Views
Registered: ‎02-25-2008


@Anonymous wrote:

Thanks a lot.

What can I do in situation when I don't have clock edge (both positive and negative) in that narrow time window when I should capture incoming data? Is there any constraint which has impact on relation between clock and data? I always wonder when OFFSET is useful. Is it for this purpose?


If you don't have a clock edge during the data-valid window, you have to use the various delay mechanisms available in the FPGA to make it work.

 

The OFFSET is the constraint you want to use.

----------------------------Yes, I do this for a living.
0 Kudos
Highlighted
14,878 Views
Registered: ‎11-10-2012

  - how are you generating the 50MHz clocks? Are you going through two DCMs in series? If so, what feedback are you using to each DCM

 

I use the platform studio 14.4, the clock_generator

BEGIN clock_generator
PARAMETER INSTANCE = clock_generator_0
PARAMETER HW_VER = 4.03.a

PARAMETER C_CLKIN_FREQ = 200000000
PARAMETER C_CLKOUT2_FREQ = 100000000
PARAMETER C_CLKOUT2_GROUP = PLL0
PARAMETER C_CLKOUT3_FREQ = 12000000
PARAMETER C_CLKOUT3_GROUP = PLL0
PARAMETER C_CLKOUT2_BUF = TRUE
PARAMETER C_CLKOUT2_PHASE = 0
PARAMETER C_CLKOUT3_BUF = TRUE
PARAMETER C_CLKOUT3_PHASE = 0
PARAMETER C_CLKOUT4_BUF = TRUE
PARAMETER C_CLKOUT4_FREQ = 50000000
PARAMETER C_CLKOUT4_GROUP = PLL0
PARAMETER C_CLKOUT4_PHASE = 270
PARAMETER C_CLKOUT2_DUTY_CYCLE = 0.500000
PARAMETER C_CLKOUT3_DUTY_CYCLE = 0.500000
PARAMETER C_CLKOUT4_DUTY_CYCLE = 0.500000
PARAMETER C_CLKOUT5_BUF = TRUE
PARAMETER C_CLKOUT5_DUTY_CYCLE = 0.500000
PARAMETER C_CLKOUT5_FREQ = 50000000
PARAMETER C_CLKOUT5_GROUP = PLL0
PARAMETER C_CLKOUT5_PHASE = 0
PARAMETER C_CLKOUT0_FREQ = 600000000
PARAMETER C_CLKOUT0_GROUP = PLL0
PARAMETER C_CLKOUT0_BUF = FALSE
PARAMETER C_CLKOUT1_FREQ = 600000000
PARAMETER C_CLKOUT1_PHASE = 180
PARAMETER C_CLKOUT1_GROUP = PLL0
PARAMETER C_CLKOUT1_BUF = FALSE
PARAMETER C_CLKOUT0_DUTY_CYCLE = 0.500000
PARAMETER C_CLKOUT0_PHASE = 0
PARAMETER C_CLKOUT1_DUTY_CYCLE = 0.500000
PORT CLKIN = CLK
PORT CLKOUT0 = clk_600_0000MHzPLL0_nobuf
PORT CLKOUT1 = clk_600_0000MHz180PLL0_nobuf
PORT CLKOUT2 = clk_100_0000MHzPLL0
PORT CLKOUT3 = clk_12_0000MHzPLL0
PORT LOCKED = proc_sys_reset_0_Dcm_locked
PORT CLKOUT4 = clk_50_0000MHz270PLL0
PORT CLKOUT5 = clk_50_0000MHzPLL0
END

 

 

  - regardless, this will add additional jitter

  - are you sure you are using IOB FFs for the inputs?

 

I am not sure. For the txd i have the IOB=true in the ucf. Having this for the rxd produced errors.

 

  - is the board already designed? If not, there may be better clock topologies that better distribute the margins

 

yes, the board is ready and running in most cases. I have many different FPGA variations for it. Sometimes I have problem with this communication not working.

 

I can changes the C_CLKOUT4_PHASE parameter in the .mhs.

But I am completely lost how the lines for the constrains would look like?

 

NET LAN_RXD<0> LOC = U23 | IOSTANDARD = "LVCMOS33" | PULLUP;
NET LAN_RXD<1> LOC = U24 | IOSTANDARD = "LVCMOS33" | PULLUP;
NET LAN_CLK LOC = N17 | IOSTANDARD = "LVCMOS33" | USELOWSKEWLINES;
NET LAN_RXER LOC = U25 | IOSTANDARD = "LVCMOS33" | PULLDOWN; # PHY Address sel 0/1
NET LAN_CRS_DV LOC = R26 | IOSTANDARD = "LVCMOS33" | PULLUP;
NET LAN_MDIO LOC = R25 | IOSTANDARD = "LVCMOS33" | PULLUP;
NET LAN_MDC LOC = T23 | IOSTANDARD = "LVCMOS33" ;
NET F_IO_LAN_INT LOC = R24 | IOSTANDARD = "LVCMOS33" ; # F_LAN_INT
NET F_LAN_RST LOC = P26 | IOSTANDARD = "LVCMOS33" ;
NET LAN_TXEN LOC = P22 | IOSTANDARD = "LVCMOS33" | IOB = True;
NET LAN_TXD<0> LOC = R21 | IOSTANDARD = "LVCMOS33" | IOB = True;
NET LAN_TXD<1> LOC = P19 | IOSTANDARD = "LVCMOS33" | IOB = True;
NET F_IO_MAC_ADDR LOC = N18 | IOSTANDARD = "LVCMOS33" ; # MAC_ADDR
NET F_IO_RESET_SWITCH LOC = B3 | IOSTANDARD = "LVCMOS33" ; # RESET_SWITCH

# constraints for mii_to_rmii
NET ethernet_0_PHY_tx_clk USELOWSKEWLINES;
NET ethernet_0_PHY_rx_clk USELOWSKEWLINES;

 

0 Kudos
Highlighted
14,877 Views
Registered: ‎11-10-2012

... and the speed grade is -3C
spartan-6 lx150t

0 Kudos
Highlighted
Teacher
Teacher
14,854 Views
Registered: ‎03-31-2012

@avrumw
Is there really a big difference between FPGA sourcing the 50 MHz clock and getting it from an external oscillator ? What matters is clk to Q of PHY; where the source of that clk is doesn't really matter much; if it is external when the FPGA receives it, there will be some delay. We can always add this delay internally too.
- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Highlighted
Guide
Guide
14,851 Views
Registered: ‎01-23-2009

@muzaffer

 

The only difference has to do with the uncertainties.

 

In a system synchronous approach, there is one "large-ish" PVT dependent component on each path

  - on the write path, it is the clk-Q of the FPGA (mostly the delay of the output driver)

  - on the read path, it is the clk-Q of the RMII (this one is generally the larger of the two).

 

When you use source synchronous, you basically move the two uncertainties to the same path

   - on the write path, the forwarded clock and forwarded data come from the same source, so the PVT uncertainty largely cancels out. This path becomes super easy to meet.

   - on the read path, the both PVT uncertainties add. The clock goes out from the FPGA, incurring the PVT variation of the final output driver, which then drives the clock of the RMII. Then the clock-Q of the RMII gets added to this for the data returning to the FPGA.

 

From the above analysis, this is doable, but the margins are smaller than in a system synchronous approach.

 

The only way to "fix" this in the FPGA is to use the DCM to cancel out the output buffer PVT uncertainty by using external feedback. This will improve the margins to something similar to a system synchronous approach (which it, sort of, is, since the output clock becomes phase aligned with the input clock to the FPGA - essentially a system synchronous clock).

 

Avrum

0 Kudos
Highlighted
Teacher
Teacher
7,154 Views
Registered: ‎03-31-2012

on the read path, the both PVT uncertainties add. The clock goes out from the FPGA, incurring the PVT variation of the final output driver,

 

isn't there a corresponding PVT varying delay on the input of the FPGA? The clock has to come through with an IO receiver which is a largish buffer with its own PVT, right? As far as I can tell the only difference would be whether a clock capable input buffer has less PVT variation than a output driver.

- Please mark the Answer as "Accept as solution" if information provided is helpful.
Give Kudos to a post which you think is helpful and reply oriented.
0 Kudos
Highlighted
Guide
Guide
7,151 Views
Registered: ‎01-23-2009

Actually, no.

 

When a DCM or MMCM is in the path, and is set to "system synchronous" mode (which is the default when only the CLKIN comes from an IBUF), the DCM/MMCM switches in a buffer on the CLKFB path that is architecturally similar to the input buffer on the IBUF, thus cancelling out the PVT variation of the IBUF. This is done because in a system synchronous application, this PVT variation directly affects the width of the input data eye you need for accurate sampling, which is a critical parameter.

 

Also, even if this wasn't the case, input buffers generally have less overall delay than output drivers; an input buffer is around 1ns, whereas an output buffer can be around 3ns, or even significantly larger depending on the drive strength. While not all of this is PVT dependent, since the overall magnitude of the delay is larger, the PVT variable portion of it is as well.

 

Avrum

Highlighted
Contributor
Contributor
4,219 Views
Registered: ‎01-09-2009

Hello Frank,

 

I know that this is an old post regarding RMII Timing!!

 

I would like to ask if you were able to constrain your interface to PHY using ISE? Indeed, i have the same issue. 

 

Regards,

Faraj

0 Kudos