cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Observer
Observer
924 Views
Registered: ‎04-24-2020

Best option for 5 Gbps ethernet between UltraScale+ FPGAs?

Jump to solution

Vivado version: 2020.1

Product family: Kintex UltraScale+

Background:

Our FPGA devices can be connected via a backplane and communicate over ethernet. Today we use SGMII (1 Gbps data rate, 1.25 Gbps line rate) between the devices with the "1G/2.5G Ethernet PCS/PMA or SGMII (16.2)" Xilinx IP. Now we want to increase the throughput to roughly 5 Gbps data rate. Our hardware engineers have verified that the backplane can reliably handle data rate of 5 Gbps even with the ~3% overhead from 46/66b encoding of e.g. 10GBASE-R.

Question:

I want to have a 5 Gbps ethernet link between two Kintex UltraScale+ FPGAs over a backplane medium that can not handle 10 Gbps line rate. Since I control both ends of the link there is a lot of room for custom solutions. I have identified two options:

  1. Use Xilinx IP "Universal Serial XGMII Ethernet Subsystem" on both ends. Hard code link speed at 5 Gbps on both ends.
  2. Use Xilinx IP "10G/25G Ethernet Subsystem" on both ends. Hard code link speed at 10G on both ends. Clock the core with half the expected GT RefClk rate, halving the data rate. (E.g. configure the IP with GT RefClk 312.5 MHz, but the actual clock is 156.25 MHz)

Option 1 (USXGMII) seems to be a reasonable option, and the one recommended here on the forum in related threads that I found ( https://forums.xilinx.com/t5/Ethernet/USXGMII-IP-in-PCS-PMA-mode-only/td-p/1163870 , https://forums.xilinx.com/t5/Ethernet/5G-Ethernet-on-Kintex7-FPGA/m-p/985039 ). However it currently has the drawback that it can not be configured in PCS/PMA only mode ( https://forums.xilinx.com/t5/Ethernet/USXGMII-IP-in-PCS-PMA-mode-only/td-p/1163870 ). Also I am not sure that USXGMII is made for the situation where the physical medium can not handle a line rate of 10G, and individual words at 10G might get corrupted.

The specification document EDCS-1150953 page 18 ( https://developer.cisco.com/site/usgmii-usxgmii ) reads:
> SOP word (4-bytes) is transmitted only once; in remainder of the 1 (2.5G), 4(1G) or 49
(100M) 4-byte words are replaced by 0xAA.

In my case, what if this single word gets messed up? The payload is replicated, as to not have a too high data rate, but this control word is not?

Option 2 (underclocking a 10G interface) seems a little sketchy. I am not sure that it is even technically possible? Setup/hold requirements are probably fine being clocked with a slower clock than what is specified during place/route. But what about for example transceiver PLL configuration? Will it work when fed with a slower clock? Maybe there are very specific constraints generated by the PCS/PMA core that depend on the clock speed?

I would appreciate any input on this. Thank you in advance.

 


Best regards

Lukas

0 Kudos
Reply
1 Solution

Accepted Solutions
Xilinx Employee
Xilinx Employee
846 Views
Registered: ‎04-16-2008

Xilinx doesn't specifically test the core at lower line rates, but underclocking the core should be possible.  Running with half rate reference clock or adjusting the QPLL dividers will likely work, but if you want GT generated with all of the settings tuned for a 5G link then the 10G/25G Ethernet Subsystem can be generated with the GT in the example design. Then the GT wizard GUI is exposed and the GT wizard can be regenerated for your desired line rate and reference clock.  In this case the common block is outside the GT wizard core and will also have to be updated.

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

9 Replies
Adventurer
Adventurer
890 Views
Registered: ‎03-21-2011

Rather than lie about the clock rate in the constraints, you could try adjusting the QPLL block.  If you made a 10G block with "support in the ref design" and edited the GT_common divider.  Then in the ref design sim try dividing the bit clock.  If it fails most likely you will notice it just never gets block lock and forever reports local fault 0x9e000001.  That seems to be what happens when I messed up my clock trying to run 10G off of 156.25 till I got it right.

Xilinx Employee
Xilinx Employee
847 Views
Registered: ‎04-16-2008

Xilinx doesn't specifically test the core at lower line rates, but underclocking the core should be possible.  Running with half rate reference clock or adjusting the QPLL dividers will likely work, but if you want GT generated with all of the settings tuned for a 5G link then the 10G/25G Ethernet Subsystem can be generated with the GT in the example design. Then the GT wizard GUI is exposed and the GT wizard can be regenerated for your desired line rate and reference clock.  In this case the common block is outside the GT wizard core and will also have to be updated.

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------

View solution in original post

Xilinx Employee
Xilinx Employee
841 Views
Registered: ‎04-16-2008

USXGMII always operates with a fixed 10G line rate and duplicates blocks when configured for 5G or lower.  

-------------------------------------------------------------------------
Don’t forget to reply, kudo, and accept as solution.
-------------------------------------------------------------------------
Observer
Observer
802 Views
Registered: ‎04-24-2020

Hello @ejanney @bitjockey and thank you for your input. I really appreciate it!


@ejanney wrote:

... if you want GT generated with all of the settings tuned for a 5G link then the 10G/25G Ethernet Subsystem can be generated with the GT in the example design. Then the GT wizard GUI is exposed and the GT wizard can be regenerated for your desired line rate and reference clock.  In this case the common block is outside the GT wizard core and will also have to be updated.


This does seem to be a viable solution. I think this is the same thing that @bitjockey suggests.

0 Kudos
Reply
Observer
Observer
801 Views
Registered: ‎04-24-2020

@ejanney wrote:

USXGMII always operates with a fixed 10G line rate and duplicates blocks when configured for 5G or lower.  


Does it indeed duplicate EVERY single octet like this? As I wrote in my original post the standard mentions something about a SOP word not being duplicated?

SOP word (4-bytes) is transmitted only once; in remainder of the 1 (2.5G), 4(1G) or 49 (100M) 4-byte words are replaced by 0xAA.

(From document EDCS-1150953 page 18 https://developer.cisco.com/site/usgmii-usxgmii )

 

0 Kudos
Reply
Adventurer
Adventurer
735 Views
Registered: ‎03-21-2011

It's not clear from the thread.  If the coreclk is still run at the original speed is that "duplicating blocks" i.e. words of data coming out because a fifo is being fed from txusrclkout half as fast as needed?  Just run coreclock also half the speed too?  Easier to meet timing and everything stays in lockstep in the logic.  QPLL is still set explicitly for the desired refclock to sample so phases/bandwidths shouldn't matter as might happen if lying about the frequency.

Note that even though the qpllrefclkout will be "wrong" it is unused by the 10G implementation, only the 5G (turned 2.5G in your case) ddr sample clock is actually used by the fabric.  I disconnected refclkout and it sims and runs in h/w just fine.  I think txusrclkout is the 5G clock divided by 32 and the refclk mux inputs in the channel are never selected.  Not sure why xilinx even bothers running an unused signal between the blocks unless it is boilerplate used by other standards and easier to just leave in?  They even show it in the pdf documentation though.

Observer
Observer
589 Views
Registered: ‎04-24-2020

A little update:

I have started using the 1/10/25G PCS/PMA IP, and in 10G mode I modify the transceiver CHANNEL parameters over DRP to get a line rate of 5G. It is about ten parameters that need modification, some of which are read-modify-writes since the DRP registers are packed with other parameters.

  • ADAPT_CFG1
  • RXCDR_CFG2
  • RXCDR_CFG2_GEN2
  • RXCDR_CFG2_GEN3
  • RXOUT_DIV
  • RX_PROGDIV_CFG
  • RX_WIDEMODE_CDR
  • TXOUT_DIV
  • TXPH_CFG
  • TX_PROGDIV_CFG

It seems to work well in simulation. It took a while to get there, since this is somewhat unexplored territory for me. I am waiting on new boards that have the correct GTREFCLK oscillator before I can try it on hardware.

Thanks for pointing me in the right direction @bitjockey @ejanney .

Adventurer
Adventurer
472 Views
Registered: ‎03-21-2011

It seems to work well in simulation. It took a while to get there, since this is somewhat unexplored territory for me. I am waiting on new boards that have the correct GTREFCLK oscillator before I can try it on hardware.

if you have hardware with a refclk that is 2x or 1/2 the desired speed, you may be able to play games with the QPLL settings to multiply or divide by 2 more.  Note that QPLL refcklout is a forwarded copy of the "wrong" clock but is actually unused in the core (as tx and rx clocks do not select it in the internal muxing, not sure why xilinx wires it up).   You may need to turn a BUFG for coreclk into a BUFG+PLL circuit too also get a coreclk of the expected freq. 

What you can't do is use a PLL to adjust refclk and feed the PLL output into the QPLL (xilinx says it isn't stable enough even if it sims) you have to do a parallel div 2 or mult 2 for the 5GHz sample qpll clock and the logic core clock.

Observer
Observer
426 Views
Registered: ‎04-24-2020

Thank you for the very informative reply @bitjockey !

In my case, the oscillator on the current hardware is 250 MHz, while I need a 156.25 MHz for the ethernet IP. So there is no simple relation. However, the new hardware arrives in just one or two weeks, so I will just wait for that.

0 Kudos
Reply