cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
LukasVik
Contributor
Contributor
1,634 Views
Registered: ‎04-24-2020

1/2.5G Ethernet PCS/PMA IP version 16.2 does not leave reset

Jump to solution

IP: "1/2.5G Ethernet PCS/PMA or SGMII" version 16.1/16.2

Vivado version: 2019.2/2020.2

Device: Kintex UltraScale+

 

Hello,

After upgrading Vivado from 2019.2 to 2020.2 I noticed that my design does not establish link on SGMII. After debug probing it seems that the IP does not go out of reset. The signals gt_rxresetdone and resetdone never go high.

I also tested building with 2020.1 and the same error occured.

I tried building the design in Vivado 2020.2 but with the IP version from 2019.2 (version 16.1 instead of 16.2) and it worked fine. So it seems the issue is related to the new version of the IP, not the new Vivado version.

I have studied Appendix F in PG047 but have not been able to solve it.

Is this a known issue? Is there a workaround?

 

Best regards

Lukas

0 Kudos
1 Solution

Accepted Solutions
guozhenp
Xilinx Employee
Xilinx Employee
883 Views
Registered: ‎05-01-2013
21 Replies
guozhenp
Xilinx Employee
Xilinx Employee
1,569 Views
Registered: ‎05-01-2013

Is this simulation issue or on board?

Do you mean that 2019.2 works good?

What's the IP core configurations? Have you tried the IP example first, does it work?

LukasVik
Contributor
Contributor
1,538 Views
Registered: ‎04-24-2020

Hello @guozhenp and thank you for the reply!

> Is this simulation issue or on board?

This issue is on board.

> Do you mean that 2019.2 works good?

Yes when I build the design with Vivado 2019.2 it works well. Also, when I build the design with 2020.2, but use IP files from 2019.2 (for this IP only, version version 16.1), it works well. But when I build with 2020.2 and the most recent version of the IP (16.2) it does not leave rx reset.

> What's the IP core configurations?

I create the IP with the attached TCL file. The free-running and DRP clock is 50 MHz and the GT reference clock is 250 MHz. I inspected the GT within the core and all properties seem to be set correctly.

gig_0.pnggig_1.pnggig_2.pnggig_3.pnggig_4.pnggig_5.pnggt_0.pnggt_1.png

All the properties are the same for version 16.1 and 16.2, both in the IP and in the GT.

> Have you tried the IP example first, does it work?

I am not sure what you mean by this, could you clarify?

As you see from my configuration above I do use "additional transceiver control and status" as well as "include shared logic in example design". However I tried disabling both of those, to minimize the risk of user error, but the result was still the same: The IP does not leave rx reset.

 

Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
1,513 Views
Registered: ‎05-01-2013

1. What's the detailed FPGA device? Is this GTH or GTY?

2. As "additional transceiver control and status" is enabled and GT resetdone is not completed, how about gtpowergood and cplllock on the board?

LukasVik
Contributor
Contributor
1,454 Views
Registered: ‎04-24-2020

Thank for the reply @guozhenp !

> 1. What's the detailed FPGA device? Is this GTH or GTY?

It is a Kintex UltraScale+ (xcku5p-ffva676-1-e). The transceiver is GTYE4.

> 2. As "additional transceiver control and status" is enabled and GT resetdone is not completed, how about gtpowergood and cplllock on the board?

gtpowergood and cplllock are both '1'.

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
1,440 Views
Registered: ‎05-01-2013

Right click the IP core .xci file and select generating IP core example design.

Please try the example on board and check if it can work first.

 

Regarding your design, are all the reset inputs released? Is rxpmaresetdone asserted?

Can you add ILA for debugging? You can add all the IP core (or transceiver) input/output signals into ILA to check.

LukasVik
Contributor
Contributor
1,296 Views
Registered: ‎04-24-2020

Hello @guozhenp 

I am terribly sorry for the late reply. I was away from work for a while.

I have tried the example design on my board, but unfortunately there was no difference. It does not leave RX reset. The gt_txresetdone signal is asserted, but not gt_rxresetdone, gt_rxpmaresetdone, or resetdone.

In my design the reset inputs are for sure released. The reset and pma_reset inputs are connected like in the example design.

I have tried connecting all the IP core input/output signals to ILA. The only differences are with the gt_rxresetdone, gt_rxpmaresetdone, and resetdone outputs. They are asserted when I build with version 16.1 of the IP, but not when I build with 16.2.

 

Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
1,281 Views
Registered: ‎05-01-2013

I've never heard about the issue in 16.2

Could you upload the ILA screenshots on the signals of GT RESET flow?

0 Kudos
LukasVik
Contributor
Contributor
1,224 Views
Registered: ‎04-24-2020

Hello @guozhenp ,

Sorry again for the late reply. I was sick for a few days.

I set up an ILA with all the signals to and from the transceiver. It proved a little complicated to set attribute mark_debug on signals within an IP core, but I used a TCL snippet like this:

set nets [get_nets gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i/*]
puts [llen ${nets}]

foreach net ${nets} {
  set_property MARK_DEBUG true ${net}
}

It is also a little complicated since the different signals are in different clock domains, so there were five ILAs created with a few signals in each. Also the order of signals is different between builds, so it is hard to compare.

 

When it is working. This is what happens when I build my design with Vivado 2020.2, but version 16.1 of the IP:

working_ila_1_part_1_at_reset.pngworking_ila_1_part_2_at_reset.pngworking_ila_2_at_reset.pngworking_ila_3_at_reset.pngworking_ila_4_at_reset.pngworking_ila_5.png

The screenshots above are at the moment when reset and pma_reset are released. Below are screenshots of the signals in steady state, i.e. ILA triggered manually a while after:

working_ila_1_part_1_after_reset.pngworking_ila_1_part_2_after_reset.pngworking_ila_2_after_reset.pngworking_ila_3_after_reset.pngworking_ila_4_after_reset.png

 

 

When it is NOT working. This is what happens when I build my design with Vivado 2020.2 and version 16.2 of the IP:

broken_ila_1_part_1_at_reset.pngbroken_ila_1_part_2_at_reset.pngbroken_ila_2_at_reset.pngbroken_ila_3.pngbroken_ila_4.pngbroken_ila_5.png

The screenshots above are at the moment when reset and pma_reset is released. Below are screenshots of the signals in steady state, i.e. ILA triggered manually a while after:

broken_ila_1_part_1_after_reset.pngbroken_ila_1_part_2_after_reset.pngbroken_ila_2_after_reset.png

 

 

Like I said, it is hard to compare the screenshots since the signals move around between the builds. But I have looked closely at them, and from what I gather, the only difference is that rxpmaresetdone_out and rxresetdone_out never go high for version 16.2 of the IP. There is also an observable difference in the rxcommadet signal. It fluctuates in the build that is working (16.1) but is static zero in the broken build (16.2). But would think that is caused by RX never leaving reset, so that seems like a consequence rather than a cause.

The interesting thing however is the fourth screenshot in "When it is not working..." above (ila_3). In this case, the ILA complains that the clock is not running. So it does seem that the clock that drives these signals is not running. I found the clock to be rxuserclk and rxuserclk2. These clocks are constructed from rxoutclk fed to a BUFG_GT. So my conclusion is that rxoutclk from the transceiver does not seem to run.

I am not an expert at the transceivers, so I do not know what can cause this?

The cplllock_out signal is high, so the PLLs are clearly locked. Also the TX clocks, userclk and userclk2, are clearly running, based on the fact that we can sample ILA signal in those clock domains. I do not know why the RX clock would not be running?

Let me know if I can provide any more information for you.

 

Best regards

Lukas

0 Kudos
LukasVik
Contributor
Contributor
1,208 Views
Registered: ‎04-24-2020

Hello again @guozhenp  ,

I looked at the logs from when I build with IP version 16.1 and 16.2. I compared the files

  • impl_1/drc_opted.rpt
  • impl_1/drc_routed.rpt
  • impl_1/methodology_drc_routed.rpt
  • impl_1/runme.log

There were no differences, except for in methodology_drc_routed.rpt, where the warning

LUTAR-1#21 Warning
LUT drives async reset alert
LUT cell gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i_i_2, with 2 or more inputs, drives asynchronous preset/clear pin(s) gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i/inst/gen_gtwizard_gtye4_top.gig_ethernet_pcs_pma_0_250mhz_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_reset_controller_internal.gen_single_instance.gtwiz_reset_inst/reset_synchronizer_gtwiz_reset_tx_datapath_inst/rst_in_meta_reg/PRE,
gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i/inst/gen_gtwizard_gtye4_top.gig_ethernet_pcs_pma_0_250mhz_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_reset_controller_internal.gen_single_instance.gtwiz_reset_inst/reset_synchronizer_gtwiz_reset_tx_datapath_inst/rst_in_out_reg/PRE,
gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i/inst/gen_gtwizard_gtye4_top.gig_ethernet_pcs_pma_0_250mhz_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_reset_controller_internal.gen_single_instance.gtwiz_reset_inst/reset_synchronizer_gtwiz_reset_tx_datapath_inst/rst_in_sync1_reg/PRE,
gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i/inst/gen_gtwizard_gtye4_top.gig_ethernet_pcs_pma_0_250mhz_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_reset_controller_internal.gen_single_instance.gtwiz_reset_inst/reset_synchronizer_gtwiz_reset_tx_datapath_inst/rst_in_sync2_reg/PRE
gt2100_inst/sgmii.generate_with_250mhz_reference.i_gig_ethernet_pcs_pma_0/U0/transceiver_inst/gig_ethernet_pcs_pma_0_250mhz_gt_i/inst/gen_gtwizard_gtye4_top.gig_ethernet_pcs_pma_0_250mhz_gt_gtwizard_gtye4_inst/gen_gtwizard_gtye4.gen_reset_controller_internal.gen_single_instance.gtwiz_reset_inst/reset_synchronizer_gtwiz_reset_tx_datapath_inst/rst_in_sync3_reg/PRE. The LUT may glitch and trigger an unexpected reset, even if it is a properly timed path.

exits in the old version (16.1) but not in the new (16.2). This does not seem related to the issue I am having.

So in summary I did not really find anything by looking at the logs. No warnings/errors or DRC problems that could explain the error.


Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
1,186 Views
Registered: ‎05-01-2013

Normally, only clock and reset not ready can make resetdone to fail.

1. Open the P&R result and make sure GT and its refclk have the correct locations in the 2020.2 design

2. rxuserclk/2 are always not available or sometimes in the ILA? Try trigger all the RX reset inputs to check if they've happened.

3. I compared all the related codes genereated by IP core. There're only a few difference in "gig_ethernet_pcs_pma_0_transceiver.v". It seems not affect your issue. But you can try replacing the 2020.2 code with 2019.2

LukasVik
Contributor
Contributor
1,161 Views
Registered: ‎04-24-2020

Hello @guozhenp ,

Regarding your point 1. I opened the designs and compared version 16.1 and 16.2, both built with Vivado 2020.2. Placement of the pins and transceivers are correct in both.

Version 16.1 (working):

refclk_old.pngibufds_old.pnggtye4_channel_old.png

(Screenshots show 1: refclk pin. 2: refclk ibufds. 3: gtye4_channel)

Version 16.2 (not working):

refclk_new.pngibufds_new.pnggtye4_channel_new.png

So to me it looks like everything is correct!

 

Best regards

Lukas

0 Kudos
LukasVik
Contributor
Contributor
1,159 Views
Registered: ‎04-24-2020

Regarding your point 2.

My ILA screenshots above show that the reset and pma_reset pins are asserted. These are connected to the same bit in a software-accessible register that is set to '1' at startup. The software then deasserts the reset register bit after a while. The other reset signals (gt_rxdfelpmreset, gt_rxprbscntreset, gt_txpmareset, gt_txpcsreset, gt_rxpmareset, gt_rxpcsreset, gt_rxbufreset) are set to '0', just like in the example design.

The reference clock is an oscillator on the PCB that is running as soon as the power is turned on. So it most definitely is running stable when the FPGA is loaded and the software has booted.

I have done a lot of testing these past days, and it seems that the rxuserclk/2 is running, and the RX circuit leaves reset, about 50% of the time. That is with the latest version (16.2) of the IP built with Vivado 2020.2. Version 16.1 of the IP, built with Vivado 2020.2, works 100% consistently.

So 50% of the times when I program the FPGA it works, when using version 16.2. The clock is running, it asserts rxresetdone, and it is available on ethernet. However 50% of the times when I program the FPGA it is not working. The clock is not running (I have a tick counter attached to a register. There are zero rising edges.), and rxresetdone is not asserted.

When it is not working, it does not matter how many times I toggle the reset/pma_reset, it never works. Only a re-program of the FPGA can make it work. When it does work however, I can toggle the reset/pma_reset any number of times. The clock runs and rxresetdone is asserted after releasing reset.

This is indeed a very curious problem. I really have no idea what could cause this. Let me know if you need any more information. Thank you for your help.

 

Best regards

Lukas

0 Kudos
LukasVik
Contributor
Contributor
1,157 Views
Registered: ‎04-24-2020

Regarding your point 3.

I also looked at a diff of the IP versions, to see if I could find anything that might cause this error.
I generated IP cores from Vivado 2020.2 with version 16.1 and 16.2 of the IP, and compared the folders.
From what I can tell there is not a large difference, at least not in the files that are not encrypted.

It seems that some sort of "waiver" constraints are create in the top level constraint file:

Screenshot from 2021-01-15 13-27-44.png

I am not sure what this means.

Also there is a code change related to rx and tx reset at the GT instance (*_transceiver.vhd):

Screenshot from 2021-01-15 13-27-49.png

Again, I can not read the encrypted code, so maybe there are changes there.

 

Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
1,132 Views
Registered: ‎05-01-2013

The encrypted code is Ethernet IP and it should not affect the issue. The failue is still in GT initialization.

The constraints look more suspicious to me. Could you replace the 16.2 design constraints with the 16.1 constraints and have a try?

0 Kudos
LukasVik
Contributor
Contributor
1,024 Views
Registered: ‎04-24-2020

Hello @guozhenp ,

As you suggested, I generated files from a core with version 16.2 and replaced the top level .xdc with the corresponding file from 16.1. It did not work.

Instead I tried with the files from 16.2, but reverting the changes in *_transceiver.vhd to correspond to 16.1. This did work!

So it seems that the changes at the GT instance (*_transceiver.vhd) between version 16.1 and 16.2 are causing my issue.

These are the changes I am talking about (16.1 on left and 16.2 on right):

transceiver_changes.pngtransceiver_changes_2.png

 

In fact, I went even deeper. Since the issue seems to be related to RX, I tried reverting only the portion of the changes that are related to RX, while keeping the changes that are related to TX. And it still worked. So indeed it seems that the RX related changes in *_transceiver.vhd are causing my issue.

 

Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
1,013 Views
Registered: ‎05-01-2013

I compared them before. The difference is the logic about RX data reset which is from rx_reset (input) and not-rxresetdone

As RXRESETDONE is not completed in your case, I thought the logic should not happen yet.

Could you just add all the related signals in the different logic into ILA? There're not many signals. When it fails, just trigger them one by one to check which one happens and cause the failure?

Thanks.

0 Kudos
LukasVik
Contributor
Contributor
989 Views
Registered: ‎04-24-2020

Hello @guozhenp ,

I ran an ILA as you suggested, and indeed there are some interesting differences.

But first of all, I had some trouble getting an ILA up and running. Since most of these signals are clocked by rxuserclk, which is not running before reset is released, I could not set up a trigger. The Hardware manager crashed saying the ila clock was not running. So instead I set the ILA clock to a freerunning 300 MHz clock that have in my design. This gave setup/hold failure in timing analysis, as expected, but I think it should be fine for this purpose.

In my ILA have the signals

  • gtwiz_reset_rx_done_out_int_reg
  • rxreset_rec
  • gtwiz_reset_rx_done_out_reg
  • gtwiz_reset_rx_done_out
  • rxreset

from the *_transceiver.vhd within the IP. I also attached my reset signal ("software_reset") which is connected to reset and pma_reset of the IP. Also a tick counter for the rxuserclk, which is a simple counter that increments on each rising edge. Note that rxusrclk is rxoutclk from the core, passed through a BUFG_GT.

This is how it looks when it works:

works_1.pngworks_2.pngworks_3.png

The first screenshot is before reset/pma_reset is released, the second screenshot is the moment reset/pma_reset is released, and the third is a while after.

This is how it looks when it fails:

fails_1.pngfails_2.pngfails_3.png

Interesting to note that the tick counter reports that there have been two rising edges on the rxuserclk, while it is zero when the core works. Also that gtwiz_reset_rx_done_out_reg is already high before reset is released, while it is low when the core works.

 

Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
970 Views
Registered: ‎05-01-2013

OK. It's clear to me now.

In the successful case, gtwiz_reset_rx_datapath_in is always 0. While in the failing case, it's always 1.

This reset blocks RXPMARESETDONE asserted and RXUSRCLK to work.

 

gtwiz_reset_rx_datapath_in is generated from rxreset AND rxresetdone. It means ...

1. rxreset input is just used to drive gtwiz_reset_rx_datapath_in

2. but we don't want it to block GT initialization. So it's only useful after rxresetdone asserted (AND rxresetdone)

 

However rxreset and its synced signal rxreset_rec in the new version has a dead lock.

rxreset_rec is initialized as 1 and it's synced by rxusrclk. Then it's always 1 and rxresetdone never asserted, rxusrclk never toggling

 

I suggest you just change rxreset_rec initial value as 0 to fix the issue.

 

 

 

 

 

LukasVik
Contributor
Contributor
898 Views
Registered: ‎04-24-2020

Hello @guozhenp ,

Indeed, setting rxreset_rec initial value to 0 solves the problem for me. Here is the change i performed:

diff_initial_value_component.pngdiff_initial_value_instance.png

Can I expect this fix to be part of the next Vivado release?

Thank you so much for the help.

 

Best regards

Lukas

0 Kudos
guozhenp
Xilinx Employee
Xilinx Employee
884 Views
Registered: ‎05-01-2013
LukasVik
Contributor
Contributor
866 Views
Registered: ‎04-24-2020

Great! Thanks again for the help. Take care.

 

Best regards

Lukas

0 Kudos