06-24-2020 05:16 AM - edited 06-24-2020 06:55 AM
We're using a Zynq Ultrascale+ MPSoC device trying to establish a 1000Base-KX connection over PS-GTR phy, using Linux. We're using a macb driver with a patch found in this forums for fixed link, which made it all work.
The clock generator which feeds the GTR banks for this connection is configured in the FSBL. The connection normally works with no issue after power-on, establishing immediatelly and providing solid performance. If it works, you can plug an replug and it always establishes the link, with no issue, until you power-cycle the board.
At power up, approx 20% of the times, however, the link will go up but there will be no communication. The other end of the connection reports a healthy, up link, but no communication. This is eventually fixed after several reboots (each reboot reconfigures the clock generator)
Aprox 5% of the times, there is no link at all, and there will never be no matter how many times we reboot. The clock generator is reported as properly configured though. You need to power-cycle to have another chance at link up.
We went from 50% times not working to 75% of the times working by reconfiguring the PS-GTR's at the fsbl after the clock generator is working and locked. Before that, the clock generator was configured after PS-GTR initialization and the spectrum of failure symptoms was broad. We got intermittent links (going up and down every 10 seconds) and things like those. Now, with that change, the scenarios we see are the three described. Either working the first try and staying there indefinitely, link but no data, fixed after several reboots and eventually becoming the first scenario with it working perfectly, or unrecoverable no-link. We have checked that the clock generator locks every time and has no apparent issue.
We think this is a bad configuration sequence which depending on several factors may leave the gem or the phys in a weird state. However, we see that the fsbl, uboot and also the linux kernel with phy_zynqmp driver may configure the PS-GTRs for proper operation, but right now we think that only the fsbl is actually configuring them, as the phy_zynqmp's trace doesn't appear in the kernel while booting. The fsbl is auto-generated with some extra custom hooks for configuring the clock generator and re-calling the PS-GTR initialization sequence functions.
Any clue about this?
EDIT: I forgot to mention that this issue is device-dependant. We have 5 boards like these and the Zynq Ultrascales are mounted in modules which we can swap between boards. There's one of the modules which works 100% of the times (and it always has) no matter in which board, and other which never worked before changing the clock distributor configuration sequence. The rest worked sometimes. Now, after the change, the one that always works still works, and the other 4 have essentially identical failure rates.
06-26-2020 06:09 AM
Hi @dmg ,
It looks to me it's related to the clocking.
Is there a way that you could track down the differences between different modules in regards to the clocks?
06-26-2020 06:40 AM
The clock generator, clocking tree and configuration are exactly the same for each module, in fact the clock circuitery is mounted on the modules. The clock generator succesfully locks and the clocks it outputs are fine and within spec. Some other things on board use the same clock generator and work every time without issues.
The thing is, due to how this clock generator is connected to the board, it's unconfigured on startup by the PS in the fsbl. The Initial state for the clock output feeding bank 505 is off, so the PS starts with no clock, and eats the initial transient of the clock output before it settles to the nominal clock.
That's why we chose to configure the PS-GTRs and GEMs after the clock generator is locked (and that improved things dramatically). The fsbl sequence currently first configures the clock generator, and after it's configured, it does the recommended PS-GTR initialization sequence (lane calibration, GEM reset, GEM configuration, GEM unreset and PS-GTR pll lock check). We do suspect that is is clocking related, but in some subtle way. Maybe the FSBL doesn't do the whole GEM/PS-GTR reset sequence (we checked and it seems to be doing it properly, but we're not 100% sure) or maybe there's something else to reset and reconfigure after the clock is stable which we aren't resetting and reconfiguring.
Seems like something doesn't like the initial clock transient or something like that, and enters a weird state which doesn't let the link go up. Do you know which could it be?
10-29-2020 02:19 AM
Hi @astone21480 ,
So 1000BASE-KR is not supported with PS-GTR. This requires a backplane auto-negotiation and this standard has not been characterized to support. If you read UG1085, you will know it only supports 1000BASE-S/LX.
In the case that you do not need the backplane AN, it might work as "fixed-link".