05-24-2021 07:08 AM
We have a board design based a Virtex 7 690T. The configuration scheme is copied from the VC709 dev board in that it is a MT28GU01 BPI flash chip. We have built around 60 of these boards and not seen config issues, but the last batch we have had built we have got 10 or so boards that exhibit very intermittent issues that I can only describe as the FPGA seemingly not completing its configuration / start up sequence correctly, I’ll describe the symptoms:
On power on the FPGA gets configured from the BPI flash chip (in Master BPI mode) the done pin goes high (lights an LED up). The default design takes some of the many clocks on the board and generates a 1 PPS strobe from each that we then tie to LED’s so we can observe correct clocks. Most times all the LED’s flash as expected. However on the faulting boards, intermittently we only get some of the LEDS flashing as if the clocks are not reaching the FPGA or the 1 PPS strobes are not being output. I’ve checked the clock splitter chips and the clocks are all being generated and I can see when the FPGA boots the waveforms getting loaded and reducing in amplitude so I am confident the signals are making it through to the FPGA. The same design is used to test all boards and normally works, just on these boards it doesn't intermittently, some boards it could be 1 in 5 power cycles you get the error, some one in 50.
And now the strange bit, you can make it spring to life and all the LED’s start flashing by rubbing your finger up the JTAG programming header, or as I also discovered by querying the status registers over JTAG. It’s almost like the start up sequence or configuration hasn’t quite finished properly and some clock strobes down the JTAG (either via noise from a finger or actual JTAG instructions are enough to clock it through to completion). Obviously I can’t check the status of the config registers as the moment I hook JTAG up and query them the device springs to life and it returns successful boot status.
The LED’s are all output from the same bank, so I don’t think there’s an issue with the outputs, its as if some of the input banks are not loaded properly or something is blocking the internal logic from running correctly, but only some of it.
I can't see any issues, spikes, etc, on the power rails at startup and they all look like they are regulating at the correct voltages.
I've checked the mode pins and replaced the pull up / down resistors with 0 ohm shorts.
I've checked the pull up resistors and signalling on INIT_B, PROG_B, WE_B, OE_B & FCS_B, the only issue I can find is when the system is in its partially booted state, it appears WE_B, OE_B & FCS_B are all being held low, which is very peculiar as they shouldn't all be low at the same time.
Strangely if I pulse PROG_B, the FPGA reconfigures, DONE goes off and back on again and configuration pauses at exactly the same point, with only a few LEDs flashing, the only way to get a proper boot, is to power cycle, or do the JTAG clock injection.
I was thinking it was a BPI issue as a repeated PROB_B gives the same state, but then why do JTAG clks sort it out.
So that made me think it must be FPGA, but then why does a PROG_B not sort it out?
I don't think its startup issues on the rails, as the PROG_B doesn't fix it, when the rails are all up and running steady state.
And I don't think its the regulators as a few JTAG clks which don't affect the power supplies makes correct behaviour resume.
It seems to be some weird FPGA configuration interaction with the BPI flash chip and I've run out of ideas of what else to look at / try, any ideas would be appreciated.
05-24-2021 08:24 AM
If you have a bunch working and a new bunch not , then you are very lucky,
a) check the new boards power supplies
b) check the signals on the config bus between a good and a bad board during start up
c) check the pull ups / down on the config bus
d) check the parts are from a good supplier, not grey imports,
e) inspect the board very carefully for solder mistakes,
f) check the boards are the same version , and all layers are present
05-24-2021 07:36 PM - edited 05-24-2021 07:37 PM
Many customers had same issue as yours. This is due to a lack of enough CCLK cycles. Does the design include like MIG or DCI calibration? The clock cycles needed in that case might vary, so when environment changes, such as voltage, temperature, or device itself, the numbers of clock needed change as well - once you wake up jtag, it sends TCK into device and then FPGA boots up.
A simply way to fix, add padding FFs after the MCS file and reprogram this one into BPI.
05-25-2021 02:37 AM
Thanks for the reply, that is exactly what I was assuming it was I just couldn't work out how. As it is in Master BPI mode, I assumed the FPGA would keep clocking CCLK until it had finished its startup, is this not the case, does it stop once it has read a final word from the flash?
The design does include 3 MIG cores so there is DCI calibrated pins and as you say a variable startup length.
Is there an easy way to add the FF's, is it a promgen setting, or is it a case of opening it in a hex editor and adding some more? I though the flash chip when erased was set to FF's so the rest of the storage should already have FF's in, or is there a final word the FPGA reads and then stops?
Do you have a link to a doc describing the configuration file format, I know I've read it somewhere years back, just struggling to re find it.
05-25-2021 02:41 AM
If you are looking for bitstream composition : Check https://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf#page=104
05-25-2021 10:29 PM
@iguo mentioned it clearly:
05-26-2021 01:11 AM
I know he mentioned it clearly, but if you read my previous posts you'd see why I asked. When erased the flash is full off FF's anyway, so me putting some FF's on the end of the MCS then flashing it should make no difference there were FF's there anyway. Unless there is some specific end word the FPGA reads, that I could put the FF's before, hence why I asked that as well.
To add complications to it, my MCS file, contains two FPGA configs, I have my BPI flash partitioned into 4 RS's and I'm using the first two, booting from the first one. I generate this from two separate bin files, so if I want to modify the startup behaviour of the default fill I will need to modify the bin file, then generate the MCS file from that.
Having opened that bin file up I can see that it clearly just ends with a batch of NOOPs, hence my question of do you mean NOOPs or FF's.
Please read my replies before assuming I can't read.
05-26-2021 05:31 PM
Hi, the padding bits actually can be any data ( should not be valid FPGA command of course). In your case, add 20 00 00 00 (NO OP) would be OK. The purpose is to keep cclk sending, the data here no matters.
Another way to fix this, modify the startup sequence settings - by default, bitstream settings are like 'No DCI wait' at cycle 2 or 3. You can change this to 'wait for DCI to lock'. It should have the same effect as adding padding bits but padding is the most frequently used work around.
05-27-2021 03:13 PM
What exactly is it that tells the FPGA to stop generating CCLK, is it a lack of valid data words, or is a certain word like the DESYNC word. As I've tried padding 1 Mbyte of NOOP's on the end of my bin file and flashing it down to the Flash, using our Flash interface core, and it does the same thing, intermittently fails to complete booting. I did try doing the same to the bit file and using promgen, but that seems to be generating me exactly the same file, so I'm guessing I've missed a payload length somewhere at the start of the bit file and its ignoring my extra pad. I'll try and correct that tomorrow.
But back to what stops the CCLK, I'm conscious if it is DESYNC, I've added all the NOOPs, after that, should I be inserting them between the start end of sequence command and the desync command to maximise clk cycles? If it is something else that flags when to stop generating CCLK can you please let me know, I've tried trawling through documentation and can find no mention at all of it.
06-11-2021 02:11 AM
Does anyone have any information on what exactly triggers the FPGA to stop sending CCLK's, as I'm struggling to find any mention. As mentioned above, my initial 'just' padding the end of the file doesn't seem to work, I just want to check I am padding in the correct place. I was hoping for a definitive answer, otherwise I'm just going to have to hack around with the file and see what I can discover.
06-14-2021 06:24 AM
Can anyone answer the question of what causes the FPGA to stop CCLK clocking, as I am finding mixed messages. This forum (Solved: How does FPGA know the size of the bitstream - Community Forums (xilinx.com)) seems to imply it is the receipt of the desync word that causes the FPGA to stop the configuration interface, the user guide UG470, pg 111, describes that START command as 'Begins the Startup Sequence: The startup sequence begins after a successful CRC check and a DESYNC command are performed.', implying it's after the DESYNC the startup state machine starts its stuff, so finished some amount of time after.
I've tried padding a massive amount (~1 Mb) of NOOP's, between the second to last command and the DESYNC command, it made no difference, it still very occasionally hangs and requires some TCK's to kick it into life. I've tried adding the same amount of NOOP's at the end of the file after the DESYNC, same thing again. It seems nothing I do is extending the amount of CCLKs the FPGA is producing, the original reply by Iguo, seemed to imply this was a relatively easy thing to achieve, I am obviously missing a step somewhere to let me do this, so any advice offered would be greatly received.
I am currently rebuilding the design with the bitgen options set to NoWait for DCI Match cycle, so hopefully it will just crack on and finish the start up rather than waiting an unspecified amount of time for the DCI to complete before moving on and running out of CCLK cycles, but this doesn't feel like quite the right thing to do, and I am also concerned if this will affect my MIG cores if they attempt to start before the DCI is matched.
Hopefully someone can shed some light on this strange behaviour or at least point me at a reliable way of extending the number of CCLK's.