cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Highlighted
Anonymous
Not applicable
2,465 Views

Zynq FSBL stuck after 'DMA Done' !

Hello, I am facing a problem of "FSBL Stuck after DMA Done" for a simple Hello World application on Z-Trun 7020 booting from BOOT.bin in SD card. In attachment you can find all the steps of Hello World project in Vivado 2017.3 (Windows). Any idea why FSBL stop here at DMA Done and does not go further ? 

0 Kudos
15 Replies
Highlighted
Advisor
Advisor
2,432 Views
Registered: ‎10-10-2014

I've reproduced this issue also (follwing the steps in the pdf). Some sdk builds have the issue systematically, the FSBL then always gets stuck and shows a line of  'dots' right after the 'DMA done!' message, it does not get to the 'FPGA done!' message. Looks like some DMA time-out, but I cannot find it immediately in the source code. To resolve the issue, I had to start all over from the export of the .hdf (delete entire sdk folder first). No reason why this resolves the issue, but it does. I'm starting to think it has something to do with an 'almost empty PL', but that's probably a crazy idea :-)

 

Once, I had the issue that my very first cold boot was ok, and all subsequent (cold) boots showed the FSBL getting stuck. I could not get a correct boot once more, only the very first time. Very strange. I kept a serial port log, here's a visual diff between the very first correct boot (left side) and wrong boot (right side). There's no diff in PCAP registers :

 

diff fsbl boot.png

 

if we know what the 'dots' mean, we could investigate this further maybe? Could it be a timing issue in the SD unit / driver?

** kudo if the answer was helpful. Accept as solution if your question is answered **
Highlighted
Xilinx Employee
Xilinx Employee
2,413 Views
Registered: ‎09-01-2014

Dot is just a waiting message while polling the devcfg.INT_STS [PCFG_DONE_INT] register to make sure PL configuration is done.
Please check the pcap.c in the FSBL.

Does DONE LED flash on the board? If you cannot see FPGA Done, maybe bitstream was damaged.
Highlighted
Advisor
Advisor
2,406 Views
Registered: ‎10-10-2014

@ritakur, when the 'dots' appear, the FPGA done led does not turn on.

 

thanks for pointing to the 'dot' generation - I checked 'XDcfgPollDone in pcap.c -> looks like it only shows a dot for the first 100 loops. Also, if I wait long enough, there should be a time-out message? This timeout should be within a few seconds I guess? I don't see this timeout message ...

 

I could follow your idea of having a damaged bitstream, however :

 

1) the very first boot was ok, all others after that 'showed the dots'. All I did was a power cycle , I didn't touch the SD card, so how could the bitstream get damaged by just reading it ...

 

2) the issue can be 'resolved' by deleting the SDK folder, re-exporting the .hdf file (so without regenarating the bitstream), and recreating the FSBL, application and boot image.

 

3) though it's probably a strange idea, it seems to be related to have almost nothing in the PL, so a bitstream with almost all fuses off. I never have seen this issue in the past with other designs, only when I started looking into the desing of @Anonymous, where he tried to only instantiate a PS, or at max a single gate in the PL.

 

I've been thinking about possible causes :

 

1) SD card is not correctly written (some caching issues, however I run 'sync' 2 times in linux after the copy, and properly eject the SD card). Also I would expect there is some crc check on the bitstream, is that the case @ritakur?

2) some timing issue when reading the SD card

3) some timing issue when copying/loading the bistream into the PL

4) some strange driver/dma issue in pcap.c

...

 

but ... as long as we cannot systematically reproduce it, it will be just guessing. I'll come back to this thread if I ever find a 100% reproducable way. 

 

 

** kudo if the answer was helpful. Accept as solution if your question is answered **
Highlighted
Xilinx Employee
Xilinx Employee
2,386 Views
Registered: ‎09-01-2014

Since SD has file system, bitstream is once copied to the DDR then transferred to PCAP(PL), so the bitstream might be damaged during the copy.

If bitstream failed with CRC check, you cannot see FPGA DONE.

I would expect the timeout message, but nothing. It looks like CPU hangs. Did you try other SD card?

 

My suggestion is to debug FSBL in SDK with JTAG boot mode because you can run the program step by step.

In the FSBL main, there is a boot mode detection, you need to cheat it by changing JTAG boot mode to SD boot.

then everything will be no difference with SD boot.

 

>>2) the issue can be 'resolved' by deleting the SDK folder, re-exporting the .hdf file (so without regenarating the bitstream), and recreating the FSBL, application and boot image.

Not sure of this, we only have a HDF not updated issue.

https://www.xilinx.com/support/answers/69489.html

Highlighted
Adventurer
Adventurer
2,286 Views
Registered: ‎10-22-2017

@ritakur

Now the same problem bothers me, i have debug FSBL in SDK with JTAG boot mode and modify the bootmode 0x00000004(nand boot), but the problem remains.

0 Kudos
Highlighted
Advisor
Advisor
2,278 Views
Registered: ‎10-10-2014

@ritakur - it's indeed strange that there is no timeout message. A few of these SD cards have been running linux and application code 24/7 for several months, so I doubt that there's something wrong with the SD cards (?).

 

Are there some specific requirements to the SD card, like minimum speed, and/or other? 

 

I'll try to regenerate an FSBL that gives me consistently the error, and from there try to use JTAG debugging - thanks for the tip to fool the boot loader.

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
2,265 Views
Registered: ‎09-01-2014

Where was the SW stuck? Or CPU hung?

0 Kudos
Highlighted
Xilinx Employee
Xilinx Employee
2,264 Views
Registered: ‎09-01-2014

I am not aware of any unsupported SD card.
Maybe try to change the card or SD test example design
<SDK_INTALL_FOLDER>\data\embeddedsw\lib\sw_services\xilffs_v3_7\examples
0 Kudos
Highlighted
Adventurer
Adventurer
2,169 Views
Registered: ‎10-22-2017

hi  my problem is solved.

The reason for this  is that new bad block have been generated in the NAND or SD card and are not marked. So the system won't start and debug mode will be the same. This problem requires the backup of the BOOT.BIN file on multiple partitions. This is because the bootroom cannot find the correct program to load the FPGA for a period of time and will search for the next BOOT.BIN. If a new bad block appears in the nand of the first partition and there is no mark, the second partition can boot correctly. And after the uboot in the second partition will start the Linux kernel, the kernel will manage nand bad blocks. This way the bad blocks of the first partition are managed and backed up. As for bare-metal running, you need to detect and mark bad blocks yourself.
Also need to modify a place in fsbl, MAX_COUNT value determines the waiting time, it is best to modify to 1000.
    * Poll for FPGA Done
     */
     Status = XDcfgPollDone(XDCFG_IXR_PCFG_DONE_MASK, MAX_COUNT);
     If (Status != XST_SUCCESS) {
         Fsbl_printf(DEBUG_INFO,"PCAP_FPGA_DONE_FAIL\r\n");
         Return XST_FAILURE;
     }

The default is to wait eight minutes.

Highlighted
2,015 Views
Registered: ‎01-16-2018

I occasionally see this when booting from QSPI flash after soft resetting the PS in code (writing 0x1 to slcr.PSS_RST_CTRL).  I can confirm the bitstream data is copied to DDR just fine since I define FSBL_DEBUG_INFO and see a valid Checksum value being printed.  If I power on reset, I see the same Checksum value printed, but it proceeds as normal and doesn't get stuck.

 

Any ideas?

 

Thanks!

0 Kudos
Highlighted
Advisor
Advisor
1,985 Views
Registered: ‎10-10-2014

@ritakur, I came accross this post, which mentions an issue with the SD card clock speed setting. by lowering the clock speed, the guy solved the issue. Can you confirm there is some max setting for the SD card controller, and that the default value set is not ok?

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Highlighted
Advisor
Advisor
1,985 Views
Registered: ‎10-10-2014

@caoshouqi, thank you for sharing that. Could you explain in a bit more detail please : you say that the kernel manages  and marks bad blocks, but how about the FSBL and U-boot -> these are 'bare metal' applications -> do we have to add code here to manage the marking of bad blocks, and do we always need to add multipel partitions? 

 

I would not expect blocks to go bad for blocks that are mainly read, not written, like FSBL and U-boot? Or does this happen more often?

 

@ritakur, is there some Xilinx doc on this kind of issues?

** kudo if the answer was helpful. Accept as solution if your question is answered **
0 Kudos
Highlighted
Adventurer
Adventurer
1,977 Views
Registered: ‎10-22-2017

@ronnywebers

This problem is mainly due to the generation of unmarked bad blocks. If bad blocks are marked, this situation is often not a concern. The reason why unmarked bad blocks are generated may be caused by excessive temperature during welding.
I encountered this situation and it was able to start again from the first partition because the kernel did a bad block management and ECC correction corrected the wrong bit.
If there are only bare programs, then it is recommended to modify the waiting time and multiple backups. Bootrom will find another valid image because the first image fails to boot.

0 Kudos
Highlighted
1,370 Views
Registered: ‎01-22-2019

 

Well, in my case the problem was simply that I had the Xilinx Cable USB II connected to the JTAG!

Disconnecting the cabe (it can be the USB side) when booting from QSPI solves the problem.

I have no idea why this happens. Maybe it messes up the boot mode register? (though it still chooses QSPI boot and only bricks during PCAP transfer).

The JTAG converter in question is this little devil:

https://media.digikey.com/photos/Xilinx%20Photos/HW-USB-II-G.JPG

0 Kudos
Highlighted
1,364 Views
Registered: ‎01-22-2019

OK, now I realize the real problem:

It was Vivado connected to the JTAG cable. When I close the server in Vivado, the device is able to boot normally.

Attached, an image of what I'm talking about.

 

vivado_hw_man.png