03-20-2014 09:57 PM
I'm currently using SDK 14.6 for a MB running on S6. I've found that when I compile a particular large C++ program (final code size about 2MB) it will intermittently crash in strange ways (alignment errors, invalid bus accesses, invalid opcodes, infinite loops, etc). The crash type is fairly consistent for a given set of code even after clean rebuilds but will change as I add/remove source files; below a certain ill-defined threshold it seems to run without issues.
Currently I have it crashing with an invalid opcode exception. The PC at the time was in code memory, and the disassembly of the ELF file at that location seems reasonable. Using the debugger, I examined memory at that location as soon as the initial main() breakpoint was hit, and the code had been overwritten with data that looks like addresses near the start of the .data section, including several repeated values.
I put a write watchpoint on the memory and apparently the memory is being overwritten by __register_exitproc, which seems to be some internal function of the CRT related to destructors and/or exception handling, from what I can tell. Unfortunately I can't see any source of this function so I'm not really sure what it's doing, but it seems to be clobbering a large amount of code around that area. (Values most commonly seen include 0xFFFFFFFF and the address of the __dso_handle symbol, but there's also a lot of addresses of global variables.)
In particular, it appears to have clobbered code memory from 0x2009680C to 0x20096913 (which is just somewhere randomly in the middle of code; I can't see any particular reason why this region was chosen) with values around 0x2015AE40 and thereabouts, which are global variables that have C++ destructors. (.init starts at 2013D950; __data_start is 0x20157D30.) I'm just using the default SDK-generated link script, with code and data in external RAM.
The application in question has a large number of global variables that have constructors/destructors, which is one of the triggers for this issue, I think. But that's something I can't really change (it's from a unit test framework).
Has anyone else seen this before? Any ideas how to resolve it?
03-20-2014 11:25 PM
Ok, I have yet to do sufficient testing to confirm this fully (since the original problem kept shifting intermittently) but so far adding the compiler flag "-fno-use-cxa-atexit" appears to have resolved the issue.
Perhaps this should be set as the default for C++ compiles, unless later versions of the SDK/compiler/CRT fix this a different way? (Corrupting code memory is not friendly behaviour.)
02-13-2017 10:39 AM
We use Xilinx tools 14.5, on an MB running on S6, as well.
We were getting intermittent initialization issues with our embedded device. Periodically, on boot, we would see memory corruption where the __dso_handle symbol, static variables with deconstructors (usually function scoped), and the value 0xFFFFFFFF were showing up throughout the corrupted region. This issue happened as the code base grew larger and we added more static/global static variables. A sister product, based on the same codebase, but with less static variables, did not exhibit this problem
I stumbled upon this thread and added the compiler flag. The issue disappeared. Using the -fno-use-cxa-atexit flag fixes this problem.