03-16-2009 01:30 AM
I am having some problem booting linux on a virtex4.
I have build the kernel using ELDK, the xilinx device tree generator and linux-2.6-xlnx sources from git.xilinx.com.
I have the following output on the terminal:
zImage starting: loaded at 0x00400000 (sp: 0x0070feb0)
Allocating 0x38a79c bytes for kernel ...
gunzipping (0x00000000 <- 0x0040d000:0x0059ee42)...done 0x367f58 bytes
Attached initrd image at 0x0059f000-0x0070ef20
initrd head: 0x1f8b0808
Linux/PowerPC load: console=ttyUL0 root=/dev/ram
Finalizing device tree... flat tree at 0x71c300
The strange thing is that when I check the value of pc register, it does not return values of the virtual memory but values like:
pc :0x00002150 ...
Instead of values like 0xcXXXXXXX
It looks like there is a problem with the MMU or something like that ?
I have seen similar problems on the forum but it did not helped me so much:
If anyone has experienced similar problem please let me know.
Any suggestion is welcomed.
03-16-2009 07:00 AM
When an exception occurs the PPC 405 reverts to real mode (the MMU is disabled). The Linux exception handler immediately re-enables the MMU and handles it.
The addresses you seem to be seeing are in the general area of the Debug Interrupt (0x2000). Are you trying to fix your boot error using a debugger? If so, remember to _only_ use a hardware breakpoint (hbreak in gdb, bps hw in XMD), etc...
If your bootup is dying due to an entirely unexpected exception (you were not using a software debugger), I'd recommend setting a hardware breakpoint at the applicable exception vector and then trying to work backwords to the ultimate cause (this could be as simple as an error parsing the device tree used). Debugging silent crashes before the kernel has finished booting can... enhance your debugging knowledge? Isert favorite euphemism here.
See UG011 (PowerPC Processor Reference Guide) for what registers might be useful to look at when an exception occurs.
You may also find a peek at XAPP1036 handy.
03-16-2009 07:27 AM
Are you using a custom board or a Xilinx board?
If you have Xilinx board (ML403/ML405) then you should get a baseline running easily using the reference design we have on the wiki (http://xilinx.wikidot.com) 1st.
In general we don't test a lot with UART Lite (ttyUL0) that you are using. Our reference designs use the UART 16550. I plan to do some testing with it soon and will see if I see any issues with UART Lite console.
You should also review the wiki as I have a kernel debug section there for common issues that I have seen. http://xilinx.wikidot.com/debugging-kernel-boot-problems
We pull from the git tree every nite and build it and run it on a board (using our standard reference design) so that we know it works.
Your console output could indicate a device tree issue. I would compare your device tree to one that works for differences (like the virtex405-ml405.dts in our tree).
03-25-2009 10:46 PM
Sorry for late reply.
I have a custom board: TB-4V-FX60-PRO from Inrevium. It seems that the problem is related to the MMU since it does not translate well the addresses.
I tried to debug it with xmd. As said before an exception occurs and so that is why I do not see virtual memory values on pc register.
Here is the trace I got using steps "stp" in xmd:
pc: 0000f040: 7d 28 02 a6 mflr r9 <----- this corresponds to : c000f040 <transfer_to_handler_cont>
pc: 0000f044: 81 69 00 00 lwz r11,0(r9)
pc: 0000f048: 81 29 00 04 lwz r9,4(r9)
pc: 0000f04c: 7d 7a 03 a6 mtsrr0 r11
pc: 0000f050: 7d 5b 03 a6 mtsrr1 r10
pc: 0000f054: 7d 28 03 a6 mtlr r9
pc: 0000f058: 4c 00 00 64 rfi
pc: c000d954: 00 00 00 00 .long 0x0 <------ This correspond to : c000d954 <machine_check_exception> but I read 00 00 00 00 instead of 94 21 ff e0
pc: c000d954: 00 00 00 00 .long 0x0
pc: c000d954: 00 00 00 00 .long 0x0
pc: 00001100: 7d 50 43 a6 mtsprg 0,r10 <------ This correspond to : c0001100 <DTLBMiss>
The problems occurs inside <transfer_to handler_cont> function @ 0x0000f040:
The instruction "rfi" is read @ 0x0000f058 which means 'returns from interrupt'.
So pc go back to <machine_check_exception> function @ 0xc000d954. And it stay 3 times at that address before going to the <DLTBMiss> function @ 0x00001100.
I think it had read "00 00 00 00" @ 0xc000d954 wich correspond to the '.long 0x0' assembly code instead of "94 21 ff e0" which correspond to the 'stwu' assembly code.
When I type XMD% mrd 0xc000d954 it returns " C000D954: 00000000"
When I type XMD% mrd 0x0000d954 it returns " D954: 9421FFE0"
As suggested in an other thread: http://forums.xilinx.com/xlnx/board/message?board.id=EDK&message.id=6235&query.id=367949#M6235 and there http://ce.et.tudelft.nl/publicationfiles/1367_700_thesis.pdf, I removed the OCM of my design but it did not solve the problem.
Do you think it can be a problem with my design not matching the kernel configuration of virterx4_defconfig ?
My design only have an uartlite, an ethernetlite, Flash and DDR. It does not have a DMA, nor a FPU, should I had it ?
03-26-2009 09:10 AM
So the issue is that your design is so far from our baseline and it's not clear you started with our baseline.
My experience is that too many changes at one time results in problems that are extremely hard to isolate.
I think you should get a baseline using our ML405 design, matching that design for you board. Once you have it working, then make changes to go towards your goal.
Sorry for that bad news, but my experience says it's still quicker to do it this way.
03-26-2009 09:50 AM
When you see the code executing an instruction 00000000 (which is illegal I think), then it is most likely caused by the linux kernel doing a panic() due to a BUG_ON() macro. This is usually due to something inconsistent in your DTB. Have you gone through the bootlog as suggested by John on the kernel-debugging wiki page? This is a very powerful method.
03-27-2009 03:38 AM
Unfortunately I do not have a ML405 nor any Xilinx board.
The "__log_buf" gives me only garbage... I continued debugging and found that the Machine_check exception appears very early at the boot sequence. It happens just after the "turn_on_mmu" function.
This could explain why every time I try to read with xmd a virtual memory space it give back only 0, even though I read somewhere that it is not possible to access virtual address space while debugging and that could explain also why I always read 0 (but ppc may/do? read the right value).
Anyway I managed to boot petalinux(uclinux) on this board, without fs-boot/u-boot because of problem of detecting the Flash, but directly by downloading the image to the SDRAM.
This is only temporarly since microblaze is so slow and space consuming. I will investigate the Flash problem to see if it is related why Linux does not boot.
04-29-2009 04:21 AM
looks like you have problems because of the PPC errata 213 of Virtex-4 FX.
whats the filename of your device tree?
it needs to be named something like virtex405-<something>.dts, because you need to build the kernel with something like:
to incude the arch/powerpc/boot/virtex405-head.S file.
look into arch/powerpc/boot/wrapper and you will see what I am talking about:
platformo="$object/virtex405-head.o $object/simpleboot.o $object/virtex.o"
btw, it would make sense to change the device tree generator to immedately create "right" filenames.