cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
jpl@xiphos.ca
Adventurer
Adventurer
5,162 Views
Registered: ‎10-28-2007

BUG_ON in exit_mmap during full fsck - microblaze

Jump to solution

Gidday there,

 

I don't know how many people have a chance to use fsck at boot using a microblaze, but I have a problem that only occurs when I have to perform a full fsck on a custom block device (custom logic and driver for SD card).  During simple fsck operations and nominal file operations (mounting, reading, writing etc.) everything appears to be fine.  I have a silly test where I reboot 60 seconds after booting up, and eventually the maximum number of mounts is exceeded and a full fsck is performed.

 

This causes the following output and stack trace:

 

 

/dev/q5sda2 was not cleanly unmounted, check forced.
[    3.580000] BUG: failure at mm/mmap.c:2187/exit_mmap()!
[    3.580000] Kernel panic - not syncing: BUG!
[    3.580000] About to dump_stack()
[    3.580000] About to show_stack
[    3.580000] 
[    3.580000] Stack:
[    3.580000]   c6c31d0c 00000001 000017ef c04b3fcc 000017ef c006a27c c02bd0d4 c02e16f0 
[    3.580000]   00000001 000017c7 c04b3fcc 000017c7 c6c31d54 c6c33d4c 00000000 c6c30000 
[    3.580000]   c001578c c02c8fc8 c02e16f0 00000001 00001794 c04b3fcc 00001794 c037de38 
[    3.580000] Call Trace: 
[    3.580000] 
[    3.580000] [<c006a27c>] exit_mmap+0x230/0x248
[    3.580000] [<c001578c>] mmput+0x64/0x120
[    3.580000] [<c001a920>] exit_mm+0x134/0x1a4
[    3.580000] [<c001a914>] exit_mm+0x128/0x1a4
[    3.580000] [<c0037128>] hrtimer_cancel+0x14/0x30
[    3.580000] [<c001c298>] do_exit+0x138/0x778
[    3.580000] [<c001c240>] do_exit+0xe0/0x778
[    3.580000] [<c001c910>] do_group_exit+0x38/0xe8
[    3.580000] [<c0028c20>] dequeue_signal+0x28/0x1ec
[    3.580000] [<c000a904>] do_page_fault+0x2e0/0x4b8
[    3.580000] [<c0028cc0>] dequeue_signal+0xc8/0x1ec
[    3.580000] [<c0065f54>] handle_mm_fault+0x5b4/0x674
[    3.580000] [<c00298b8>] get_signal_to_deliver+0x244/0x408
[    3.580000] [<c0005fc4>] do_signal+0x64/0x27c
[    3.580000] [<c0038274>] up_read+0xc/0x20
[    3.580000] [<c00091b8>] ret_from_exc+0x48/0x1b8
[    3.580000] [<c0009168>] page_fault_instr_trap+0x1f8/0x200
[    3.580000] [<c000f414>] __enqueue_entity+0xc0/0xe8
[    3.580000] [<c001ed04>] do_softirq+0x4c/0x58
[    3.580000] [<c02b1e28>] _raw_spin_lock+0x1c/0x30
[    3.580000] [<c02b1d1c>] _raw_spin_unlock_irq+0xc/0x54
[    3.580000] [<c02af474>] schedule+0x148/0x4dc
[    3.580000] [<c02af53c>] schedule+0x210/0x4dc
[    3.580000] [<c02af448>] schedule+0x11c/0x4dc
[    3.580000] [<c02af4d4>] schedule+0x1a8/0x4dc
[    3.580000] [<c0009514>] ret_from_irq+0x1c/0x48
[    3.580000] [<c00094f0>] irq_call+0x0/0x8
[    3.580000] 
[    3.580000] Panic Timeout: 60
[    3.580000] Rebooting in 60 seconds..
[    3.580000] Machine restart...

 

Please note, in order to get the stack trace and the automatic reboot, I needed to modify the default code from the git repository.  I can post the diffs if anyone is interested.

 

The driver for our logic (and SD cards) had been running just dandy on PPC405, under 2.6.28 for several years.  Unfortunately, the port to microblaze was not a simple re-compile because the block driver API changed before 2.6.33 was released, so I'm dealing with two simultaneous environment changes.  All this to say, the problem may well be in my poorly coded block driver.

 

So, has anyone seen this before?

 

Thanks

 

Joshua

 

0 Kudos
Reply
1 Solution

Accepted Solutions
linnj
Xilinx Employee
Xilinx Employee
6,355 Views
Registered: ‎09-10-2008

Hi Joshua,

 

I don't run a lot of debug all the time, just turning it on when I see something wrong.

 

I don't have any great advice here.

 

Sorry.

View solution in original post

0 Kudos
Reply
5 Replies
linnj
Xilinx Employee
Xilinx Employee
5,159 Views
Registered: ‎09-10-2008

Hi Joshua,

 

I realize this is yet another permutation, but I pushed out 2.6.34 to the public git server yesterday.

 

You could give it a try as arch/microblaze continues to change as stuff matures.

 

Thanks,

John

0 Kudos
Reply
jpl@xiphos.ca
Adventurer
Adventurer
5,150 Views
Registered: ‎10-28-2007

Gidday John,

 

I updated the kernel to 2.6.34 (which meant that I had to deal with other internal changes such as a change to blk_queue_max_sectors (http://lwn.net/Articles/376508/) as well as a change in the behaviour of the of_compatible formatting or parsing.

 

Anyway, this didn't fix my problem.

 

 

/dev/q5sda2 was not cleanly unmounted, check forced.
[    3.370000] BUG: failure at mm/mmap.c:2265/exit_mmap()!
[    3.370000] Kernel panic - not syncing: BUG!
[    3.370000] 
[    3.370000] Stack:
[    3.370000]   c70d1d14 c70b9a00 c70d0000 00000001 c70d1d30 c006a5c8 c02c59cc c04c0af0 
[    3.370000]   00000000 c70d1d54 42554721 00000004 c70d1d54 c70d0000 c0014dc0 c02d1b78 
[    3.370000]   c02ea87c 00000001 0000169f c04c0fb4 0000169f c0389e38 00000089 00000000 
[    3.370000] Call Trace: 
[    3.370000] 
[    3.370000] [<c006a5c8>] exit_mmap+0x230/0x248
[    3.370000] [<c0014dc0>] mmput+0x64/0x120
[    3.370000] [<c0019e30>] exit_mm+0x134/0x1a4
[    3.370000] [<c0019e24>] exit_mm+0x128/0x1a4
[    3.370000] [<c0036828>] hrtimer_cancel+0x14/0x30
[    3.370000] [<c001b7b8>] do_exit+0x138/0x778
[    3.370000] [<c001b760>] do_exit+0xe0/0x778
[    3.370000] [<c001be30>] do_group_exit+0x38/0xe8
[    3.370000] [<c00282d4>] dequeue_signal+0x28/0x1ec
[    3.370000] [<c0009e18>] do_page_fault+0x2e4/0x478
[    3.370000] [<c0028374>] dequeue_signal+0xc8/0x1ec
[    3.370000] [<c0066164>] handle_mm_fault+0x5b4/0x674
[    3.370000] [<c0028f74>] get_signal_to_deliver+0x244/0x408
[    3.370000] [<c0005664>] do_signal+0x64/0x27c
[    3.370000] [<c0037974>] up_read+0xc/0x20
[    3.370000] [<c00085b8>] ret_from_exc+0x48/0x1b8
[    3.370000] [<c0008568>] page_fault_instr_trap+0x1f8/0x200
[    3.370000] [<c000eaf4>] __enqueue_entity+0xc0/0xe8
[    3.370000] [<c001e1fc>] do_softirq+0x4c/0x58
[    3.370000] [<c02bc808>] _raw_spin_lock+0x1c/0x30
[    3.370000] [<c02bc6fc>] _raw_spin_unlock_irq+0xc/0x54
[    3.370000] [<c02b9e44>] schedule+0x148/0x4dc
[    3.370000] [<c02b9f0c>] schedule+0x210/0x4dc
[    3.370000] [<c02b9e18>] schedule+0x11c/0x4dc
[    3.370000] [<c02b9ea4>] schedule+0x1a8/0x4dc
[    3.370000] [<c0008914>] ret_from_irq+0x1c/0x48
[    3.370000] [<c00088f0>] irq_call+0x0/0x8
[    3.370000] 
[    3.370000] Rebooting in 60 seconds..
�    3.370000] Machine restart...
Autobooting ELF Kernel in 4 seconds, press SPACE to stop
4
Initializing SD Card...
Loading ELF Image...
Starting ELF Image...

 

 

Normally (with 2.6.33) it would continue past 'Starting ELF Image...' e.g.

 

 

Starting ELF Image...
[    0.000000] Linux version 2.6.34 (jpl@megatron.xiphos.ca) (gcc version 4.1.2) #216 PREEMPT Wed Jun 23 15:58:12 EDT 2010

 However, something appears to be left over in memory and the system hangs at this point.  If I re-init the FPGA and look at the memory from my bootloader, I see the following.

 

 

[jpl@megatron linux-2.6-xlnx]$ /home/jpl/dev/exnet/Q6/xilinx/microblaze_v1.0/microblaze-unknown-linux-gnu/bin/mb-linux-objdump -t vmlinux |grep __log_buf

c04c0fb4 l     O .bss   00020000 __log_buf


[Q6]> md 0x804c0fb0
804C0FB0: 00000000 3C353E5B 20202020 302E3030    ....<5>[    0.00
804C0FC0: 30303030 5D204C69 6E757820 76657273    0000] Linux vers
804C0FD0: 696F6E20 322E362E 33342028 6A706C40    ion 2.6.34 (jpl@
804C0FE0: 6D656761 74726F6E 2E786970 686F732E    megatron.xiphos.
804C0FF0: 63612920 28676363 20766572 73696F6E    ca) (gcc version
804C1000: 20342E31 2E322920 23323136 20505245     4.1.2) #216 PRE
804C1010: 454D5054 20576564 204A756E 20323320    EMPT Wed Jun 23 
804C1020: 31353A35 383A3132 20454454 20323031    15:58:12 EDT 201
804C1030: 300A3C36 3E5B2020 2020302E 30303030    0.<6>[    0.0000
804C1040: 30305D20 73657475 705F6370 75696E66    00] setup_cpuinf
804C1050: 6F3A2069 6E697469 616C6973 696E670A    o: initialising.
804C1060: 3C343E5B 20202020 302E3030 30303030    <4>[    0.000000
804C1070: 5D207365 7475705F 63707569 6E666F3A    ] setup_cpuinfo:
804C1080: 204E6F20 50565220 73757070 6F72742E     No PVR support.
804C1090: 20557369 6E672073 74617469 63204350     Using static CP
804C10A0: 5520696E 666F2066 726F6D20 4644540A    U info from FDT.
[Q6]> 
804C10B0: 3C363E5B 20202020 302E3030 30303030    <6>[    0.000000
804C10C0: 5D206361 6368653A 2077745F 6D73725F    ] cache: wt_msr_
804C10D0: 6E6F6972 710A0000 00000000 00000000    noirq

 

 

John, what debugging options do you use when you are running/testing your kernels?  Sometimes I run with a few slab debug and memory poisoning options set.  Could this be having an effect?  The kernel output after this should be....

 

 

[    0.000000] setup_memory: max_mapnr: 0x8000
[    0.000000] setup_memory: min_low_pfn: 0x80000
[    0.000000] setup_memory: max_low_pfn: 0x88000

 Thanks in advance for any comments.

 

Joshua

 

 

0 Kudos
Reply
linnj
Xilinx Employee
Xilinx Employee
6,356 Views
Registered: ‎09-10-2008

Hi Joshua,

 

I don't run a lot of debug all the time, just turning it on when I see something wrong.

 

I don't have any great advice here.

 

Sorry.

View solution in original post

0 Kudos
Reply
jpl@xiphos.ca
Adventurer
Adventurer
5,050 Views
Registered: ‎10-28-2007

Gidday John,

 

Thank you, yes, it was the kernel debugging options that were causing the havoc.  I eventually realised this when looking through the source and seeing a note about the Page Table Entries overflowing if the spinlock debugging was enabled without some sort of PTE locking being configured.

 

Anyway, I've turned off the kernel debugging and am able to do my full fsck.ext2.  I still have a reboot problem between 2.6.33 and 2.6.34, but I'll start a new thread about that once I have some useful information to share.

 

Thanks again.

 

Joshua

0 Kudos
Reply
linnj
Xilinx Employee
Xilinx Employee
5,047 Views
Registered: ‎09-10-2008

Great! Thanks for the update.

0 Kudos
Reply