* Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395
[not found] <bug-39632-10286@https.bugzilla.kernel.org/>
@ 2011-07-28 0:01 ` Andrew Morton
2011-07-28 0:23 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2011-07-28 0:01 UTC (permalink / raw)
To: linux-mm; +Cc: bugme-daemon, KAMEZAWA Hiroyuki, greenhostnl, Tejun Heo
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Wed, 20 Jul 2011 15:25:32 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=39632
>
> Summary: kernel BUG at arch/x86/mm/fault.c:395
> Product: Memory Management
> Version: 2.5
> Kernel Version: 3.0.0-RC7
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> AssignedTo: akpm@linux-foundation.org
> ReportedBy: greenhostnl@gmail.com
> Regression: No
I think this is a plain old oops in mem_cgroup_charge_statistics(), but
for some reason it's treating the oopsing address as part of the
vmalloc arena. Perhaps this is what a use-after-free looks like on the
new percpu area implementation?
>
> This bug is triggered when the cgroup oom-killer is invoked and kills a child
> process in the cgroups hierarchy. It does not happen every time, but sometimes.
> The immediate result is a process hanging in the 'D' state.
>
> The machine is AMD 64, kernel 3.0.0rc7, running as a paravirtualised Xen guest.
> Cgroups are configured. CONFIG_CGROUP_MEM_RES_CTLR=y (swap not used).
>
> This kernel has been patched with Daniel Kiper's XEN memory-hotplug-ballooning
> patchset, queued for Linux 3.1, otherwise vanilla. I am unable to determine how
> relevant the patchset is to this problem.
>
> Bug output follows:
>
> [426900.196014] Memory cgroup out of memory: Kill process 22433 (php-cgi) score
> 924 or sacrifice child
> [426900.196014] Killed process 22433 (php-cgi) total-vm:289680kB,
> anon-rss:134272kB, file-rss:7136kB
> [426900.218250] ------------[ cut here ]------------
> [426900.218262] kernel BUG at arch/x86/mm/fault.c:395!
> [426900.218268] invalid opcode: 0000 [#1] SMP
> [426900.218276] CPU 0
> [426900.218279] Modules linked in: ipv6 evdev pcspkr xfs exportfs dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod
> [426900.218307]
> [426900.218312] Pid: 22433, comm: php-cgi Not tainted 3.0.0-rc7+ #1
> [426900.218323] RIP: e030:[<ffffffff8135854a>] [<ffffffff8135854a>]
> vmalloc_fault+0x15a/0x2a0
> [426900.218339] RSP: e02b:ffff8800a53b38c8 EFLAGS: 00010046
> [426900.218345] RAX: 00000000c5cc2000 RBX: ffffe8fffff994e0 RCX:
> ffff880000000ff8
> [426900.218352] RDX: 0000000000000000 RSI: ffff8800c5cc2ff8 RDI:
> 0000000000000000
> [426900.218359] RBP: ffff88003c167e88 R08: 00003ffffffff000 R09:
> ffffffff81505880
> [426900.218367] R10: ffff880000000000 R11: dead000000200200 R12:
> ffffffff814cde88
> [426900.218372] R13: ffff8800a53b39f8 R14: 0000000000000029 R15:
> 0000000000000000
> [426900.218386] FS: 00007ff0228a8720(0000) GS:ffff88003fd61000(0000)
> knlGS:0000000000000000
> [426900.218393] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [426900.218399] CR2: ffffe8fffff994e0 CR3: 000000003c167000 CR4:
> 0000000000000660
> [426900.218407] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [426900.218415] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [426900.218423] Process php-cgi (pid: 22433, threadinfo ffff8800a53b2000, task
> ffff8800bf556aa0)
> [426900.218432] Stack:
> [426900.218436] ffff8800a53b3fd8 0000000000000001 ffffe8fffff994e0
> 0000000000000002
> [426900.218453] ffff8800a53b39f8 ffffffff81358bd9 0000000000000060
> ffff8800bf556aa0
> [426900.218467] ffff88003c2fd180 0000000000000002 0000000000000000
> 0000000200020200
> [426900.218483] Call Trace:
> [426900.218491] [<ffffffff81358bd9>] ? do_page_fault+0x339/0x4e0
> [426900.218501] [<ffffffff810b0d64>] ? __alloc_pages_nodemask+0x144/0x860
> [426900.218510] [<ffffffff81355915>] ? page_fault+0x25/0x30
> [426900.218519] [<ffffffff810df69a>] ? mem_cgroup_charge_statistics+0x3a/0x60
> [426900.218594] [<ffffffff810e241d>] ? __mem_cgroup_uncharge_common+0xcd/0x1f0
> [426900.218604] [<ffffffff810d0068>] ? page_remove_rmap+0x38/0x60
> [426900.218613] [<ffffffff810c907b>] ? unmap_vmas+0x60b/0x8f0
> [426900.218622] [<ffffffff810cb608>] ? exit_mmap+0x78/0x110
> [426900.218632] [<ffffffff81041475>] ? mmput+0x25/0xe0
> [426900.218640] [<ffffffff81045b45>] ? exit_mm+0x125/0x160
> [426900.218647] [<ffffffff8104780b>] ? do_exit+0x16b/0x870
> [426900.218655] [<ffffffff81047f4f>] ? do_group_exit+0x3f/0xb0
> [426900.218667] [<ffffffff8105524d>] ? get_signal_to_deliver+0x1dd/0x400
> [426900.218676] [<ffffffff8100a8cd>] ? __switch_to+0x26d/0x350
> [426900.218684] [<ffffffff8100b360>] ? do_notify_resume+0x100/0x7f0
> [426900.218693] [<ffffffff810e7b31>] ? vfs_read+0x161/0x180
> [426900.218700] [<ffffffff8135575c>] ? retint_signal+0x48/0x8c
> [426900.218706] Code: 39 48 85 ff 74 25 ff 14 25 40 99 4d 81 48 89 c2 48 8b 3e
> ff 14 25 40 99 4d 81 4c 21 c2 4c 21 c0 4c 01 d2 4c 01 d0 48 39 c2 74 41 <0f> 0b
> eb fe 0f 0b eb fe 48 89 ef e8 66 d8 ca ff 66 90 e9 67 ff
> [426900.218826] RIP [<ffffffff8135854a>] vmalloc_fault+0x15a/0x2a0
> [426900.218835] RSP <ffff8800a53b38c8>
> [426900.218844] ---[ end trace 20f6f5477696edd2 ]---
> [426900.218850] Fixing recursive fault but reboot is needed!
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395
2011-07-28 0:01 ` [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395 Andrew Morton
@ 2011-07-28 0:23 ` KAMEZAWA Hiroyuki
2011-07-28 3:03 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 3+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-28 0:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, bugme-daemon, greenhostnl, Tejun Heo
On Wed, 27 Jul 2011 17:01:48 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:
>
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 20 Jul 2011 15:25:32 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=39632
> >
> > Summary: kernel BUG at arch/x86/mm/fault.c:395
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 3.0.0-RC7
> > Platform: All
> > OS/Version: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Other
> > AssignedTo: akpm@linux-foundation.org
> > ReportedBy: greenhostnl@gmail.com
> > Regression: No
>
> I think this is a plain old oops in mem_cgroup_charge_statistics(), but
> for some reason it's treating the oopsing address as part of the
> vmalloc arena. Perhaps this is what a use-after-free looks like on the
> new percpu area implementation?
>
> [426900.218491] [<ffffffff81358bd9>] ? do_page_fault+0x339/0x4e0
> [426900.218501] [<ffffffff810b0d64>] ? __alloc_pages_nodemask+0x144/0x860
> [426900.218510] [<ffffffff81355915>] ? page_fault+0x25/0x30
> [426900.218519] [<ffffffff810df69a>] ? mem_cgroup_charge_statistics+0x3a/0x60
Hmm, touches unmapped vmalloc area and caused OOps.
And yes, mem_cgroup_charge_statistics() touches per-cpu area, which is allocated
in vmalloc() area....
The percpu area is allocated at a cgroup creation and freed at destroy.
I wonder why oom-kill is a trigger for the issue...if there is
double-free or some other issue, other trouble can be seen...
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395
2011-07-28 0:23 ` KAMEZAWA Hiroyuki
@ 2011-07-28 3:03 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 3+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-28 3:03 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Andrew Morton, linux-mm, bugme-daemon, greenhostnl, Tejun Heo
On Thu, 28 Jul 2011 09:23:33 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 27 Jul 2011 17:01:48 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
> >
> > (switched to email. Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Wed, 20 Jul 2011 15:25:32 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=39632
> > >
> > > Summary: kernel BUG at arch/x86/mm/fault.c:395
> > > Product: Memory Management
> > > Version: 2.5
> > > Kernel Version: 3.0.0-RC7
> > > Platform: All
> > > OS/Version: Linux
> > > Tree: Mainline
> > > Status: NEW
> > > Severity: normal
> > > Priority: P1
> > > Component: Other
> > > AssignedTo: akpm@linux-foundation.org
> > > ReportedBy: greenhostnl@gmail.com
> > > Regression: No
> >
> > I think this is a plain old oops in mem_cgroup_charge_statistics(), but
> > for some reason it's treating the oopsing address as part of the
> > vmalloc arena. Perhaps this is what a use-after-free looks like on the
> > new percpu area implementation?
> >
>
> > [426900.218491] [<ffffffff81358bd9>] ? do_page_fault+0x339/0x4e0
> > [426900.218501] [<ffffffff810b0d64>] ? __alloc_pages_nodemask+0x144/0x860
> > [426900.218510] [<ffffffff81355915>] ? page_fault+0x25/0x30
> > [426900.218519] [<ffffffff810df69a>] ? mem_cgroup_charge_statistics+0x3a/0x60
>
> Hmm, touches unmapped vmalloc area and caused OOps.
>
> And yes, mem_cgroup_charge_statistics() touches per-cpu area, which is allocated
> in vmalloc() area....
>
> The percpu area is allocated at a cgroup creation and freed at destroy.
>
> I wonder why oom-kill is a trigger for the issue...if there is
> double-free or some other issue, other trouble can be seen...
>
Sorry, I lost another view point.
page_cgroup->mem_cgroup may point a stale memcg.
IIUC, pre_destroy() checks res->usage == 0 before destroy(). So, I think
no page_cgroup points to destroyed cgroup, hmm. I'll check again.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-07-28 3:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-39632-10286@https.bugzilla.kernel.org/>
2011-07-28 0:01 ` [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395 Andrew Morton
2011-07-28 0:23 ` KAMEZAWA Hiroyuki
2011-07-28 3:03 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox