Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395
       [not found] <bug-39632-10286@https.bugzilla.kernel.org/>
@ 2011-07-28  0:01 ` Andrew Morton
  2011-07-28  0:23   ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2011-07-28  0:01 UTC (permalink / raw)
  To: linux-mm; +Cc: bugme-daemon, KAMEZAWA Hiroyuki, greenhostnl, Tejun Heo


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 20 Jul 2011 15:25:32 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=39632
> 
>            Summary: kernel BUG at arch/x86/mm/fault.c:395
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.0.0-RC7
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: greenhostnl@gmail.com
>         Regression: No

I think this is a plain old oops in mem_cgroup_charge_statistics(), but
for some reason it's treating the oopsing address as part of the
vmalloc arena.  Perhaps this is what a use-after-free looks like on the
new percpu area implementation?


> 
> This bug is triggered when the cgroup oom-killer is invoked and kills a child
> process in the cgroups hierarchy. It does not happen every time, but sometimes.
> The immediate result is a process hanging in the 'D' state.
> 
> The machine is AMD 64, kernel 3.0.0rc7, running as a paravirtualised Xen guest.
> Cgroups are configured. CONFIG_CGROUP_MEM_RES_CTLR=y (swap not used).
> 
> This kernel has been patched with Daniel Kiper's XEN memory-hotplug-ballooning
> patchset, queued for Linux 3.1, otherwise vanilla. I am unable to determine how
> relevant the patchset is to this problem.
> 
> Bug output follows:
> 
> [426900.196014] Memory cgroup out of memory: Kill process 22433 (php-cgi) score
> 924 or sacrifice child
> [426900.196014] Killed process 22433 (php-cgi) total-vm:289680kB,
> anon-rss:134272kB, file-rss:7136kB
> [426900.218250] ------------[ cut here ]------------
> [426900.218262] kernel BUG at arch/x86/mm/fault.c:395!
> [426900.218268] invalid opcode: 0000 [#1] SMP
> [426900.218276] CPU 0
> [426900.218279] Modules linked in: ipv6 evdev pcspkr xfs exportfs dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod
> [426900.218307]
> [426900.218312] Pid: 22433, comm: php-cgi Not tainted 3.0.0-rc7+ #1
> [426900.218323] RIP: e030:[<ffffffff8135854a>]  [<ffffffff8135854a>]
> vmalloc_fault+0x15a/0x2a0
> [426900.218339] RSP: e02b:ffff8800a53b38c8  EFLAGS: 00010046
> [426900.218345] RAX: 00000000c5cc2000 RBX: ffffe8fffff994e0 RCX:
> ffff880000000ff8
> [426900.218352] RDX: 0000000000000000 RSI: ffff8800c5cc2ff8 RDI:
> 0000000000000000
> [426900.218359] RBP: ffff88003c167e88 R08: 00003ffffffff000 R09:
> ffffffff81505880
> [426900.218367] R10: ffff880000000000 R11: dead000000200200 R12:
> ffffffff814cde88
> [426900.218372] R13: ffff8800a53b39f8 R14: 0000000000000029 R15:
> 0000000000000000
> [426900.218386] FS:  00007ff0228a8720(0000) GS:ffff88003fd61000(0000)
> knlGS:0000000000000000
> [426900.218393] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [426900.218399] CR2: ffffe8fffff994e0 CR3: 000000003c167000 CR4:
> 0000000000000660
> [426900.218407] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [426900.218415] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [426900.218423] Process php-cgi (pid: 22433, threadinfo ffff8800a53b2000, task
> ffff8800bf556aa0)
> [426900.218432] Stack:
> [426900.218436]  ffff8800a53b3fd8 0000000000000001 ffffe8fffff994e0
> 0000000000000002
> [426900.218453]  ffff8800a53b39f8 ffffffff81358bd9 0000000000000060
> ffff8800bf556aa0
> [426900.218467]  ffff88003c2fd180 0000000000000002 0000000000000000
> 0000000200020200
> [426900.218483] Call Trace:
> [426900.218491]  [<ffffffff81358bd9>] ? do_page_fault+0x339/0x4e0
> [426900.218501]  [<ffffffff810b0d64>] ? __alloc_pages_nodemask+0x144/0x860
> [426900.218510]  [<ffffffff81355915>] ? page_fault+0x25/0x30
> [426900.218519]  [<ffffffff810df69a>] ? mem_cgroup_charge_statistics+0x3a/0x60
> [426900.218594]  [<ffffffff810e241d>] ? __mem_cgroup_uncharge_common+0xcd/0x1f0
> [426900.218604]  [<ffffffff810d0068>] ? page_remove_rmap+0x38/0x60
> [426900.218613]  [<ffffffff810c907b>] ? unmap_vmas+0x60b/0x8f0
> [426900.218622]  [<ffffffff810cb608>] ? exit_mmap+0x78/0x110
> [426900.218632]  [<ffffffff81041475>] ? mmput+0x25/0xe0
> [426900.218640]  [<ffffffff81045b45>] ? exit_mm+0x125/0x160
> [426900.218647]  [<ffffffff8104780b>] ? do_exit+0x16b/0x870
> [426900.218655]  [<ffffffff81047f4f>] ? do_group_exit+0x3f/0xb0
> [426900.218667]  [<ffffffff8105524d>] ? get_signal_to_deliver+0x1dd/0x400
> [426900.218676]  [<ffffffff8100a8cd>] ? __switch_to+0x26d/0x350
> [426900.218684]  [<ffffffff8100b360>] ? do_notify_resume+0x100/0x7f0
> [426900.218693]  [<ffffffff810e7b31>] ? vfs_read+0x161/0x180
> [426900.218700]  [<ffffffff8135575c>] ? retint_signal+0x48/0x8c
> [426900.218706] Code: 39 48 85 ff 74 25 ff 14 25 40 99 4d 81 48 89 c2 48 8b 3e
> ff 14 25 40 99 4d 81 4c 21 c2 4c 21 c0 4c 01 d2 4c 01 d0 48 39 c2 74 41 <0f> 0b
> eb fe 0f 0b eb fe 48 89 ef e8 66 d8 ca ff 66 90 e9 67 ff
> [426900.218826] RIP  [<ffffffff8135854a>] vmalloc_fault+0x15a/0x2a0
> [426900.218835]  RSP <ffff8800a53b38c8>
> [426900.218844] ---[ end trace 20f6f5477696edd2 ]---
> [426900.218850] Fixing recursive fault but reboot is needed!
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395
  2011-07-28  0:01 ` [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395 Andrew Morton
@ 2011-07-28  0:23   ` KAMEZAWA Hiroyuki
  2011-07-28  3:03     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 3+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-28  0:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, bugme-daemon, greenhostnl, Tejun Heo

On Wed, 27 Jul 2011 17:01:48 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 20 Jul 2011 15:25:32 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=39632
> > 
> >            Summary: kernel BUG at arch/x86/mm/fault.c:395
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 3.0.0-RC7
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: akpm@linux-foundation.org
> >         ReportedBy: greenhostnl@gmail.com
> >         Regression: No
> 
> I think this is a plain old oops in mem_cgroup_charge_statistics(), but
> for some reason it's treating the oopsing address as part of the
> vmalloc arena.  Perhaps this is what a use-after-free looks like on the
> new percpu area implementation?
> 

> [426900.218491]  [<ffffffff81358bd9>] ? do_page_fault+0x339/0x4e0
> [426900.218501]  [<ffffffff810b0d64>] ? __alloc_pages_nodemask+0x144/0x860
> [426900.218510]  [<ffffffff81355915>] ? page_fault+0x25/0x30
> [426900.218519]  [<ffffffff810df69a>] ? mem_cgroup_charge_statistics+0x3a/0x60

Hmm, touches unmapped vmalloc area and caused OOps.

And yes, mem_cgroup_charge_statistics() touches per-cpu area, which is allocated
in vmalloc() area....

The percpu area is allocated at a cgroup creation and freed at destroy.

I wonder why oom-kill is a trigger for the issue...if there is 
double-free or some other issue, other trouble can be seen...

Thanks,
-Kame









--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395
  2011-07-28  0:23   ` KAMEZAWA Hiroyuki
@ 2011-07-28  3:03     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 3+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-07-28  3:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, bugme-daemon, greenhostnl, Tejun Heo

On Thu, 28 Jul 2011 09:23:33 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Wed, 27 Jul 2011 17:01:48 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Wed, 20 Jul 2011 15:25:32 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=39632
> > > 
> > >            Summary: kernel BUG at arch/x86/mm/fault.c:395
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 3.0.0-RC7
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Other
> > >         AssignedTo: akpm@linux-foundation.org
> > >         ReportedBy: greenhostnl@gmail.com
> > >         Regression: No
> > 
> > I think this is a plain old oops in mem_cgroup_charge_statistics(), but
> > for some reason it's treating the oopsing address as part of the
> > vmalloc arena.  Perhaps this is what a use-after-free looks like on the
> > new percpu area implementation?
> > 
> 
> > [426900.218491]  [<ffffffff81358bd9>] ? do_page_fault+0x339/0x4e0
> > [426900.218501]  [<ffffffff810b0d64>] ? __alloc_pages_nodemask+0x144/0x860
> > [426900.218510]  [<ffffffff81355915>] ? page_fault+0x25/0x30
> > [426900.218519]  [<ffffffff810df69a>] ? mem_cgroup_charge_statistics+0x3a/0x60
> 
> Hmm, touches unmapped vmalloc area and caused OOps.
> 
> And yes, mem_cgroup_charge_statistics() touches per-cpu area, which is allocated
> in vmalloc() area....
> 
> The percpu area is allocated at a cgroup creation and freed at destroy.
> 
> I wonder why oom-kill is a trigger for the issue...if there is 
> double-free or some other issue, other trouble can be seen...
> 

Sorry, I lost another view point.
page_cgroup->mem_cgroup may point a stale memcg.

IIUC, pre_destroy() checks res->usage == 0 before destroy(). So, I think
no page_cgroup points to destroyed cgroup, hmm. I'll check again.



Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-07-28  3:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-39632-10286@https.bugzilla.kernel.org/>
2011-07-28  0:01 ` [Bugme-new] [Bug 39632] New: kernel BUG at arch/x86/mm/fault.c:395 Andrew Morton
2011-07-28  0:23   ` KAMEZAWA Hiroyuki
2011-07-28  3:03     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox