* kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck
@ 2014-03-17 9:19 Xishi Qiu
2014-03-17 9:51 ` Michal Hocko
2014-03-17 10:15 ` Peter Zijlstra
0 siblings, 2 replies; 5+ messages in thread
From: Xishi Qiu @ 2014-03-17 9:19 UTC (permalink / raw)
To: Andrew Morton, vegard.nossum, Pekka Enberg, Peter Zijlstra,
David Rientjes, Vegard Nossum
Cc: Linux MM, LKML, Xishi Qiu, Li Zefan
OS boot failed when set cmdline kmemcheck=1. The reason is that
NMI handlers will access the memory from kmalloc(), this will cause
page fault, because memory from kmalloc() is tracked by kmemcheck.
watchdog_nmi_enable()
perf_event_create_kernel_counter()
perf_event_alloc()
event = kzalloc(sizeof(*event), GFP_KERNEL);
Now we don't support page faults in NMI context is that we
may already be handling an existing fault (or trap) when the NMI hits.
So that would mess up kmemcheck's working state.
Here is the failed log:
[ 1.731052] WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:634 k
memcheck_fault+0xb1/0xc0()
[ 1.731053] Modules linked in:
[ 1.731056] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3-0.1-default+
#1
[ 1.731057] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285
/BC11BTSA , BIOS CTSAV036 04/27/2011
[ 1.731061] 000000000000027a ffff880c39c07678 ffffffff814ca491 ffff880c39c07
6b8
[ 1.731063] ffffffff8104ce97 0000000000000000 ffff880c39c07838 ffff880c21028
1d4
[ 1.731065] 0000000000000000 0000000000000000 ffff880c210281d4 ffff880c39c07
6c8
[ 1.731065] Call Trace:
[ 1.731073] <NMI> [<ffffffff814ca491>] dump_stack+0x6a/0x79
[ 1.731077] [<ffffffff8104ce97>] warn_slowpath_common+0x87/0xb0
[ 1.731079] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
[ 1.731081] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
[ 1.731087] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
[ 1.731092] [<ffffffff81272cd2>] ? put_dec+0x72/0x90
[ 1.731093] [<ffffffff812730ba>] ? number+0x33a/0x360
[ 1.731096] [<ffffffff814d2829>] do_page_fault+0x9/0x10
[ 1.731098] [<ffffffff814cf222>] page_fault+0x22/0x30
[ 1.731104] [<ffffffff81348b4c>] ? vt_console_print+0x8c/0x400
[ 1.731106] [<ffffffff81348b2c>] ? vt_console_print+0x6c/0x400
[ 1.731111] [<ffffffff8109cd9b>] ? msg_print_text+0x18b/0x1f0
[ 1.731113] [<ffffffff8109bed1>] call_console_drivers+0xc1/0xe0
[ 1.731115] [<ffffffff8109d746>] console_unlock+0x236/0x280
[ 1.731117] [<ffffffff8109e095>] vprintk_emit+0x2b5/0x450
[ 1.731119] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
[ 1.731120] [<ffffffff814ca3f7>] printk+0x4a/0x4c
[ 1.731122] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
[ 1.731124] [<ffffffff8104ce4e>] warn_slowpath_common+0x3e/0xb0
[ 1.731126] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
[ 1.731128] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
[ 1.731130] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
[ 1.731132] [<ffffffff814d2829>] do_page_fault+0x9/0x10
[ 1.731134] [<ffffffff814cf222>] page_fault+0x22/0x30
[ 1.731138] [<ffffffff81015b52>] ? x86_perf_event_update+0x2/0x70
[ 1.731142] [<ffffffff8101de21>] ? intel_pmu_save_and_restart+0x11/0x50
[ 1.731144] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
[ 1.731146] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
[ 1.731148] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
[ 1.731150] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
[ 1.731151] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
Another NMI handler which from CONFIG_ACPI_APEI_GHES=y, has the same problem too.
ghes_probe()
register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
I find it is not easy to change, because:
e.g.
ghes_ioremap_init()
ghes_ioremap_area = __get_vm_area() -> it will call kmalloc() at last, and we
can not change the general interface.
And we can not use kmem_cache_alloc()(create a new slab with SLAB_NOTRACK) instead of
kmalloc() when the size is variable.
Thanks,
Xishi Qiu
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck
2014-03-17 9:19 kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck Xishi Qiu
@ 2014-03-17 9:51 ` Michal Hocko
2014-03-17 9:55 ` Vegard Nossum
2014-03-17 10:15 ` Peter Zijlstra
1 sibling, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2014-03-17 9:51 UTC (permalink / raw)
To: Xishi Qiu
Cc: Andrew Morton, vegard.nossum, Pekka Enberg, Peter Zijlstra,
David Rientjes, Vegard Nossum, Linux MM, LKML, Li Zefan
On Mon 17-03-14 17:19:33, Xishi Qiu wrote:
> OS boot failed when set cmdline kmemcheck=1. The reason is that
> NMI handlers will access the memory from kmalloc(), this will cause
> page fault, because memory from kmalloc() is tracked by kmemcheck.
>
> watchdog_nmi_enable()
> perf_event_create_kernel_counter()
> perf_event_alloc()
> event = kzalloc(sizeof(*event), GFP_KERNEL);
Where is this path called from an NMI context?
Your trace bellow points at something else and it doesn't seem to
allocate any memory either. It looks more like x86_perf_event_update
sees an invalid perf_event or something like that...
> Now we don't support page faults in NMI context is that we
> may already be handling an existing fault (or trap) when the NMI hits.
> So that would mess up kmemcheck's working state.
>
> Here is the failed log:
> [ 1.731052] WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:634 k
> memcheck_fault+0xb1/0xc0()
> [ 1.731053] Modules linked in:
> [ 1.731056] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3-0.1-default+
> #1
> [ 1.731057] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285
> /BC11BTSA , BIOS CTSAV036 04/27/2011
> [ 1.731061] 000000000000027a ffff880c39c07678 ffffffff814ca491 ffff880c39c07
> 6b8
> [ 1.731063] ffffffff8104ce97 0000000000000000 ffff880c39c07838 ffff880c21028
> 1d4
> [ 1.731065] 0000000000000000 0000000000000000 ffff880c210281d4 ffff880c39c07
> 6c8
> [ 1.731065] Call Trace:
> [ 1.731073] <NMI> [<ffffffff814ca491>] dump_stack+0x6a/0x79
> [ 1.731077] [<ffffffff8104ce97>] warn_slowpath_common+0x87/0xb0
> [ 1.731079] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
> [ 1.731081] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
> [ 1.731087] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
> [ 1.731092] [<ffffffff81272cd2>] ? put_dec+0x72/0x90
> [ 1.731093] [<ffffffff812730ba>] ? number+0x33a/0x360
> [ 1.731096] [<ffffffff814d2829>] do_page_fault+0x9/0x10
> [ 1.731098] [<ffffffff814cf222>] page_fault+0x22/0x30
> [ 1.731104] [<ffffffff81348b4c>] ? vt_console_print+0x8c/0x400
> [ 1.731106] [<ffffffff81348b2c>] ? vt_console_print+0x6c/0x400
> [ 1.731111] [<ffffffff8109cd9b>] ? msg_print_text+0x18b/0x1f0
> [ 1.731113] [<ffffffff8109bed1>] call_console_drivers+0xc1/0xe0
> [ 1.731115] [<ffffffff8109d746>] console_unlock+0x236/0x280
> [ 1.731117] [<ffffffff8109e095>] vprintk_emit+0x2b5/0x450
> [ 1.731119] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
> [ 1.731120] [<ffffffff814ca3f7>] printk+0x4a/0x4c
> [ 1.731122] [<ffffffff810452c1>] ? kmemcheck_fault+0xb1/0xc0
> [ 1.731124] [<ffffffff8104ce4e>] warn_slowpath_common+0x3e/0xb0
> [ 1.731126] [<ffffffff8104ced5>] warn_slowpath_null+0x15/0x20
> [ 1.731128] [<ffffffff810452c1>] kmemcheck_fault+0xb1/0xc0
> [ 1.731130] [<ffffffff814d262b>] __do_page_fault+0x39b/0x4c0
> [ 1.731132] [<ffffffff814d2829>] do_page_fault+0x9/0x10
> [ 1.731134] [<ffffffff814cf222>] page_fault+0x22/0x30
> [ 1.731138] [<ffffffff81015b52>] ? x86_perf_event_update+0x2/0x70
> [ 1.731142] [<ffffffff8101de21>] ? intel_pmu_save_and_restart+0x11/0x50
> [ 1.731144] [<ffffffff8101eb02>] intel_pmu_handle_irq+0x142/0x3a0
> [ 1.731146] [<ffffffff814d0655>] perf_event_nmi_handler+0x35/0x60
> [ 1.731148] [<ffffffff814cfe83>] nmi_handle+0x63/0x150
> [ 1.731150] [<ffffffff814cffd3>] default_do_nmi+0x63/0x290
> [ 1.731151] [<ffffffff814d02a8>] do_nmi+0xa8/0xe0
>
> Another NMI handler which from CONFIG_ACPI_APEI_GHES=y, has the same problem too.
> ghes_probe()
> register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
>
> I find it is not easy to change, because:
> e.g.
> ghes_ioremap_init()
> ghes_ioremap_area = __get_vm_area() -> it will call kmalloc() at last, and we
> can not change the general interface.
>
> And we can not use kmem_cache_alloc()(create a new slab with SLAB_NOTRACK) instead of
> kmalloc() when the size is variable.
>
> Thanks,
> Xishi Qiu
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck
2014-03-17 9:51 ` Michal Hocko
@ 2014-03-17 9:55 ` Vegard Nossum
2014-03-17 10:42 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Vegard Nossum @ 2014-03-17 9:55 UTC (permalink / raw)
To: Michal Hocko, Xishi Qiu
Cc: Andrew Morton, Pekka Enberg, Peter Zijlstra, David Rientjes,
Vegard Nossum, Linux MM, LKML, Li Zefan
On 03/17/2014 10:51 AM, Michal Hocko wrote:
> On Mon 17-03-14 17:19:33, Xishi Qiu wrote:
>> OS boot failed when set cmdline kmemcheck=1. The reason is that
>> NMI handlers will access the memory from kmalloc(), this will cause
>> page fault, because memory from kmalloc() is tracked by kmemcheck.
>>
>> watchdog_nmi_enable()
>> perf_event_create_kernel_counter()
>> perf_event_alloc()
>> event = kzalloc(sizeof(*event), GFP_KERNEL);
>
> Where is this path called from an NMI context?
>
> Your trace bellow points at something else and it doesn't seem to
> allocate any memory either. It looks more like x86_perf_event_update
> sees an invalid perf_event or something like that...
>
It's not important that the kzalloc() is called from NMI context, it's
important that the memory that was allocated is touched (read/written)
from NMI context.
I'm currently looking into the possibility of handling recursive faults
in kmemcheck (using the approach outlined by peterz; see
https://lkml.org/lkml/2014/2/26/141).
Vegard
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck
2014-03-17 9:19 kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck Xishi Qiu
2014-03-17 9:51 ` Michal Hocko
@ 2014-03-17 10:15 ` Peter Zijlstra
1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2014-03-17 10:15 UTC (permalink / raw)
To: Xishi Qiu
Cc: Andrew Morton, vegard.nossum, Pekka Enberg, David Rientjes,
Vegard Nossum, Linux MM, LKML, Li Zefan
On Mon, Mar 17, 2014 at 05:19:33PM +0800, Xishi Qiu wrote:
> Now we don't support page faults in NMI context is that we
> may already be handling an existing fault (or trap) when the NMI hits.
> So that would mess up kmemcheck's working state.
I think it was suggested earlier that kmemcheck could maybe have a stack
of states.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck
2014-03-17 9:55 ` Vegard Nossum
@ 2014-03-17 10:42 ` Michal Hocko
0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2014-03-17 10:42 UTC (permalink / raw)
To: Vegard Nossum
Cc: Xishi Qiu, Andrew Morton, Pekka Enberg, Peter Zijlstra,
David Rientjes, Vegard Nossum, Linux MM, LKML, Li Zefan
On Mon 17-03-14 10:55:28, Vegard Nossum wrote:
> On 03/17/2014 10:51 AM, Michal Hocko wrote:
> >On Mon 17-03-14 17:19:33, Xishi Qiu wrote:
> >>OS boot failed when set cmdline kmemcheck=1. The reason is that
> >>NMI handlers will access the memory from kmalloc(), this will cause
> >>page fault, because memory from kmalloc() is tracked by kmemcheck.
> >>
> >>watchdog_nmi_enable()
> >> perf_event_create_kernel_counter()
> >> perf_event_alloc()
> >> event = kzalloc(sizeof(*event), GFP_KERNEL);
> >
> >Where is this path called from an NMI context?
> >
> >Your trace bellow points at something else and it doesn't seem to
> >allocate any memory either. It looks more like x86_perf_event_update
> >sees an invalid perf_event or something like that...
> >
>
> It's not important that the kzalloc() is called from NMI context, it's
> important that the memory that was allocated is touched (read/written) from
> NMI context.
OK, I see. I thought that kzalloc already touches that memory but my
knowledge of kmemcheck is basically zero...
Anyway, sorry for the noise.
> I'm currently looking into the possibility of handling recursive faults in
> kmemcheck (using the approach outlined by peterz; see
> https://lkml.org/lkml/2014/2/26/141).
>
>
> Vegard
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-03-17 10:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-17 9:19 kmemcheck: OS boot failed because NMI handlers access the memory tracked by kmemcheck Xishi Qiu
2014-03-17 9:51 ` Michal Hocko
2014-03-17 9:55 ` Vegard Nossum
2014-03-17 10:42 ` Michal Hocko
2014-03-17 10:15 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox