linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0
       [not found] ` <20181012155533.2f15a8bb35103aa1fa87962e@linux-foundation.org>
@ 2018-10-12 22:56   ` Andrew Morton
  2018-10-13 12:57     ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2018-10-12 22:56 UTC (permalink / raw)
  To: Vlastimil Babka, bugzilla-daemon, leozinho29_eu; +Cc: linux-mm

(cc linux-mm, argh)

On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> Vlastimil, it looks like your August 21 smaps changes are failing. 
> This one is pretty urgent, please.
> 
> Leonardo (yes?): thanks for reporting.  Very helpful.
> 
> On Thu, 11 Oct 2018 18:13:31 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=201377
> > 
> >             Bug ID: 201377
> >            Summary: Kernel BUG under memory pressure: unable to handle
> >                     kernel NULL pointer dereference at 00000000000000f0
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 4.19-rc7
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: leozinho29_eu@hotmail.com
> >         Regression: No
> > 
> > Created attachment 278997
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=278997&action=edit
> > dmesg and kernel config
> > 
> > I'm using Xubuntu 18.04 and I noticed that under memory pressure the script
> > from https://github.com/pixelb/ps_mem.git (HEAD
> > 1ed0bc5519d889d58235f2c35db01e4ede0d8231is) causing a kernel BUG and locking a
> > CPU. On dmesg the following appears:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
> > 
> > After this BUG the computer performance becomes greatly degraded, some software
> > do not close, some fail to open, some fail to work properly. As an example,
> > bash fails to autocomplete.
> > 
> > Steps to reproduce:
> > 
> > 1) Be under memory pressure. Using dd to write a large file at /dev/shm works
> > for this;
> > 2) Run the script from https://github.com/pixelb/ps_mem.git
> > 
> > Expected result: script will print information and system will keep working
> > normally;
> > 
> > Observed result: script is killed, kernel BUG happens, CPU get stuck and
> > computer presents problems.
> > 
> > I did not observe this with 4.17.19, I'll bisect and see if I can find which
> > commit is causing this.
> > 
> > I'm sorry if I'm reporting to the wrong product and component.
> > 
> > -- 
> > You are receiving this mail because:
> > You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0
  2018-10-12 22:56   ` [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0 Andrew Morton
@ 2018-10-13 12:57     ` Vlastimil Babka
  2018-10-14  7:17       ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2018-10-13 12:57 UTC (permalink / raw)
  To: Andrew Morton, bugzilla-daemon, leozinho29_eu
  Cc: linux-mm, Greg Kroah-Hartman

On 10/13/18 12:56 AM, Andrew Morton wrote:
> (cc linux-mm, argh)
> 
> On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> Vlastimil, it looks like your August 21 smaps changes are failing. 
>> This one is pretty urgent, please.

Thanks, will look in few hours. Glad that there will be rc8...

>> Leonardo (yes?): thanks for reporting.  Very helpful.
>>
>> On Thu, 11 Oct 2018 18:13:31 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=201377
>>>
>>>             Bug ID: 201377
>>>            Summary: Kernel BUG under memory pressure: unable to handle
>>>                     kernel NULL pointer dereference at 00000000000000f0
>>>            Product: Memory Management
>>>            Version: 2.5
>>>     Kernel Version: 4.19-rc7
>>>           Hardware: All
>>>                 OS: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: normal
>>>           Priority: P1
>>>          Component: Other
>>>           Assignee: akpm@linux-foundation.org
>>>           Reporter: leozinho29_eu@hotmail.com
>>>         Regression: No
>>>
>>> Created attachment 278997
>>>   --> https://bugzilla.kernel.org/attachment.cgi?id=278997&action=edit
>>> dmesg and kernel config
>>>
>>> I'm using Xubuntu 18.04 and I noticed that under memory pressure the script
>>> from https://github.com/pixelb/ps_mem.git (HEAD
>>> 1ed0bc5519d889d58235f2c35db01e4ede0d8231is) causing a kernel BUG and locking a
>>> CPU. On dmesg the following appears:
>>>
>>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
>>>
>>> After this BUG the computer performance becomes greatly degraded, some software
>>> do not close, some fail to open, some fail to work properly. As an example,
>>> bash fails to autocomplete.
>>>
>>> Steps to reproduce:
>>>
>>> 1) Be under memory pressure. Using dd to write a large file at /dev/shm works
>>> for this;
>>> 2) Run the script from https://github.com/pixelb/ps_mem.git
>>>
>>> Expected result: script will print information and system will keep working
>>> normally;
>>>
>>> Observed result: script is killed, kernel BUG happens, CPU get stuck and
>>> computer presents problems.
>>>
>>> I did not observe this with 4.17.19, I'll bisect and see if I can find which
>>> commit is causing this.
>>>
>>> I'm sorry if I'm reporting to the wrong product and component.
>>>
>>> -- 
>>> You are receiving this mail because:
>>> You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0
  2018-10-13 12:57     ` Vlastimil Babka
@ 2018-10-14  7:17       ` Vlastimil Babka
  2018-10-14 18:07         ` Leonardo Soares Müller
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2018-10-14  7:17 UTC (permalink / raw)
  To: Andrew Morton, bugzilla-daemon, leozinho29_eu
  Cc: linux-mm, Greg Kroah-Hartman, Daniel Colascione, Alexey Dobriyan

On 10/13/18 2:57 PM, Vlastimil Babka wrote:
> On 10/13/18 12:56 AM, Andrew Morton wrote:
>> (cc linux-mm, argh)
>>
>> On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>>>
>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>> bugzilla web interface).
>>>
>>> Vlastimil, it looks like your August 21 smaps changes are failing. 
>>> This one is pretty urgent, please.
> 
> Thanks, will look in few hours. Glad that there will be rc8...

I think I found it, and it seems the bug was there all the time for smaps_rollup.
Dunno why it was hit only now. Please test?

----8<----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0
  2018-10-14  7:17       ` Vlastimil Babka
@ 2018-10-14 18:07         ` Leonardo Soares Müller
  2018-10-14 20:14           ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: Leonardo Soares Müller @ 2018-10-14 18:07 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, bugzilla-daemon
  Cc: linux-mm, Greg Kroah-Hartman, Daniel Colascione, Alexey Dobriyan

This patch applied on 4.19-rc7 corrected the problem to me and the
script is no longer triggering the kernel bug.

I completely skipped 4.18 because there were multiple regressions
affecting my computer. 4.19-rc6 and 4.19-rc7 have most regressions fixed
but then this issue appeared.

The first kernel version released I found with this problem is 4.18-rc4,
but bisecting between 4.18-rc3 and 4.18-rc4 failed: on boot there was
one message starting with [UNSUPP] and with something about "Arbitrary
File System".

Em 14/10/2018 04:17, Vlastimil Babka escreveu:
> On 10/13/18 2:57 PM, Vlastimil Babka wrote:
>> On 10/13/18 12:56 AM, Andrew Morton wrote:
>>> (cc linux-mm, argh)
>>>
>>> On Fri, 12 Oct 2018 15:55:33 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>>>
>>>>
>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>>> bugzilla web interface).
>>>>
>>>> Vlastimil, it looks like your August 21 smaps changes are failing. 
>>>> This one is pretty urgent, please.
>>
>> Thanks, will look in few hours. Glad that there will be rc8...
> 
> I think I found it, and it seems the bug was there all the time for smaps_rollup.
> Dunno why it was hit only now. Please test?
> 
> ----8<----
> From 948be25ee1bdddca8244d1a055fbf812022571e7 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Sun, 14 Oct 2018 08:59:44 +0200
> Subject: [PATCH] mm: /proc/pid/smaps_rollup: fix NULL pointer deref in
>  smaps_pte_range
> 
> Leonardo reports an apparent regression in 4.19-rc7:
> 
>  BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0
>  PGD 0 P4D 0
>  Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 3 PID: 6032 Comm: python Not tainted 4.19.0-041900rc7-lowlatency #201810071631
>  Hardware name: LENOVO 80UG/Toronto 4A2, BIOS 0XCN45WW 08/09/2018
>  RIP: 0010:smaps_pte_range+0x32d/0x540
>  Code: 80 00 00 00 00 74 a9 48 89 de 41 f6 40 52 40 0f 85 04 02 00 00 49 2b 30 48 c1 ee 0c 49 03 b0 98 00 00 00 49 8b 80 a0 00 00 00 <48> 8b b8 f0 00 00 00 e8 b7 ef ec ff 48 85 c0 0f 84 71 ff ff ff a8
>  RSP: 0018:ffffb0cbc484fb88 EFLAGS: 00010202
>  RAX: 0000000000000000 RBX: 0000560ddb9e9000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: 0000000560ddb9e9 RDI: 0000000000000001
>  RBP: ffffb0cbc484fbc0 R08: ffff94a5a227a578 R09: ffff94a5a227a578
>  R10: 0000000000000000 R11: 0000560ddbbe7000 R12: ffffe903098ba728
>  R13: ffffb0cbc484fc78 R14: ffffb0cbc484fcf8 R15: ffff94a5a2e9cf48
>  FS:  00007f6dfb683740(0000) GS:ffff94a5aaf80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000000000f0 CR3: 000000011c118001 CR4: 00000000003606e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   __walk_page_range+0x3c2/0x6f0
>   walk_page_vma+0x42/0x60
>   smap_gather_stats+0x79/0xe0
>   ? gather_pte_stats+0x320/0x320
>   ? gather_hugetlb_stats+0x70/0x70
>   show_smaps_rollup+0xcd/0x1c0
>   seq_read+0x157/0x400
>   __vfs_read+0x3a/0x180
>   ? security_file_permission+0x93/0xc0
>   ? security_file_permission+0x93/0xc0
>   vfs_read+0x8f/0x140
>   ksys_read+0x55/0xc0
>   __x64_sys_read+0x1a/0x20
>   do_syscall_64+0x5a/0x110
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Decoded code matched to local compilation+disassembly points to
> smaps_pte_entry():
> 
>         } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
>                                                         && pte_none(*pte))) {
>                 page = find_get_entry(vma->vm_file->f_mapping,
>                                                 linear_page_index(vma, addr));
> 
> Here, vma->vm_file is NULL. mss->check_shmem_swap should be false in that case,
> however for smaps_rollup, smap_gather_stats() can set the flag true for one vma
> and leave it true for subsequent vma's where it should be false.
> 
> To fix, reset the check_shmem_swap flag to false. There's also related bug
> which sets mss->swap to shmem_swapped, which in the context of smaps_rollup
> overwrites any value accumulated from previous vma's. Fix that as well.
> 
> Note that the report suggests a regression between 4.17.19 and 4.19-rc7,
> which makes the 4.19 series ending with commit 258f669e7e88 ("mm:
> /proc/pid/smaps_rollup: convert to single value seq_file") suspicious. But the
> mss was reused for rollup since 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
> so let's play it safe with the stable backport.
> 
> Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201377
> Reported-by: Leonardo Mueller <leozinho29_eu@hotmail.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: <stable@vger.kernel.org>
> ---
>  fs/proc/task_mmu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 5ea1d64cb0b4..a027473561c6 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -713,6 +713,8 @@ static void smap_gather_stats(struct vm_area_struct *vma,
>  	smaps_walk.private = mss;
>  
>  #ifdef CONFIG_SHMEM
> +	/* In case of smaps_rollup, reset the value from previous vma */
> +	mss->check_shmem_swap = false;
>  	if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) {
>  		/*
>  		 * For shared or readonly shmem mappings we know that all
> @@ -728,7 +730,7 @@ static void smap_gather_stats(struct vm_area_struct *vma,
>  
>  		if (!shmem_swapped || (vma->vm_flags & VM_SHARED) ||
>  					!(vma->vm_flags & VM_WRITE)) {
> -			mss->swap = shmem_swapped;
> +			mss->swap += shmem_swapped;
>  		} else {
>  			mss->check_shmem_swap = true;
>  			smaps_walk.pte_hole = smaps_pte_hole;
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0
  2018-10-14 18:07         ` Leonardo Soares Müller
@ 2018-10-14 20:14           ` Vlastimil Babka
  2018-10-14 22:07             ` Leonardo Soares Müller
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2018-10-14 20:14 UTC (permalink / raw)
  To: Leonardo Soares Müller, Andrew Morton, bugzilla-daemon
  Cc: linux-mm, Greg Kroah-Hartman, Daniel Colascione, Alexey Dobriyan

On 10/14/18 8:07 PM, Leonardo Soares MA 1/4 ller wrote:
> This patch applied on 4.19-rc7 corrected the problem to me and the
> script is no longer triggering the kernel bug.

Great! Can we add your Tested-by: then?

> I completely skipped 4.18 because there were multiple regressions
> affecting my computer. 4.19-rc6 and 4.19-rc7 have most regressions fixed
> but then this issue appeared.
> 
> The first kernel version released I found with this problem is 4.18-rc4,

OK, that confirms the smaps_rollup problem is indeed older than my
rewrite. Unless it's a typo and you mean 4.19-rc4 since you "skipped 4.18".

> but bisecting between 4.18-rc3 and 4.18-rc4 failed: on boot there was
> one message starting with [UNSUPP] and with something about "Arbitrary
> File System".
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0
  2018-10-14 20:14           ` Vlastimil Babka
@ 2018-10-14 22:07             ` Leonardo Soares Müller
  0 siblings, 0 replies; 6+ messages in thread
From: Leonardo Soares Müller @ 2018-10-14 22:07 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, bugzilla-daemon
  Cc: linux-mm, Greg Kroah-Hartman, Daniel Colascione, Alexey Dobriyan

I meant eighteen, this is right. While I skipped 4.18 for normal use, to
do tests when this issue appeared I tested with 4.18 too and noticed
that since 4.18-rc4 the issue exist.

Yes, you can add me to Tested-by, as this patch solved the issue to me:
no problems with kernel and the script runs normally. Thank you.

Em 14/10/2018 17:14, Vlastimil Babka escreveu:
> On 10/14/18 8:07 PM, Leonardo Soares Müller wrote:
>> This patch applied on 4.19-rc7 corrected the problem to me and the
>> script is no longer triggering the kernel bug.
> 
> Great! Can we add your Tested-by: then?
> 
>> I completely skipped 4.18 because there were multiple regressions
>> affecting my computer. 4.19-rc6 and 4.19-rc7 have most regressions fixed
>> but then this issue appeared.
>>
>> The first kernel version released I found with this problem is 4.18-rc4,
> 
> OK, that confirms the smaps_rollup problem is indeed older than my
> rewrite. Unless it's a typo and you mean 4.19-rc4 since you "skipped 4.18".
> 
>> but bisecting between 4.18-rc3 and 4.18-rc4 failed: on boot there was
>> one message starting with [UNSUPP] and with something about "Arbitrary
>> File System".
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-14 22:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-201377-27@https.bugzilla.kernel.org/>
     [not found] ` <20181012155533.2f15a8bb35103aa1fa87962e@linux-foundation.org>
2018-10-12 22:56   ` [Bug 201377] New: Kernel BUG under memory pressure: unable to handle kernel NULL pointer dereference at 00000000000000f0 Andrew Morton
2018-10-13 12:57     ` Vlastimil Babka
2018-10-14  7:17       ` Vlastimil Babka
2018-10-14 18:07         ` Leonardo Soares Müller
2018-10-14 20:14           ` Vlastimil Babka
2018-10-14 22:07             ` Leonardo Soares Müller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox