Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Florian Weimer <fw@deneb.enyo.de>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
Date: Mon, 7 Oct 2019 15:28:17 +0200	[thread overview]
Message-ID: <2af04718-d5cb-1bb1-a789-be017f2e2df0@suse.cz> (raw)
In-Reply-To: <87lfu4i79z.fsf@mid.deneb.enyo.de>

On 10/1/19 9:40 PM, Florian Weimer wrote:
> * Vlastimil Babka:
> 
>> On 9/30/19 11:17 PM, Dave Chinner wrote:
>>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>>>> * Dave Chinner:
>>>>
>>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>>>> killed because of kernel paging request failure in the XFS code.
>>>>>
>>>>> dmesg output? if the system was still running, then you might be
>>>>> able to pull the trace from syslog. But we can't do much without
>>>>> knowing what the actual failure was....
>>>>
>>>> Huh.  I actually have something in syslog:
>>>>
>>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address:
>>>> 0000000000000000
>>>> [ 4001.238415] #PF: supervisor read access in kernel mode
>>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>>>> [ 4001.238418] PGD 0 P4D 0 
>>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+
>>>> #1
>>>> [ 4001.238424] Hardware name: System manufacturer System Product
>>>> Name/P6X58D-E, BIOS 0701 05/10/2011
>>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>>>
>>> That's memory compaction code it's crashed in.
>>>
>>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0
>>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1
>>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00
>>>> 00 00 4c 89 f7
>>
>> Tried to decode it, but couldn't match it to source code, my version of
>> compiled code is too different. Would it be possible to either send
>> mm/compaction.o from the matching build, or output of 'objdump -d -l'
>> for the __reset_isolation_pfn function?
> 
> See below.  I don't have debuginfo for this build, and the binary does
> not reproduce for some reason.  Due to the heavy inlining, it might be
> quite hard to figure out what's going on.

Thanks, but I'm still not able to "decompile" that in my head.

> I've switched to kernel builds with debuginfo from now on.  I'm
> surprised that it's not the default.

Let's see if you can reproduce it with that.

However, I've noticed at least something weird:

>      37e:	49 8b 16             	mov    (%r14),%rdx
>      381:	48 89 d0             	mov    %rdx,%rax
>      384:	48 c1 ea 35          	shr    $0x35,%rdx
>      388:	48 8b 14 d7          	mov    (%rdi,%rdx,8),%rdx
>      38c:	48 c1 e8 2d          	shr    $0x2d,%rax
>      390:	48 85 d2             	test   %rdx,%rdx
>      393:	74 0a                	je     39f <__reset_isolation_pfn+0x27f>

IIUC, this will jump to 39f when rdx is zero.

>      395:	0f b6 c0             	movzbl %al,%eax
>      398:	48 c1 e0 04          	shl    $0x4,%rax
>      39c:	48 01 c2             	add    %rax,%rdx
>      39f:	48 8b 02             	mov    (%rdx),%rax

And this is where we crash because rdx is zero. So the test+branch might
have sent us directly here to crash. Sounds like an inverted condition
somewhere? Or possibly a result of optimizations.

next prev parent reply	other threads:[~2019-10-07 13:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87pnji8cpw.fsf@mid.deneb.enyo.de>
     [not found] ` <20190930085406.GP16973@dread.disaster.area>
     [not found]   ` <87o8z1fvqu.fsf@mid.deneb.enyo.de>
2019-09-30 21:17     ` Dave Chinner
2019-09-30 21:42       ` Florian Weimer
2019-10-01  9:10       ` Vlastimil Babka
2019-10-01 19:40         ` Florian Weimer
2019-10-07 13:28           ` Vlastimil Babka [this message]
2019-10-07 13:56             ` Vlastimil Babka
2019-10-08  8:52               ` Mel Gorman
2019-10-16 19:38         ` Florian Weimer
2019-10-16 20:03           ` Vlastimil Babka
2019-10-18 17:38             ` Florian Weimer
2019-10-21  8:13               ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2af04718-d5cb-1bb1-a789-be017f2e2df0@suse.cz \
    --to=vbabka@suse.cz \
    --cc=david@fromorbit.com \
    --cc=fw@deneb.enyo.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox