linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Phil Auld <pauld@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: Boot fails with 59faa4da7cd4 and 3accabda4da1
Date: Fri, 10 Oct 2025 14:42:59 -0400	[thread overview]
Message-ID: <20251010184259.GB436967@pauld.westford.csb> (raw)
In-Reply-To: <e4f2a3e3-649a-423b-9696-6406ef56340f@suse.cz>

On Fri, Oct 10, 2025 at 08:27:30PM +0200 Vlastimil Babka wrote:
> On 10/10/25 20:19, Linus Torvalds wrote:
> > On Fri, 10 Oct 2025 at 08:11, Phil Auld <pauld@redhat.com> wrote:
> >>
> >> After several days of failed boots I've gotten it down to these two
> >> commits.
> >>
> >> 59faa4da7cd4 maple_tree: use percpu sheaves for maple_node_cache
> >> 3accabda4da1 mm, vma: use percpu sheaves for vm_area_struct cache
> >>
> >> The first is such an early failure it's silent. With just 3acca I
> >> get :
> >>
> >> [    9.341152] BUG: kernel NULL pointer dereference, address: 0000000000000040
> >> [    9.348115] #PF: supervisor read access in kernel mode
> >> [    9.353264] #PF: error_code(0x0000) - not-present page
> >> [    9.358413] PGD 0 P4D 0
> >> [    9.360959] Oops: Oops: 0000 [#1] SMP NOPTI
> >> [    9.365154] CPU: 21 UID: 0 PID: 818 Comm: kworker/u398:0 Not tainted 6.17.0-rc3.slab+ #5 PREEMPT(voluntary)
> >> [    9.374982] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.26.0 07/30/2025
> >> [    9.382641] RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0
> >> [    9.388048] Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89
> > 
> > That decodes to
> > 
> >    0:           mov    0x10(%rsi),%rax
> >    4:           mov    0x8(%rsi),%rsi
> >    8:           test   %rax,%rax
> >    b:           je     0x18
> >    d:           mov    0x18(%rax),%ecx
> >   10:           test   %ecx,%ecx
> >   12:           jne    0xfd
> >   18:           movslq %gs:0x250eee4(%rip),%rax
> >   20:           mov    0xe0(%r14,%rax,8),%rax
> >   28:*          mov    0x40(%rax),%r13          <-- trapping instruction
> >   2c:           mov    %r13,%rdi
> >   2f:           call   0xffffffffffff81e4
> >   34:           mov    %rax,%rbp
> >   37:           test   %rax,%rax
> >   3a:           je     0x59
> > 
> > which is the code around that barn_replace_empty_sheaf() call.
> > 
> > In particular, the trapping instruction is from get_barn(), it's the "->barn" in
> > 
> >         return get_node(s, numa_mem_id())->barn;
> > 
> > so it looks like 'get_node()' is returning NULL here:
> > 
> >         return s->node[node];
> > 
> > That 0x250eee4(%rip) is from "get_node()" becoming
> > 
> >   18:           movslq  %gs:numa_node(%rip), %rax  # node
> >   20:           mov    0xe0(%r14,%rax,8),%rax # ->node[node]
> > 
> > instruction, and then that ->barn dereference is the trapping
> > instruction that tries to read node->barn:
> > 
> >   28:*          mov    0x40(%rax),%r13   # node->barn
> > 
> > but I did *not* look into why s->node[node] would be NULL.
> > 
> > Over to you Vlastimil,
> 
> Thanks, yeah will look ASAP. I suspect the "nodes with zero memory" is
> something that might not be handled well in general on x86. I know powerpc
> used to do these kind of setups first and they have some special handling,
> so numa_mem_id() would give you the closest node with memory in there and I
> suspect it's not happening here. CPU 21 is node 6 so it's one of those
> without memory. I'll see if I can simulate this with QEMU and what's the
> most sensible fix
>

Thanks for taking a look.  I thought the NPS4 thing might be playing a role.

I'm happy to take any test/fix code you have for a spin on this system. 

Cheers,
Phil


> >             Linus
> 

-- 



  reply	other threads:[~2025-10-10 18:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-10 15:11 Phil Auld
2025-10-10 18:19 ` Linus Torvalds
2025-10-10 18:27   ` Vlastimil Babka
2025-10-10 18:42     ` Phil Auld [this message]
2025-10-10 22:22       ` Vlastimil Babka
2025-10-11  0:29         ` Phil Auld
2025-10-13 13:09           ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251010184259.GB436967@pauld.westford.csb \
    --to=pauld@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox