From: Jerome Glisse <jglisse@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH] mm: fix maxnode for mbind(), set_mempolicy() and migrate_pages()
Date: Tue, 23 Jul 2024 09:33:44 -0700 [thread overview]
Message-ID: <CAPTQFZSgNHEE0Ub17=kfF-W64bbfRc4wYijTkG==+XxfgcocOQ@mail.gmail.com> (raw)
In-Reply-To: <0c390494-e6ba-4cde-aace-cd726f2409a1@redhat.com>
On Mon, 22 Jul 2024 at 06:09, David Hildenbrand <david@redhat.com> wrote:
>
> On 20.07.24 19:35, Jerome Glisse wrote:
> > Because maxnode bug there is no way to bind or migrate_pages to the
> > last node in multi-node NUMA system unless you lie about maxnodes
> > when making the mbind, set_mempolicy or migrate_pages syscall.
> >
> > Manpage for those syscall describe maxnodes as the number of bits in
> > the node bitmap ("bit mask of nodes containing up to maxnode bits").
> > Thus if maxnode is n then we expect to have a n bit(s) bitmap which
> > means that the mask of valid bits is ((1 << n) - 1). The get_nodes()
> > decrement lead to the mask being ((1 << (n - 1)) - 1).
> >
> > The three syscalls use a common helper get_nodes() and first things
> > this helper do is decrement maxnode by 1 which leads to using n-1 bits
> > in the provided mask of nodes (see get_bitmap() an helper function to
> > get_nodes()).
> >
> > The lead to two bugs, either the last node in the bitmap provided will
> > not be use in either of the three syscalls, or the syscalls will error
> > out and return EINVAL if the only bit set in the bitmap was the last
> > bit in the mask of nodes (which is ignored because of the bug and an
> > empty mask of nodes is an invalid argument).
> >
> > I am surprised this bug was never caught ... it has been in the kernel
> > since forever.
>
> Let's look at QEMU: backends/hostmem.c
>
> /*
> * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
> * as argument to mbind() due to an old Linux bug (feature?) which
> * cuts off the last specified node. This means backend->host_nodes
> * must have MAX_NODES+1 bits available.
> */
>
> Which means that it's been known for a long time, and the workaround
> seems to be pretty easy.
>
> So I wonder if we rather want to update the documentation to match reality.
[Sorry resending as text ... gmail insanity]
I think it is kind of weird if we ask to supply maxnodes+1 to work
around the bug. If we apply this patch qemu would continue to work as
is while fixing users that were not aware of that bug. So I would say
applying this patch does more good. Long term qemu can drop its
workaround or keep it for backward compatibility with old kernel.
Thank you,
Jérôme
next prev parent reply other threads:[~2024-07-23 16:34 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-20 17:35 Jerome Glisse
2024-07-20 17:55 ` Matthew Wilcox
2024-07-22 21:21 ` Gregory Price
[not found] ` <0c390494-e6ba-4cde-aace-cd726f2409a1@redhat.com>
2024-07-23 16:19 ` Jerome Glisse
2024-07-23 16:33 ` Jerome Glisse [this message]
2024-07-23 17:37 ` David Hildenbrand
2024-07-23 18:24 ` David Hildenbrand
2024-07-24 4:15 ` Jerome Glisse
2024-07-24 6:27 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPTQFZSgNHEE0Ub17=kfF-W64bbfRc4wYijTkG==+XxfgcocOQ@mail.gmail.com' \
--to=jglisse@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox