From: Vlastimil Babka <vbabka@suse.cz>
To: Florian Weimer <fweimer@redhat.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
Ralph Campbell <rcampbell@nvidia.com>,
Linux MM <linux-mm@kvack.org>,
longman@redhat.com, Linux API <linux-api@vger.kernel.org>,
Andi Kleen <ak@linux.intel.com>
Subject: Re: No system call to determine MAX_NUMNODES?
Date: Wed, 13 Feb 2019 15:48:25 +0100 [thread overview]
Message-ID: <c4032ba4-4fe6-f591-ee72-6530d449a97c@suse.cz> (raw)
In-Reply-To: <87d0nvepf9.fsf@oldenburg2.str.redhat.com>
On 2/13/19 3:25 PM, Florian Weimer wrote:
> * Vlastimil Babka:
>
>> On 2/7/19 1:27 AM, Alexander Duyck wrote:
>>> On Wed, Feb 6, 2019 at 3:13 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
>>>>
>>>> I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
>>>> a new issue compared to 5.0.0-rc5.
>>>>
>>>> It looks like there is no convenient way to query the kernel's value for
>>>> MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
>>>> 'maxnode' parameter to the GET_MEMPOLICY(2) system call.
>>>> Otherwise, EINVAL is returned.
>>>>
>>>> Searching the internet for get_mempolicy yields some references that
>>>> recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
>>>>
>>>> Running "cat /proc/self/status | grep Mems_allowed:" I get:
>>>> With 5.0.0-rc5:
>>>> Mems_allowed: 00000000,00000001
>>>> With 5.0.0-rc5-mm1:
>>>> Mems_allowed: 1
>>>> (both kernels were config'ed with CONFIG_NODES_SHIFT=6)
>>>>
>>>> Clearly, there should be a better way to query MAX_NUMNODES like
>>>> sysconf(), sysctl(), or libnuma.
>>>
>>> Really we shouldn't need to know that. That just tells us about how
>>> the kernel was built, it doesn't really provide any information about
>>> the layout of the system.
>>>
>>>> I searched for the patch that changed /proc/self/status but didn't find it.
>>>
>>> The patch you are looking for is located at:
>>> http://lkml.kernel.org/r/1545405631-6808-1-git-send-email-longman@redhat.com
>>
>> Hmm looks like libnuma [1] uses that /proc/self/status parsing approach for
>> numa_num_possible_nodes() and it's also mentioned in man numa(3), and comment in
>> code mentions that libcpuset does that as well. I'm afraid we can't just break this.
>
> Oh-oh. This looks utterly broken to me in the face of process
> migration.
MAX_NUMNODES and thus the layout of /proc/self/status is a build-time constant
of the kernel, so it won't change after migration between VM's if that's what
you're asking. CRIU might be affected if restore is done on kernel with
different MAX_NUMNODES.
> Is this used for anything important? Perhaps sizing data structures in
> user space?
libnuma seems to parse it only once and then remembering the result for
everything else, so there shouldn't be e.g. mismatch between buffer alloc and
writing to it.
> Thanks,
> Florian
>
prev parent reply other threads:[~2019-02-13 14:48 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-06 23:13 Ralph Campbell
2019-02-07 0:27 ` Alexander Duyck
2019-02-13 9:26 ` Vlastimil Babka
2019-02-13 14:25 ` Florian Weimer
2019-02-13 14:48 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4032ba4-4fe6-f591-ee72-6530d449a97c@suse.cz \
--to=vbabka@suse.cz \
--cc=ak@linux.intel.com \
--cc=alexander.duyck@gmail.com \
--cc=fweimer@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=rcampbell@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox