* No system call to determine MAX_NUMNODES?
@ 2019-02-06 23:13 Ralph Campbell
2019-02-07 0:27 ` Alexander Duyck
0 siblings, 1 reply; 5+ messages in thread
From: Ralph Campbell @ 2019-02-06 23:13 UTC (permalink / raw)
To: Linux MM
I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
a new issue compared to 5.0.0-rc5.
It looks like there is no convenient way to query the kernel's value for
MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
'maxnode' parameter to the GET_MEMPOLICY(2) system call.
Otherwise, EINVAL is returned.
Searching the internet for get_mempolicy yields some references that
recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
Running "cat /proc/self/status | grep Mems_allowed:" I get:
With 5.0.0-rc5:
Mems_allowed: 00000000,00000001
With 5.0.0-rc5-mm1:
Mems_allowed: 1
(both kernels were config'ed with CONFIG_NODES_SHIFT=6)
Clearly, there should be a better way to query MAX_NUMNODES like
sysconf(), sysctl(), or libnuma.
I searched for the patch that changed /proc/self/status but didn't find it.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: No system call to determine MAX_NUMNODES?
2019-02-06 23:13 No system call to determine MAX_NUMNODES? Ralph Campbell
@ 2019-02-07 0:27 ` Alexander Duyck
2019-02-13 9:26 ` Vlastimil Babka
0 siblings, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2019-02-07 0:27 UTC (permalink / raw)
To: Ralph Campbell; +Cc: Linux MM, longman
On Wed, Feb 6, 2019 at 3:13 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
>
> I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
> a new issue compared to 5.0.0-rc5.
>
> It looks like there is no convenient way to query the kernel's value for
> MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
> 'maxnode' parameter to the GET_MEMPOLICY(2) system call.
> Otherwise, EINVAL is returned.
>
> Searching the internet for get_mempolicy yields some references that
> recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
>
> Running "cat /proc/self/status | grep Mems_allowed:" I get:
> With 5.0.0-rc5:
> Mems_allowed: 00000000,00000001
> With 5.0.0-rc5-mm1:
> Mems_allowed: 1
> (both kernels were config'ed with CONFIG_NODES_SHIFT=6)
>
> Clearly, there should be a better way to query MAX_NUMNODES like
> sysconf(), sysctl(), or libnuma.
Really we shouldn't need to know that. That just tells us about how
the kernel was built, it doesn't really provide any information about
the layout of the system.
> I searched for the patch that changed /proc/self/status but didn't find it.
The patch you are looking for is located at:
http://lkml.kernel.org/r/1545405631-6808-1-git-send-email-longman@redhat.com
I wonder if we shouldn't look at modifying kernel_get_mempolicy and
the compat call to test for nr_node_ids instead of MAX_NUMNODES since
the rest of the data would be useless anyway.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: No system call to determine MAX_NUMNODES?
2019-02-07 0:27 ` Alexander Duyck
@ 2019-02-13 9:26 ` Vlastimil Babka
2019-02-13 14:25 ` Florian Weimer
0 siblings, 1 reply; 5+ messages in thread
From: Vlastimil Babka @ 2019-02-13 9:26 UTC (permalink / raw)
To: Alexander Duyck, Ralph Campbell; +Cc: Linux MM, longman, Linux API, Andi Kleen
On 2/7/19 1:27 AM, Alexander Duyck wrote:
> On Wed, Feb 6, 2019 at 3:13 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
>>
>> I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
>> a new issue compared to 5.0.0-rc5.
>>
>> It looks like there is no convenient way to query the kernel's value for
>> MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
>> 'maxnode' parameter to the GET_MEMPOLICY(2) system call.
>> Otherwise, EINVAL is returned.
>>
>> Searching the internet for get_mempolicy yields some references that
>> recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
>>
>> Running "cat /proc/self/status | grep Mems_allowed:" I get:
>> With 5.0.0-rc5:
>> Mems_allowed: 00000000,00000001
>> With 5.0.0-rc5-mm1:
>> Mems_allowed: 1
>> (both kernels were config'ed with CONFIG_NODES_SHIFT=6)
>>
>> Clearly, there should be a better way to query MAX_NUMNODES like
>> sysconf(), sysctl(), or libnuma.
>
> Really we shouldn't need to know that. That just tells us about how
> the kernel was built, it doesn't really provide any information about
> the layout of the system.
>
>> I searched for the patch that changed /proc/self/status but didn't find it.
>
> The patch you are looking for is located at:
> http://lkml.kernel.org/r/1545405631-6808-1-git-send-email-longman@redhat.com
Hmm looks like libnuma [1] uses that /proc/self/status parsing approach for
numa_num_possible_nodes() and it's also mentioned in man numa(3), and comment in
code mentions that libcpuset does that as well. I'm afraid we can't just break this.
> I wonder if we shouldn't look at modifying kernel_get_mempolicy and
> the compat call to test for nr_node_ids instead of MAX_NUMNODES since
> the rest of the data would be useless anyway.
>
[1] https://github.com/numactl/numactl/blob/master/libnuma.c
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: No system call to determine MAX_NUMNODES?
2019-02-13 9:26 ` Vlastimil Babka
@ 2019-02-13 14:25 ` Florian Weimer
2019-02-13 14:48 ` Vlastimil Babka
0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2019-02-13 14:25 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Alexander Duyck, Ralph Campbell, Linux MM, longman, Linux API,
Andi Kleen
* Vlastimil Babka:
> On 2/7/19 1:27 AM, Alexander Duyck wrote:
>> On Wed, Feb 6, 2019 at 3:13 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
>>>
>>> I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
>>> a new issue compared to 5.0.0-rc5.
>>>
>>> It looks like there is no convenient way to query the kernel's value for
>>> MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
>>> 'maxnode' parameter to the GET_MEMPOLICY(2) system call.
>>> Otherwise, EINVAL is returned.
>>>
>>> Searching the internet for get_mempolicy yields some references that
>>> recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
>>>
>>> Running "cat /proc/self/status | grep Mems_allowed:" I get:
>>> With 5.0.0-rc5:
>>> Mems_allowed: 00000000,00000001
>>> With 5.0.0-rc5-mm1:
>>> Mems_allowed: 1
>>> (both kernels were config'ed with CONFIG_NODES_SHIFT=6)
>>>
>>> Clearly, there should be a better way to query MAX_NUMNODES like
>>> sysconf(), sysctl(), or libnuma.
>>
>> Really we shouldn't need to know that. That just tells us about how
>> the kernel was built, it doesn't really provide any information about
>> the layout of the system.
>>
>>> I searched for the patch that changed /proc/self/status but didn't find it.
>>
>> The patch you are looking for is located at:
>> http://lkml.kernel.org/r/1545405631-6808-1-git-send-email-longman@redhat.com
>
> Hmm looks like libnuma [1] uses that /proc/self/status parsing approach for
> numa_num_possible_nodes() and it's also mentioned in man numa(3), and comment in
> code mentions that libcpuset does that as well. I'm afraid we can't just break this.
Oh-oh. This looks utterly broken to me in the face of process
migration.
Is this used for anything important? Perhaps sizing data structures in
user space?
Thanks,
Florian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: No system call to determine MAX_NUMNODES?
2019-02-13 14:25 ` Florian Weimer
@ 2019-02-13 14:48 ` Vlastimil Babka
0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2019-02-13 14:48 UTC (permalink / raw)
To: Florian Weimer
Cc: Alexander Duyck, Ralph Campbell, Linux MM, longman, Linux API,
Andi Kleen
On 2/13/19 3:25 PM, Florian Weimer wrote:
> * Vlastimil Babka:
>
>> On 2/7/19 1:27 AM, Alexander Duyck wrote:
>>> On Wed, Feb 6, 2019 at 3:13 PM Ralph Campbell <rcampbell@nvidia.com> wrote:
>>>>
>>>> I was using the latest git://git.cmpxchg.org/linux-mmotm.git and noticed
>>>> a new issue compared to 5.0.0-rc5.
>>>>
>>>> It looks like there is no convenient way to query the kernel's value for
>>>> MAX_NUMNODES yet this is used in kernel_get_mempolicy() to validate the
>>>> 'maxnode' parameter to the GET_MEMPOLICY(2) system call.
>>>> Otherwise, EINVAL is returned.
>>>>
>>>> Searching the internet for get_mempolicy yields some references that
>>>> recommend reading /proc/<pid>/status and parsing the line "Mems_allowed:".
>>>>
>>>> Running "cat /proc/self/status | grep Mems_allowed:" I get:
>>>> With 5.0.0-rc5:
>>>> Mems_allowed: 00000000,00000001
>>>> With 5.0.0-rc5-mm1:
>>>> Mems_allowed: 1
>>>> (both kernels were config'ed with CONFIG_NODES_SHIFT=6)
>>>>
>>>> Clearly, there should be a better way to query MAX_NUMNODES like
>>>> sysconf(), sysctl(), or libnuma.
>>>
>>> Really we shouldn't need to know that. That just tells us about how
>>> the kernel was built, it doesn't really provide any information about
>>> the layout of the system.
>>>
>>>> I searched for the patch that changed /proc/self/status but didn't find it.
>>>
>>> The patch you are looking for is located at:
>>> http://lkml.kernel.org/r/1545405631-6808-1-git-send-email-longman@redhat.com
>>
>> Hmm looks like libnuma [1] uses that /proc/self/status parsing approach for
>> numa_num_possible_nodes() and it's also mentioned in man numa(3), and comment in
>> code mentions that libcpuset does that as well. I'm afraid we can't just break this.
>
> Oh-oh. This looks utterly broken to me in the face of process
> migration.
MAX_NUMNODES and thus the layout of /proc/self/status is a build-time constant
of the kernel, so it won't change after migration between VM's if that's what
you're asking. CRIU might be affected if restore is done on kernel with
different MAX_NUMNODES.
> Is this used for anything important? Perhaps sizing data structures in
> user space?
libnuma seems to parse it only once and then remembering the result for
everything else, so there shouldn't be e.g. mismatch between buffer alloc and
writing to it.
> Thanks,
> Florian
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-02-13 14:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 23:13 No system call to determine MAX_NUMNODES? Ralph Campbell
2019-02-07 0:27 ` Alexander Duyck
2019-02-13 9:26 ` Vlastimil Babka
2019-02-13 14:25 ` Florian Weimer
2019-02-13 14:48 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox