From: David Hildenbrand <david@redhat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Usama Arif <usamaarif642@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev,
riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com,
baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com,
npache@redhat.com, ryan.roberts@arm.com,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the process
Date: Fri, 16 May 2025 09:45:17 +0200 [thread overview]
Message-ID: <8f0a22c2-3176-4942-994d-58d940901ecf@redhat.com> (raw)
In-Reply-To: <cbc95f9b-1c13-45ec-8d34-38544d3f2dd3@lucifer.local>
On 15.05.25 22:35, Lorenzo Stoakes wrote:
> On Thu, May 15, 2025 at 09:12:13PM +0200, David Hildenbrand wrote:
>> On 15.05.25 20:08, Lorenzo Stoakes wrote:
>>> On Thu, May 15, 2025 at 06:11:55PM +0200, David Hildenbrand wrote:
>>>>>>> So if you're not overriding VM_NOHUGEPAGE, the whole point of this exercise
>>>>>>> is to override global 'never'?
>>>>>>>
>>>>>>
>>>>>> Again, I am not overriding never.
>>>>>>
>>>>>> hugepage_global_always and hugepage_global_enabled will evaluate to false
>>>>>> and you will not get a hugepage.
>>>>>
>>>>> Yeah, again ack, but I kind of hate that we set VM_HUGEPAGE everywhere even
>>>>> if the policy is never.
>>>>
>>>> I think it should behave just as if someone does manually an madvise(). So
>>>> whatever we do here during an madvise, we should try to do the same thing
>>>> here.
>>>
>>> Ack I agree with this.
>>>
>>> It actually simplifies things a LOT to view it this way - we're saying 'by
>>> default apply madvise(...) to new VMAs'.
>>>
>>> Hm I wonder if we could have a more generic version of this...
>>>
>>> Note though that we're not _quite_ doing this.
>>>
>>> So in hugepage_madvise():
>>>
>>> int hugepage_madvise(struct vm_area_struct *vma,
>>> unsigned long *vm_flags, int advice)
>>> {
>>> ...
>>>
>>> switch (advice) {
>>> case MADV_HUGEPAGE:
>>> *vm_flags &= ~VM_NOHUGEPAGE;
>>> *vm_flags |= VM_HUGEPAGE;
>>>
>>> ...
>>>
>>> break;
>>>
>>> ...
>>> }
>>>
>>> ...
>>> }
>>>
>>> So here we're actually clearing VM_NOHUGEPAGE and overriding it, but in the
>>> proposed code we're not.
>>
>> Yeah, I think I suggested that, but probably we should just do exactly what
>> madvise() does.
>
> Yes, agreed.
>
> Usama - do you have any issue with us switching to how madvise() does it?
>
>>
>>>
>>> So we're back into confusing territory again :)
>>>
>>> I wonder if we could...
>>>
>>> 1. Add an MADV_xxx that mimics the desired behaviour here.
>>>
>>> 2. Add a generic 'madvise() by default' thing at a process level?
>>>
>>> Is this crazy?
>>
>> I think that's what I had in mind, just a bit twisted.
>>
>> What could work is
>>
>> 1) prctl to set the default
>>
>> 2) madvise() to adjust all existing VMAs
>>
>>
>> We might have to teach 2) to ignore non-compatible VMAs / holes. Maybe not,
>> worth an investigation.
>
> Yeah, I think it'd _probably_ be ok except on s390 (which can fail, and so
> we'd have to be able to say - skip on error, carry on).
>
> We'll just get an -ENOMEM at the end for the gaps (god how I hate
> that). Otherwise I don't think MADV_HUGEPAGE actually is really that
> restrictive.
>
> That would simplify :)
>
> But I still so hate using prctl()... this might be one of those cases where
> we simply figure out we have no other choice.
> > But when you put it as simply as this maybe it's not so bad. With the
> flags2 gone by fixing this stupid 32-bit limit it's less awful.
>
> Perhaps worth seeing what an improved RFC of this series looks like with
> all the various bits fixed to give an idea.
Yes.
>
> But you do then wonder if we could make this _generic_ for _any_ madvise(),
> and how _that_ would look.
>
> But perhaps that's insane because many VMAs would simply not be suited to
> having certain madvise flags set hmm.
Same thinking. I think this is rather special.
In a perfect world not even the madvise(*HUGEPAGE) would exist.
But here we are ... 14 years (wow!) after
commit 0af4e98b6b095c74588af04872f83d333c958c32
Author: Andrea Arcangeli <aarcange@redhat.com>
Date: Thu Jan 13 15:46:55 2011 -0800
thp: madvise(MADV_HUGEPAGE)
(I'm surprised you don't complain about madvise(). IMHO, prctl() is even
a better interface than catch-all madvise(); a syscall where an advise
might not be an advise. I saw some funny rants about MADV_DONTNEED on
reddit at some point ... :) mctrl() would have been clearer, at least
for me :D )
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-05-16 7:45 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-15 13:33 [PATCH 0/6] prctl: introduce PR_SET/GET_THP_POLICY Usama Arif
2025-05-15 13:33 ` [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the process Usama Arif
2025-05-15 14:40 ` Lorenzo Stoakes
2025-05-15 14:44 ` David Hildenbrand
2025-05-15 14:56 ` Usama Arif
2025-05-15 14:58 ` David Hildenbrand
2025-05-15 15:18 ` Lorenzo Stoakes
2025-05-15 15:45 ` Liam R. Howlett
2025-05-15 15:57 ` David Hildenbrand
2025-05-15 16:38 ` Lorenzo Stoakes
2025-05-15 17:29 ` David Hildenbrand
2025-05-15 18:09 ` Liam R. Howlett
2025-05-15 18:21 ` Lorenzo Stoakes
2025-05-15 18:42 ` Zi Yan
2025-05-15 21:04 ` Lorenzo Stoakes
2025-05-15 18:46 ` Usama Arif
2025-05-15 19:20 ` David Hildenbrand
2025-05-15 15:28 ` Usama Arif
2025-05-15 16:06 ` Lorenzo Stoakes
2025-05-15 16:11 ` David Hildenbrand
2025-05-15 18:08 ` Lorenzo Stoakes
2025-05-15 19:12 ` David Hildenbrand
2025-05-15 20:35 ` Lorenzo Stoakes
2025-05-16 7:45 ` David Hildenbrand [this message]
2025-05-16 10:57 ` Lorenzo Stoakes
2025-05-16 11:24 ` David Hildenbrand
2025-05-16 12:57 ` Lorenzo Stoakes
2025-05-16 17:19 ` Usama Arif
2025-05-16 17:51 ` Lorenzo Stoakes
2025-05-16 19:34 ` Usama Arif
2025-05-17 16:20 ` Is number of process_madvise()-able ranges limited to 8? (was Re: [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the process) SeongJae Park
2025-05-17 18:50 ` Lorenzo Stoakes
2025-05-17 20:25 ` SeongJae Park
2025-05-17 19:01 ` [PATCH 1/6] prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the process Lorenzo Stoakes
2025-05-15 16:47 ` Usama Arif
2025-05-15 18:36 ` Lorenzo Stoakes
2025-05-15 19:17 ` David Hildenbrand
2025-05-15 20:42 ` Lorenzo Stoakes
2025-05-16 6:12 ` kernel test robot
2025-05-15 13:33 ` [PATCH 2/6] prctl: introduce PR_THP_POLICY_DEFAULT_NOHUGE " Usama Arif
2025-05-16 8:19 ` kernel test robot
2025-05-15 13:33 ` [PATCH 3/6] prctl: introduce PR_THP_POLICY_SYSTEM " Usama Arif
2025-05-15 13:33 ` [PATCH 4/6] selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_NOHUGE Usama Arif
2025-05-15 13:33 ` [PATCH 5/6] selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_HUGE Usama Arif
2025-05-15 13:33 ` [PATCH 6/6] docs: transhuge: document process level THP controls Usama Arif
2025-05-15 13:55 ` [PATCH 0/6] prctl: introduce PR_SET/GET_THP_POLICY Lorenzo Stoakes
2025-05-15 14:50 ` Usama Arif
2025-05-15 15:15 ` Lorenzo Stoakes
2025-05-15 15:54 ` Usama Arif
2025-05-15 16:04 ` David Hildenbrand
2025-05-15 16:24 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f0a22c2-3176-4942-994d-58d940901ecf@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=laoar.shao@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox