From: David Hildenbrand <david@redhat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
akpm@linux-foundation.org, ziy@nvidia.com,
Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
dev.jain@arm.com, baohua@kernel.org, zokeefe@google.com,
shy828301@gmail.com, usamaarif642@gmail.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
Date: Wed, 25 Jun 2025 10:40:23 +0200 [thread overview]
Message-ID: <a027fe94-e6c2-46d0-8768-6acd8e801cc3@redhat.com> (raw)
In-Reply-To: <f36e64f2-f3d1-407e-862f-ceccc89ac9a8@lucifer.local>
On 25.06.25 10:22, Lorenzo Stoakes wrote:
> On Wed, Jun 25, 2025 at 10:16:46AM +0200, David Hildenbrand wrote:
>> On 25.06.25 09:49, David Hildenbrand wrote:
>>> I think the whole use case of using MADV_COLLAPSE to completely control
>>> THP allocation in a system is otherwise pretty hard to achieve, if there
>>> is no other way to tame THP allocation through page faults+khugepaged.
>>
>> Just want to add: for an app itself, it's doable in "madvise" mode perfectly
>> fine.
>>
>> If your app does a MADV_HUGEPAGE, it can get a THP during page-fault +
>> khugepaged.
>>
>> If your app does not do a MADV_HUGEPAGE, it can get a THP through
>> MADV_COLLAPSE.
>>
>> So the "madvise" mode actually works.
>
> Right, but for me MADV_COLLAPSE is more about 'I want THPs _now_ (if available),
> not when khugepaged decides to give me some'.
>
> So we have multiple semantics at work here, unfortunately.
>
>>
>> The problem appears as soon as we want to control other processes that might
>> be setting MADV_HUGEPAGE, and we actually want to control the behavior using
>> process_madvise(MADV_COLLAPSE), to say "well, the MADV_HUGEPAGE" should be
>> ignored.
>
> This is a _very_ specialist use.
>
> I'd argue for a 'manual' mode to be added to sysfs to cover this case, with
> 'never' having the 'actually means never' semantics.
>
> You might argue that could confuse things, but it'd retain the 'de facto'
> understanding nearly everybody has about what thees flags mean, but give
> whatever user is out there that needs this the ability to continue doing what
> they want.
>
> And we get into philosophy about not 'breaking' userland, not sure we have a
> TLB/page fault/folio allocation efficiency contract with userland :)
>
> No program will break with this patch applied. Just potentially get performance
> degradation in a very, very specialist case.
>
>>
>> Then, you configure "never" system-wide and use
>> process_madvise(MADV_COLLAPSE) to drive it all manually.
>>
>> Curious to learn if there is such a user out there.
>
> Oh me too :)
I just looked at the original use cases [1], such a use case is not
mentioned.
But it did add process_madvise(MADV_COLLAPSE) in
876b4a1896646cc85ec6b1fc1c9270928b7e0831 where we document
"
This is useful for the development of userspace agents that seek to
optimize THP utilization system-wide by using userspace signals to
prioritize what memory is most deserving of being THP-backed.
"
The "prioritize" might indicate that this is used in combination with
"madvise", not with "never"/
So yeah, it all boils down to
(1) If there is no such use case, "never can mean never". Because there
is nothing to break, really.
(2) If there is such a use case, we might be breaking it.
[1]
https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-06-25 8:40 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 1:40 Baolin Wang
2025-06-25 1:40 ` [PATCH v4 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-25 4:34 ` Dev Jain
2025-06-25 1:40 ` [PATCH v4 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
2025-06-25 5:53 ` [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP " Hugh Dickins
2025-06-25 6:05 ` Dev Jain
2025-06-25 6:26 ` Baolin Wang
2025-06-25 6:49 ` Dev Jain
2025-06-25 6:55 ` Baolin Wang
2025-06-25 7:20 ` Lorenzo Stoakes
2025-06-25 7:34 ` David Hildenbrand
2025-06-25 7:55 ` Lorenzo Stoakes
2025-06-25 8:12 ` Lorenzo Stoakes
2025-06-25 8:24 ` David Hildenbrand
2025-06-25 8:37 ` Lorenzo Stoakes
2025-06-25 8:52 ` Baolin Wang
2025-06-25 9:31 ` Lorenzo Stoakes
2025-06-25 10:02 ` Baolin Wang
2025-06-25 10:07 ` David Hildenbrand
2025-06-25 10:15 ` Lorenzo Stoakes
2025-06-25 10:29 ` David Hildenbrand
2025-06-25 8:53 ` David Hildenbrand
2025-06-25 11:03 ` Usama Arif
2025-06-25 11:09 ` David Hildenbrand
2025-06-26 3:49 ` Hugh Dickins
2025-06-25 7:23 ` David Hildenbrand
2025-06-25 7:30 ` Lorenzo Stoakes
2025-06-25 7:36 ` David Hildenbrand
2025-06-25 7:42 ` Lorenzo Stoakes
2025-06-25 7:49 ` David Hildenbrand
2025-06-25 8:16 ` David Hildenbrand
2025-06-25 8:22 ` Lorenzo Stoakes
2025-06-25 8:40 ` David Hildenbrand [this message]
2025-06-25 8:45 ` Lorenzo Stoakes
2025-06-25 21:51 ` Hugh Dickins
2025-07-09 12:36 ` Lorenzo Stoakes
2025-07-10 1:58 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a027fe94-e6c2-46d0-8768-6acd8e801cc3@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=usamaarif642@gmail.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox