From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com,
Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
Date: Sat, 7 Jun 2025 12:55:19 +0100 [thread overview]
Message-ID: <998a069c-9be5-4a10-888c-ba8269eaa333@lucifer.local> (raw)
In-Reply-To: <8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com>
Not related to your patch at all, but man this whole thing (thp allowed orders)
needs significant improvement, it seems always perversely complicated for a
relatively simple operation.
Overall I LOVE what you're doing here, but I feel we can clarify things a
little while we're at it to make it clear exactly what we're doing.
This is a very important change so forgive my fiddling about here but I'm
hoping we can take the opportunity to make things a little simpler!
On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
you're changing what THP is permitted across the board, I may have missed some
discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
users?
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
OK so it's already confusing that the global settings only impact 'inherit'
settings below, so they're not really global at all, but rather perhaps should
be called 'inherited'.
Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
that'd just add to the confusion :P
OK this is also not your fault just general commentary.
Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
also there's no need to refer to the code, it's actually clearer I think to
refer to the underlying logic:
If no hugepage modes are enabled for the desired orders, nor can we
enable them by inheriting from a 'global' enabled setting - then it
must be the case that all desired orders either specify or inherit
'NEVER' - and we must abort.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
I feel this is compressing a lot of logic in a way that took me several
readings to understand (hey I might not be the smartest cookie in the jar,
but we need to account for all levels of kernel developer ;)
I feel like we can make things a lot clearer here by separating out with a
helper function (means we can drop some indentation too), and also take
advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
exits with 0 early so no need for us to do so ourselves:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
unsigned long always = READ_ONCE(huge_anon_orders_always);
unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
bool inherit_enabled = hugepage_global_enabled();
bool has_madvise = vm_flags & VM_HUGEPAGE;
unsigned long mask = always | madvise;
mask = always | madvise;
if (inherit_enabled)
mask |= inherit;
/* All set to/inherit NEVER - never means never globally, abort. */
if (!(mask & orders))
return 0;
/* Otherwise, we only enforce sysfs settings if asked. */
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
mask = always;
if (has_madvise)
mask |= madvise;
if (hugepage_global_always() || (has_madvise && inherit_enabled))
mask |= inherit;
return orders & mask;
}
...
static inline
unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
if (vma_is_anonymous(vma))
orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
}
>
> --
> 2.43.5
>
next prev parent reply other threads:[~2025-06-07 11:55 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 8:00 [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP " Baolin Wang
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-06 16:49 ` Dev Jain
2025-06-06 18:47 ` Dev Jain
2025-06-09 5:57 ` Baolin Wang
2025-06-07 11:55 ` Lorenzo Stoakes [this message]
2025-06-07 12:21 ` Lorenzo Stoakes
2025-06-09 6:18 ` Baolin Wang
2025-06-09 15:12 ` Lorenzo Stoakes
2025-06-09 6:10 ` Baolin Wang
2025-06-09 15:17 ` Lorenzo Stoakes
2025-06-11 6:59 ` Baolin Wang
2025-06-08 18:37 ` Nico Pache
2025-06-09 6:36 ` Baolin Wang
2025-06-11 12:34 ` David Hildenbrand
2025-06-12 7:51 ` Baolin Wang
2025-06-12 8:46 ` Dev Jain
2025-06-12 8:52 ` David Hildenbrand
2025-06-12 8:51 ` David Hildenbrand
2025-06-12 12:45 ` Baolin Wang
2025-06-12 13:05 ` David Hildenbrand
2025-06-12 13:25 ` Baolin Wang
2025-06-12 13:40 ` Baolin Wang
2025-06-12 13:27 ` Lorenzo Stoakes
2025-06-12 13:29 ` Lorenzo Stoakes
2025-06-12 14:13 ` Baolin Wang
2025-06-12 14:16 ` David Hildenbrand
2025-06-12 14:20 ` Lorenzo Stoakes
2025-06-12 14:09 ` David Hildenbrand
2025-06-12 14:49 ` Lorenzo Stoakes
2025-06-13 2:07 ` Baolin Wang
2025-06-13 5:18 ` Lorenzo Stoakes
2025-06-12 13:07 ` Lorenzo Stoakes
2025-06-12 13:13 ` David Hildenbrand
2025-06-12 13:31 ` Lorenzo Stoakes
2025-06-05 8:00 ` [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
2025-06-07 12:14 ` Lorenzo Stoakes
2025-06-07 12:17 ` Lorenzo Stoakes
2025-06-09 6:34 ` Baolin Wang
2025-06-09 19:30 ` Lorenzo Stoakes
2025-06-09 6:31 ` Baolin Wang
2025-06-09 15:33 ` Lorenzo Stoakes
2025-06-11 7:02 ` Baolin Wang
2025-06-07 12:28 ` [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP " Lorenzo Stoakes
2025-06-11 7:05 ` Baolin Wang
2025-06-13 14:23 ` Usama Arif
2025-06-13 14:29 ` Lorenzo Stoakes
2025-06-13 14:39 ` Usama Arif
2025-06-13 14:42 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=998a069c-9be5-4a10-888c-ba8269eaa333@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox