linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Yafang Shao <laoar.shao@gmail.com>,
	Usama Arif <usamaarif642@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, shakeel.butt@linux.dev, riel@surriel.com,
	baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH 0/1] prctl: allow overriding system THP policy to always
Date: Sun, 11 May 2025 10:15:53 +0200	[thread overview]
Message-ID: <913bdc9b-a3c2-401c-99d0-18799850db9e@redhat.com> (raw)
In-Reply-To: <8A18FB29-CC41-456F-A80E-807984691F0F@nvidia.com>

On 10.05.25 01:34, Zi Yan wrote:
> On 9 May 2025, at 18:42, David Hildenbrand wrote:
> 
>>>>>> - madvise
>>>>>>     The sysadmin gently encourages the use of THP, but it is only
>>>>>> enabled when explicitly requested by the application.
>>>
>>> And this "user mode" or "manual mode", where applications self-manage
>>> which parts of userspace they want to enroll.
>>>
>>> Both madvise() and unprivileged prctl() should work here as well,
>>> IMO. There is no policy or security difference between them, it's just
>>> about granularity and usability.
>>>
>>>>>> - never
>>>>>>      The sysadmin discourages the use of THP, and "its use is only permitted
>>>>>> with explicit approval" .
>>>
>>> This one I don't quite agree with, and IMO conflicts with what David
>>> is saying as well.
>>
>> Yeah ... "never" does not mean "sometimes" in my reality :)
>>
>>>
>>>>> "never" so far means "no thps, no exceptions". We've had serious THP
>>>>> issues in the past, where our workaround until we sorted out the issue
>>>>> for affected customers was to force-disable THPs on that system during boot.
>>>>
>>>> Right, that reflects the current behavior. What we aim to enhance is
>>>> by adding the requirement that "its use is only permitted with
>>>> explicit approval."
>>>
>>> I think you're conflating a safety issue with a security issue.
>>>
>>> David is saying there can be cases where the kernel is broken, and
>>> "never" is a production escape hatch to disable the feature until a
>>> kernel upgrade for the fix is possible. In such a case, it doesn't
>>> make sense to override this decision based on any sort of workload
>>> policy, privileged or not.
>>>
>>> The way I understand you is that you want enrollment (and/or
>>> self-management) only for blessed applications. Because you don't
>>> generally trust workloads in the wild enough to switch the global
>>> default away from "never", given the semantics of always/madvise.
>>
>> Assuming "never" means "never" and "always" means "always" ( crazy, right? :) ), could be make use of "madvise" mode, which essentially means "VM_HUGEPAGE" takes control?
>>
>> We'd need
>>
>> a) A way to enable THP for a process. Changing the default/vma settings to VM_HUGEPAGE as discussed using a prctl could work.
>>
>> b) A way to ignore VM_HUGEPAGE for processes. Maybe the existing prctl to force-disable THPs could work?
> 
> This means process level control overrides VMA level control, which
> overrides global control, right?
> 
> Intuitively, it should be that VMA level control overrides process level
> control, which overrides global control, namely finer granularly control
> precedes coarse one. But some apps might not use VMA level control
> (e.g., madvise) carefully, we want to override that. Maybe ignoring VMA
> level control is what we want?

Let's take a step back:

Current behavior is

1) If anybody (global / process / VM) says "never" 
(never/PR_SET_THP_DISABLE/VM_NOHUGEPAGE), the behavior is "never".

2) In "madvise" mode, only "VM_HUGEPAGE" gets THP unless 
PR_SET_THP_DISABLE is set. So per-process overrides per-VMA.

3) In "always" mode, everything gets THP unless per-VMA (VM_NOHUGEPAGE) 
or per-process (PR_SET_THP_DISABLE) disables it.


Interestingly, PR_SET_THP_DISABLE used to mimic exactly what I proposed 
for the other direction (change default of VM_HUGEPAGE), except that it 
wouldn't modify already existing mappings. Worth looking at 
1860033237d4. Not sure if that commit was the right call, but it's the 
semantics we have today.

That commit notes:

"It should be noted, that the new implementation makes 
PR_SET_THP_DISABLE master override to any per-VMA setting, which was not 
the case previously."


Whatever we do, we have to be careful to not create more mess or 
inconsistency.

Especially, if anybody sets VM_NOHUGEPAGE or PR_SET_THP_DISABLE, we must 
not use THPs, ever.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-05-11  8:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-07 14:00 Usama Arif
2025-05-07 14:00 ` [PATCH 1/1] prctl: allow overriding system THP policy to always per process Usama Arif
2025-05-07 15:02   ` Usama Arif
2025-05-07 20:14   ` Zi Yan
2025-05-08 10:53     ` Usama Arif
2025-05-08 20:29       ` Zi Yan
2025-05-07 14:57 ` [PATCH 0/1] prctl: allow overriding system THP policy to always Zi Yan
2025-05-07 15:12   ` Usama Arif
2025-05-07 15:57     ` Zi Yan
2025-05-07 16:09       ` Usama Arif
2025-05-08  5:41         ` Yafang Shao
2025-05-08 16:04           ` Usama Arif
2025-05-09  2:15             ` Yafang Shao
2025-05-09  5:13               ` Johannes Weiner
2025-05-09  9:24                 ` Yafang Shao
2025-05-09  9:30                   ` David Hildenbrand
2025-05-09  9:43                     ` Yafang Shao
2025-05-09 16:46                       ` Johannes Weiner
2025-05-09 22:42                         ` David Hildenbrand
2025-05-09 23:34                           ` Zi Yan
2025-05-11  8:15                             ` David Hildenbrand [this message]
2025-05-11 14:08                               ` Usama Arif
2025-05-13 11:43                                 ` Yafang Shao
2025-05-13 12:04                                 ` David Hildenbrand
2025-05-11  2:08                         ` Yafang Shao
2025-05-08 11:06 ` David Hildenbrand
2025-05-08 16:35   ` Usama Arif
2025-05-08 17:39     ` David Hildenbrand
2025-05-08 18:05       ` Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=913bdc9b-a3c2-401c-99d0-18799850db9e@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox