From: Qi Zheng <zhengqi.arch@bytedance.com>
To: Johannes Weiner <hannes@cmpxchg.org>, peterz@infradead.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: CONFIG_PT_RECLAIM
Date: Thu, 27 Feb 2025 14:58:50 +0800 [thread overview]
Message-ID: <6281ffc9-398e-44b9-a95c-2527004e09b7@bytedance.com> (raw)
In-Reply-To: <20250227060820.GC110982@cmpxchg.org>
Hi Johannes,
On 2/27/25 2:08 PM, Johannes Weiner wrote:
> On Thu, Feb 27, 2025 at 11:04:51AM +0800, Qi Zheng wrote:
>> Hi Johannes,
>>
>> On 2/27/25 2:30 AM, Johannes Weiner wrote:
>>> Does PT_RECLAIM need to be configurable by the user?
>>
>> The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs
>> support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc
>> ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only
>> supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT.
>>
>> Therefore, PT_RECLAIM also implies the meaning of enabling
>> MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just
>> thought that as a new feature, it would be better to give users the
>> ability to turn it on and off.
>
> New *features*, yes - something that has a significant enough cost
> that clearly not all users want to pay for the benefits.
Got it.
>
> But it's hard to imagine anybody would WANT to keep the page tables
> around if they madvised away all the pages inside of them. It's a
> great optimization, what would be a reason to opt out?
OK, now I think it makes sense to change it to 'def_bool y'.
>
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 2761098dbc1a..99383c93db33 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM
>>> def_bool n
>>>
>>> config PT_RECLAIM
>>> - bool "reclaim empty user page table pages"
>>> - default y
>>> + def_bool y
>>> depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>> select MMU_GATHER_RCU_TABLE_FREE
>>> - help
>>> - Try to reclaim empty user page table pages in paths other than munmap
>>> - and exit_mmap path.
>>> -
>>> - Note: now only empty user PTE page table pages will be reclaimed.
>>> -
>>
>> Maybe keep the help information?
>
> I don't find it very helpful :( Which "other paths?" It doesn't
> explain any pros and cons, and why anybody might choose to enable or
> disable it. The Note repeats what's in the sentence before it.
Sorry about that. :(
>
> Maybe I'm missing something. Could this not just be an #ifdef block
> inside mm/madvise.c, instead of living inside a new file with two new
> config symbols?
>
> #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
> ...
> #endif
>
> Is there an arch-specific feature that it requires besides
> MMU_GATHER_RCU_TABLE_FREE such that only x86 supports it now?
No, it only needs MMU_GATHER_RCU_TABLE_FREE.
>
> And why *does* it require MMU_GATHER_RCU_TABLE_FREE?
Because in the madvise(MADV_DONTNEED) path, mmu_gather has been used
to batch flush tlb and free physical pages. It is a better choice to
free PTE pages in this ways as well.
And because PT_RECLAIM needs rcu, we need MMU_GATHER_RCU_TABLE_FREE to
make pte_free_tlb() free PTE pages through rcu. Of course, we also need
to modify __tlb_remove_table_one() to make it use rcu as well.
>
> Documentation/mm/process_addrs.rst explains why you need rcu, but
> there is free_pte_defer() that THP was using long before x86 needed
> MMU_GATHER_RCU_TABLE_FREE. It seems to me if you could use that, this
> feature would also work fine on architectures that do not generally
> need RCU for flush & frees otherwise. So is the main issue that there
As mentioned above, we want to flush & frees in batches, so we don't
use pte_free_defer().
> just isn't an explicitly deferred variant of pte_free_tlb()?
The pte_free_defer() seems to have been adapted to all archs, so I
wonder if all archs can support MMU_GATHER_RCU_TABLE_FREE, so that
pte_free_tlb() will always use rcu to free PTE pages.
Maybe I missed something.
+Peter.
>
> If so, this is a fairly non-obvious dependency that should be
> documented. It would help somebody trying to port this to a !RCU
> mmu_gather arch.
>
> And I apologize if all this was discussed before. But if it was, the
> conclusions should be in the changelog or in code comments. This is a
> very delicate synchronization scheme that I think deserves explicit
> documentation somewhere.
next prev parent reply other threads:[~2025-02-27 6:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-26 18:30 CONFIG_PT_RECLAIM Johannes Weiner
2025-02-27 3:04 ` CONFIG_PT_RECLAIM Qi Zheng
2025-02-27 6:08 ` CONFIG_PT_RECLAIM Johannes Weiner
2025-02-27 6:58 ` Qi Zheng [this message]
2025-02-27 7:40 ` CONFIG_PT_RECLAIM Qi Zheng
2025-02-27 9:54 ` CONFIG_PT_RECLAIM David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6281ffc9-398e-44b9-a95c-2527004e09b7@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox