linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Qi Zheng <zhengqi.arch@bytedance.com>
To: Johannes Weiner <hannes@cmpxchg.org>, peterz@infradead.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: CONFIG_PT_RECLAIM
Date: Thu, 27 Feb 2025 14:58:50 +0800	[thread overview]
Message-ID: <6281ffc9-398e-44b9-a95c-2527004e09b7@bytedance.com> (raw)
In-Reply-To: <20250227060820.GC110982@cmpxchg.org>

Hi Johannes,

On 2/27/25 2:08 PM, Johannes Weiner wrote:
> On Thu, Feb 27, 2025 at 11:04:51AM +0800, Qi Zheng wrote:
>> Hi Johannes,
>>
>> On 2/27/25 2:30 AM, Johannes Weiner wrote:
>>> Does PT_RECLAIM need to be configurable by the user?
>>
>> The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs
>> support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc
>> ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only
>> supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT.
>>
>> Therefore, PT_RECLAIM also implies the meaning of enabling
>> MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just
>> thought that as a new feature, it would be better to give users the
>> ability to turn it on and off.
> 
> New *features*, yes - something that has a significant enough cost
> that clearly not all users want to pay for the benefits.

Got it.

> 
> But it's hard to imagine anybody would WANT to keep the page tables
> around if they madvised away all the pages inside of them. It's a
> great optimization, what would be a reason to opt out?

OK, now I think it makes sense to change it to 'def_bool y'.

> 
>>> diff --git a/mm/Kconfig b/mm/Kconfig
>>> index 2761098dbc1a..99383c93db33 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM
>>>    	def_bool n
>>>    
>>>    config PT_RECLAIM
>>> -	bool "reclaim empty user page table pages"
>>> -	default y
>>> +	def_bool y
>>>    	depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP
>>>    	select MMU_GATHER_RCU_TABLE_FREE
>>> -	help
>>> -	  Try to reclaim empty user page table pages in paths other than munmap
>>> -	  and exit_mmap path.
>>> -
>>> -	  Note: now only empty user PTE page table pages will be reclaimed.
>>> -
>>
>> Maybe keep the help information?
> 
> I don't find it very helpful :( Which "other paths?" It doesn't
> explain any pros and cons, and why anybody might choose to enable or
> disable it. The Note repeats what's in the sentence before it.

Sorry about that. :(

> 
> Maybe I'm missing something. Could this not just be an #ifdef block
> inside mm/madvise.c, instead of living inside a new file with two new
> config symbols?
> 
> #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE
> ...
> #endif
> 
> Is there an arch-specific feature that it requires besides
> MMU_GATHER_RCU_TABLE_FREE such that only x86 supports it now?

No, it only needs MMU_GATHER_RCU_TABLE_FREE.

> 
> And why *does* it require MMU_GATHER_RCU_TABLE_FREE?

Because in the madvise(MADV_DONTNEED) path, mmu_gather has been used
to batch flush tlb and free physical pages. It is a better choice to
free PTE pages in this ways as well.

And because PT_RECLAIM needs rcu, we need MMU_GATHER_RCU_TABLE_FREE to
make pte_free_tlb() free PTE pages through rcu. Of course, we also need
to modify __tlb_remove_table_one() to make it use rcu as well.

> 
> Documentation/mm/process_addrs.rst explains why you need rcu, but
> there is free_pte_defer() that THP was using long before x86 needed
> MMU_GATHER_RCU_TABLE_FREE. It seems to me if you could use that, this
> feature would also work fine on architectures that do not generally
> need RCU for flush & frees otherwise. So is the main issue that there

As mentioned above, we want to flush & frees in batches, so we don't
use pte_free_defer().

> just isn't an explicitly deferred variant of pte_free_tlb()?

The pte_free_defer() seems to have been adapted to all archs, so I
wonder if all archs can support MMU_GATHER_RCU_TABLE_FREE, so that
pte_free_tlb() will always use rcu to free PTE pages.

Maybe I missed something.

+Peter.

> 
> If so, this is a fairly non-obvious dependency that should be
> documented. It would help somebody trying to port this to a !RCU
> mmu_gather arch.
> 
> And I apologize if all this was discussed before. But if it was, the
> conclusions should be in the changelog or in code comments. This is a
> very delicate synchronization scheme that I think deserves explicit
> documentation somewhere.



  reply	other threads:[~2025-02-27  6:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 18:30 CONFIG_PT_RECLAIM Johannes Weiner
2025-02-27  3:04 ` CONFIG_PT_RECLAIM Qi Zheng
2025-02-27  6:08   ` CONFIG_PT_RECLAIM Johannes Weiner
2025-02-27  6:58     ` Qi Zheng [this message]
2025-02-27  7:40       ` CONFIG_PT_RECLAIM Qi Zheng
2025-02-27  9:54 ` CONFIG_PT_RECLAIM David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6281ffc9-398e-44b9-a95c-2527004e09b7@bytedance.com \
    --to=zhengqi.arch@bytedance.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox