linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: hughd@google.com, willy@infradead.org, mgorman@suse.de,
	muchun.song@linux.dev, vbabka@kernel.org,
	akpm@linux-foundation.org, zokeefe@google.com,
	rientjes@google.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval
Date: Fri, 16 Aug 2024 10:59:01 +0200	[thread overview]
Message-ID: <ebb35909-1c12-48e0-8788-824c5f7f629e@redhat.com> (raw)
In-Reply-To: <3e8253c4-9181-4027-84ee-28e1fc488f61@bytedance.com>

On 12.08.24 08:21, Qi Zheng wrote:
> Hi David,
> 
> On 2024/8/10 00:54, David Hildenbrand wrote:
>> On 07.08.24 05:08, Qi Zheng wrote:
>>> Hi David,
>>>
>>> On 2024/8/6 22:16, David Hildenbrand wrote:
>>>> On 06.08.24 04:40, Qi Zheng wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2024/8/5 22:43, David Hildenbrand wrote:
>>>>>> On 05.08.24 14:55, Qi Zheng wrote:
>>>>>>> Make pte_offset_map_nolock() return pmdval so that we can recheck the
>>>>>>> *pmd once the lock is taken. This is a preparation for freeing empty
>>>>>>> PTE pages, no functional changes are expected.
>>>>>>
>>>>>> Skimming the patches, only patch #4 updates one of the callsites
>>>>>> (collapse_pte_mapped_thp).
>>>>>
>>>>> In addition, retract_page_tables() and reclaim_pgtables_pmd_entry()
>>>>> also used the pmdval returned by pte_offset_map_nolock().
>>>>
>>>> Right, and I am questioning if only touching these two is sufficient,
>>>> and how we can make it clearer when someone actually has to recheck the
>>>> PMD.
>>>>
>>>>>
>>>>>>
>>>>>> Wouldn't we have to recheck if the PMD val changed in more cases after
>>>>>> taking the PTL?
>>>>>>
>>>>>> If not, would it make sense to have a separate function that
>>>>>> returns the
>>>>>> pmdval and we won't have to update each and every callsite?
>>>>>
>>>>> pte_offset_map_nolock() had already obtained the pmdval previously,
>>>>> just
>>>>> hadn't returned it. And updating those callsite is simple, so I think
>>>>> there may not be a need to add a separate function.
>>>>
>>>> Let me ask this way: why is retract_page_tables() and
>>>> reclaim_pgtables_pmd_entry() different to the other ones, and how would
>>>> someone using pte_offset_map_nolock() know what's to do here?
>>>
>>> If we acuqire the PTL (PTE or PMD lock) after calling
>>> pte_offset_map_nolock(), it means we may be modifying the corresponding
>>> pte or pmd entry. In that case, we need to perform a pmd_same() check
>>> after holding the PTL, just like in pte_offset_map_lock(), to prevent
>>> the possibility of the PTE page being reclaimed at that time.
>>
>> Okay, what I thought.
>>
>>>
>>> If we call pte_offset_map_nolock() and do not need to acquire the PTL
>>> afterwards, it means we are only reading the PTE page. In this case, the
>>> rcu_read_lock() in pte_offset_map_nolock() will ensure that the PTE page
>>> cannot be reclaimed.
>>>
>>>>
>>>> IIUC, we must check the PMDVAL after taking the PTL in case
>>>>
>>>> (a) we want to modify the page table to turn pte_none() entries to
>>>>        !pte_none(). Because it could be that the page table was
>>>> removed and
>>>>        now is all pte_none()
>>>>
>>>> (b) we want to remove the page table ourselves and want to check if it
>>>>        has already been removed.
>>>>
>>>> Is that it?
>>>
>>> Yes.
>>>
>>>>
>>>> So my thinking is if another function variant can make that clearer.
>>>
>>> OK, how about naming it pte_offset_map_before_lock?
>>
>> That's the issue with some of the code: for example in
>> filemap_fault_recheck_pte_none() we'll call pte_offset_map_nolock() and
>> conditionally take the PTL. But we won't be modifying the pages tables.
>>
>> Maybe something like:
>>
>> pte_offset_map_readonly_nolock()
>>
>> and
>>
>> pte_offset_map_maywrite_nolock()
>>
>> The latter would require you to pass the PMD pointer such that you have
>> to really mess up to ignore what to do with it (check PMD same or not
>> check PMD same if you really know what you are douing).
>>
>> The first would not take a PMD pointer at all, because there is no need to.
> 
> These two function names LGTM. Will do in the next version.

That is probably something you can send as a separate patch independent 
of this full series.

Then we might also get more review+thoughts from other folks!

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-08-16  8:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 12:55 [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 1/7] mm: pgtable: make pte_offset_map_nolock() return pmdval Qi Zheng
2024-08-05 14:43   ` David Hildenbrand
2024-08-06  2:40     ` Qi Zheng
2024-08-06 14:16       ` David Hildenbrand
     [not found]         ` <f6c05526-5ac9-4597-9e80-099ea22fa0ae@bytedance.com>
2024-08-09 16:54           ` David Hildenbrand
2024-08-12  6:21             ` Qi Zheng
2024-08-16  8:59               ` David Hildenbrand [this message]
2024-08-16  9:21                 ` Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 2/7] mm: introduce CONFIG_PT_RECLAIM Qi Zheng
2024-08-06 14:25   ` David Hildenbrand
2024-08-05 12:55 ` [RFC PATCH v2 3/7] mm: pass address information to pmd_install() Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 4/7] mm: pgtable: try to reclaim empty PTE pages in zap_page_range_single() Qi Zheng
2024-08-06 14:40   ` David Hildenbrand
     [not found]     ` <42942b4d-153e-43e2-bfb1-43db49f87e50@bytedance.com>
2024-08-16  9:22       ` David Hildenbrand
2024-08-16 10:01         ` Qi Zheng
2024-08-16 10:03           ` David Hildenbrand
2024-08-16 10:07             ` Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 5/7] x86: mm: free page table pages by RCU instead of semi RCU Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 6/7] x86: mm: define arch_flush_tlb_before_set_huge_page Qi Zheng
2024-08-05 12:55 ` [RFC PATCH v2 7/7] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Qi Zheng
2024-08-05 13:14 ` [RFC PATCH v2 0/7] synchronously scan and reclaim empty user PTE pages Qi Zheng
2024-08-06  3:31 ` Qi Zheng
2024-08-16  2:55   ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebb35909-1c12-48e0-8788-824c5f7f629e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox