From: Qi Zheng <zhengqi.arch@bytedance.com>
To: David Hildenbrand <david@redhat.com>
Cc: jannh@google.com, hughd@google.com, willy@infradead.org,
muchun.song@linux.dev, vbabka@kernel.org,
akpm@linux-foundation.org, peterx@redhat.com, mgorman@suse.de,
catalin.marinas@arm.com, will@kernel.org,
dave.hansen@linux.intel.com, luto@kernel.org,
peterz@infradead.org, x86@kernel.org, lorenzo.stoakes@oracle.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
zokeefe@google.com, rientjes@google.com
Subject: Re: [PATCH v3 4/9] mm: introduce skip_none_ptes()
Date: Mon, 18 Nov 2024 11:35:15 +0800 [thread overview]
Message-ID: <2b48d313-4f66-47c8-98d8-8aa78db62b1b@bytedance.com> (raw)
In-Reply-To: <4edccc1a-2761-4a5a-89a6-7869c1b6b08a@redhat.com>
On 2024/11/15 22:59, David Hildenbrand wrote:
> On 15.11.24 15:41, Qi Zheng wrote:
>>
>>
>> On 2024/11/15 18:22, David Hildenbrand wrote:
>>>>>> *nr_skip = nr;
>>>>>>
>>>>>> and then:
>>>>>>
>>>>>> zap_pte_range
>>>>>> --> nr = do_zap_pte_range(tlb, vma, pte, addr, end, details,
>>>>>> &skip_nr,
>>>>>> rss, &force_flush, &force_break);
>>>>>> if (can_reclaim_pt) {
>>>>>> none_nr += count_pte_none(pte, nr);
>>>>>> none_nr += nr_skip;
>>>>>> }
>>>>>>
>>>>>> Right?
>>>>>
>>>>> Yes. I did not look closely at the patch that adds the counting of
>>>>
>>>> Got it.
>>>>
>>>>> pte_none though (to digest why it is required :) ).
>>>>
>>>> Because 'none_nr == PTRS_PER_PTE' is used in patch #7 to detect
>>>> empty PTE page.
>>>
>>> Okay, so the problem is that "nr" would be "all processed entries" but
>>> there are cases where we "process an entry but not zap it".
>>>
>>> What you really only want to know is "was any entry not zapped", which
>>> could be a simple input boolean variable passed into do_zap_pte_range?
>>>
>>> Because as soon as any entry was processed but no zapped, you can
>>> immediately give up on reclaiming that table.
>>
>> Yes, we can set can_reclaim_pt to false when a !pte_none() entry is
>> found in count_pte_none().
>
> I'm not sure if well need cont_pte_none(), but I'll have to take a look
> at your new patch to see how this fits together with doing the pte_none
> detection+skipping in do_zap_pte_range().
>
> I was wondering if you cannot simply avoid the additional scanning and
> simply set "can_reclaim_pt" if you skip a zap.
Maybe we can return the information whether the zap was skipped from
zap_present_ptes() and zap_nonpresent_ptes() through parameters like I
did in [PATCH v1 3/7] and [PATCH v1 4/7].
In theory, we can detect empty PTE pages in the following two ways:
1) If no zap is skipped, it means that all pte entries have been
zap, and the PTE page must be empty.
2) If all pte entries are detected to be none, then the PTE page is
empty.
In the error case, 1) may cause non-empty PTE pages to be reclaimed
(which is unacceptable), while the 2) will at most cause empty PTE pages
to not be reclaimed.
So the most reliable and efficient method may be:
a. If there is a zap that is skipped, stop scanning and do not reclaim
the PTE page;
b. Otherwise, as now, detect the empty PTE page through count_pte_none()
>
next prev parent reply other threads:[~2024-11-18 3:35 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-14 6:59 [PATCH v3 0/9] synchronously scan and reclaim empty user PTE pages Qi Zheng
2024-11-14 6:59 ` [PATCH v3 1/9] mm: khugepaged: recheck pmd state in retract_page_tables() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 2/9] mm: userfaultfd: recheck dst_pmd entry in move_pages_pte() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 3/9] mm: introduce zap_nonpresent_ptes() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 4/9] mm: introduce skip_none_ptes() Qi Zheng
2024-11-14 8:04 ` David Hildenbrand
2024-11-14 9:20 ` Qi Zheng
2024-11-14 12:32 ` David Hildenbrand
2024-11-14 12:51 ` Qi Zheng
2024-11-14 21:19 ` David Hildenbrand
2024-11-15 3:03 ` Qi Zheng
2024-11-15 10:22 ` David Hildenbrand
2024-11-15 14:41 ` Qi Zheng
2024-11-15 14:59 ` David Hildenbrand
2024-11-18 3:35 ` Qi Zheng [this message]
2024-11-18 9:29 ` David Hildenbrand
2024-11-18 10:34 ` Qi Zheng
2024-11-18 10:41 ` David Hildenbrand
2024-11-18 10:56 ` Qi Zheng
2024-11-18 10:59 ` David Hildenbrand
2024-11-18 11:13 ` Qi Zheng
2024-11-19 9:55 ` David Hildenbrand
2024-11-19 10:03 ` Qi Zheng
2024-11-14 6:59 ` [PATCH v3 5/9] mm: introduce do_zap_pte_range() Qi Zheng
2024-11-14 6:59 ` [PATCH v3 6/9] mm: make zap_pte_range() handle full within-PMD range Qi Zheng
2024-11-14 6:59 ` [PATCH v3 7/9] mm: pgtable: try to reclaim empty PTE page in madvise(MADV_DONTNEED) Qi Zheng
2024-11-14 6:59 ` [PATCH v3 8/9] x86: mm: free page table pages by RCU instead of semi RCU Qi Zheng
2024-11-14 7:00 ` [PATCH v3 9/9] x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64 Qi Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b48d313-4f66-47c8-98d8-8aa78db62b1b@bytedance.com \
--to=zhengqi.arch@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=muchun.song@linux.dev \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=vbabka@kernel.org \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox