From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
hanchuanhua@oppo.com, v-songbaohua@oppo.com
Subject: Re: [PATCH 1/2] mm: bail out do_swap_page() when no PTE table exist
Date: Sun, 21 Dec 2025 10:40:33 +0100 [thread overview]
Message-ID: <34fafda2-9f54-408a-be9f-d54c39b72878@kernel.org> (raw)
In-Reply-To: <20251220032407.xeszwxus664jp7tq@master>
On 12/20/25 04:24, Wei Yang wrote:
> On Fri, Dec 19, 2025 at 09:42:16AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 12/16/25 08:59, Wei Yang wrote:
>>> The alloc_swap_folio() function scans the PTE table to determine the
>>> potential size (order) of the folio content to be swapped in.
>>>
>>> Currently, if the call to pte_offset_map_lock() returns NULL, it
>>> indicates that no PTE table exists for that range. Despite this, the
>>> code proceeds to allocate an order-0 folio and continues the swap-in
>>> process. This is unnecessary if the required table is absent.
>>>
>>> This commit modifies the logic to immediately bail out of the swap-in
>>> process when the PTE table is missing (i.e., pte_offset_map_lock()
>>> returns NULL). This ensures we do not attempt to continue swapping when
>>> the page table structure is incomplete or changed, preventing
>>> unnecessary work.
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Chuanhua Han <hanchuanhua@oppo.com>
>>> Cc: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>> mm/memory.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 2a55edc48a65..1b8ef4f0ea60 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4566,7 +4566,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>> pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>>> vmf->address & PMD_MASK, &ptl);
>>> if (unlikely(!pte))
>>> - goto fallback;
>>> + return ERR_PTR(-EAGAIN);
>>> /*
>>> * For do_swap_page, find the highest order where the aligned range is
>>> @@ -4709,6 +4709,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>> __swap_count(entry) == 1) {
>>> /* skip swapcache */
>>> folio = alloc_swap_folio(vmf);
>>> + if (IS_ERR(folio))
>>> + goto out;
>>> if (folio) {
>>> __folio_set_locked(folio);
>>> __folio_set_swapbacked(folio);
>>
>> How would we be able to even trigger this?
>
> To be honest, I haven't thought about it. Thanks for the question.
Always ask yourself that question when trying to optimize something :)
Optimizing out unnecessary work in something that doesn't happen all the
frequently is not particularly helpful.
>
>>
>> Trigger a swap fault with concurrent MADV_DONTNEED and concurrent page table
>> reclaim.
>>
>
> Let me try to catch up with you.
>
> swap fault is triggered because user is access this range.
And there was a page table with a swap entry.
> MADV_DONTNEED is also triggered by user and means this range is not necessary.
Right, another thread could be zapping that range.
>
> So, we don't expect user will do these two contrary behavior at the same time.
> This is your point, right?
They could. And it's valid. It just likely doesn't make a lot of sense :)
>
>> Is that really something we should be worrying about?
But for the page table to vanish you'd actually need page table reclaim
(as triggered by MADV_DONTNEED) to zap the whole page table.
>>
>
> Now I question myself why alloc_anon_folio() need to bail out like this.
Your patch adds more complication by making it valid for
alloc_swap_folio() to return an error pointer. I don't like that when
there is no evidence that we frequently trigger that.
Also, there is no real difference between not finding a page table
(reclaimed) or if the pte changed (as handled by can_swapin_thp()).
In fact, the latter (PTE changed) is even much more likely than a
reclaimed page table.
--
Cheers
David
next prev parent reply other threads:[~2025-12-21 9:40 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-16 7:59 [PATCH 0/3] mm/memory: align alloc_swap_folio() logic with alloc_anon_folio() Wei Yang
2025-12-16 7:59 ` [PATCH 1/2] mm: bail out do_swap_page() when no PTE table exist Wei Yang
2025-12-19 8:42 ` David Hildenbrand (Red Hat)
2025-12-20 3:24 ` Wei Yang
2025-12-21 9:40 ` David Hildenbrand (Red Hat) [this message]
2025-12-16 7:59 ` [PATCH 2/2] mm: avoid unnecessary PTE table lock during initial swap folio scan Wei Yang
2025-12-19 8:47 ` David Hildenbrand (Red Hat)
2025-12-20 3:36 ` Wei Yang
2025-12-21 9:47 ` David Hildenbrand (Red Hat)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34fafda2-9f54-408a-be9f-d54c39b72878@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=hanchuanhua@oppo.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox