linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
	hanchuanhua@oppo.com, v-songbaohua@oppo.com
Subject: Re: [PATCH 1/2] mm: bail out do_swap_page() when no PTE table exist
Date: Sun, 21 Dec 2025 10:40:33 +0100	[thread overview]
Message-ID: <34fafda2-9f54-408a-be9f-d54c39b72878@kernel.org> (raw)
In-Reply-To: <20251220032407.xeszwxus664jp7tq@master>

On 12/20/25 04:24, Wei Yang wrote:
> On Fri, Dec 19, 2025 at 09:42:16AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 12/16/25 08:59, Wei Yang wrote:
>>> The alloc_swap_folio() function scans the PTE table to determine the
>>> potential size (order) of the folio content to be swapped in.
>>>
>>> Currently, if the call to pte_offset_map_lock() returns NULL, it
>>> indicates that no PTE table exists for that range. Despite this, the
>>> code proceeds to allocate an order-0 folio and continues the swap-in
>>> process. This is unnecessary if the required table is absent.
>>>
>>> This commit modifies the logic to immediately bail out of the swap-in
>>> process when the PTE table is missing (i.e., pte_offset_map_lock()
>>> returns NULL). This ensures we do not attempt to continue swapping when
>>> the page table structure is incomplete or changed, preventing
>>> unnecessary work.
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Chuanhua Han <hanchuanhua@oppo.com>
>>> Cc: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>    mm/memory.c | 4 +++-
>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 2a55edc48a65..1b8ef4f0ea60 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4566,7 +4566,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>>    	pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>>>    				  vmf->address & PMD_MASK, &ptl);
>>>    	if (unlikely(!pte))
>>> -		goto fallback;
>>> +		return ERR_PTR(-EAGAIN);
>>>    	/*
>>>    	 * For do_swap_page, find the highest order where the aligned range is
>>> @@ -4709,6 +4709,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>    		    __swap_count(entry) == 1) {
>>>    			/* skip swapcache */
>>>    			folio = alloc_swap_folio(vmf);
>>> +			if (IS_ERR(folio))
>>> +				goto out;
>>>    			if (folio) {
>>>    				__folio_set_locked(folio);
>>>    				__folio_set_swapbacked(folio);
>>
>> How would we be able to even trigger this?
> 
> To be honest, I haven't thought about it. Thanks for the question.

Always ask yourself that question when trying to optimize something :)

Optimizing out unnecessary work in something that doesn't happen all the 
frequently is not particularly helpful.

> 
>>
>> Trigger a swap fault with concurrent MADV_DONTNEED and concurrent page table
>> reclaim.
>>
> 
> Let me try to catch up with you.
> 
>    swap fault is triggered because user is access this range.

And there was a page table with a swap entry.

>    MADV_DONTNEED is also triggered by user and means this range is not necessary.

Right, another thread could be zapping that range.

> 
> So, we don't expect user will do these two contrary behavior at the same time.
> This is your point, right?

They could. And it's valid. It just likely doesn't make a lot of sense :)

> 
>> Is that really something we should be worrying about?

But for the page table to vanish you'd actually need page table reclaim 
(as triggered by MADV_DONTNEED) to zap the whole page table.

>>
> 
> Now I question myself why alloc_anon_folio() need to bail out like this.

Your patch adds more complication by making it valid for 
alloc_swap_folio() to return an error pointer. I don't like that when 
there is no evidence that we frequently trigger that.

Also, there is no real difference between not finding a page table 
(reclaimed) or if the pte changed (as handled by can_swapin_thp()).

In fact, the latter (PTE changed) is even much more likely than a 
reclaimed page table.

-- 
Cheers

David


  reply	other threads:[~2025-12-21  9:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  7:59 [PATCH 0/3] mm/memory: align alloc_swap_folio() logic with alloc_anon_folio() Wei Yang
2025-12-16  7:59 ` [PATCH 1/2] mm: bail out do_swap_page() when no PTE table exist Wei Yang
2025-12-19  8:42   ` David Hildenbrand (Red Hat)
2025-12-20  3:24     ` Wei Yang
2025-12-21  9:40       ` David Hildenbrand (Red Hat) [this message]
2025-12-16  7:59 ` [PATCH 2/2] mm: avoid unnecessary PTE table lock during initial swap folio scan Wei Yang
2025-12-19  8:47   ` David Hildenbrand (Red Hat)
2025-12-20  3:36     ` Wei Yang
2025-12-21  9:47       ` David Hildenbrand (Red Hat)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34fafda2-9f54-408a-be9f-d54c39b72878@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=hanchuanhua@oppo.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox