linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
	hanchuanhua@oppo.com, v-songbaohua@oppo.com
Subject: Re: [PATCH 2/2] mm: avoid unnecessary PTE table lock during initial swap folio scan
Date: Sun, 21 Dec 2025 10:47:19 +0100	[thread overview]
Message-ID: <d8854406-7b5a-4c54-b93b-5362cf4d3ad4@kernel.org> (raw)
In-Reply-To: <20251220033627.xy6yralcx76vucs7@master>

On 12/20/25 04:36, Wei Yang wrote:
> On Fri, Dec 19, 2025 at 09:47:17AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 12/16/25 08:59, Wei Yang wrote:
>>> The alloc_swap_folio() function performs an initial scan of the PTE
>>> table solely to determine the potential size (order) of the folio
>>> content that needs to be swapped in.
>>>
>>> Locking the PTE table during this initial read is unnecessary for two
>>> reasons:
>>>
>>>     * We are not writing to the PTE table at this stage.
>>>
>>>     * The code will re-check and lock the table again immediately before
>>>       any actual modification is attempted.
>>>
>>> This commit refactors the initial scan to map the PTE table without
>>> acquiring the lock. This reduces contention and overhead, improving
>>> performance of the swap-in path.
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Chuanhua Han <hanchuanhua@oppo.com>
>>> Cc: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>    mm/memory.c | 6 ++----
>>>    1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 1b8ef4f0ea60..f8d6adfa83d7 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4529,7 +4529,6 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>>    	struct folio *folio;
>>>    	unsigned long addr;
>>>    	softleaf_t entry;
>>> -	spinlock_t *ptl;
>>>    	pte_t *pte;
>>>    	gfp_t gfp;
>>>    	int order;
>>> @@ -4563,8 +4562,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>>    	if (!orders)
>>>    		goto fallback;
>>> -	pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>>> -				  vmf->address & PMD_MASK, &ptl);
>>> +	pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
>>>    	if (unlikely(!pte))
>>
>> Can can_swapin_thp() deal with concurrent unmap and possible freeing of
>> pages+swap?
>>
>> We have some code that depends on swap entries stabilizing the swap device
>> etc; the moment you allow for that concurrently to go away you open a can of
>> worns.
>>
> 
> Sorry I don't follow you.
> 
> You mean some swap entry would be unmapped and cleared?

We could concurrently be zapping the page table. That means, after we 
read a swap-PTE, we could be concurrently freeing the swap entry from a 
different thread.

So the moment you depend on something that goes from PTE to something in 
the swap subsystem you might be in trouble.

swap_pte_batch() does things like lookup_swap_cgroup_id(), and 
can_swapin_thp() does things like swap_zeromap_batch() and 
non_swapcache_batch().

I don't know what happens if we can have concurrent zap+freeing of swap 
entries there, and if we could trigger some undefined behavior.

Therefore, we have to a bit more careful here.

Because I assume this is the first time that we walk swap entries 
without the PTE lock held?

-- 
Cheers

David


      reply	other threads:[~2025-12-21  9:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  7:59 [PATCH 0/3] mm/memory: align alloc_swap_folio() logic with alloc_anon_folio() Wei Yang
2025-12-16  7:59 ` [PATCH 1/2] mm: bail out do_swap_page() when no PTE table exist Wei Yang
2025-12-19  8:42   ` David Hildenbrand (Red Hat)
2025-12-20  3:24     ` Wei Yang
2025-12-21  9:40       ` David Hildenbrand (Red Hat)
2025-12-16  7:59 ` [PATCH 2/2] mm: avoid unnecessary PTE table lock during initial swap folio scan Wei Yang
2025-12-19  8:47   ` David Hildenbrand (Red Hat)
2025-12-20  3:36     ` Wei Yang
2025-12-21  9:47       ` David Hildenbrand (Red Hat) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8854406-7b5a-4c54-b93b-5362cf4d3ad4@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=hanchuanhua@oppo.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox