From: "David Hildenbrand (Red Hat)" <david@kernel.org>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
hanchuanhua@oppo.com, v-songbaohua@oppo.com
Subject: Re: [PATCH 2/2] mm: avoid unnecessary PTE table lock during initial swap folio scan
Date: Sun, 21 Dec 2025 10:47:19 +0100 [thread overview]
Message-ID: <d8854406-7b5a-4c54-b93b-5362cf4d3ad4@kernel.org> (raw)
In-Reply-To: <20251220033627.xy6yralcx76vucs7@master>
On 12/20/25 04:36, Wei Yang wrote:
> On Fri, Dec 19, 2025 at 09:47:17AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 12/16/25 08:59, Wei Yang wrote:
>>> The alloc_swap_folio() function performs an initial scan of the PTE
>>> table solely to determine the potential size (order) of the folio
>>> content that needs to be swapped in.
>>>
>>> Locking the PTE table during this initial read is unnecessary for two
>>> reasons:
>>>
>>> * We are not writing to the PTE table at this stage.
>>>
>>> * The code will re-check and lock the table again immediately before
>>> any actual modification is attempted.
>>>
>>> This commit refactors the initial scan to map the PTE table without
>>> acquiring the lock. This reduces contention and overhead, improving
>>> performance of the swap-in path.
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Chuanhua Han <hanchuanhua@oppo.com>
>>> Cc: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>> mm/memory.c | 6 ++----
>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 1b8ef4f0ea60..f8d6adfa83d7 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4529,7 +4529,6 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>> struct folio *folio;
>>> unsigned long addr;
>>> softleaf_t entry;
>>> - spinlock_t *ptl;
>>> pte_t *pte;
>>> gfp_t gfp;
>>> int order;
>>> @@ -4563,8 +4562,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>> if (!orders)
>>> goto fallback;
>>> - pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>>> - vmf->address & PMD_MASK, &ptl);
>>> + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
>>> if (unlikely(!pte))
>>
>> Can can_swapin_thp() deal with concurrent unmap and possible freeing of
>> pages+swap?
>>
>> We have some code that depends on swap entries stabilizing the swap device
>> etc; the moment you allow for that concurrently to go away you open a can of
>> worns.
>>
>
> Sorry I don't follow you.
>
> You mean some swap entry would be unmapped and cleared?
We could concurrently be zapping the page table. That means, after we
read a swap-PTE, we could be concurrently freeing the swap entry from a
different thread.
So the moment you depend on something that goes from PTE to something in
the swap subsystem you might be in trouble.
swap_pte_batch() does things like lookup_swap_cgroup_id(), and
can_swapin_thp() does things like swap_zeromap_batch() and
non_swapcache_batch().
I don't know what happens if we can have concurrent zap+freeing of swap
entries there, and if we could trigger some undefined behavior.
Therefore, we have to a bit more careful here.
Because I assume this is the first time that we walk swap entries
without the PTE lock held?
--
Cheers
David
prev parent reply other threads:[~2025-12-21 9:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-16 7:59 [PATCH 0/3] mm/memory: align alloc_swap_folio() logic with alloc_anon_folio() Wei Yang
2025-12-16 7:59 ` [PATCH 1/2] mm: bail out do_swap_page() when no PTE table exist Wei Yang
2025-12-19 8:42 ` David Hildenbrand (Red Hat)
2025-12-20 3:24 ` Wei Yang
2025-12-21 9:40 ` David Hildenbrand (Red Hat)
2025-12-16 7:59 ` [PATCH 2/2] mm: avoid unnecessary PTE table lock during initial swap folio scan Wei Yang
2025-12-19 8:47 ` David Hildenbrand (Red Hat)
2025-12-20 3:36 ` Wei Yang
2025-12-21 9:47 ` David Hildenbrand (Red Hat) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d8854406-7b5a-4c54-b93b-5362cf4d3ad4@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=hanchuanhua@oppo.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox