From: Nico Pache <npache@redhat.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
aarcange@redhat.com, akpm@linux-foundation.org,
anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, byungchul@sk.com,
catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net,
dave.hansen@linux.intel.com, david@kernel.org, dev.jain@arm.com,
gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com,
jackmanb@google.com, jack@suse.cz, jannh@google.com,
jglisse@google.com, joshua.hahnjy@gmail.com, kas@kernel.org,
lance.yang@linux.dev, Liam.Howlett@oracle.com,
lorenzo.stoakes@oracle.com, mathieu.desnoyers@efficios.com,
matthew.brost@intel.com, mhiramat@kernel.org, mhocko@suse.com,
peterx@redhat.com, pfalcato@suse.de, rakie.kim@sk.com,
raquini@redhat.com, rdunlap@infradead.org,
richard.weiyang@gmail.com, rientjes@google.com,
rostedt@goodmis.org, rppt@kernel.org, ryan.roberts@arm.com,
shivankg@amd.com, sunnanyong@huawei.com, surenb@google.com,
thomas.hellstrom@linux.intel.com, tiwai@suse.de,
usamaarif642@gmail.com, vbabka@suse.cz, vishal.moola@gmail.com,
wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org,
yang@os.amperecomputing.com, ying.huang@linux.alibaba.com,
ziy@nvidia.com, zokeefe@google.com
Subject: Re: [PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers
Date: Wed, 18 Mar 2026 10:48:55 -0600 [thread overview]
Message-ID: <42341241-23dc-439e-b32b-da011743adce@redhat.com> (raw)
In-Reply-To: <5eae219d-62b8-4aea-905d-7a9b59b9892b@lucifer.local>
On 3/16/26 12:17 PM, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 11, 2026 at 03:13:11PM -0600, Nico Pache wrote:
>> The anonymous page fault handler in do_anonymous_page() open-codes the
>> sequence to map a newly allocated anonymous folio at the PTE level:
>> - construct the PTE entry
>> - add rmap
>> - add to LRU
>> - set the PTEs
>> - update the MMU cache.
>
> Yikes yeah this all needs work. Thanks for looking at this!
np! I believe it looks much cleaner now, and it has the added benefit of
cleaning up some of the mTHP patchset.
I also believe you suggested this so I'll add your SB when I add your RB tag :)
>
>>
>> Introduce a two helpers to consolidate this duplicated logic, mirroring the
>
> NIT: 'Introduce a two helpers' -> "introduce two helpers'
ack!
>
>> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>>
>> map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
>> references, adds anon rmap and LRU. This function also handles the
>> uffd_wp that can occur in the pf variant.
>>
>> map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
>> counter updates, and mTHP fault allocation statistics for the page fault
>> path.
>
> MEGA nit, not sure why you're not just putting this in a bullet list, just
> weird to see not-code indented here :)
Ill drop the indentation!
>
>>
>> The zero-page read path in do_anonymous_page() is also untangled from the
>> shared setpte label, since it does not allocate a folio and should not
>> share the same mapping sequence as the write path. Make nr_pages = 1
>> rather than relying on the variable. This makes it more clear that we
>> are operating on the zero page only.
>>
>> This refactoring will also help reduce code duplication between mm/memory.c
>> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
>> mapping that can be reused by future callers.
>
> Maybe worth mentioning subseqent patches that will use what you set up
> here?
ok ill add something like "... that will be used when adding mTHP support to
khugepaged, and may find future reuse by other callers." Speaking of which I
tried to leverage this elsewhere and I believe that will take extra focus and
time, but may be doable.
>
> Also you split things out into _nopf() and _pf() variants, it might be
> worth saying exactly why you're doing that or what you are preparing to do?
ok ill expand on this in the "bullet list" you reference above.
>
>>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>> Signed-off-by: Nico Pache <npache@redhat.com>
>
> There's nits above and below, but overall the logic looks good, so with
> nits addressed/reasonably responded to:
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Thanks :) Ill take care of those
>
>> ---
>> include/linux/mm.h | 4 ++++
>> mm/memory.c | 60 +++++++++++++++++++++++++++++++---------------
>> 2 files changed, 45 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 4c4fd55fc823..9fea354bd17f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -4903,4 +4903,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>>
>> void snapshot_page(struct page_snapshot *ps, const struct page *page);
>>
>> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
>> + struct vm_area_struct *vma, unsigned long addr,
>> + bool uffd_wp);
>
> How I hate how uffd infiltrates all our code like this.
>
> Not your fault :)
>
>> +
>> #endif /* _LINUX_MM_H */
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 6aa0ea4af1fc..5c8bf1eb55f5 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -5197,6 +5197,37 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>> return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
>> }
>>
>> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
>> + struct vm_area_struct *vma, unsigned long addr,
>> + bool uffd_wp)
>> +{
>> + unsigned int nr_pages = folio_nr_pages(folio);
>
> const would be good
>
>> + pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
>> +
>> + entry = pte_sw_mkyoung(entry);
>> +
>> + if (vma->vm_flags & VM_WRITE)
>> + entry = pte_mkwrite(pte_mkdirty(entry), vma);
>> + if (uffd_wp)
>> + entry = pte_mkuffd_wp(entry);
>> +
>> + folio_ref_add(folio, nr_pages - 1);
>> + folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
>> + folio_add_lru_vma(folio, vma);
>> + set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
>> + update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
>> +}
>> +
>> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
>> + struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
>> +{
>> + unsigned int order = folio_order(folio);
>
> const would be good here also!
ack on the const's
>
>> +
>> + map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
>> + add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
>
> Is 1 << order strictly right here? This field is a long value, so 1L <<
> order maybe? I get nervous about these shifts...
>
> Note that folio_large_nr_pages() uses 1L << order so that does seem
> preferable.
Ok sounds good thanks!
>
>> + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
>> +}
>> +
>> /*
>> * We enter with non-exclusive mmap_lock (to exclude vma changes,
>> * but allow concurrent faults), and pte mapped but not yet locked.
>> @@ -5243,7 +5274,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>> return handle_userfault(vmf, VM_UFFD_MISSING);
>> }
>> - goto setpte;
>> + if (vmf_orig_pte_uffd_wp(vmf))
>> + entry = pte_mkuffd_wp(entry);
>> + set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>
> How I _despise_ how uffd is implemented in mm. Feels like open coded
> nonsense spills out everywhere.
>
> Not your fault obviously :)
>
>> +
>> + /* No need to invalidate - it was non-present before */
>> + update_mmu_cache_range(vmf, vma, addr, vmf->pte,
>> + /*nr_pages=*/ 1);
>
> Is there any point in passing vmf here given you pass NULL above, and it
> appears that nobody actually uses this? I guess it doesn't matter but
> seeing this immediately made we question why you set it in one, and not the
> other?
>
> Maybe I'm mistaken and some arch uses it? Don't think so though.
>
> Also can't we then just use update_mmu_cache() which is the single-page
> wrapper of this AFAICT? That'd make it even simpler.
>
> Having done this, is there any reason to keep the annoying and confusing
> initial assignment of nr_pages = 1 at declaration time?
>
> It seems that nr_pages is unconditionally assigned before it's used
> anywhere now at line 5298:
>
> nr_pages = folio_nr_pages(folio);
> addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
> ...
>
> It's kinda weird to use nr_pages again after you go out of your way to
> avoid it using folio_nr_pages() in map_anon_folio_pte_nopf() and
> folio_order() in map_anon_folio_pte_pf().
Yes I believe so, thank you that is cleaner! In the past I had nr_pages as a
variable but David suggested being explicit with the "1" so it would be obvious
its a single page... update_mmu_cache() should indicate the same thing.
>
> But yeah, ok we align the address and it's yucky maybe leave for now (but
> we can definitely stop defaulting nr_pages to 1 :)
>
>> + goto unlock;
>> }
>>
>> /* Allocate our own private page. */
>> @@ -5267,11 +5305,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> */
>> __folio_mark_uptodate(folio);
>>
>> - entry = folio_mk_pte(folio, vma->vm_page_prot);
>> - entry = pte_sw_mkyoung(entry);
>> - if (vma->vm_flags & VM_WRITE)
>> - entry = pte_mkwrite(pte_mkdirty(entry), vma);
>> -
>> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>> if (!vmf->pte)
>> goto release;
>> @@ -5293,19 +5326,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>> folio_put(folio);
>> return handle_userfault(vmf, VM_UFFD_MISSING);
>> }
>> -
>> - folio_ref_add(folio, nr_pages - 1);
>> - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
>> - count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>> - folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
>> - folio_add_lru_vma(folio, vma);
>> -setpte:
>> - if (vmf_orig_pte_uffd_wp(vmf))
>> - entry = pte_mkuffd_wp(entry);
>> - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
>> -
>> - /* No need to invalidate - it was non-present before */
>> - update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
>> + map_anon_folio_pte_pf(folio, vmf->pte, vma, addr,
>> + vmf_orig_pte_uffd_wp(vmf));
>
> So we're going from:
>
> entry = folio_mk_pte(...)
> entry = pte_sw_mkyoung(...)
> if (write)
> entry = pte_mkwrite(also dirty...)
> folio_ref_add(nr_pages - 1)
> add_mm_counter(... MM_ANON_PAGES, nr_pages)
> count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC)
> folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
> folio_add_lru_vma(folio, vma)
> if (vmf_orig_pte_uffd_wp(vmf))
> entry = pte_mkuffd_wp(entry)
> set_ptes(mm, addr, vmf->pte, entry, nr_pages)
> update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages)
>
> To:
>
> <map_anon_folio_pte_nopf>
> entry = folio_mk_pte(...)
> entry = pte_sw_mkyoung(...)
> if (write)
> entry = pte_mkwrite(also dirty...)
> if (vmf_orig_pte_uffd_wp(vmf) <passed in via uffd_wp>) <-- reordered
> entry = pte_mkuffd_wp(entry)
> folio_ref_add(nr_pages - 1)
> folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
> folio_add_lru_vma(folio, vma)
> set_ptes(mm, addr, pte, entry, nr_pages)
> update_mmu_cache_range(NULL, vma, addr, pte, nr_pages)
>
> <map_anon_folio_pte_pf>
> add_mm_counter(... MM_ANON_PAGES, nr_pages) <-- reordered
> count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC) <-- reodrdered
>
> But the reorderings seem fine, and it is achieving the same thing.
>
> All the parameters being passed seem correct too.
Thanks for the review and verifying the logic :) I was particularly scared of
changing the page fault handler, so im glad multiple people have confirmed this
seems fine.
Cheers,
-- Nico
>
>> unlock:
>> if (vmf->pte)
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>> --
>> 2.53.0
>>
>
> Cheers, Lorenzo
>
next prev parent reply other threads:[~2026-03-18 16:49 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 21:13 [PATCH mm-unstable v3 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
2026-03-11 21:13 ` [PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-03-16 18:17 ` Lorenzo Stoakes (Oracle)
2026-03-18 16:48 ` Nico Pache [this message]
2026-03-11 21:13 ` [PATCH mm-unstable v3 2/5] mm: introduce is_pmd_order helper Nico Pache
2026-03-11 21:13 ` [PATCH mm-unstable v3 3/5] mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
2026-03-16 18:18 ` Lorenzo Stoakes (Oracle)
2026-03-18 16:50 ` Nico Pache
2026-03-11 21:13 ` [PATCH mm-unstable v3 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2026-03-11 21:13 ` [PATCH mm-unstable v3 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-03-12 2:04 ` Wei Yang
2026-03-18 16:54 ` Nico Pache
2026-03-20 7:53 ` Wei Yang
2026-03-12 9:27 ` David Hildenbrand (Arm)
2026-03-13 8:33 ` Baolin Wang
2026-03-15 15:16 ` Lance Yang
2026-03-16 18:54 ` Lorenzo Stoakes (Oracle)
2026-03-18 17:22 ` Nico Pache
2026-03-19 16:01 ` Lorenzo Stoakes (Oracle)
2026-03-11 21:34 ` [PATCH mm-unstable v3 0/5] mm: khugepaged cleanups and mTHP prerequisites Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42341241-23dc-439e-b32b-da011743adce@redhat.com \
--to=npache@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=byungchul@sk.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.brost@intel.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shivankg@amd.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox