Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nico Pache <npache@redhat.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	aarcange@redhat.com,  akpm@linux-foundation.org,
	anshuman.khandual@arm.com, apopple@nvidia.com,
	 baohua@kernel.org, baolin.wang@linux.alibaba.com,
	byungchul@sk.com,  catalin.marinas@arm.com, cl@gentwo.org,
	corbet@lwn.net,  dave.hansen@linux.intel.com, david@kernel.org,
	dev.jain@arm.com,  gourry@gourry.net, hannes@cmpxchg.org,
	hughd@google.com, jackmanb@google.com,  jack@suse.cz,
	jannh@google.com, jglisse@google.com, joshua.hahnjy@gmail.com,
	 kas@kernel.org, lance.yang@linux.dev, Liam.Howlett@oracle.com,
	 lorenzo.stoakes@oracle.com, mathieu.desnoyers@efficios.com,
	 matthew.brost@intel.com, mhiramat@kernel.org, mhocko@suse.com,
	 peterx@redhat.com, pfalcato@suse.de, rakie.kim@sk.com,
	raquini@redhat.com,  rdunlap@infradead.org,
	richard.weiyang@gmail.com, rientjes@google.com,
	 rostedt@goodmis.org, rppt@kernel.org, ryan.roberts@arm.com,
	shivankg@amd.com,  sunnanyong@huawei.com, surenb@google.com,
	thomas.hellstrom@linux.intel.com,  tiwai@suse.de,
	usamaarif642@gmail.com, vbabka@suse.cz, vishal.moola@gmail.com,
	 wangkefeng.wang@huawei.com, will@kernel.org,
	willy@infradead.org,  yang@os.amperecomputing.com,
	ying.huang@linux.alibaba.com, zokeefe@google.com
Subject: Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
Date: Thu, 12 Feb 2026 12:45:10 -0700	[thread overview]
Message-ID: <CAA1CXcDNX6g8c09q=oaW6RdPYrpEfZP-Hk2rXo-2XqREm68Twg@mail.gmail.com> (raw)
In-Reply-To: <048C7077-3E54-4DFE-A25D-05D3CCB132D6@nvidia.com>

On Thu, Feb 12, 2026 at 9:09 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 11 Feb 2026, at 21:18, Nico Pache wrote:
>
> > The anonymous page fault handler in do_anonymous_page() open-codes the
> > sequence to map a newly allocated anonymous folio at the PTE level:
> >       - construct the PTE entry
> >       - add rmap
> >       - add to LRU
> >       - set the PTEs
> >       - update the MMU cache.
> >
> > Introduce a two helpers to consolidate this duplicated logic, mirroring the
> > existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> >
> >       map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> >       references, adds anon rmap and LRU. This function also handles the
> >       uffd_wp that can occur in the pf variant.
> >
> >       map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> >       counter updates, and mTHP fault allocation statistics for the page fault
> >       path.
> >
> > The zero-page read path in do_anonymous_page() is also untangled from the
> > shared setpte label, since it does not allocate a folio and should not
> > share the same mapping sequence as the write path. Make nr_pages = 1
> > rather than relying on the variable. This makes it more clear that we
> > are operating on the zero page only.
> >
> > This refactoring will also help reduce code duplication between mm/memory.c
> > and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> > mapping that can be reused by future callers.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  include/linux/mm.h |  4 ++++
> >  mm/memory.c        | 56 ++++++++++++++++++++++++++++++----------------
> >  2 files changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index f8a8fd47399c..c3aa1f51e020 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
> >
> >  void snapshot_page(struct page_snapshot *ps, const struct page *page);
> >
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             bool uffd_wp);
> > +
> >  #endif /* _LINUX_MM_H */
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8c19af97f0a0..61c2277c9d9f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> >       return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> >  }
> >
> > +
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             bool uffd_wp)
> > +{
> > +     pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> > +     unsigned int nr_pages = folio_nr_pages(folio);
> > +
> > +     entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> > +     if (uffd_wp)
> > +             entry = pte_mkuffd_wp(entry);
> > +
> > +     folio_ref_add(folio, nr_pages - 1);
> > +     folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > +     folio_add_lru_vma(folio, vma);
> > +     set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> > +     update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
>
> Copy the comment
> /* No need to invalidate - it was non-present before */
> above it please.

Good call thank you!

>
> > +}
> > +
> > +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             unsigned int nr_pages, bool uffd_wp)
> > +{
> > +     map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> > +     add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > +     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > +}
> > +
> > +
> >  /*
> >   * We enter with non-exclusive mmap_lock (to exclude vma changes,
> >   * but allow concurrent faults), and pte mapped but not yet locked.
> > @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >                       pte_unmap_unlock(vmf->pte, vmf->ptl);
> >                       return handle_userfault(vmf, VM_UFFD_MISSING);
> >               }
> > -             goto setpte;
> > +             if (vmf_orig_pte_uffd_wp(vmf))
> > +                     entry = pte_mkuffd_wp(entry);
> > +             set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>
> entry is only used in this if statement, you can move its declaration inside.

Ack!

>
> > +
> > +             /* No need to invalidate - it was non-present before */
> > +             update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
> > +             goto unlock;
> >       }
> >
> >       /* Allocate our own private page. */
> > @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >        */
> >       __folio_mark_uptodate(folio);
> >
> > -     entry = folio_mk_pte(folio, vma->vm_page_prot);
> > -     entry = pte_sw_mkyoung(entry);
>
> It is removed, can you explain why?

Thanks for catching that (as others have too), I will add it back and
run my testing again to make sure everything is still ok. As Joshua
pointed out it may only affect MIPS, hence no issues in my testing.

>
> > -     if (vma->vm_flags & VM_WRITE)
> > -             entry = pte_mkwrite(pte_mkdirty(entry), vma);
>
> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).

Yes, upon further investigation this does seem to slightly change the behavior.

pte_mkdirty() is now being called unconditionally from the VM_WRITE
flag. I noticed other callers in the kernel doing this too.

Is it ok to leave the pte_mkdirty() or should I go back to using
pte_mkwrite with the conditional guarding both mkwrite and mkdirty?

>
> > -
>
> The above code is moved into map_anon_folio_pte_nopf(), thus executed
> later than before the change. folio, vma->vm_flags, and vma->vm_page_prot
> are not changed between, so there should be no functional change.
> But it is better to explain it in the commit message to make review easier.

Will do! Thank you for confirming :) I am pretty sure we can make this
move without any functional change.

>
> >       vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> >       if (!vmf->pte)
> >               goto release;
> > @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >               folio_put(folio);
> >               return handle_userfault(vmf, VM_UFFD_MISSING);
> >       }
> > -
> > -     folio_ref_add(folio, nr_pages - 1);
>
> > -     add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > -     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>
> These counter updates are moved after folio_add_new_anon_rmap(),
> mirroring map_anon_folio_pmd_pf()’s order. Looks good to me.
>
> > -     folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > -     folio_add_lru_vma(folio, vma);
> > -setpte:
>
> > -     if (vmf_orig_pte_uffd_wp(vmf))
> > -             entry = pte_mkuffd_wp(entry);
>
> This is moved above folio_ref_add() in map_anon_folio_pte_nopf(), but
> no functional change is expected.
>
> > -     set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> > -
> > -     /* No need to invalidate - it was non-present before */
> > -     update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> > +     map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
> >  unlock:
> >       if (vmf->pte)
> >               pte_unmap_unlock(vmf->pte, vmf->ptl);
> > --
> > 2.53.0
>
> 3 things:
> 1. Copy the comment for update_mmu_cache_range() in map_anon_folio_pte_nopf().
> 2. Make pte_t entry local in zero-page handling.
> 3. Explain why entry = pte_sw_mkyoung(entry) is removed.
>
> Thanks.

Thanks for the review :) Ill fix the issues stated above!

-- Nico

>
>
> Best Regards,
> Yan, Zi
>

next prev parent reply	other threads:[~2026-02-12 19:45 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12  2:18 [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
2026-02-12  2:18 ` [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-02-12 14:38   ` Pedro Falcato
2026-02-12 15:55   ` Joshua Hahn
2026-02-12 19:33     ` Nico Pache
2026-02-12 16:09   ` Zi Yan
2026-02-12 19:45     ` Nico Pache [this message]
2026-02-12 20:06       ` Zi Yan
2026-02-12 20:13         ` David Hildenbrand (Arm)
2026-02-12  2:18 ` [PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper Nico Pache
2026-02-12 14:40   ` Pedro Falcato
2026-02-12 16:11   ` Zi Yan
2026-02-12 19:45   ` David Hildenbrand (Arm)
2026-02-13  3:51   ` Barry Song
2026-02-14  7:24   ` Lance Yang
2026-02-20 10:38   ` Dev Jain
2026-02-20 10:42     ` David Hildenbrand (Arm)
2026-02-12  2:18 ` [PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
2026-02-12  6:56   ` Vernon Yang
2026-02-12 14:45   ` Pedro Falcato
2026-02-12 16:21   ` Zi Yan
2026-02-12 19:51   ` David Hildenbrand (Arm)
2026-02-12  2:23 ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2026-02-12  2:23   ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 19:52   ` [PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* David Hildenbrand (Arm)
2026-02-12  2:25 ` [PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-02-12 15:33   ` Pedro Falcato
2026-02-12 17:34   ` Zi Yan
2026-02-12 20:03   ` David Hildenbrand (Arm)
2026-02-12 20:26     ` Nico Pache
2026-02-14  8:24       ` Lance Yang
2026-02-12  2:26 ` [PATCH mm-unstable v1 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA1CXcDNX6g8c09q=oaW6RdPYrpEfZP-Hk2rXo-2XqREm68Twg@mail.gmail.com' \
    --to=npache@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=byungchul@sk.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jackmanb@google.com \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.brost@intel.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=raquini@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shivankg@amd.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox