From: Mike Rapoport <rppt@kernel.org>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
Hugh Dickins <hughd@google.com>,
James Houghton <jthoughton@google.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Michal Hocko <mhocko@suse.com>,
Muchun Song <muchun.song@linux.dev>,
Nikita Kalyazin <kalyazin@amazon.com>,
Oscar Salvador <osalvador@suse.de>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Shuah Khan <shuah@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH RFC 10/17] shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops
Date: Sun, 8 Feb 2026 12:35:43 +0200 [thread overview]
Message-ID: <aYhm_4difwN5XXxe@kernel.org> (raw)
In-Reply-To: <aYIzCuh8cjd09zrP@x1.local>
On Tue, Feb 03, 2026 at 12:40:26PM -0500, Peter Xu wrote:
> On Tue, Jan 27, 2026 at 09:29:29PM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use
> > them in __mfill_atomic_pte() to add shmem folios to page cache and
> > remove them in case of error.
> >
> > Implement these methods in shmem along with vm_uffd_ops->alloc_folio()
> > and drop shmem_mfill_atomic_pte().
> >
> > Since userfaultfd now does not reference any functions from shmem, drop
> > include if linux/shmem_fs.h from mm/userfaultfd.c
> >
> > mfill_atomic_install_pte() is not used anywhere outside of
> > mm/userfaultfd, make it static.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> This patch looks like a real nice cleanup on its own, thanks Mike!
>
> I guess I never tried to read into shmem accountings, now after I read some
> of the codes I don't see any issue with your change. We can also wait for
> some shmem developers double check those. Comments inline below on
> something I spot.
>
> >
> > fixup
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> [unexpected lines can be removed here]
Sure :)
> > ---
> > include/linux/shmem_fs.h | 14 ----
> > include/linux/userfaultfd_k.h | 20 +++--
> > mm/shmem.c | 148 ++++++++++++----------------------
> > mm/userfaultfd.c | 79 +++++++++---------
> > 4 files changed, 106 insertions(+), 155 deletions(-)
> >
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index e2069b3179c4..754f17e5b53c 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -97,6 +97,21 @@ struct vm_uffd_ops {
> > */
> > struct folio *(*alloc_folio)(struct vm_area_struct *vma,
> > unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request.
> > + * Should lock the folio and add it to VMA's page cache.
> > + * Returns 0 on success, error code on failre.
>
> failure
Thanks, will fix.
> > + */
> > + int (*filemap_add)(struct folio *folio, struct vm_area_struct *vma,
> > + unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request on the error
> > + * handling path.
> > + * Should revert the operation of ->filemap_add().
> > + * The folio should be unlocked, but the reference to it should not be
> > + * dropped.
>
> Might be slightly misleading to explicitly mention this? As page cache
> also holds references and IIUC they need to be dropped there. But I get
> your point, on keeping the last refcount due to allocation.
>
> IMHO the "should revert the operation of ->filemap_add()" is good enough
> and accurately describes it.
Yeah, sounds good.
> > + */
> > + void (*filemap_remove)(struct folio *folio, struct vm_area_struct *vma);
> > };
> >
> > /* A combined operation mode + behavior flags. */
...
> > +static int shmem_mfill_filemap_add(struct folio *folio,
> > + struct vm_area_struct *vma,
> > + unsigned long addr)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > + struct address_space *mapping = inode->i_mapping;
> > + pgoff_t pgoff = linear_page_index(vma, addr);
> > + gfp_t gfp = mapping_gfp_mask(mapping);
> > + int err;
> > +
> > __folio_set_locked(folio);
> > __folio_set_swapbacked(folio);
> > - __folio_mark_uptodate(folio);
> > -
> > - ret = -EFAULT;
> > - max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> > - if (unlikely(pgoff >= max_off))
> > - goto out_release;
> >
> > - ret = mem_cgroup_charge(folio, dst_vma->vm_mm, gfp);
> > - if (ret)
> > - goto out_release;
> > - ret = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > - if (ret)
> > - goto out_release;
> > + err = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > + if (err)
> > + goto err_unlock;
> >
> > - ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> > - &folio->page, true, flags);
> > - if (ret)
> > - goto out_delete_from_cache;
> > + if (shmem_inode_acct_blocks(inode, 1)) {
>
> We used to do this early before allocation, IOW, I think we still have an
> option to leave this to alloc_folio() hook. However I don't see an issue
> either keeping it in filemap_add(). Maybe this movement should better be
> spelled out in the commit message anyway on how this decision is made.
>
> IIUC it's indeed safe we move this acct_blocks() here, I even see Hugh
> mentioned such in an older commit 3022fd7af96, but Hugh left uffd alone at
> that time:
>
> Userfaultfd is a foreign country: they do things differently there, and
> for good reason - to avoid mmap_lock deadlock. Leave ordering in
> shmem_mfill_atomic_pte() untouched for now, but I would rather like to
> mesh it better with shmem_get_folio_gfp() in the future.
>
> I'm not sure if that's also what you wanted to do - to make userfaultfd
> code work similarly like what shmem_alloc_and_add_folio() does right now.
> Maybe you want to mention that too somewhere in the commit log when posting
> a formal patch.
>
> One thing not directly relevant is, shmem_alloc_and_add_folio() also does
> proper recalc of inode allocation info when acct_blocks() fails here. But
> if that's a problem, that's pre-existing for userfaultfd, so IIUC we can
> also leave it alone until someone (maybe quota user) complains about shmem
> allocation failures on UFFDIO_COPY.. It's just that it looks similar
> problem here in userfaultfd path.
I actually wanted to have ordering as close as possible to
shmem_alloc_and_add_folio(), that's the first reason on moving acct_blocks
to ->filemap_add().
Another reason, is that it simplifies rollback in case of a failure, as
shmem_recalc_inode(inode, 0, 0); in ->filemap_remove() takes care of the
block accounting as well.
> > + err = -ENOMEM;
> > + goto err_delete_from_cache;
> > + }
> >
> > + folio_add_lru(folio);
>
> This change is pretty separate from the work, but looks correct to me: IIUC
> we moved the lru add earlier now, and it should be safe as long as we're
> holding folio lock all through the process, and folio_put() (ultimately,
> __page_cache_release()) will always properly undo the lru change. Please
> help double check if my understanding is correct.
This follows shmem_alloc_and_add_folio(), and my understanding as well that
this is safe as long as we hold folio lock.
> > +static void shmem_mfill_filemap_remove(struct folio *folio,
> > + struct vm_area_struct *vma)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > +
> > + filemap_remove_folio(folio);
> > + shmem_recalc_inode(inode, 0, 0);
> > folio_unlock(folio);
> > - folio_put(folio);
> > -out_unacct_blocks:
> > - shmem_inode_unacct_blocks(inode, 1);
>
> This looks wrong, or maybe I miss somewhere we did the unacct_blocks()?
This is handled by shmem_recalc_inode(inode, 0, 0).
> > @@ -401,6 +397,9 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> >
> > set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> >
> > + if (page_in_cache)
> > + folio_unlock(folio);
>
> Nitpick: another small change that looks correct, but IMHO would be nice to
> either make it a small separate patch, or mention in the commit message.
I'll address this in the commit log,
> > +
> > /* No need to invalidate - it was non-present before */
> > update_mmu_cache(dst_vma, dst_addr, dst_pte);
> > ret = 0;
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2026-02-08 10:35 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-27 19:29 [PATCH RFC 00/17] mm, kvm: allow uffd suppot in guest_memfd Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 01/17] userfaultfd: introduce mfill_copy_folio_locked() helper Mike Rapoport
2026-02-03 17:45 ` Peter Xu
2026-02-08 9:49 ` Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 02/17] userfaultfd: introduce struct mfill_state Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 03/17] userfaultfd: introduce mfill_get_pmd() helper Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 04/17] userfaultfd: introduce mfill_get_vma() and mfill_put_vma() Mike Rapoport
2026-02-02 21:49 ` Peter Xu
2026-02-08 9:54 ` Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 05/17] userfaultfd: retry copying with locks dropped in mfill_atomic_pte_copy() Mike Rapoport
2026-02-02 21:23 ` Peter Xu
2026-02-08 10:01 ` Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 06/17] userfaultfd: move vma_can_userfault out of line Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 07/17] userfaultfd: introduce vm_uffd_ops Mike Rapoport
2026-02-02 21:36 ` Peter Xu
2026-02-08 10:13 ` Mike Rapoport
2026-02-11 19:35 ` Peter Xu
2026-02-15 17:47 ` Mike Rapoport
2026-02-18 21:34 ` Peter Xu
2026-01-27 19:29 ` [PATCH RFC 08/17] userfaultfd, shmem: use a VMA callback to handle UFFDIO_CONTINUE Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 09/17] userfaultfd: introduce vm_uffd_ops->alloc_folio() Mike Rapoport
2026-02-02 22:13 ` Peter Xu
2026-02-08 10:22 ` Mike Rapoport
2026-02-11 19:37 ` Peter Xu
2026-01-27 19:29 ` [PATCH RFC 10/17] shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops Mike Rapoport
2026-02-03 17:40 ` Peter Xu
2026-02-08 10:35 ` Mike Rapoport [this message]
2026-02-11 20:00 ` Peter Xu
2026-02-15 17:45 ` Mike Rapoport
2026-02-18 21:45 ` Peter Xu
2026-01-27 19:29 ` [PATCH RFC 11/17] userfaultfd: mfill_atomic() remove retry logic Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 12/17] mm: introduce VM_FAULT_UFFD_MINOR fault reason Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 13/17] mm: introduce VM_FAULT_UFFD_MISSING " Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 14/17] KVM: guest_memfd: implement userfaultfd minor mode Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 15/17] KVM: guest_memfd: implement userfaultfd missing mode Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 16/17] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 17/17] KVM: selftests: test userfaultfd missing " Mike Rapoport
2026-02-03 20:56 ` [PATCH RFC 00/17] mm, kvm: allow uffd suppot in guest_memfd Peter Xu
2026-02-09 15:35 ` David Hildenbrand (Arm)
2026-02-11 6:04 ` Mike Rapoport
2026-02-11 9:52 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYhm_4difwN5XXxe@kernel.org \
--to=rppt@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox