linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: lizhe.67@bytedance.com
Cc: akpm@linux-foundation.org, david@redhat.com,
	farman@linux.ibm.com, jgg@nvidia.com, jgg@ziepe.ca,
	kvm@vger.kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, willy@infradead.org
Subject: Re: [PATCH v5 1/5] mm: introduce num_pages_contiguous()
Date: Mon, 29 Sep 2025 14:19:33 -0600	[thread overview]
Message-ID: <20250929141933.2c9c78fc.alex.williamson@redhat.com> (raw)
In-Reply-To: <20250929032107.7512-1-lizhe.67@bytedance.com>

On Mon, 29 Sep 2025 11:21:07 +0800
lizhe.67@bytedance.com wrote:

> On Mon, 1 Sep 2025 11:25:32 +0800, lizhe.67@bytedance.com wrote:
> 
> > On Wed, 27 Aug 2025 12:10:55 -0600, alex.williamson@redhat.com wrote:
> >   
> > > On Thu, 14 Aug 2025 14:47:10 +0800
> > > lizhe.67@bytedance.com wrote:
> > >   
> > > > From: Li Zhe <lizhe.67@bytedance.com>
> > > > 
> > > > Let's add a simple helper for determining the number of contiguous pages
> > > > that represent contiguous PFNs.
> > > > 
> > > > In an ideal world, this helper would be simpler or not even required.
> > > > Unfortunately, on some configs we still have to maintain (SPARSEMEM
> > > > without VMEMMAP), the memmap is allocated per memory section, and we might
> > > > run into weird corner cases of false positives when blindly testing for
> > > > contiguous pages only.
> > > > 
> > > > One example of such false positives would be a memory section-sized hole
> > > > that does not have a memmap. The surrounding memory sections might get
> > > > "struct pages" that are contiguous, but the PFNs are actually not.
> > > > 
> > > > This helper will, for example, be useful for determining contiguous PFNs
> > > > in a GUP result, to batch further operations across returned "struct
> > > > page"s. VFIO will utilize this interface to accelerate the VFIO DMA map
> > > > process.
> > > > 
> > > > Implementation based on Linus' suggestions to avoid new usage of
> > > > nth_page() where avoidable.
> > > > 
> > > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > > > Co-developed-by: David Hildenbrand <david@redhat.com>
> > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > ---
> > > >  include/linux/mm.h        |  7 ++++++-
> > > >  include/linux/mm_inline.h | 35 +++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 41 insertions(+), 1 deletion(-)  
> > > 
> > > 
> > > Does this need any re-evaluation after Willy's series?[1]  Patch 2/
> > > changes page_to_section() to memdesc_section() which takes a new
> > > memdesc_flags_t, ie. page->flags.  The conversion appears trivial, but
> > > mm has many subtleties.
> > > 
> > > Ideally we could also avoid merge-time fixups for linux-next and
> > > mainline.  
> > 
> > Thank you for your reminder.
> > 
> > In my view, if Willy's series is integrated, this patch will need to
> > be revised as follows. Please correct me if I'm wrong.
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ab4d979f4eec..bad0373099ad 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1763,7 +1763,12 @@ static inline unsigned long memdesc_section(memdesc_flags_t mdf)
> >  {
> >  	return (mdf.f >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
> >  }
> > -#endif
> > +#else /* !SECTION_IN_PAGE_FLAGS */
> > +static inline unsigned long memdesc_section(memdesc_flags_t mdf)
> > +{
> > +	return 0;
> > +}
> > +#endif /* SECTION_IN_PAGE_FLAGS */
> >  
> >  /**
> >   * folio_pfn - Return the Page Frame Number of a folio.
> > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> > index 150302b4a905..bb23496d465b 100644
> > --- a/include/linux/mm_inline.h
> > +++ b/include/linux/mm_inline.h
> > @@ -616,4 +616,40 @@ static inline bool vma_has_recency(struct vm_area_struct *vma)
> >  	return true;
> >  }
> >  
> > +/**
> > + * num_pages_contiguous() - determine the number of contiguous pages
> > + *			    that represent contiguous PFNs
> > + * @pages: an array of page pointers
> > + * @nr_pages: length of the array, at least 1
> > + *
> > + * Determine the number of contiguous pages that represent contiguous PFNs
> > + * in @pages, starting from the first page.
> > + *
> > + * In some kernel configs contiguous PFNs will not have contiguous struct
> > + * pages. In these configurations num_pages_contiguous() will return a num
> > + * smaller than ideal number. The caller should continue to check for pfn
> > + * contiguity after each call to num_pages_contiguous().
> > + *
> > + * Returns the number of contiguous pages.
> > + */
> > +static inline size_t num_pages_contiguous(struct page **pages, size_t nr_pages)
> > +{
> > +	struct page *cur_page = pages[0];
> > +	unsigned long section = memdesc_section(cur_page->flags);
> > +	size_t i;
> > +
> > +	for (i = 1; i < nr_pages; i++) {
> > +		if (++cur_page != pages[i])
> > +			break;
> > +		/*
> > +		 * In unproblematic kernel configs, page_to_section() == 0 and
> > +		 * the whole check will get optimized out.
> > +		 */
> > +		if (memdesc_section(cur_page->flags) != section)
> > +			break;
> > +	}
> > +
> > +	return i;
> > +}
> > +
> >  #endif  
> 
> Hi Alex,
> 
> I noticed that Willy's series has been merged into the mm-stable
> branch. Could you please let me know if this vfio optimization
> series is also ready to be merged?

I was hoping for a shared branch here, it doesn't seem like a good idea
to merge mm-stable into my next branch.  My current plan is to send a
pull request without this series.  If there are no objections we
could try for a second pull request once mm-stable is merged that would
include just this series.  Otherwise it would need to wait one more
cycle, which I know would be frustrating for something we tried to
include in the previous merge window.  Thanks,

Alex



  reply	other threads:[~2025-09-29 20:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14  6:47 [PATCH v5 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
2025-08-14  6:47 ` [PATCH v5 1/5] mm: introduce num_pages_contiguous() lizhe.67
2025-08-14  6:54   ` David Hildenbrand
2025-08-14  7:58     ` lizhe.67
2025-08-27 18:10   ` Alex Williamson
2025-09-01  3:25     ` lizhe.67
2025-09-29  3:21       ` lizhe.67
2025-09-29 20:19         ` Alex Williamson [this message]
2025-09-30  3:36           ` lizhe.67
2025-08-14  6:47 ` [PATCH v5 2/5] vfio/type1: optimize vfio_pin_pages_remote() lizhe.67
2025-08-14  6:47 ` [PATCH v5 3/5] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-08-14  6:47 ` [PATCH v5 4/5] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
2025-08-14  6:47 ` [PATCH v5 5/5] vfio/type1: optimize vfio_unpin_pages_remote() lizhe.67
2025-10-06 19:44 ` [PATCH v5 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250929141933.2c9c78fc.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizhe.67@bytedance.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox