RE: [PATCH v6 3/5] mm/gup: Introduce memfd_pin_user_pages() for pinning memfd pages (v6)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Kasireddy, Vivek" <vivek.kasireddy@intel.com>
To: David Hildenbrand <david@redhat.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Hugh Dickins" <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	"Kim, Dongwon" <dongwon.kim@intel.com>,
	"Chang, Junxiao" <junxiao.chang@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: RE: [PATCH v6 3/5] mm/gup: Introduce memfd_pin_user_pages() for pinning memfd pages (v6)
Date: Thu, 7 Dec 2023 05:09:20 +0000	[thread overview]
Message-ID: <IA0PR11MB718520192D5885CFCCEE320FF88BA@IA0PR11MB7185.namprd11.prod.outlook.com> (raw)
In-Reply-To: <5ffe2ea3-83da-4b5f-adc8-af9cd9a57cd2@redhat.com>

Hi David,

> On 05.12.23 06:35, Vivek Kasireddy wrote:
> > For drivers that would like to longterm-pin the pages associated
> > with a memfd, the pin_user_pages_fd() API provides an option to
> > not only pin the pages via FOLL_PIN but also to check and migrate
> > them if they reside in movable zone or CMA block. This API
> > currently works with memfds but it should work with any files
> > that belong to either shmemfs or hugetlbfs. Files belonging to
> > other filesystems are rejected for now.
> >
> > The pages need to be located first before pinning them via FOLL_PIN.
> > If they are found in the page cache, they can be immediately pinned.
> > Otherwise, they need to be allocated using the filesystem specific
> > APIs and then pinned.
> >
> > v2:
> > - Drop gup_flags and improve comments and commit message (David)
> > - Allocate a page if we cannot find in page cache for the hugetlbfs
> >    case as well (David)
> > - Don't unpin pages if there is a migration related failure (David)
> > - Drop the unnecessary nr_pages <= 0 check (Jason)
> > - Have the caller of the API pass in file * instead of fd (Jason)
> >
> > v3: (David)
> > - Enclose the huge page allocation code with #ifdef
> CONFIG_HUGETLB_PAGE
> >    (Build error reported by kernel test robot <lkp@intel.com>)
> > - Don't forget memalloc_pin_restore() on non-migration related errors
> > - Improve the readability of the cleanup code associated with
> >    non-migration related errors
> > - Augment the comments by describing FOLL_LONGTERM like behavior
> > - Include the R-b tag from Jason
> >
> > v4:
> > - Remove the local variable "page" and instead use 3 return statements
> >    in alloc_file_page() (David)
> > - Add the R-b tag from David
> >
> > v5: (David)
> > - For hugetlb case, ensure that we only obtain head pages from the
> >    mapping by using __filemap_get_folio() instead of find_get_page_flags()
> > - Handle -EEXIST when two or more potential users try to simultaneously
> >    add a huge page to the mapping by forcing them to retry on failure
> >
> > v6: (Christoph)
> > - Rename this API to memfd_pin_user_pages() to make it clear that it
> >    is intended for memfds
> > - Move the memfd page allocation helper from gup.c to memfd.c
> > - Fix indentation errors in memfd_pin_user_pages()
> > - For contiguous ranges of folios, use a helper such as
> >    filemap_get_folios_contig() to lookup the page cache in batches
> >
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > Cc: Dongwon Kim <dongwon.kim@intel.com>
> > Cc: Junxiao Chang <junxiao.chang@intel.com>
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> (v2)
> > Reviewed-by: David Hildenbrand <david@redhat.com> (v3)
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > ---
> >   include/linux/memfd.h |   5 +++
> >   include/linux/mm.h    |   2 +
> >   mm/gup.c              | 102 ++++++++++++++++++++++++++++++++++++++++++
> >   mm/memfd.c            |  34 ++++++++++++++
> >   4 files changed, 143 insertions(+)
> >
> > diff --git a/include/linux/memfd.h b/include/linux/memfd.h
> > index e7abf6fa4c52..6fc0d1282151 100644
> > --- a/include/linux/memfd.h
> > +++ b/include/linux/memfd.h
> > @@ -6,11 +6,16 @@
> >
> >   #ifdef CONFIG_MEMFD_CREATE
> >   extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int
> arg);
> > +extern struct page *memfd_alloc_page(struct file *memfd, pgoff_t idx);
> >   #else
> >   static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned int a)
> >   {
> >   	return -EINVAL;
> >   }
> > +static inline struct page *memfd_alloc_page(struct file *memfd, pgoff_t
> idx)
> > +{
> > +	return ERR_PTR(-EINVAL);
> > +}
> >   #endif
> >
> >   #endif /* __LINUX_MEMFD_H */
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 418d26608ece..ac69db45509f 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2472,6 +2472,8 @@ long get_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >   		    struct page **pages, unsigned int gup_flags);
> >   long pin_user_pages_unlocked(unsigned long start, unsigned long
> nr_pages,
> >   		    struct page **pages, unsigned int gup_flags);
> > +long memfd_pin_user_pages(struct file *file, pgoff_t start,
> > +			  unsigned long nr_pages, struct page **pages);
> >
> >   int get_user_pages_fast(unsigned long start, int nr_pages,
> >   			unsigned int gup_flags, struct page **pages);
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 231711efa390..eb93d1ec9dc6 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -5,6 +5,7 @@
> >   #include <linux/spinlock.h>
> >
> >   #include <linux/mm.h>
> > +#include <linux/memfd.h>
> >   #include <linux/memremap.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/rmap.h>
> > @@ -17,6 +18,7 @@
> >   #include <linux/hugetlb.h>
> >   #include <linux/migrate.h>
> >   #include <linux/mm_inline.h>
> > +#include <linux/pagevec.h>
> >   #include <linux/sched/mm.h>
> >   #include <linux/shmem_fs.h>
> >
> > @@ -3410,3 +3412,103 @@ long pin_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >   				     &locked, gup_flags);
> >   }
> >   EXPORT_SYMBOL(pin_user_pages_unlocked);
> > +
> > +/**
> > + * memfd_pin_user_pages() - pin user pages associated with a memfd
> > + * @memfd:      the memfd whose pages are to be pinned
> > + * @start:      starting memfd offset
> > + * @nr_pages:   number of pages from start to pin
> > + * @pages:      array that receives pointers to the pages pinned.
> > + *              Should be at-least nr_pages long.
> > + *
> > + * Attempt to pin pages associated with a memfd; given that a memfd is
> either
> > + * backed by shmem or hugetlb, the pages can either be found in the page
> cache
> > + * or need to be allocated if necessary. Once the pages are located, they
> are
> > + * all pinned via FOLL_PIN. And, these pinned pages need to be released
> either
> > + * using unpin_user_pages() or unpin_user_page().
> > + *
> > + * It must be noted that the pages may be pinned for an indefinite amount
> > + * of time. And, in most cases, the duration of time they may stay pinned
> > + * would be controlled by the userspace. This behavior is effectively the
> > + * same as using FOLL_LONGTERM with other GUP APIs.
> > + *
> > + * Returns number of pages pinned. This would be equal to the number of
> > + * pages requested. If no pages were pinned, it returns -errno.
> > + */
> > +long memfd_pin_user_pages(struct file *memfd, pgoff_t start,
> > +			  unsigned long nr_pages, struct page **pages)
> > +{
> > +	pgoff_t start_idx, end_idx = start + nr_pages - 1;
> > +	unsigned int flags, nr_folios, i, j;
> > +	struct folio_batch fbatch;
> > +	struct page *page = NULL;
> > +	struct folio *folio;
> > +	long ret;
> > +
> > +	if (!nr_pages)
> > +		return -EINVAL;
> > +
> > +	if (!memfd)
> > +		return -EINVAL;
> > +
> > +	if (!shmem_file(memfd) && !is_file_hugepages(memfd))
> > +		return -EINVAL;
> > +
> > +	flags = memalloc_pin_save();
> > +	do {
> > +		folio_batch_init(&fbatch);
> > +		start_idx = start;
> > +		i = 0;
> > +
> > +		while (start_idx <= end_idx) {
> > +			/*
> > +			 * In most cases, we should be able to find the page
> > +			 * in the page cache. If we cannot find it for some
> > +			 * reason, we try to allocate one and add it to the
> > +			 * page cache.
> > +			 */
> > +			nr_folios = filemap_get_folios_contig(memfd-
> >f_mapping,
> > +							      &start_idx,
> > +							      end_idx,
> > +							      &fbatch);
> > +			if (page) {
> > +				put_page(page);
> > +				page = NULL;
> > +			}
> > +			for (j = 0; j < nr_folios; j++) {
> > +				folio = fbatch.folios[j];
> > +				ret = try_grab_page(&folio->page, FOLL_PIN);
> > +				if (unlikely(ret)) {
> > +					folio_batch_release(&fbatch);
> > +					goto err;
> > +				}
> > +
> > +				pages[i++] = &folio->page;
> > +			}
> 
> I might be wrong, but that interface is still inconsistent. I think your
> intention is to always return folios (head pages), but why are we
> returning pages from this interface then?
> 
> It would be more consistent regarding the other GUP interfaces to return
> the actual tail pages that fit the given "pgoff_t start". So if you
> punch in "nr_pages" you expect to get "nr_pages" pages, and not some
> other number of folios.
> 
> Otherwise, this interface is highly confusing.
> 
> If you always want to return folios, then better name it
> "memfd_pin_user_folios" (or just "memfd_pin_folios") and pass in a range
> (instead of a nr_pages parameter), and somehow indicate to the caller
> how many folio were in that range, and if that range was fully covered.
I think it makes sense to return folios from this interface; and considering my
use-case, I'd like have this API return an error if it cannot pin (or allocate)
the exact number of folios the caller requested. 

> 
> Or am I missing something?
I can make the udmabuf driver use folios instead of pages too but the function
check_and_migrate_movable_pages() in GUP still takes a list of pages. Do you
think it is ok to use a local variable to collect all the head pages for this?

Thanks,
Vivek

> 
> --
> Cheers,
> 
> David / dhildenb
>

next prev parent reply	other threads:[~2023-12-07  5:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05  5:35 [PATCH v6 0/5] " Vivek Kasireddy
2023-12-05  5:35 ` [PATCH v6 1/5] udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap Vivek Kasireddy
2023-12-05  5:35 ` [PATCH v6 2/5] udmabuf: Add back support for mapping hugetlb pages (v5) Vivek Kasireddy
2023-12-06  9:16   ` Christoph Hellwig
2023-12-05  5:35 ` [PATCH v6 3/5] mm/gup: Introduce memfd_pin_user_pages() for pinning memfd pages (v6) Vivek Kasireddy
2023-12-06  9:18   ` Christoph Hellwig
2023-12-07  5:06     ` Kasireddy, Vivek
2023-12-06 11:19   ` David Hildenbrand
2023-12-07  5:09     ` Kasireddy, Vivek [this message]
2023-12-07  9:44       ` David Hildenbrand
2023-12-07 13:05         ` Jason Gunthorpe
2023-12-07 13:35           ` David Hildenbrand
2023-12-08  7:57         ` Kasireddy, Vivek
2023-12-08  9:53           ` David Hildenbrand
2023-12-05  5:35 ` [PATCH v6 4/5] udmabuf: Pin the pages using memfd_pin_user_pages() API (v4) Vivek Kasireddy
2023-12-05  5:35 ` [PATCH v6 5/5] selftests/dma-buf/udmabuf: Add tests to verify data after page migration Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=IA0PR11MB718520192D5885CFCCEE320FF88BA@IA0PR11MB7185.namprd11.prod.outlook.com \
    --to=vivek.kasireddy@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@redhat.com \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox