linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Vivek Kasireddy <vivek.kasireddy@intel.com>,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	Mike Kravetz <mike.kravetz@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Dongwon Kim <dongwon.kim@intel.com>,
	Junxiao Chang <junxiao.chang@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v1 1/3] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages
Date: Fri, 6 Oct 2023 10:03:33 +0200	[thread overview]
Message-ID: <4c272313-d2cd-fa29-3126-496636e14115@redhat.com> (raw)
In-Reply-To: <20231003074447.3245729-2-vivek.kasireddy@intel.com>

On 03.10.23 09:44, Vivek Kasireddy wrote:
> For drivers that would like to longterm-pin the pages associated
> with a file, the pin_user_pages_fd() API provides an option to
> not only FOLL_PIN the pages but also to check and migrate them
> if they reside in movable zone or CMA block. For now, this API
> can only work with files belonging to shmem or hugetlbfs given
> that the udmabuf driver is the only user.

Maybe add "Other files are rejected.". Wasn't clear to me before I 
looked into the code.

> 
> It must be noted that the pages associated with hugetlbfs files
> are expected to be found in the page cache. An error is returned
> if they are not found. However, shmem pages can be swapped in or
> allocated if they are not present in the page cache.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Cc: Junxiao Chang <junxiao.chang@intel.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>   include/linux/mm.h |  2 ++
>   mm/gup.c           | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 89 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bf5d0b1b16f4..af2121fb8101 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2457,6 +2457,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   		    struct page **pages, unsigned int gup_flags);
>   long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   		    struct page **pages, unsigned int gup_flags);
> +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> +		       unsigned int gup_flags, struct page **pages);
>   
>   int get_user_pages_fast(unsigned long start, int nr_pages,
>   			unsigned int gup_flags, struct page **pages);
> diff --git a/mm/gup.c b/mm/gup.c
> index 2f8a2d89fde1..e34b77a15fa8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3400,3 +3400,90 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   				     &locked, gup_flags);
>   }
>   EXPORT_SYMBOL(pin_user_pages_unlocked);
> +

This does look quite neat, nice! Let's take a closer look ...

> +/**
> + * pin_user_pages_fd() - pin user pages associated with a file
> + * @fd:         the fd whose pages are to be pinned
> + * @start:      starting file offset
> + * @nr_pages:   number of pages from start to pin
> + * @gup_flags:  flags modifying pin behaviour

^ I assume we should drop that. At least for now the flags are 
completely unused. And most likely we would want a different set of 
flags later (GUPFD_ ...).

> + * @pages:      array that receives pointers to the pages pinned.
> + *              Should be at least nr_pages long.
> + *
> + * Attempt to pin (and migrate) pages associated with a file belonging to

I'd drop the "and migrate" part, it's more of an implementation detail.

> + * either shmem or hugetlbfs. An error is returned if pages associated with
> + * hugetlbfs files are not present in the page cache. However, shmem pages
> + * are swapped in or allocated if they are not present in the page cache.

Why don't we do the same for hugetlbfs? Would make the interface more 
streamlined.

Certainly add that pinned pages have to be released using 
unpin_user_pages().

> + *
> + * Returns number of pages pinned. This would be equal to the number of
> + * pages requested.
> + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns
> + * -errno.
> + */
> +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> +		       unsigned int gup_flags, struct page **pages)
> +{
> +	struct page *page;
> +	struct file *filep;
> +	unsigned int flags, i;
> +	long ret;
> +
> +	if (nr_pages <= 0)
> +		return 0;

I think we should just forbid that and use a WARN_ON_ONCE() here / 
return -EINVAL. So we'll never end up returning 0.

> +	if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
> +		return 0;
> +
> +	if (start < 0)
> +		return -EINVAL;
> +
> +	filep = fget(fd);
> +	if (!filep)
> +	    return -EINVAL;
> +
> +	if (!shmem_file(filep) && !is_file_hugepages(filep))
> +	    return -EINVAL;
> +
> +	flags = memalloc_pin_save();
> +	do {
> +		for (i = 0; i < nr_pages; i++) {
> +			if (shmem_mapping(filep->f_mapping)) {
> +				page = shmem_read_mapping_page(filep->f_mapping,
> +							       start + i);
> +				if (IS_ERR(page)) {
> +					ret = PTR_ERR(page);
> +					goto err;
> +				}
> +			} else {
> +				page = find_get_page_flags(filep->f_mapping,
> +							   start + i,
> +							   FGP_ACCESSED);
> +				if (!page) {
> +					ret = -EINVAL;
> +					goto err;
> +				}
> +			}
> +			ret = try_grab_page(page, FOLL_PIN);
> +			if (unlikely(ret))
> +				goto err;
> +
> +			pages[i] = page;
> +			put_page(pages[i]);
> +		}
> +
> +		ret = check_and_migrate_movable_pages(nr_pages, pages);
> +	} while (ret == -EAGAIN);
> +
> +err:
> +	memalloc_pin_restore(flags);
> +	fput(filep);
> +	if (!ret)
> +		return nr_pages;
> +
> +	while (i > 0 && pages[--i]) {
> +		unpin_user_page(pages[i]);
> +		pages[i] = NULL;

If migrate_longterm_unpinnable_pages() failed, say with -ENOMEM, the 
pages were already unpinned, but pages[i] has not been cleared, no?

> +	}
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(pin_user_pages_fd);
> +

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-10-06  8:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03  7:44 [PATCH v1 0/3] " Vivek Kasireddy
2023-10-03  7:44 ` [PATCH v1 1/3] " Vivek Kasireddy
2023-10-06  8:03   ` David Hildenbrand [this message]
2023-10-06 18:43     ` Jason Gunthorpe
2023-10-17  7:39     ` Kasireddy, Vivek
2023-10-10 13:51   ` Jason Gunthorpe
2023-10-03  7:44 ` [PATCH v1 2/3] udmabuf: Pin the pages using pin_user_pages_fd() API Vivek Kasireddy
2023-10-03  7:44 ` [PATCH v1 3/3] selftests/dma-buf/udmabuf: Add tests to verify data after page migration Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c272313-d2cd-fa29-3126-496636e14115@redhat.com \
    --to=david@redhat.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox