linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Vivek Kasireddy <vivek.kasireddy@intel.com>,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Hugh Dickins <hughd@google.com>, Peter Xu <peterx@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Dongwon Kim <dongwon.kim@intel.com>,
	Junxiao Chang <junxiao.chang@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v4 3/5] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages (v4)
Date: Mon, 20 Nov 2023 09:48:14 +0100	[thread overview]
Message-ID: <22794df8-b1d3-4cb9-846f-dd5afe8b880e@redhat.com> (raw)
In-Reply-To: <20231118063233.733523-4-vivek.kasireddy@intel.com>

On 18.11.23 07:32, Vivek Kasireddy wrote:
> For drivers that would like to longterm-pin the pages associated
> with a file, the pin_user_pages_fd() API provides an option to
> not only pin the pages via FOLL_PIN but also to check and migrate
> them if they reside in movable zone or CMA block. This API
> currently works with files that belong to either shmem or hugetlbfs.
> Files belonging to other filesystems are rejected for now.
> 
> The pages need to be located first before pinning them via FOLL_PIN.
> If they are found in the page cache, they can be immediately pinned.
> Otherwise, they need to be allocated using the filesystem specific
> APIs and then pinned.
> 
> v2:
> - Drop gup_flags and improve comments and commit message (David)
> - Allocate a page if we cannot find in page cache for the hugetlbfs
>    case as well (David)
> - Don't unpin pages if there is a migration related failure (David)
> - Drop the unnecessary nr_pages <= 0 check (Jason)
> - Have the caller of the API pass in file * instead of fd (Jason)
> 
> v3: (David)
> - Enclose the huge page allocation code with #ifdef CONFIG_HUGETLB_PAGE
>    (Build error reported by kernel test robot <lkp@intel.com>)
> - Don't forget memalloc_pin_restore() on non-migration related errors
> - Improve the readability of the cleanup code associated with
>    non-migration related errors
> - Augment the comments by describing FOLL_LONGTERM like behavior
> - Include the R-b tag from Jason
> 
> v4:
> - Remove the local variable "page" and instead use 3 return statements
>    in alloc_file_page() (David)
> - Add the R-b tag from David
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Cc: Junxiao Chang <junxiao.chang@intel.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> (v2)
> Reviewed-by: David Hildenbrand <david@redhat.com> (v3)
> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---


[...]


> +static struct page *alloc_file_page(struct file *file, pgoff_t idx)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> +	struct folio *folio;
> +	int err;
> +
> +	if (is_file_hugepages(file)) {
> +		folio = alloc_hugetlb_folio_nodemask(hstate_file(file),
> +						     NUMA_NO_NODE,
> +						     NULL,
> +						     GFP_USER);
> +		if (folio && folio_try_get(folio)) {
> +			err = hugetlb_add_to_page_cache(folio,
> +							file->f_mapping,
> +							idx);
> +			if (err) {
> +				folio_put(folio);
> +				free_huge_folio(folio);
> +				return ERR_PTR(err);
> +			}
> +			return &folio->page;

While looking at the user of pin_user_pages_fd(), I realized something:

Assume idx is not aligned to the hugetlb page size. 
find_get_page_flags() would always return a tail page in that case, but 
you'd be returning the head page here.

See pagecache_get_page()->folio_file_page(folio, index);

> +		}
> +		return ERR_PTR(-ENOMEM);
> +	}
> +#endif
> +	return shmem_read_mapping_page(file->f_mapping, idx);
> +}
> +
> +/**
> + * pin_user_pages_fd() - pin user pages associated with a file
> + * @file:       the file whose pages are to be pinned
> + * @start:      starting file offset
> + * @nr_pages:   number of pages from start to pin
> + * @pages:      array that receives pointers to the pages pinned.
> + *              Should be at-least nr_pages long.
> + *
> + * Attempt to pin pages associated with a file that belongs to either shmem
> + * or hugetlb. The pages are either found in the page cache or allocated if
> + * necessary. Once the pages are located, they are all pinned via FOLL_PIN.
> + * And, these pinned pages need to be released either using unpin_user_pages()
> + * or unpin_user_page().
> + *
> + * It must be noted that the pages may be pinned for an indefinite amount
> + * of time. And, in most cases, the duration of time they may stay pinned
> + * would be controlled by the userspace. This behavior is effectively the
> + * same as using FOLL_LONGTERM with other GUP APIs.
> + *
> + * Returns number of pages pinned. This would be equal to the number of
> + * pages requested. If no pages were pinned, it returns -errno.
> + */
> +long pin_user_pages_fd(struct file *file, pgoff_t start,
> +		       unsigned long nr_pages, struct page **pages)
> +{
> +	struct page *page;
> +	unsigned int flags, i;
> +	long ret;
> +
> +	if (start < 0)
> +		return -EINVAL;
> +
> +	if (!file)
> +	    return -EINVAL;
> +
> +	if (!shmem_file(file) && !is_file_hugepages(file))
> +	    return -EINVAL;
> +
> +	flags = memalloc_pin_save();
> +	do {
> +		for (i = 0; i < nr_pages; i++) {
> +			/*
> + 			 * In most cases, we should be able to find the page
> +			 * in the page cache. If we cannot find it, we try to
> +			 * allocate one and add it to the page cache.
> +			 */
> +			page = find_get_page_flags(file->f_mapping,
> +						   start + i,
> +						   FGP_ACCESSED);
> +			if (!page) {
> +				page = alloc_file_page(file, start + i);
> +				if (IS_ERR(page)) {
> +					ret = PTR_ERR(page);
> +					goto err;

While looking at above, I do wonder: what if two parties tried to alloc 
the page at the same time? I suspect we'd want to handle -EEXIST a bit 
nicer here, right?


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-11-20  8:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-18  6:32 [PATCH v4 0/5] " Vivek Kasireddy
2023-11-18  6:32 ` [PATCH v4 1/5] udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap Vivek Kasireddy
2023-11-18  6:32 ` [PATCH v4 2/5] udmabuf: Add back support for mapping hugetlb pages (v3) Vivek Kasireddy
2023-11-18  6:32 ` [PATCH v4 3/5] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages (v4) Vivek Kasireddy
2023-11-20  8:48   ` David Hildenbrand [this message]
2023-11-21  6:54     ` Kasireddy, Vivek
2023-11-18  6:32 ` [PATCH v4 4/5] udmabuf: Pin the pages using pin_user_pages_fd() API (v3) Vivek Kasireddy
2023-11-18  6:32 ` [PATCH v4 5/5] selftests/dma-buf/udmabuf: Add tests to verify data after page migration Vivek Kasireddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22794df8-b1d3-4cb9-846f-dd5afe8b880e@redhat.com \
    --to=david@redhat.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dongwon.kim@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=junxiao.chang@intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=vivek.kasireddy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox