* [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator [not found] <20230123173007.325544-1-dhowells@redhat.com> @ 2023-01-23 17:29 ` David Howells 2023-01-23 18:21 ` Christoph Hellwig ` (2 more replies) 2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells 2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells 2 siblings, 3 replies; 39+ messages in thread From: David Howells @ 2023-01-23 17:29 UTC (permalink / raw) To: Al Viro, Christoph Hellwig Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, David Hildenbrand, linux-mm Add a function, iov_iter_extract_pages(), to extract a list of pages from an iterator. The pages may be returned with a pin added or nothing, depending on the type of iterator. Add a second function, iov_iter_extract_mode(), to determine how the cleanup should be done. There are two cases: (1) ITER_IOVEC or ITER_UBUF iterator. Extracted pages will have pins (FOLL_PIN) obtained on them so that a concurrent fork() will forcibly copy the page so that DMA is done to/from the parent's buffer and is unavailable to/unaffected by the child process. iov_iter_extract_mode() will return FOLL_PIN for this case. The caller should use something like folio_put_unpin() to dispose of the page. (2) Any other sort of iterator. No refs or pins are obtained on the page, the assumption is made that the caller will manage page retention. iov_iter_extract_mode() will return 0. The pages don't need additional disposal. Signed-off-by: David Howells <dhowells@redhat.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Christoph Hellwig <hch@lst.de> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- Notes: ver #8) - It seems that all DIO is supposed to be done under FOLL_PIN now, and not FOLL_GET, so switch to only using pin_user_pages() for user-backed iters. - Wrap an argument in brackets in the iov_iter_extract_mode() macro. - Drop the extract_flags argument to iov_iter_extract_mode() for now [hch]. ver #7) - Switch to passing in iter-specific flags rather than FOLL_* flags. - Drop the direction flags for now. - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA. - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter. - Add support for extraction from KVEC-type iters. - Use iov_iter_advance() rather than open-coding it. - Make BVEC- and KVEC-type skip over initial empty vectors. ver #6) - Add back the function to indicate the cleanup mode. - Drop the cleanup_mode return arg to iov_iter_extract_pages(). - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter data_source. ver #4) - Use ITER_SOURCE/DEST instead of WRITE/READ. - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. ver #3) - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access to get/pin_user_pages_fast()[1]. include/linux/uio.h | 22 +++ lib/iov_iter.c | 320 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 342 insertions(+) diff --git a/include/linux/uio.h b/include/linux/uio.h index 46d5080314c6..a8165335f8da 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -363,4 +363,26 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, /* Flags for iov_iter_get/extract_pages*() */ #define ITER_ALLOW_P2PDMA 0x01 /* Allow P2PDMA on the extracted pages */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, + size_t maxsize, unsigned int maxpages, + unsigned int extract_flags, size_t *offset0); + +/** + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained + * @iter: The iterator + * + * Examine the iterator and indicate by returning FOLL_PIN or 0 as to how, if + * at all, pages extracted from the iterator will be retained by the extraction + * function. + * + * FOLL_PIN indicates that the pages will have a pin placed in them that the + * caller must unpin. This is must be done for DMA/async DIO to force fork() + * to forcibly copy a page for the child (the parent must retain the original + * page). + * + * 0 indicates that no measures are taken and that it's up to the caller to + * retain the pages. + */ +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0) + #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index fb04abe7d746..57f3e9404160 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1916,3 +1916,323 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state) i->iov -= state->nr_segs - i->nr_segs; i->nr_segs = state->nr_segs; } + +/* + * Extract a list of contiguous pages from an ITER_PIPE iterator. This does + * not get references of its own on the pages, nor does it get a pin on them. + * If there's a partial page, it adds that first and will then allocate and add + * pages into the pipe to make up the buffer space to the amount required. + * + * The caller must hold the pipe locked and only transferring into a pipe is + * supported. + */ +static ssize_t iov_iter_extract_pipe_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + unsigned int nr, offset, chunk, j; + struct page **p; + size_t left; + + if (!sanity(i)) + return -EFAULT; + + offset = pipe_npages(i, &nr); + if (!nr) + return -EFAULT; + *offset0 = offset; + + maxpages = min_t(size_t, nr, maxpages); + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + left = maxsize; + for (j = 0; j < maxpages; j++) { + struct page *page = append_pipe(i, left, &offset); + if (!page) + break; + chunk = min_t(size_t, left, PAGE_SIZE - offset); + left -= chunk; + *p++ = page; + } + if (!j) + return -EFAULT; + return maxsize - left; +} + +/* + * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not + * get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_xarray_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + struct page *page, **p; + unsigned int nr = 0, offset; + loff_t pos = i->xarray_start + i->iov_offset; + pgoff_t index = pos >> PAGE_SHIFT; + XA_STATE(xas, i->xarray, index); + + offset = pos & ~PAGE_MASK; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + p[nr++] = find_subpage(page, xas.xa_index); + if (nr == maxpages) + break; + } + rcu_read_unlock(); + + maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from an ITER_BVEC iterator. This does + * not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_bvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + struct page **p, *page; + size_t skip = i->iov_offset, offset; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->bvec->bv_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->kvec++; + skip = 0; + } + + skip += i->bvec->bv_offset; + page = i->bvec->bv_page + skip / PAGE_SIZE; + offset = skip % PAGE_SIZE; + *offset0 = offset; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + for (k = 0; k < maxpages; k++) + p[k] = page + k; + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of virtually contiguous pages from an ITER_KVEC iterator. + * This does not get references on the pages, nor does it get a pin on them. + */ +static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i, + struct page ***pages, size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + struct page **p, *page; + const void *kaddr; + size_t skip = i->iov_offset, offset, len; + int k; + + for (;;) { + if (i->nr_segs == 0) + return 0; + maxsize = min(maxsize, i->kvec->iov_len - skip); + if (maxsize) + break; + i->iov_offset = 0; + i->nr_segs--; + i->kvec++; + skip = 0; + } + + offset = skip % PAGE_SIZE; + *offset0 = offset; + kaddr = i->kvec->iov_base; + + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + p = *pages; + + kaddr -= offset; + len = offset + maxsize; + for (k = 0; k < maxpages; k++) { + size_t seg = min_t(size_t, len, PAGE_SIZE); + + if (is_vmalloc_or_module_addr(kaddr)) + page = vmalloc_to_page(kaddr); + else + page = virt_to_page(kaddr); + + p[k] = page; + len -= seg; + kaddr += PAGE_SIZE; + } + + maxsize = min_t(size_t, maxsize, maxpages * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/* + * Extract a list of contiguous pages from a user iterator and get a pin on + * each of them. This should only be used if the iterator is user-backed + * (IOBUF/UBUF). + * + * It does not get refs on the pages, but the pages must be unpinned by the + * caller once the transfer is complete. + * + * This is safe to be used where background IO/DMA *is* going to be modifying + * the buffer; using a pin rather than a ref makes forces fork() to give the + * child a copy of the page. + */ +static ssize_t iov_iter_extract_user_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + unsigned long addr; + unsigned int gup_flags = FOLL_PIN; + size_t offset; + int res; + + if (i->data_source == ITER_DEST) + gup_flags |= FOLL_WRITE; + if (extract_flags & ITER_ALLOW_P2PDMA) + gup_flags |= FOLL_PCI_P2PDMA; + if (i->nofault) + gup_flags |= FOLL_NOFAULT; + + addr = first_iovec_segment(i, &maxsize); + *offset0 = offset = addr % PAGE_SIZE; + addr &= PAGE_MASK; + maxpages = want_pages_array(pages, maxsize, offset, maxpages); + if (!maxpages) + return -ENOMEM; + res = pin_user_pages_fast(addr, maxpages, gup_flags, *pages); + if (unlikely(res <= 0)) + return res; + maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - offset); + iov_iter_advance(i, maxsize); + return maxsize; +} + +/** + * iov_iter_extract_pages - Extract a list of contiguous pages from an iterator + * @i: The iterator to extract from + * @pages: Where to return the list of pages + * @maxsize: The maximum amount of iterator to extract + * @maxpages: The maximum size of the list of pages + * @extract_flags: Flags to qualify request + * @offset0: Where to return the starting offset into (*@pages)[0] + * + * Extract a list of contiguous pages from the current point of the iterator, + * advancing the iterator. The maximum number of pages and the maximum amount + * of page contents can be set. + * + * If *@pages is NULL, a page list will be allocated to the required size and + * *@pages will be set to its base. If *@pages is not NULL, it will be assumed + * that the caller allocated a page list at least @maxpages in size and this + * will be filled in. + * + * @extract_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be + * allowed on the pages extracted. + * + * The iov_iter_extract_mode() function can be used to query how cleanup should + * be performed. + * + * Extra refs or pins on the pages may be obtained as follows: + * + * (*) If the iterator is user-backed (ITER_IOVEC/ITER_UBUF), pins will be + * added to the pages, but refs will not be taken. + * iov_iter_extract_mode() will return FOLL_PIN. + * + * (*) If the iterator is ITER_PIPE, this must describe a destination for the + * data. Additional pages may be allocated and added to the pipe (which + * will hold the refs), but pins will not be obtained for the caller. The + * caller must hold the pipe lock. iov_iter_extract_mode() will return 0. + * + * (*) If the iterator is ITER_KVEC, ITER_BVEC or ITER_XARRAY, the pages are + * merely listed; no extra refs or pins are obtained. + * iov_iter_extract_mode() will return 0. + * + * Note also: + * + * (*) Use with ITER_DISCARD is not supported as that has no content. + * + * On success, the function sets *@pages to the new pagelist, if allocated, and + * sets *offset0 to the offset into the first page. + * + * It may also return -ENOMEM and -EFAULT. + */ +ssize_t iov_iter_extract_pages(struct iov_iter *i, + struct page ***pages, + size_t maxsize, + unsigned int maxpages, + unsigned int extract_flags, + size_t *offset0) +{ + maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT); + if (!maxsize) + return 0; + + if (likely(user_backed_iter(i))) + return iov_iter_extract_user_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_kvec(i)) + return iov_iter_extract_kvec_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_bvec(i)) + return iov_iter_extract_bvec_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_pipe(i)) + return iov_iter_extract_pipe_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + if (iov_iter_is_xarray(i)) + return iov_iter_extract_xarray_pages(i, pages, maxsize, + maxpages, extract_flags, + offset0); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(iov_iter_extract_pages); ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator 2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells @ 2023-01-23 18:21 ` Christoph Hellwig 2023-01-24 14:27 ` David Hildenbrand 2023-01-24 14:35 ` David Howells 2 siblings, 0 replies; 39+ messages in thread From: Christoph Hellwig @ 2023-01-23 18:21 UTC (permalink / raw) To: David Howells Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, David Hildenbrand, linux-mm Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator 2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells 2023-01-23 18:21 ` Christoph Hellwig @ 2023-01-24 14:27 ` David Hildenbrand 2023-01-24 14:35 ` David Howells 2 siblings, 0 replies; 39+ messages in thread From: David Hildenbrand @ 2023-01-24 14:27 UTC (permalink / raw) To: David Howells, Al Viro, Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, linux-mm On 23.01.23 18:29, David Howells wrote: > Add a function, iov_iter_extract_pages(), to extract a list of pages from > an iterator. The pages may be returned with a pin added or nothing, > depending on the type of iterator. > > Add a second function, iov_iter_extract_mode(), to determine how the > cleanup should be done. > > There are two cases: > > (1) ITER_IOVEC or ITER_UBUF iterator. > > Extracted pages will have pins (FOLL_PIN) obtained on them so that a > concurrent fork() will forcibly copy the page so that DMA is done > to/from the parent's buffer and is unavailable to/unaffected by the > child process. > > iov_iter_extract_mode() will return FOLL_PIN for this case. The > caller should use something like folio_put_unpin() to dispose of the > page. > > (2) Any other sort of iterator. > > No refs or pins are obtained on the page, the assumption is made that > the caller will manage page retention. > > iov_iter_extract_mode() will return 0. The pages don't need > additional disposal. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Al Viro <viro@zeniv.linux.org.uk> > cc: Christoph Hellwig <hch@lst.de> > cc: John Hubbard <jhubbard@nvidia.com> > cc: David Hildenbrand <david@redhat.com> > cc: Matthew Wilcox <willy@infradead.org> > cc: linux-fsdevel@vger.kernel.org > cc: linux-mm@kvack.org > --- > > Notes: > ver #8) > - It seems that all DIO is supposed to be done under FOLL_PIN now, and not > FOLL_GET, so switch to only using pin_user_pages() for user-backed > iters. > - Wrap an argument in brackets in the iov_iter_extract_mode() macro. > - Drop the extract_flags argument to iov_iter_extract_mode() for now > [hch]. > > ver #7) > - Switch to passing in iter-specific flags rather than FOLL_* flags. > - Drop the direction flags for now. > - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA. > - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter. > - Add support for extraction from KVEC-type iters. > - Use iov_iter_advance() rather than open-coding it. > - Make BVEC- and KVEC-type skip over initial empty vectors. > > ver #6) > - Add back the function to indicate the cleanup mode. > - Drop the cleanup_mode return arg to iov_iter_extract_pages(). > - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter > data_source. > > ver #4) > - Use ITER_SOURCE/DEST instead of WRITE/READ. > - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. > > ver #3) > - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access > to get/pin_user_pages_fast()[1]. > > include/linux/uio.h | 22 +++ > lib/iov_iter.c | 320 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 342 insertions(+) > > diff --git a/include/linux/uio.h b/include/linux/uio.h > index 46d5080314c6..a8165335f8da 100644 > --- a/include/linux/uio.h > +++ b/include/linux/uio.h > @@ -363,4 +363,26 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, > /* Flags for iov_iter_get/extract_pages*() */ > #define ITER_ALLOW_P2PDMA 0x01 /* Allow P2PDMA on the extracted pages */ > > +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, > + size_t maxsize, unsigned int maxpages, > + unsigned int extract_flags, size_t *offset0); > + > +/** > + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained > + * @iter: The iterator > + * > + * Examine the iterator and indicate by returning FOLL_PIN or 0 as to how, if > + * at all, pages extracted from the iterator will be retained by the extraction > + * function. > + * > + * FOLL_PIN indicates that the pages will have a pin placed in them that the > + * caller must unpin. This is must be done for DMA/async DIO to force fork() > + * to forcibly copy a page for the child (the parent must retain the original > + * page). > + * > + * 0 indicates that no measures are taken and that it's up to the caller to > + * retain the pages. > + */ > +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0) > + Does it make sense to move that to the patch where it is needed? (do we need it at all anymore?) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator 2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells 2023-01-23 18:21 ` Christoph Hellwig 2023-01-24 14:27 ` David Hildenbrand @ 2023-01-24 14:35 ` David Howells 2023-01-24 14:37 ` David Hildenbrand 2023-01-24 14:45 ` David Howells 2 siblings, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:35 UTC (permalink / raw) To: David Hildenbrand Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, linux-mm David Hildenbrand <david@redhat.com> wrote: > > +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0) > > + > > Does it make sense to move that to the patch where it is needed? (do we need > it at all anymore?) Yes, we need something. Granted, there are now only two alternatives, not three: either the pages are going to be pinned or they're not going to be pinned, but I would rather have a specific function that tells you than just use, say, user_backed_iter() to make it easier to adjust it later if we need to - and easier to find the places where we need to adjust. But if it's preferred we could require people to use user_backed_iter() instead. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator 2023-01-24 14:35 ` David Howells @ 2023-01-24 14:37 ` David Hildenbrand 2023-01-24 14:45 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: David Hildenbrand @ 2023-01-24 14:37 UTC (permalink / raw) To: David Howells Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, linux-mm On 24.01.23 15:35, David Howells wrote: > David Hildenbrand <david@redhat.com> wrote: > >>> +#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0) >>> + >> >> Does it make sense to move that to the patch where it is needed? (do we need >> it at all anymore?) > > Yes, we need something. Granted, there are now only two alternatives, not > three: either the pages are going to be pinned or they're not going to be > pinned, but I would rather have a specific function that tells you than just > use, say, user_backed_iter() to make it easier to adjust it later if we need > to - and easier to find the places where we need to adjust. > > But if it's preferred we could require people to use user_backed_iter() > instead. At least reduces the occurrences of FOLL_PIN :) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator 2023-01-24 14:35 ` David Howells 2023-01-24 14:37 ` David Hildenbrand @ 2023-01-24 14:45 ` David Howells 2023-01-24 14:52 ` David Hildenbrand 1 sibling, 1 reply; 39+ messages in thread From: David Howells @ 2023-01-24 14:45 UTC (permalink / raw) To: David Hildenbrand Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, linux-mm David Hildenbrand <david@redhat.com> wrote: > At least reduces the occurrences of FOLL_PIN :) I don't see where the problem is in letting people supply FOLL_PIN or FOLL_GET. Why even have pin_user_pages() and get_user_pages() since they end up at the same place. They should be inline wrappers, if separate functions at all. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator 2023-01-24 14:45 ` David Howells @ 2023-01-24 14:52 ` David Hildenbrand 0 siblings, 0 replies; 39+ messages in thread From: David Hildenbrand @ 2023-01-24 14:52 UTC (permalink / raw) To: David Howells Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, John Hubbard, linux-mm On 24.01.23 15:45, David Howells wrote: > David Hildenbrand <david@redhat.com> wrote: > >> At least reduces the occurrences of FOLL_PIN :) > > I don't see where the problem is in letting people supply FOLL_PIN or > FOLL_GET. Why even have pin_user_pages() and get_user_pages() since they end > up at the same place. They should be inline wrappers, if separate functions > at all. There once was a proposed goal of restricting FOLL_GET to core-mm and allowing other drivers etc. to only use FOLL_PIN. Providing pin_user_pages() and the corresponding unpin part seemed very clean to me. To me that makes perfect sense and will prevent any misuse once we converted everything relevant to FOLL_PIN. To be precise, I think we could get away in this patch set by not using FOLL_PIN and FOLL_GET and it wouldn't really reduce readability of the code ... -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page [not found] <20230123173007.325544-1-dhowells@redhat.com> 2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells @ 2023-01-23 17:30 ` David Howells 2023-01-23 18:21 ` Christoph Hellwig ` (3 more replies) 2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells 2 siblings, 4 replies; 39+ messages in thread From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw) To: Al Viro, Christoph Hellwig Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Provide a helper in the get_user_pages code to drop a pin or a ref on a page based on being given FOLL_GET or FOLL_PIN in its flags argument or do nothing if neither is set. Signed-off-by: David Howells <dhowells@redhat.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Christoph Hellwig <hch@lst.de> cc: Matthew Wilcox <willy@infradead.org> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-mm@kvack.org --- include/linux/mm.h | 3 +++ mm/gup.c | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..3de9d88f8524 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) #define SECTION_IN_PAGE_FLAGS #endif +void folio_put_unpin(struct folio *folio, unsigned int flags); +void page_put_unpin(struct page *page, unsigned int flags); + /* * The identification function is mainly used by the buddy allocator for * determining if two pages could be buddies. We are not really identifying diff --git a/mm/gup.c b/mm/gup.c index f45a3a5be53a..3ee4b4c7e0cb 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) folio_put_refs(folio, refs); } +/** + * folio_put_unpin - Unpin/put a folio as appropriate + * @folio: The folio to release + * @flags: gup flags indicating the mode of release (FOLL_*) + * + * Release a folio according to the flags. If FOLL_GET is set, the folio has a + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left + * unaltered. + */ +void folio_put_unpin(struct folio *folio, unsigned int flags) +{ + if (flags & (FOLL_GET | FOLL_PIN)) + gup_put_folio(folio, 1, flags); +} +EXPORT_SYMBOL_GPL(folio_put_unpin); + +void page_put_unpin(struct page *page, unsigned int flags) +{ + folio_put_unpin(page_folio(page), flags); +} +EXPORT_SYMBOL_GPL(page_put_unpin); + /** * try_grab_page() - elevate a page's refcount by a flag-dependent amount * @page: pointer to page to be grabbed ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells @ 2023-01-23 18:21 ` Christoph Hellwig 2023-01-24 3:03 ` John Hubbard ` (2 subsequent siblings) 3 siblings, 0 replies; 39+ messages in thread From: Christoph Hellwig @ 2023-01-23 18:21 UTC (permalink / raw) To: David Howells Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells 2023-01-23 18:21 ` Christoph Hellwig @ 2023-01-24 3:03 ` John Hubbard 2023-01-24 14:28 ` David Hildenbrand 2023-01-24 14:41 ` David Howells 3 siblings, 0 replies; 39+ messages in thread From: John Hubbard @ 2023-01-24 3:03 UTC (permalink / raw) To: David Howells, Al Viro, Christoph Hellwig, Jason Gunthorpe Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On 1/23/23 09:30, David Howells wrote: > Provide a helper in the get_user_pages code to drop a pin or a ref on a > page based on being given FOLL_GET or FOLL_PIN in its flags argument or do > nothing if neither is set. > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Al Viro <viro@zeniv.linux.org.uk> > cc: Christoph Hellwig <hch@lst.de> > cc: Matthew Wilcox <willy@infradead.org> > cc: linux-fsdevel@vger.kernel.org > cc: linux-block@vger.kernel.org > cc: linux-mm@kvack.org > --- > include/linux/mm.h | 3 +++ > mm/gup.c | 22 ++++++++++++++++++++++ > 2 files changed, 25 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8f857163ac89..3de9d88f8524 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1367,6 +1367,9 @@ static inline bool is_cow_mapping(vm_flags_t flags) > #define SECTION_IN_PAGE_FLAGS > #endif > > +void folio_put_unpin(struct folio *folio, unsigned int flags); > +void page_put_unpin(struct page *page, unsigned int flags); How about these names instead: folio_put_or_unpin() page_put_or_unpin() ? Also, could we please change the name of the flags argument, to gup_flags? > + > /* > * The identification function is mainly used by the buddy allocator for > * determining if two pages could be buddies. We are not really identifying > diff --git a/mm/gup.c b/mm/gup.c > index f45a3a5be53a..3ee4b4c7e0cb 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -191,6 +191,28 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) > folio_put_refs(folio, refs); > } > > +/** > + * folio_put_unpin - Unpin/put a folio as appropriate > + * @folio: The folio to release > + * @flags: gup flags indicating the mode of release (FOLL_*) > + * > + * Release a folio according to the flags. If FOLL_GET is set, the folio has a > + * ref dropped; if FOLL_PIN is set, it is unpinned; otherwise it is left > + * unaltered. > + */ > +void folio_put_unpin(struct folio *folio, unsigned int flags) > +{ > + if (flags & (FOLL_GET | FOLL_PIN)) Another minor complication is that FOLL_PIN is supposed to be an internal-to-mm flag. But here (and in another part of the series), it has leaked into the public API. One approach would be to give up and just admit that, like FOLL_GET, FOLL_PIN has escaped into the wild. Another approach would be to use a new set of flags, such as USE_FOLL_GET and USE_FOLL_PIN. But I'm starting to lean toward the first approach: just let FOLL_PIN be used in this way, treat it as part of the external API at least for this area (not for gup/pup calls, though). So after all that thinking out loud, I think this is OK to use FOLL_PIN. +Cc Jason Gunthorpe, because he is about to split up FOLL_* into public and internal sets. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells 2023-01-23 18:21 ` Christoph Hellwig 2023-01-24 3:03 ` John Hubbard @ 2023-01-24 14:28 ` David Hildenbrand 2023-01-24 14:41 ` David Howells 3 siblings, 0 replies; 39+ messages in thread From: David Hildenbrand @ 2023-01-24 14:28 UTC (permalink / raw) To: David Howells, Al Viro, Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On 23.01.23 18:30, David Howells wrote: > Provide a helper in the get_user_pages code to drop a pin or a ref on a > page based on being given FOLL_GET or FOLL_PIN in its flags argument or do > nothing if neither is set. Does the FOLL_GET part still apply to this patch set? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells ` (2 preceding siblings ...) 2023-01-24 14:28 ` David Hildenbrand @ 2023-01-24 14:41 ` David Howells 2023-01-24 14:52 ` Christoph Hellwig 2023-01-24 15:04 ` David Howells 3 siblings, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:41 UTC (permalink / raw) To: David Hildenbrand Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm David Hildenbrand <david@redhat.com> wrote: > > Provide a helper in the get_user_pages code to drop a pin or a ref on a > > page based on being given FOLL_GET or FOLL_PIN in its flags argument or do > > nothing if neither is set. > > Does the FOLL_GET part still apply to this patch set? Yes. Christoph insisted that the bio conversion patch be split up. That means there's an interval where you can get FOLL_GET from that. However, since Jason wants to hide FOLL_PUT, this is going to have to be removed and the switching done in the bio code for the bio case (until that reduces to just pinning) and the skbuff cleanup code (when that is eventually done - that will have the possibility of skbuffs comprising a mix of ref'd, pinned and unpinned data, albeit in separate fragments; I've posted patches that illustrate this[1]). David https://lore.kernel.org/all/167391047703.2311931.8115712773222260073.stgit@warthog.procyon.org.uk/ [1] Patches 33-34 ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-24 14:41 ` David Howells @ 2023-01-24 14:52 ` Christoph Hellwig 2023-01-24 14:53 ` David Hildenbrand 2023-01-24 15:04 ` David Howells 1 sibling, 1 reply; 39+ messages in thread From: Christoph Hellwig @ 2023-01-24 14:52 UTC (permalink / raw) To: David Howells Cc: David Hildenbrand, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 02:41:33PM +0000, David Howells wrote: > Yes. Christoph insisted that the bio conversion patch be split up. That > means there's an interval where you can get FOLL_GET from that. The only place where we have both is in the block layer. It never gets set by bio_set_cleanup_mode. Instead we can just keep using put_page dirctly for the BIO_PAGE_REFFED case in the callers of bio_release_page and in bio_release_pages itself, and then do away with bio_to_gup_flags and bio_release_page entirely. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-24 14:52 ` Christoph Hellwig @ 2023-01-24 14:53 ` David Hildenbrand 0 siblings, 0 replies; 39+ messages in thread From: David Hildenbrand @ 2023-01-24 14:53 UTC (permalink / raw) To: Christoph Hellwig, David Howells Cc: Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On 24.01.23 15:52, Christoph Hellwig wrote: > On Tue, Jan 24, 2023 at 02:41:33PM +0000, David Howells wrote: >> Yes. Christoph insisted that the bio conversion patch be split up. That >> means there's an interval where you can get FOLL_GET from that. > > The only place where we have both is in the block layer. It never gets > set by bio_set_cleanup_mode. > > Instead we can just keep using put_page dirctly for the BIO_PAGE_REFFED > case in the callers of bio_release_page and in bio_release_pages itself, > and then do away with bio_to_gup_flags and bio_release_page entirely. > Thank you, just what I had in mind ... -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page 2023-01-24 14:41 ` David Howells 2023-01-24 14:52 ` Christoph Hellwig @ 2023-01-24 15:04 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: David Howells @ 2023-01-24 15:04 UTC (permalink / raw) To: Christoph Hellwig Cc: dhowells, David Hildenbrand, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Christoph Hellwig <hch@infradead.org> wrote: > The only place where we have both is in the block layer. It never gets > set by bio_set_cleanup_mode. It gets set directly by patch 6. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down [not found] <20230123173007.325544-1-dhowells@redhat.com> 2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells 2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells @ 2023-01-23 17:30 ` David Howells 2023-01-23 18:25 ` Christoph Hellwig ` (2 more replies) 2 siblings, 3 replies; 39+ messages in thread From: David Howells @ 2023-01-23 17:30 UTC (permalink / raw) To: Al Viro, Christoph Hellwig Cc: David Howells, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and also so that they can be stored in the bottom two bits of a page pointer (something I'm looking at for zerocopy socket fragments). (Note that BIO_PAGE_REFFED should probably be got rid of at some point, hence why FOLL_PIN is at 0.) Also renumber down the other FOLL_* flags to close the gaps. Signed-off-by: David Howells <dhowells@redhat.com> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Christoph Hellwig <hch@lst.de> cc: Matthew Wilcox <willy@infradead.org> cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- Notes: ver #8) - Put FOLL_PIN at bit 0 and FOLL_GET at bit 1 to match BIO_PAGE_*. - Renumber the remaining flags down to fill in the gap. include/linux/mm.h | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3de9d88f8524..c95bc4f77e8f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3074,26 +3074,28 @@ static inline vm_fault_t vmf_error(int err) struct page *follow_page(struct vm_area_struct *vma, unsigned long address, unsigned int foll_flags); -#define FOLL_WRITE 0x01 /* check pte is writable */ -#define FOLL_TOUCH 0x02 /* mark page accessed */ -#define FOLL_GET 0x04 /* do get_page on page */ -#define FOLL_DUMP 0x08 /* give error on hole if it would be zero */ -#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */ -#define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO +#define FOLL_PIN 0x01 /* pages must be released via unpin_user_page */ +#define FOLL_GET 0x02 /* do get_page on page (equivalent to BIO_FOLL_GET) */ +#define FOLL_WRITE 0x04 /* check pte is writable */ +#define FOLL_TOUCH 0x08 /* mark page accessed */ +#define FOLL_DUMP 0x10 /* give error on hole if it would be zero */ +#define FOLL_FORCE 0x20 /* get_user_pages read/write w/o permission */ +#define FOLL_NOWAIT 0x40 /* if a disk transfer is needed, start the IO * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ -#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ -#define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ -#define FOLL_ANON 0x8000 /* don't do file mappings */ -#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ -#define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ -#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ -#define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ -#define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ -#define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */ +#define FOLL_TRIED 0x200 /* a retry, previous pass started an IO */ +#define FOLL_REMOTE 0x400 /* we are working on non-current tsk/mm */ +#define FOLL_ANON 0x800 /* don't do file mappings */ +#define FOLL_LONGTERM 0x1000 /* mapping lifetime is indefinite: see below */ +#define FOLL_SPLIT_PMD 0x2000 /* split huge pmd before returning */ +#define FOLL_FAST_ONLY 0x4000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_PCI_P2PDMA 0x8000 /* allow returning PCI P2PDMA pages */ +#define FOLL_INTERRUPTIBLE 0x10000 /* allow interrupts from generic signals */ /* + * Note that FOLL_PIN is sorted to bit 0 to be coincident with BIO_PAGE_PINNED. + * * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each * other. Here is what they mean, and how to use them: * ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells @ 2023-01-23 18:25 ` Christoph Hellwig 2023-01-24 3:08 ` John Hubbard 2023-01-24 7:05 ` David Howells 2 siblings, 0 replies; 39+ messages in thread From: Christoph Hellwig @ 2023-01-23 18:25 UTC (permalink / raw) To: David Howells Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Mon, Jan 23, 2023 at 05:30:07PM +0000, David Howells wrote: > Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that > they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and > also so that they can be stored in the bottom two bits of a page pointer > (something I'm looking at for zerocopy socket fragments). > > (Note that BIO_PAGE_REFFED should probably be got rid of at some point, > hence why FOLL_PIN is at 0.) > > Also renumber down the other FOLL_* flags to close the gaps. Taking the risk of having a long bikeshed: I much prefer (1U << bitnr) style definition that make it obvious where there are holes, but otherwise: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells 2023-01-23 18:25 ` Christoph Hellwig @ 2023-01-24 3:08 ` John Hubbard 2023-01-24 3:11 ` John Hubbard 2023-01-24 7:05 ` David Howells 2 siblings, 1 reply; 39+ messages in thread From: John Hubbard @ 2023-01-24 3:08 UTC (permalink / raw) To: David Howells, Al Viro, Christoph Hellwig Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On 1/23/23 09:30, David Howells wrote: > Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that > they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and > also so that they can be stored in the bottom two bits of a page pointer > (something I'm looking at for zerocopy socket fragments). > > (Note that BIO_PAGE_REFFED should probably be got rid of at some point, > hence why FOLL_PIN is at 0.) > > Also renumber down the other FOLL_* flags to close the gaps. Should we also get these sorted into internal-to-mm and public sets? Because Jason (+Cc) again was about to split them apart into mm/internal.h [1] and that might make that a little cleaner. Also, I don't think that there is any large readability difference either way between 0x and <<1, so whatever you and Christophe settle on there seems fine. So either way with those points, Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks, -- John Hubbard NVIDIA > > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Al Viro <viro@zeniv.linux.org.uk> > cc: Christoph Hellwig <hch@lst.de> > cc: Matthew Wilcox <willy@infradead.org> > cc: linux-fsdevel@vger.kernel.org > cc: linux-mm@kvack.org > --- > > Notes: > ver #8) > - Put FOLL_PIN at bit 0 and FOLL_GET at bit 1 to match BIO_PAGE_*. > - Renumber the remaining flags down to fill in the gap. > > include/linux/mm.h | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 3de9d88f8524..c95bc4f77e8f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3074,26 +3074,28 @@ static inline vm_fault_t vmf_error(int err) > struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > unsigned int foll_flags); > > -#define FOLL_WRITE 0x01 /* check pte is writable */ > -#define FOLL_TOUCH 0x02 /* mark page accessed */ > -#define FOLL_GET 0x04 /* do get_page on page */ > -#define FOLL_DUMP 0x08 /* give error on hole if it would be zero */ > -#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */ > -#define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO > +#define FOLL_PIN 0x01 /* pages must be released via unpin_user_page */ > +#define FOLL_GET 0x02 /* do get_page on page (equivalent to BIO_FOLL_GET) */ > +#define FOLL_WRITE 0x04 /* check pte is writable */ > +#define FOLL_TOUCH 0x08 /* mark page accessed */ > +#define FOLL_DUMP 0x10 /* give error on hole if it would be zero */ > +#define FOLL_FORCE 0x20 /* get_user_pages read/write w/o permission */ > +#define FOLL_NOWAIT 0x40 /* if a disk transfer is needed, start the IO > * and return without waiting upon it */ > #define FOLL_NOFAULT 0x80 /* do not fault in pages */ > #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ > -#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ > -#define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ > -#define FOLL_ANON 0x8000 /* don't do file mappings */ > -#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ > -#define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ > -#define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ > -#define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ > -#define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ > -#define FOLL_INTERRUPTIBLE 0x200000 /* allow interrupts from generic signals */ > +#define FOLL_TRIED 0x200 /* a retry, previous pass started an IO */ > +#define FOLL_REMOTE 0x400 /* we are working on non-current tsk/mm */ > +#define FOLL_ANON 0x800 /* don't do file mappings */ > +#define FOLL_LONGTERM 0x1000 /* mapping lifetime is indefinite: see below */ > +#define FOLL_SPLIT_PMD 0x2000 /* split huge pmd before returning */ > +#define FOLL_FAST_ONLY 0x4000 /* gup_fast: prevent fall-back to slow gup */ > +#define FOLL_PCI_P2PDMA 0x8000 /* allow returning PCI P2PDMA pages */ > +#define FOLL_INTERRUPTIBLE 0x10000 /* allow interrupts from generic signals */ > > /* > + * Note that FOLL_PIN is sorted to bit 0 to be coincident with BIO_PAGE_PINNED. > + * > * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each > * other. Here is what they mean, and how to use them: > * > > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 3:08 ` John Hubbard @ 2023-01-24 3:11 ` John Hubbard 2023-01-24 13:13 ` Jason Gunthorpe ` (2 more replies) 0 siblings, 3 replies; 39+ messages in thread From: John Hubbard @ 2023-01-24 3:11 UTC (permalink / raw) To: David Howells, Al Viro, Christoph Hellwig, Jason Gunthorpe Cc: Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On 1/23/23 19:08, John Hubbard wrote: > On 1/23/23 09:30, David Howells wrote: >> Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that >> they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and >> also so that they can be stored in the bottom two bits of a page pointer >> (something I'm looking at for zerocopy socket fragments). >> >> (Note that BIO_PAGE_REFFED should probably be got rid of at some point, >> hence why FOLL_PIN is at 0.) >> >> Also renumber down the other FOLL_* flags to close the gaps. > > Should we also get these sorted into internal-to-mm and public sets? > Because Jason (+Cc) again was about to split them apart into > mm/internal.h [1] and that might make that a little cleaner. I see that I omitted both Jason's Cc and the reference, so I'll resend. Sorry for the extra noise. [1] https://lore.kernel.org/all/fcdb3294-3740-a0e0-b115-12842eb0696d@nvidia.com/ > > Also, I don't think that there is any large readability difference > either way between 0x and <<1, so whatever you and Christophe settle on > there seems fine. > > So either way with those points, > > Reviewed-by: John Hubbard <jhubbard@nvidia.com> > > thanks, thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 3:11 ` John Hubbard @ 2023-01-24 13:13 ` Jason Gunthorpe 2023-01-24 13:18 ` Christoph Hellwig 2023-01-24 13:40 ` David Howells 2023-01-24 13:46 ` David Howells 2 siblings, 1 reply; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 13:13 UTC (permalink / raw) To: John Hubbard Cc: David Howells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Mon, Jan 23, 2023 at 07:11:42PM -0800, John Hubbard wrote: > On 1/23/23 19:08, John Hubbard wrote: > > On 1/23/23 09:30, David Howells wrote: > > > Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that > > > they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and > > > also so that they can be stored in the bottom two bits of a page pointer > > > (something I'm looking at for zerocopy socket fragments). > > > > > > (Note that BIO_PAGE_REFFED should probably be got rid of at some point, > > > hence why FOLL_PIN is at 0.) > > > > > > Also renumber down the other FOLL_* flags to close the gaps. > > > > Should we also get these sorted into internal-to-mm and public sets? > > Because Jason (+Cc) again was about to split them apart into > > mm/internal.h [1] and that might make that a little cleaner. > > I see that I omitted both Jason's Cc and the reference, so I'll resend. > Sorry for the extra noise. > > [1] https://lore.kernel.org/all/fcdb3294-3740-a0e0-b115-12842eb0696d@nvidia.com/ Yeah, I already wrote a similar patch, using the 1<< notation, splitting the internal/external, and rebasing on the move to mm_types.. I can certainly drop that patch if we'd rather do this. Though, I'm not so keen on using FOLL_ internal flags inside the block layer.. Can you stick with the BIO versions of these? Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:13 ` Jason Gunthorpe @ 2023-01-24 13:18 ` Christoph Hellwig 2023-01-24 13:43 ` Jason Gunthorpe 0 siblings, 1 reply; 39+ messages in thread From: Christoph Hellwig @ 2023-01-24 13:18 UTC (permalink / raw) To: Jason Gunthorpe Cc: John Hubbard, David Howells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 09:13:30AM -0400, Jason Gunthorpe wrote: > Yeah, I already wrote a similar patch, using the 1<< notation, > splitting the internal/external, and rebasing on the move to > mm_types.. I can certainly drop that patch if we'd rather do this. Given that you are doing more work in that area it might be best to drop this patch from this series. > Though, I'm not so keen on using FOLL_ internal flags inside the block > layer.. Can you stick with the BIO versions of these? The block layer doesn't really use it - the new helper in iov_iter.c returns it, and the block layer instantly turns it into an internal flag. But maybe it should just return a bool pinned (by reference) now? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:18 ` Christoph Hellwig @ 2023-01-24 13:43 ` Jason Gunthorpe 0 siblings, 0 replies; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 13:43 UTC (permalink / raw) To: Christoph Hellwig Cc: John Hubbard, David Howells, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 05:18:07AM -0800, Christoph Hellwig wrote: > On Tue, Jan 24, 2023 at 09:13:30AM -0400, Jason Gunthorpe wrote: > > Yeah, I already wrote a similar patch, using the 1<< notation, > > splitting the internal/external, and rebasing on the move to > > mm_types.. I can certainly drop that patch if we'd rather do this. > > Given that you are doing more work in that area it might be best > to drop this patch from this series. > > > Though, I'm not so keen on using FOLL_ internal flags inside the block > > layer.. Can you stick with the BIO versions of these? > > The block layer doesn't really use it - the new helper in iov_iter.c > returns it, and the block layer instantly turns it into an internal > flag. But maybe it should just return a bool pinned (by reference) > now? Yes, a bool flag to the new mm helper instead of a FOLL_* seems good to me Thanks, Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 3:11 ` John Hubbard 2023-01-24 13:13 ` Jason Gunthorpe @ 2023-01-24 13:40 ` David Howells 2023-01-24 13:46 ` David Howells 2 siblings, 0 replies; 39+ messages in thread From: David Howells @ 2023-01-24 13:40 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > Though, I'm not so keen on using FOLL_ internal flags inside the block > layer.. Can you stick with the BIO versions of these? It's used in a new iov_iter function and will be used in a bunch of filesystems including network filesystems. I'm also looking at using it in the sk_buff handling also. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 3:11 ` John Hubbard 2023-01-24 13:13 ` Jason Gunthorpe 2023-01-24 13:40 ` David Howells @ 2023-01-24 13:46 ` David Howells 2023-01-24 13:47 ` Jason Gunthorpe 2023-01-24 13:57 ` David Howells 2 siblings, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 13:46 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > Yeah, I already wrote a similar patch, using the 1<< notation, > splitting the internal/external, and rebasing on the move to > mm_types.. I can certainly drop that patch if we'd rather do this. Note that I've already given Andrew a patch to move FOLL_* to mm_types.h. I'm happy to go with your patch on top of that if you can just renumber FOLL_PIN to 0 and FOLL_GET to 1. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:46 ` David Howells @ 2023-01-24 13:47 ` Jason Gunthorpe 2023-01-24 13:57 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 13:47 UTC (permalink / raw) To: David Howells Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 01:46:23PM +0000, David Howells wrote: > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > Yeah, I already wrote a similar patch, using the 1<< notation, > > splitting the internal/external, and rebasing on the move to > > mm_types.. I can certainly drop that patch if we'd rather do this. > > Note that I've already given Andrew a patch to move FOLL_* to mm_types.h. > > I'm happy to go with your patch on top of that if you can just renumber > FOLL_PIN to 0 and FOLL_GET to 1. I moved FOLL_PIN to internal.h because it is not supposed to be used outside mm/* I would prefer to keep it that way. Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:46 ` David Howells 2023-01-24 13:47 ` Jason Gunthorpe @ 2023-01-24 13:57 ` David Howells 2023-01-24 14:00 ` Jason Gunthorpe ` (3 more replies) 1 sibling, 4 replies; 39+ messages in thread From: David Howells @ 2023-01-24 13:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > I moved FOLL_PIN to internal.h because it is not supposed to be used > outside mm/* > > I would prefer to keep it that way. I'll need to do something else, then, such as creating a new pair of 'cleanup' flags: [include/linux/mm_types.h] #define PAGE_CLEANUP_UNPIN (1U << 0) #define PAGE_CLEANUP_PUT (1U << 0) [mm/gup.h] void folio_put_unpin(struct folio *folio, unsigned int cleanup_flags) { unsigned int gup_flags = 0; cleanup_flags &= PAGE_CLEANUP_UNPIN | PAGE_CLEANUP_PUT; if (!cleanup_flags) return; gup_flags |= cleanup_flags & PAGE_CLEANUP_UNPIN ? FOLL_PIN : 0; gup_flags |= cleanup_flags & PAGE_CLEANUP_PUT ? FOLL_GET : 0; gup_put_folio(folio, 1, flags); } EXPORT_SYMBOL_GPL(folio_put_unpin); David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:57 ` David Howells @ 2023-01-24 14:00 ` Jason Gunthorpe 2023-01-24 14:02 ` Christoph Hellwig ` (2 subsequent siblings) 3 siblings, 0 replies; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 14:00 UTC (permalink / raw) To: David Howells Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 01:57:08PM +0000, David Howells wrote: > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > I moved FOLL_PIN to internal.h because it is not supposed to be used > > outside mm/* > > > > I would prefer to keep it that way. > > I'll need to do something else, then, such as creating a new pair of 'cleanup' > flags: > > [include/linux/mm_types.h] > #define PAGE_CLEANUP_UNPIN (1U << 0) > #define PAGE_CLEANUP_PUT (1U << 0) > > [mm/gup.h] > void folio_put_unpin(struct folio *folio, unsigned int cleanup_flags) > { > unsigned int gup_flags = 0; > > cleanup_flags &= PAGE_CLEANUP_UNPIN | PAGE_CLEANUP_PUT; > if (!cleanup_flags) > return; > gup_flags |= cleanup_flags & PAGE_CLEANUP_UNPIN ? FOLL_PIN : 0; > gup_flags |= cleanup_flags & PAGE_CLEANUP_PUT ? FOLL_GET : 0; > gup_put_folio(folio, 1, flags); > } > EXPORT_SYMBOL_GPL(folio_put_unpin); I suggest: if (cleanup_flags & PAGE_CLEANUP_UNPIN) gup_put_folio(folio, 1, true); else if (cleanup_flags & PAGE_CLEANUP_PUT) gup_put_folio(folio, 1, false); or if you are really counting cycles if (cleanup_flags & PAGE_CLEANUP_NEEDED) gup_put_folio(folio, 1, cleanup_flags & PAGE_CLEANUP_UNPIN) Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:57 ` David Howells 2023-01-24 14:00 ` Jason Gunthorpe @ 2023-01-24 14:02 ` Christoph Hellwig 2023-01-24 14:11 ` David Howells 2023-01-24 14:12 ` David Howells 3 siblings, 0 replies; 39+ messages in thread From: Christoph Hellwig @ 2023-01-24 14:02 UTC (permalink / raw) To: David Howells Cc: Jason Gunthorpe, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 01:57:08PM +0000, David Howells wrote: > [include/linux/mm_types.h] > #define PAGE_CLEANUP_UNPIN (1U << 0) > #define PAGE_CLEANUP_PUT (1U << 0) With the latest series we don't need PAGE_CLEANUP_PUT at all. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:57 ` David Howells 2023-01-24 14:00 ` Jason Gunthorpe 2023-01-24 14:02 ` Christoph Hellwig @ 2023-01-24 14:11 ` David Howells 2023-01-24 14:14 ` Jason Gunthorpe 2023-01-24 14:27 ` David Howells 2023-01-24 14:12 ` David Howells 3 siblings, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:11 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > if (cleanup_flags & PAGE_CLEANUP_UNPIN) > gup_put_folio(folio, 1, true); > else if (cleanup_flags & PAGE_CLEANUP_PUT) > gup_put_folio(folio, 1, false); gup_put_folio() doesn't take a bool - it takes FOLL_PIN and FOLL_GET: static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) though I could do: if (cleanup_flags & PAGE_CLEANUP_UNPIN) gup_put_folio(folio, 1, FOLL_PIN); else if (cleanup_flags & PAGE_CLEANUP_PUT) gup_put_folio(folio, 1, FOLL_GET); But the reason I wrote it like this: gup_flags |= cleanup_flags & PAGE_CLEANUP_UNPIN ? FOLL_PIN : 0; gup_flags |= cleanup_flags & PAGE_CLEANUP_PUT ? FOLL_GET : 0; is that if PAGE_CLEANUP_UNPIN == FOLL_PIN and PAGE_CLEANUP_PUT == FOLL_GET, the C compiler optimiser should be able to turn all of that into a single AND instruction. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:11 ` David Howells @ 2023-01-24 14:14 ` Jason Gunthorpe 2023-01-24 14:27 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 14:14 UTC (permalink / raw) To: David Howells Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 02:11:50PM +0000, David Howells wrote: > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > if (cleanup_flags & PAGE_CLEANUP_UNPIN) > > gup_put_folio(folio, 1, true); > > else if (cleanup_flags & PAGE_CLEANUP_PUT) > > gup_put_folio(folio, 1, false); > > gup_put_folio() doesn't take a bool - it takes FOLL_PIN and FOLL_GET: My point was to change that to take a 'bool unpin' because FOLL_PIN is not to be used outside gup.c Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:11 ` David Howells 2023-01-24 14:14 ` Jason Gunthorpe @ 2023-01-24 14:27 ` David Howells 2023-01-24 14:31 ` Jason Gunthorpe 2023-01-24 14:59 ` David Howells 1 sibling, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:27 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > My point was to change that to take a 'bool unpin' because FOLL_PIN is > not to be used outside gup.c I need a 3-state wrapper. But if I can't do it in gup.c, then I'll have to do it elsewhere. As Christoph says, most of the places will only be pinned or not-pinned. The whole point was to avoid generating new constants when existing constants would do. David. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:27 ` David Howells @ 2023-01-24 14:31 ` Jason Gunthorpe 2023-01-24 14:59 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 14:31 UTC (permalink / raw) To: David Howells Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 02:27:53PM +0000, David Howells wrote: > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > My point was to change that to take a 'bool unpin' because FOLL_PIN is > > not to be used outside gup.c > > I need a 3-state wrapper. But if I can't do it in gup.c, then I'll have to do > it elsewhere. As Christoph says, most of the places will only be pinned or > not-pinned. The whole point was to avoid generating new constants when > existing constants would do. What is the 3rd state? Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:27 ` David Howells 2023-01-24 14:31 ` Jason Gunthorpe @ 2023-01-24 14:59 ` David Howells 2023-01-24 15:06 ` Jason Gunthorpe 2023-01-24 15:12 ` David Howells 1 sibling, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > What is the 3rd state? Consider a network filesystem message generated for a direct I/O that the network filesystem does zerocopy on. You may have an sk_buff that has fragments from one or more of three different sources: (1) Fragments consisting of specifically allocated pages, such as the IP/UDP/TCP headers that have refs taken on them. (2) Fragments consisting of zerocopy kernel buffers that has neither refs nor pins belonging to the sk_buff. iov_iter_extract_pages() will not take pins when extracting from, say, an XARRAY-type or KVEC-type iterator. iov_iter_extract_mode() will return 0. (3) Fragments consisting of zerocopy user buffers that have pins taken on them belonging to the sk_buff. iov_iter_extract_pages() will take pins when extracting from, say, a UBUF-type or IOVEC-type iterator. iov_iter_extract_mode() will return FOLL_PIN (at the moment). So you have three states: Ref'd, pinned and no-retention. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:59 ` David Howells @ 2023-01-24 15:06 ` Jason Gunthorpe 2023-01-24 15:12 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: Jason Gunthorpe @ 2023-01-24 15:06 UTC (permalink / raw) To: David Howells Cc: John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 02:59:25PM +0000, David Howells wrote: > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > What is the 3rd state? > > Consider a network filesystem message generated for a direct I/O that the > network filesystem does zerocopy on. You may have an sk_buff that has > fragments from one or more of three different sources: > > (1) Fragments consisting of specifically allocated pages, such as the > IP/UDP/TCP headers that have refs taken on them. > > (2) Fragments consisting of zerocopy kernel buffers that has neither refs nor > pins belonging to the sk_buff. > > iov_iter_extract_pages() will not take pins when extracting from, say, an > XARRAY-type or KVEC-type iterator. iov_iter_extract_mode() will return > 0. > > (3) Fragments consisting of zerocopy user buffers that have pins taken on > them belonging to the sk_buff. > > iov_iter_extract_pages() will take pins when extracting from, say, a > UBUF-type or IOVEC-type iterator. iov_iter_extract_mode() will return > FOLL_PIN (at the moment). > > So you have three states: Ref'd, pinned and no-retention. Isn't that this: if (cleanup_flags & PAGE_CLEANUP_NEEDED) gup_put_folio(folio, 1, cleanup_flags & PAGE_CLEANUP_UNPIN) ? Three states - decr get, decr pinned, do nothing? Jason ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:59 ` David Howells 2023-01-24 15:06 ` Jason Gunthorpe @ 2023-01-24 15:12 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: David Howells @ 2023-01-24 15:12 UTC (permalink / raw) To: Jason Gunthorpe Cc: dhowells, John Hubbard, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Jason Gunthorpe <jgg@nvidia.com> wrote: > Isn't that this: > > if (cleanup_flags & PAGE_CLEANUP_NEEDED) > gup_put_folio(folio, 1, cleanup_flags & PAGE_CLEANUP_UNPIN) > > > ? Yes. As claimed, that would be three states. > Three states - decr get, decr pinned, do nothing? Yes. Don't worry about it. I'm not going to do it in gup.c. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 13:57 ` David Howells ` (2 preceding siblings ...) 2023-01-24 14:11 ` David Howells @ 2023-01-24 14:12 ` David Howells 2023-01-24 14:13 ` Christoph Hellwig 2023-01-24 14:25 ` David Howells 3 siblings, 2 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:12 UTC (permalink / raw) To: Christoph Hellwig Cc: dhowells, Jason Gunthorpe, John Hubbard, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jan 24, 2023 at 01:57:08PM +0000, David Howells wrote: > > [include/linux/mm_types.h] > > #define PAGE_CLEANUP_UNPIN (1U << 0) > > #define PAGE_CLEANUP_PUT (1U << 0) > > With the latest series we don't need PAGE_CLEANUP_PUT at all. We will when it comes to skbuffs. David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:12 ` David Howells @ 2023-01-24 14:13 ` Christoph Hellwig 2023-01-24 14:25 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: Christoph Hellwig @ 2023-01-24 14:13 UTC (permalink / raw) To: David Howells Cc: Christoph Hellwig, Jason Gunthorpe, John Hubbard, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm On Tue, Jan 24, 2023 at 02:12:25PM +0000, David Howells wrote: > > With the latest series we don't need PAGE_CLEANUP_PUT at all. > > We will when it comes to skbuffs. I'm a little doubtful of that. But IFF skbuffs need it, we can just change the interface at that point. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-24 14:12 ` David Howells 2023-01-24 14:13 ` Christoph Hellwig @ 2023-01-24 14:25 ` David Howells 1 sibling, 0 replies; 39+ messages in thread From: David Howells @ 2023-01-24 14:25 UTC (permalink / raw) To: Christoph Hellwig Cc: dhowells, Jason Gunthorpe, John Hubbard, Al Viro, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm Christoph Hellwig <hch@infradead.org> wrote: > > > With the latest series we don't need PAGE_CLEANUP_PUT at all. > > > > We will when it comes to skbuffs. > > I'm a little doubtful of that. skbuff fragments will be and will have to be a mixture of allocated pages that need putting and pinned or non-ref'd and non-pinned zerocopy stuff. I have posted a patch that works for the limited amount of driverage that I use on my test machine. Think network filesystem messages where you have a mixture of protocol bits generated by the kernel and data provided by direct I/O being sent by zerocopy (libceph, for example). David ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down 2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells 2023-01-23 18:25 ` Christoph Hellwig 2023-01-24 3:08 ` John Hubbard @ 2023-01-24 7:05 ` David Howells 2 siblings, 0 replies; 39+ messages in thread From: David Howells @ 2023-01-24 7:05 UTC (permalink / raw) To: John Hubbard Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jens Axboe, Jan Kara, Jeff Layton, Logan Gunthorpe, linux-fsdevel, linux-block, linux-kernel, Christoph Hellwig, linux-mm John Hubbard <jhubbard@nvidia.com> wrote: > > Renumber FOLL_PIN and FOLL_GET down to bit 0 and 1 respectively so that > > they are coincidentally the same as BIO_PAGE_PINNED and BIO_PAGE_REFFED and > > also so that they can be stored in the bottom two bits of a page pointer > > (something I'm looking at for zerocopy socket fragments). > > (Note that BIO_PAGE_REFFED should probably be got rid of at some point, > > hence why FOLL_PIN is at 0.) > > Also renumber down the other FOLL_* flags to close the gaps. > > Should we also get these sorted into internal-to-mm and public sets? > Because Jason (+Cc) again was about to split them apart into > mm/internal.h [1] and that might make that a little cleaner. My plan was to push this patch by itself through akpm since it's only an optimisation and not necessary to the rest of the patches here. David ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2023-01-24 15:12 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20230123173007.325544-1-dhowells@redhat.com>
2023-01-23 17:29 ` [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator David Howells
2023-01-23 18:21 ` Christoph Hellwig
2023-01-24 14:27 ` David Hildenbrand
2023-01-24 14:35 ` David Howells
2023-01-24 14:37 ` David Hildenbrand
2023-01-24 14:45 ` David Howells
2023-01-24 14:52 ` David Hildenbrand
2023-01-23 17:30 ` [PATCH v8 03/10] mm: Provide a helper to drop a pin/ref on a page David Howells
2023-01-23 18:21 ` Christoph Hellwig
2023-01-24 3:03 ` John Hubbard
2023-01-24 14:28 ` David Hildenbrand
2023-01-24 14:41 ` David Howells
2023-01-24 14:52 ` Christoph Hellwig
2023-01-24 14:53 ` David Hildenbrand
2023-01-24 15:04 ` David Howells
2023-01-23 17:30 ` [PATCH v8 10/10] mm: Renumber FOLL_PIN and FOLL_GET down David Howells
2023-01-23 18:25 ` Christoph Hellwig
2023-01-24 3:08 ` John Hubbard
2023-01-24 3:11 ` John Hubbard
2023-01-24 13:13 ` Jason Gunthorpe
2023-01-24 13:18 ` Christoph Hellwig
2023-01-24 13:43 ` Jason Gunthorpe
2023-01-24 13:40 ` David Howells
2023-01-24 13:46 ` David Howells
2023-01-24 13:47 ` Jason Gunthorpe
2023-01-24 13:57 ` David Howells
2023-01-24 14:00 ` Jason Gunthorpe
2023-01-24 14:02 ` Christoph Hellwig
2023-01-24 14:11 ` David Howells
2023-01-24 14:14 ` Jason Gunthorpe
2023-01-24 14:27 ` David Howells
2023-01-24 14:31 ` Jason Gunthorpe
2023-01-24 14:59 ` David Howells
2023-01-24 15:06 ` Jason Gunthorpe
2023-01-24 15:12 ` David Howells
2023-01-24 14:12 ` David Howells
2023-01-24 14:13 ` Christoph Hellwig
2023-01-24 14:25 ` David Howells
2023-01-24 7:05 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox