* [PATCH V2] mm/gup: folio_split_user_page_pin @ 2024-09-24 15:05 Steve Sistare 2024-09-24 16:55 ` Jason Gunthorpe 2024-09-27 15:44 ` David Hildenbrand 0 siblings, 2 replies; 8+ messages in thread From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, David Hildenbrand, Jason Gunthorpe, Matthew Wilcox, Steve Sistare Export a function that repins a high-order folio at small-page granularity. This allows any range of small pages within the folio to be unpinned later. For example, pages pinned via memfd_pin_folios and modified by folio_split_user_page_pin could be unpinned via unpin_user_page(s). Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- In V2 this has been renamed from repin_folio_unhugely, but is otherwise unchanged from V1. --- --- include/linux/mm.h | 1 + mm/gup.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 13bff7c..b0b572d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, struct folio **folios, unsigned int max_folios, pgoff_t *offset); +void folio_split_user_page_pin(struct folio *folio, unsigned long npages); int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); diff --git a/mm/gup.c b/mm/gup.c index fcd602b..94ee79dd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, return ret; } EXPORT_SYMBOL_GPL(memfd_pin_folios); + +/** + * folio_split_user_page_pin() - split the pin on a high order folio + * @folio: the folio to split + * @npages: The new number of pages the folio pin reference should hold + * + * Given a high order folio that is already pinned, adjust the reference + * count to allow unpin_user_page_range() and related to be called on a + * the folio. npages is the number of pages that will be passed to a + * future unpin_user_page_range(). + */ +void folio_split_user_page_pin(struct folio *folio, unsigned long npages) +{ + if (!folio_test_large(folio) || is_huge_zero_folio(folio) || + npages == 1) + return; + atomic_add(npages - 1, &folio->_refcount); + atomic_add(npages - 1, &folio->_pincount); +} +EXPORT_SYMBOL_GPL(folio_split_user_page_pin); -- 1.8.3.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-09-24 15:05 [PATCH V2] mm/gup: folio_split_user_page_pin Steve Sistare @ 2024-09-24 16:55 ` Jason Gunthorpe 2024-09-27 15:44 ` David Hildenbrand 1 sibling, 0 replies; 8+ messages in thread From: Jason Gunthorpe @ 2024-09-24 16:55 UTC (permalink / raw) To: Steve Sistare; +Cc: linux-mm, Andrew Morton, David Hildenbrand, Matthew Wilcox On Tue, Sep 24, 2024 at 08:05:32AM -0700, Steve Sistare wrote: > Export a function that repins a high-order folio at small-page granularity. > This allows any range of small pages within the folio to be unpinned later. > For example, pages pinned via memfd_pin_folios and modified by > folio_split_user_page_pin could be unpinned via unpin_user_page(s). > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > > --- > In V2 this has been renamed from repin_folio_unhugely, but is > otherwise unchanged from V1. This needs to stay in your series since I will need to take it all together.. But it looks Ok to me Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-09-24 15:05 [PATCH V2] mm/gup: folio_split_user_page_pin Steve Sistare 2024-09-24 16:55 ` Jason Gunthorpe @ 2024-09-27 15:44 ` David Hildenbrand 2024-09-27 15:58 ` Jason Gunthorpe 1 sibling, 1 reply; 8+ messages in thread From: David Hildenbrand @ 2024-09-27 15:44 UTC (permalink / raw) To: Steve Sistare, linux-mm; +Cc: Andrew Morton, Jason Gunthorpe, Matthew Wilcox On 24.09.24 17:05, Steve Sistare wrote: > Export a function that repins a high-order folio at small-page granularity. > This allows any range of small pages within the folio to be unpinned later. > For example, pages pinned via memfd_pin_folios and modified by > folio_split_user_page_pin could be unpinned via unpin_user_page(s). > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > > --- > In V2 this has been renamed from repin_folio_unhugely, but is > otherwise unchanged from V1. > --- > --- > include/linux/mm.h | 1 + > mm/gup.c | 20 ++++++++++++++++++++ > 2 files changed, 21 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 13bff7c..b0b572d 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, > struct folio **folios, unsigned int max_folios, > pgoff_t *offset); > +void folio_split_user_page_pin(struct folio *folio, unsigned long npages); > > int get_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages); > diff --git a/mm/gup.c b/mm/gup.c > index fcd602b..94ee79dd 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, > return ret; > } > EXPORT_SYMBOL_GPL(memfd_pin_folios); > + > +/** > + * folio_split_user_page_pin() - split the pin on a high order folio There really is no such concept of splitting pins :/ > + * @folio: the folio to split "folio to split": Highly misleading :) > + * @npages: The new number of pages the folio pin reference should hold > + * > + * Given a high order folio that is already pinned, adjust the reference > + * count to allow unpin_user_page_range() and related to be called on a unpin_user_page_range() does not exist, at least upstream. Did you mean unpin_user_page_range_dirty_lock() ? > + * the folio. npages is the number of pages that will be passed to a > + * future unpin_user_page_range(). > + */ > +void folio_split_user_page_pin(struct folio *folio, unsigned long npages) > +{ > + if (!folio_test_large(folio) || is_huge_zero_folio(folio) || is_huge_zero_folio() is still likely wrong. Just follow the flow in unpin_user_page_range_dirty_lock() -> gup_put_folio(). Please point to me where in unpin_user_page_range_dirty_lock() -> gup_put_folio() there is_a huge_zero_folio() special-casing is that would skip adjusting the refcount and the pincount, so it would be balanced? > + npages == 1) > + return; > + atomic_add(npages - 1, &folio->_refcount); > + atomic_add(npages - 1, &folio->_pincount); > +} > +EXPORT_SYMBOL_GPL(folio_split_user_page_pin); I can understand why we want to add more pins to a folio. I don't like this interface. I would suggest a more generic interface: /** * folio_try_add_pins() - add pins to an already-pinned folio * @folio: the folio to add more pins to * * Try to add more pins to an already-pinned folio. The semantics * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot * be changed. * * This function is helpful when having obtained a pin on a large folio * using memfd_pin_folios(), but wanting to logically unpin parts * (e.g., individual pages) of the folio later, for example, using * unpin_user_page_range_dirty_lock(). * * This is not the right interface to initially pin a folio. */ int folio_try_add_pins(struct folio *folio, unsigned int pins) { VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); return try_grab_folio(folio, pins, FOLL_PIN); } We might want to consider adding even better overflow checks in try_grab_folio(), but that's a different discussion. The shared zeropage will be taken care of automatically, and the huge zero folio does currently not need any special care ... -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-09-27 15:44 ` David Hildenbrand @ 2024-09-27 15:58 ` Jason Gunthorpe 2024-10-01 17:17 ` Steven Sistare 0 siblings, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2024-09-27 15:58 UTC (permalink / raw) To: David Hildenbrand; +Cc: Steve Sistare, linux-mm, Andrew Morton, Matthew Wilcox On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: > /** > * folio_try_add_pins() - add pins to an already-pinned folio > * @folio: the folio to add more pins to > * > * Try to add more pins to an already-pinned folio. The semantics > * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot > * be changed. > * > * This function is helpful when having obtained a pin on a large folio > * using memfd_pin_folios(), but wanting to logically unpin parts > * (e.g., individual pages) of the folio later, for example, using > * unpin_user_page_range_dirty_lock(). > * > * This is not the right interface to initially pin a folio. > */ > int folio_try_add_pins(struct folio *folio, unsigned int pins) > { > VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); > > return try_grab_folio(folio, pins, FOLL_PIN); > } That looks pretty good to me too Thanks, Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-09-27 15:58 ` Jason Gunthorpe @ 2024-10-01 17:17 ` Steven Sistare 2024-10-04 10:04 ` David Hildenbrand 0 siblings, 1 reply; 8+ messages in thread From: Steven Sistare @ 2024-10-01 17:17 UTC (permalink / raw) To: Jason Gunthorpe, David Hildenbrand Cc: linux-mm, Andrew Morton, Matthew Wilcox On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: > On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >> /** >> * folio_try_add_pins() - add pins to an already-pinned folio >> * @folio: the folio to add more pins to >> * >> * Try to add more pins to an already-pinned folio. The semantics >> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >> * be changed. >> * >> * This function is helpful when having obtained a pin on a large folio >> * using memfd_pin_folios(), but wanting to logically unpin parts >> * (e.g., individual pages) of the folio later, for example, using >> * unpin_user_page_range_dirty_lock(). >> * >> * This is not the right interface to initially pin a folio. >> */ >> int folio_try_add_pins(struct folio *folio, unsigned int pins) >> { >> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >> >> return try_grab_folio(folio, pins, FOLL_PIN); >> } > > That looks pretty good to me too Looks good and passes my tests, I will add this version in V3 of the patch series. Are you sure you want "try" in the name folio_try_add_pins? Usually try means that any failure is transient and a future call may succeed, but the failures in try_grab_folio look permanent to me (suggesting that is also a misnomer, but at least it is not exported). - Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-10-01 17:17 ` Steven Sistare @ 2024-10-04 10:04 ` David Hildenbrand 2024-10-04 17:20 ` Steven Sistare 0 siblings, 1 reply; 8+ messages in thread From: David Hildenbrand @ 2024-10-04 10:04 UTC (permalink / raw) To: Steven Sistare, Jason Gunthorpe; +Cc: linux-mm, Andrew Morton, Matthew Wilcox On 01.10.24 19:17, Steven Sistare wrote: > On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: >> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >>> /** >>> * folio_try_add_pins() - add pins to an already-pinned folio >>> * @folio: the folio to add more pins to >>> * >>> * Try to add more pins to an already-pinned folio. The semantics >>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >>> * be changed. >>> * >>> * This function is helpful when having obtained a pin on a large folio >>> * using memfd_pin_folios(), but wanting to logically unpin parts >>> * (e.g., individual pages) of the folio later, for example, using >>> * unpin_user_page_range_dirty_lock(). >>> * >>> * This is not the right interface to initially pin a folio. >>> */ >>> int folio_try_add_pins(struct folio *folio, unsigned int pins) >>> { >>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >>> >>> return try_grab_folio(folio, pins, FOLL_PIN); >>> } >> >> That looks pretty good to me too > > Looks good and passes my tests, I will add this version in V3 of the patch series. > > Are you sure you want "try" in the name folio_try_add_pins? Usually try means > that any failure is transient and a future call may succeed And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged). So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins. Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-10-04 10:04 ` David Hildenbrand @ 2024-10-04 17:20 ` Steven Sistare 2024-10-04 20:19 ` David Hildenbrand 0 siblings, 1 reply; 8+ messages in thread From: Steven Sistare @ 2024-10-04 17:20 UTC (permalink / raw) To: David Hildenbrand, Jason Gunthorpe Cc: linux-mm, Andrew Morton, Matthew Wilcox On 10/4/2024 6:04 AM, David Hildenbrand wrote: > On 01.10.24 19:17, Steven Sistare wrote: >> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: >>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >>>> /** >>>> * folio_try_add_pins() - add pins to an already-pinned folio >>>> * @folio: the folio to add more pins to >>>> * >>>> * Try to add more pins to an already-pinned folio. The semantics >>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >>>> * be changed. >>>> * >>>> * This function is helpful when having obtained a pin on a large folio >>>> * using memfd_pin_folios(), but wanting to logically unpin parts >>>> * (e.g., individual pages) of the folio later, for example, using >>>> * unpin_user_page_range_dirty_lock(). >>>> * >>>> * This is not the right interface to initially pin a folio. >>>> */ >>>> int folio_try_add_pins(struct folio *folio, unsigned int pins) >>>> { >>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >>>> >>>> return try_grab_folio(folio, pins, FOLL_PIN); >>>> } >>> >>> That looks pretty good to me too >> >> Looks good and passes my tests, I will add this version in V3 of the patch series. >> >> Are you sure you want "try" in the name folio_try_add_pins? Usually try means >> that any failure is transient and a future call may succeed > > And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged). > > So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins. > Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct. To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but rename it to folio_add_pins. And not touch the existing folio_add_pin. I am ready to send V3 of the iommu_ioas_map_file series, and I would like to add this patch back to the series as Jason requested. - Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin 2024-10-04 17:20 ` Steven Sistare @ 2024-10-04 20:19 ` David Hildenbrand 0 siblings, 0 replies; 8+ messages in thread From: David Hildenbrand @ 2024-10-04 20:19 UTC (permalink / raw) To: Steven Sistare, Jason Gunthorpe Cc: linux-mm, Andrew Morton, Matthew Wilcox, Nico Pache On 04.10.24 19:20, Steven Sistare wrote: > On 10/4/2024 6:04 AM, David Hildenbrand wrote: >> On 01.10.24 19:17, Steven Sistare wrote: >>> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote: >>>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote: >>>>> /** >>>>> * folio_try_add_pins() - add pins to an already-pinned folio >>>>> * @folio: the folio to add more pins to >>>>> * >>>>> * Try to add more pins to an already-pinned folio. The semantics >>>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot >>>>> * be changed. >>>>> * >>>>> * This function is helpful when having obtained a pin on a large folio >>>>> * using memfd_pin_folios(), but wanting to logically unpin parts >>>>> * (e.g., individual pages) of the folio later, for example, using >>>>> * unpin_user_page_range_dirty_lock(). >>>>> * >>>>> * This is not the right interface to initially pin a folio. >>>>> */ >>>>> int folio_try_add_pins(struct folio *folio, unsigned int pins) >>>>> { >>>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); >>>>> >>>>> return try_grab_folio(folio, pins, FOLL_PIN); >>>>> } >>>> >>>> That looks pretty good to me too >>> >>> Looks good and passes my tests, I will add this version in V3 of the patch series. >>> >>> Are you sure you want "try" in the name folio_try_add_pins? Usually try means >>> that any failure is transient and a future call may succeed >> >> And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged). >> >> So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins. >> Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct. > > To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but > rename it to folio_add_pins. And not touch the existing folio_add_pin. This should be unified/cleaned up at some point. @Nico, interested in looking into this? Otherwise, I can add it to my todo list. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-04 20:19 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-09-24 15:05 [PATCH V2] mm/gup: folio_split_user_page_pin Steve Sistare 2024-09-24 16:55 ` Jason Gunthorpe 2024-09-27 15:44 ` David Hildenbrand 2024-09-27 15:58 ` Jason Gunthorpe 2024-10-01 17:17 ` Steven Sistare 2024-10-04 10:04 ` David Hildenbrand 2024-10-04 17:20 ` Steven Sistare 2024-10-04 20:19 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox