* [PATCH V2] mm/gup: folio_split_user_page_pin
@ 2024-09-24 15:05 Steve Sistare
2024-09-24 16:55 ` Jason Gunthorpe
2024-09-27 15:44 ` David Hildenbrand
0 siblings, 2 replies; 8+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, David Hildenbrand, Jason Gunthorpe,
Matthew Wilcox, Steve Sistare
Export a function that repins a high-order folio at small-page granularity.
This allows any range of small pages within the folio to be unpinned later.
For example, pages pinned via memfd_pin_folios and modified by
folio_split_user_page_pin could be unpinned via unpin_user_page(s).
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
In V2 this has been renamed from repin_folio_unhugely, but is
otherwise unchanged from V1.
---
---
include/linux/mm.h | 1 +
mm/gup.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 13bff7c..b0b572d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
struct folio **folios, unsigned int max_folios,
pgoff_t *offset);
+void folio_split_user_page_pin(struct folio *folio, unsigned long npages);
int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index fcd602b..94ee79dd 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
return ret;
}
EXPORT_SYMBOL_GPL(memfd_pin_folios);
+
+/**
+ * folio_split_user_page_pin() - split the pin on a high order folio
+ * @folio: the folio to split
+ * @npages: The new number of pages the folio pin reference should hold
+ *
+ * Given a high order folio that is already pinned, adjust the reference
+ * count to allow unpin_user_page_range() and related to be called on a
+ * the folio. npages is the number of pages that will be passed to a
+ * future unpin_user_page_range().
+ */
+void folio_split_user_page_pin(struct folio *folio, unsigned long npages)
+{
+ if (!folio_test_large(folio) || is_huge_zero_folio(folio) ||
+ npages == 1)
+ return;
+ atomic_add(npages - 1, &folio->_refcount);
+ atomic_add(npages - 1, &folio->_pincount);
+}
+EXPORT_SYMBOL_GPL(folio_split_user_page_pin);
--
1.8.3.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-09-24 15:05 [PATCH V2] mm/gup: folio_split_user_page_pin Steve Sistare
@ 2024-09-24 16:55 ` Jason Gunthorpe
2024-09-27 15:44 ` David Hildenbrand
1 sibling, 0 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2024-09-24 16:55 UTC (permalink / raw)
To: Steve Sistare; +Cc: linux-mm, Andrew Morton, David Hildenbrand, Matthew Wilcox
On Tue, Sep 24, 2024 at 08:05:32AM -0700, Steve Sistare wrote:
> Export a function that repins a high-order folio at small-page granularity.
> This allows any range of small pages within the folio to be unpinned later.
> For example, pages pinned via memfd_pin_folios and modified by
> folio_split_user_page_pin could be unpinned via unpin_user_page(s).
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>
> ---
> In V2 this has been renamed from repin_folio_unhugely, but is
> otherwise unchanged from V1.
This needs to stay in your series since I will need to take it all
together..
But it looks Ok to me
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-09-24 15:05 [PATCH V2] mm/gup: folio_split_user_page_pin Steve Sistare
2024-09-24 16:55 ` Jason Gunthorpe
@ 2024-09-27 15:44 ` David Hildenbrand
2024-09-27 15:58 ` Jason Gunthorpe
1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-09-27 15:44 UTC (permalink / raw)
To: Steve Sistare, linux-mm; +Cc: Andrew Morton, Jason Gunthorpe, Matthew Wilcox
On 24.09.24 17:05, Steve Sistare wrote:
> Export a function that repins a high-order folio at small-page granularity.
> This allows any range of small pages within the folio to be unpinned later.
> For example, pages pinned via memfd_pin_folios and modified by
> folio_split_user_page_pin could be unpinned via unpin_user_page(s).
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>
> ---
> In V2 this has been renamed from repin_folio_unhugely, but is
> otherwise unchanged from V1.
> ---
> ---
> include/linux/mm.h | 1 +
> mm/gup.c | 20 ++++++++++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 13bff7c..b0b572d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2521,6 +2521,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> struct folio **folios, unsigned int max_folios,
> pgoff_t *offset);
> +void folio_split_user_page_pin(struct folio *folio, unsigned long npages);
>
> int get_user_pages_fast(unsigned long start, int nr_pages,
> unsigned int gup_flags, struct page **pages);
> diff --git a/mm/gup.c b/mm/gup.c
> index fcd602b..94ee79dd 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3733,3 +3733,23 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> return ret;
> }
> EXPORT_SYMBOL_GPL(memfd_pin_folios);
> +
> +/**
> + * folio_split_user_page_pin() - split the pin on a high order folio
There really is no such concept of splitting pins :/
> + * @folio: the folio to split
"folio to split": Highly misleading :)
> + * @npages: The new number of pages the folio pin reference should hold
> + *
> + * Given a high order folio that is already pinned, adjust the reference
> + * count to allow unpin_user_page_range() and related to be called on a
unpin_user_page_range() does not exist, at least upstream. Did you mean
unpin_user_page_range_dirty_lock() ?
> + * the folio. npages is the number of pages that will be passed to a
> + * future unpin_user_page_range().
> + */
> +void folio_split_user_page_pin(struct folio *folio, unsigned long npages)
> +{
> + if (!folio_test_large(folio) || is_huge_zero_folio(folio) ||
is_huge_zero_folio() is still likely wrong.
Just follow the flow in unpin_user_page_range_dirty_lock() ->
gup_put_folio().
Please point to me where in unpin_user_page_range_dirty_lock() ->
gup_put_folio() there is_a huge_zero_folio() special-casing is that
would skip adjusting the refcount and the pincount, so it would be balanced?
> + npages == 1)
> + return;
> + atomic_add(npages - 1, &folio->_refcount);
> + atomic_add(npages - 1, &folio->_pincount);
> +}
> +EXPORT_SYMBOL_GPL(folio_split_user_page_pin);
I can understand why we want to add more pins to a folio. I don't like
this interface.
I would suggest a more generic interface:
/**
* folio_try_add_pins() - add pins to an already-pinned folio
* @folio: the folio to add more pins to
*
* Try to add more pins to an already-pinned folio. The semantics
* of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
* be changed.
*
* This function is helpful when having obtained a pin on a large folio
* using memfd_pin_folios(), but wanting to logically unpin parts
* (e.g., individual pages) of the folio later, for example, using
* unpin_user_page_range_dirty_lock().
*
* This is not the right interface to initially pin a folio.
*/
int folio_try_add_pins(struct folio *folio, unsigned int pins)
{
VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
return try_grab_folio(folio, pins, FOLL_PIN);
}
We might want to consider adding even better overflow checks in
try_grab_folio(), but that's a different discussion.
The shared zeropage will be taken care of automatically, and the huge
zero folio does currently not need any special care ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-09-27 15:44 ` David Hildenbrand
@ 2024-09-27 15:58 ` Jason Gunthorpe
2024-10-01 17:17 ` Steven Sistare
0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2024-09-27 15:58 UTC (permalink / raw)
To: David Hildenbrand; +Cc: Steve Sistare, linux-mm, Andrew Morton, Matthew Wilcox
On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
> /**
> * folio_try_add_pins() - add pins to an already-pinned folio
> * @folio: the folio to add more pins to
> *
> * Try to add more pins to an already-pinned folio. The semantics
> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
> * be changed.
> *
> * This function is helpful when having obtained a pin on a large folio
> * using memfd_pin_folios(), but wanting to logically unpin parts
> * (e.g., individual pages) of the folio later, for example, using
> * unpin_user_page_range_dirty_lock().
> *
> * This is not the right interface to initially pin a folio.
> */
> int folio_try_add_pins(struct folio *folio, unsigned int pins)
> {
> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>
> return try_grab_folio(folio, pins, FOLL_PIN);
> }
That looks pretty good to me too
Thanks,
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-09-27 15:58 ` Jason Gunthorpe
@ 2024-10-01 17:17 ` Steven Sistare
2024-10-04 10:04 ` David Hildenbrand
0 siblings, 1 reply; 8+ messages in thread
From: Steven Sistare @ 2024-10-01 17:17 UTC (permalink / raw)
To: Jason Gunthorpe, David Hildenbrand
Cc: linux-mm, Andrew Morton, Matthew Wilcox
On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>> /**
>> * folio_try_add_pins() - add pins to an already-pinned folio
>> * @folio: the folio to add more pins to
>> *
>> * Try to add more pins to an already-pinned folio. The semantics
>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>> * be changed.
>> *
>> * This function is helpful when having obtained a pin on a large folio
>> * using memfd_pin_folios(), but wanting to logically unpin parts
>> * (e.g., individual pages) of the folio later, for example, using
>> * unpin_user_page_range_dirty_lock().
>> *
>> * This is not the right interface to initially pin a folio.
>> */
>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>> {
>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>
>> return try_grab_folio(folio, pins, FOLL_PIN);
>> }
>
> That looks pretty good to me too
Looks good and passes my tests, I will add this version in V3 of the patch series.
Are you sure you want "try" in the name folio_try_add_pins? Usually try means
that any failure is transient and a future call may succeed, but the failures in
try_grab_folio look permanent to me (suggesting that is also a misnomer, but at
least it is not exported).
- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-10-01 17:17 ` Steven Sistare
@ 2024-10-04 10:04 ` David Hildenbrand
2024-10-04 17:20 ` Steven Sistare
0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-10-04 10:04 UTC (permalink / raw)
To: Steven Sistare, Jason Gunthorpe; +Cc: linux-mm, Andrew Morton, Matthew Wilcox
On 01.10.24 19:17, Steven Sistare wrote:
> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>>> /**
>>> * folio_try_add_pins() - add pins to an already-pinned folio
>>> * @folio: the folio to add more pins to
>>> *
>>> * Try to add more pins to an already-pinned folio. The semantics
>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>> * be changed.
>>> *
>>> * This function is helpful when having obtained a pin on a large folio
>>> * using memfd_pin_folios(), but wanting to logically unpin parts
>>> * (e.g., individual pages) of the folio later, for example, using
>>> * unpin_user_page_range_dirty_lock().
>>> *
>>> * This is not the right interface to initially pin a folio.
>>> */
>>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>>> {
>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>>
>>> return try_grab_folio(folio, pins, FOLL_PIN);
>>> }
>>
>> That looks pretty good to me too
>
> Looks good and passes my tests, I will add this version in V3 of the patch series.
>
> Are you sure you want "try" in the name folio_try_add_pins? Usually try means
> that any failure is transient and a future call may succeed
And now I took another look at the codebase and we already do have
folio_add_pin() that adds a single pin, but continues on overflows (not
sure I like that, but at least it can be caught and debugged).
So yes, we could simply turn folio_add_pin() into a wrapper around a
folio_add_pins() that adds multiple pins.
Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the
open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED
accounting correct.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-10-04 10:04 ` David Hildenbrand
@ 2024-10-04 17:20 ` Steven Sistare
2024-10-04 20:19 ` David Hildenbrand
0 siblings, 1 reply; 8+ messages in thread
From: Steven Sistare @ 2024-10-04 17:20 UTC (permalink / raw)
To: David Hildenbrand, Jason Gunthorpe
Cc: linux-mm, Andrew Morton, Matthew Wilcox
On 10/4/2024 6:04 AM, David Hildenbrand wrote:
> On 01.10.24 19:17, Steven Sistare wrote:
>> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
>>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>>>> /**
>>>> * folio_try_add_pins() - add pins to an already-pinned folio
>>>> * @folio: the folio to add more pins to
>>>> *
>>>> * Try to add more pins to an already-pinned folio. The semantics
>>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>>> * be changed.
>>>> *
>>>> * This function is helpful when having obtained a pin on a large folio
>>>> * using memfd_pin_folios(), but wanting to logically unpin parts
>>>> * (e.g., individual pages) of the folio later, for example, using
>>>> * unpin_user_page_range_dirty_lock().
>>>> *
>>>> * This is not the right interface to initially pin a folio.
>>>> */
>>>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>>>> {
>>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>>>
>>>> return try_grab_folio(folio, pins, FOLL_PIN);
>>>> }
>>>
>>> That looks pretty good to me too
>>
>> Looks good and passes my tests, I will add this version in V3 of the patch series.
>>
>> Are you sure you want "try" in the name folio_try_add_pins? Usually try means
>> that any failure is transient and a future call may succeed
>
> And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged).
>
> So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins.
> Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct.
To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but
rename it to folio_add_pins. And not touch the existing folio_add_pin.
I am ready to send V3 of the iommu_ioas_map_file series, and I would like to add this
patch back to the series as Jason requested.
- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] mm/gup: folio_split_user_page_pin
2024-10-04 17:20 ` Steven Sistare
@ 2024-10-04 20:19 ` David Hildenbrand
0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-10-04 20:19 UTC (permalink / raw)
To: Steven Sistare, Jason Gunthorpe
Cc: linux-mm, Andrew Morton, Matthew Wilcox, Nico Pache
On 04.10.24 19:20, Steven Sistare wrote:
> On 10/4/2024 6:04 AM, David Hildenbrand wrote:
>> On 01.10.24 19:17, Steven Sistare wrote:
>>> On 9/27/2024 11:58 AM, Jason Gunthorpe wrote:
>>>> On Fri, Sep 27, 2024 at 05:44:52PM +0200, David Hildenbrand wrote:
>>>>> /**
>>>>> * folio_try_add_pins() - add pins to an already-pinned folio
>>>>> * @folio: the folio to add more pins to
>>>>> *
>>>>> * Try to add more pins to an already-pinned folio. The semantics
>>>>> * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
>>>>> * be changed.
>>>>> *
>>>>> * This function is helpful when having obtained a pin on a large folio
>>>>> * using memfd_pin_folios(), but wanting to logically unpin parts
>>>>> * (e.g., individual pages) of the folio later, for example, using
>>>>> * unpin_user_page_range_dirty_lock().
>>>>> *
>>>>> * This is not the right interface to initially pin a folio.
>>>>> */
>>>>> int folio_try_add_pins(struct folio *folio, unsigned int pins)
>>>>> {
>>>>> VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
>>>>>
>>>>> return try_grab_folio(folio, pins, FOLL_PIN);
>>>>> }
>>>>
>>>> That looks pretty good to me too
>>>
>>> Looks good and passes my tests, I will add this version in V3 of the patch series.
>>>
>>> Are you sure you want "try" in the name folio_try_add_pins? Usually try means
>>> that any failure is transient and a future call may succeed
>>
>> And now I took another look at the codebase and we already do have folio_add_pin() that adds a single pin, but continues on overflows (not sure I like that, but at least it can be caught and debugged).
>>
>> So yes, we could simply turn folio_add_pin() into a wrapper around a folio_add_pins() that adds multiple pins.
>> Looking at folio_add_pin() vs. try_grab_folio() I am not sure if the open-coding the logic in folio_add_pin() got the NR_FOLL_PIN_ACQUIRED accounting correct.
>
> To be clear, I am only suggesting that I use your folio_try_add_pins implementation, but
> rename it to folio_add_pins. And not touch the existing folio_add_pin.
This should be unified/cleaned up at some point.
@Nico, interested in looking into this? Otherwise, I can add it to my
todo list.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-04 20:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-24 15:05 [PATCH V2] mm/gup: folio_split_user_page_pin Steve Sistare
2024-09-24 16:55 ` Jason Gunthorpe
2024-09-27 15:44 ` David Hildenbrand
2024-09-27 15:58 ` Jason Gunthorpe
2024-10-01 17:17 ` Steven Sistare
2024-10-04 10:04 ` David Hildenbrand
2024-10-04 17:20 ` Steven Sistare
2024-10-04 20:19 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox