linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
       [not found] ` <1726319158-283074-2-git-send-email-steven.sistare@oracle.com>
@ 2024-09-14 13:19   ` Steven Sistare
  2024-09-17 12:25     ` David Hildenbrand
       [not found]   ` <ZudFcANtENlaRJ+r@nvidia.com>
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Sistare @ 2024-09-14 13:19 UTC (permalink / raw)
  To: linux-mm
  Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, iommu, Andrew Morton,
	Matthew Wilcox

cc'ing linux-mm for review of this one patch of the series.

This proposes a new KAPI function repin_folio_unhugely(), for use in this
patch of the iommu_ioas_map_file series:

   iommufd: IOMMU_IOAS_MAP_FILE implementation
   https://lore.kernel.org/linux-iommu/1726319158-283074-7-git-send-email-steven.sistare@oracle.com

- Steve

On 9/14/2024 9:05 AM, Steve Sistare wrote:
> Export a function that repins a huge-page folio at small-page granularity.
> This allows any range of small pages within the folio to be unpinned later.
> For example, pages pinned via memfd_pin_folios and modified by
> repin_folio_unhugely could be unpinned via unpin_user_page(s).
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>   include/linux/mm.h |  1 +
>   mm/gup.c           | 18 ++++++++++++++++++
>   2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1470736..ba8344f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2514,6 +2514,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>   long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>   		      struct folio **folios, unsigned int max_folios,
>   		      pgoff_t *offset);
> +void repin_folio_unhugely(struct folio *folio, unsigned long npin);
>   
>   int get_user_pages_fast(unsigned long start, int nr_pages,
>   			unsigned int gup_flags, struct page **pages);
> diff --git a/mm/gup.c b/mm/gup.c
> index 947881ff..f8f3f2a 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>   	return ret;
>   }
>   EXPORT_SYMBOL_GPL(memfd_pin_folios);
> +
> +/**
> + * repin_folio_unhugely() - repin a folio at small page granularity
> + * @folio: the folio to repin
> + * @npin:  the number of pages pinned in the folio
> + *
> + * Given a huge page folio that is already pinned, and the number of small
> + * pages that are pinned in it, adjust the pincount to reflect small-page
> + * granularity.  Each small page can later be unpinned individually.
> + */
> +void repin_folio_unhugely(struct folio *folio, unsigned long npin)
> +{
> +	if (!folio_test_large(folio) || is_huge_zero_folio(folio) || npin == 1)
> +		return;
> +	atomic_add(npin - 1, &folio->_refcount);
> +	atomic_add(npin - 1, &folio->_pincount);
> +}
> +EXPORT_SYMBOL_GPL(repin_folio_unhugely);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-14 13:19   ` [PATCH V1 1/9] mm/gup: repin_folio_unhugely Steven Sistare
@ 2024-09-17 12:25     ` David Hildenbrand
  2024-09-18 14:51       ` Steven Sistare
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2024-09-17 12:25 UTC (permalink / raw)
  To: Steven Sistare, linux-mm
  Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, iommu, Andrew Morton,
	Matthew Wilcox

On 14.09.24 15:19, Steven Sistare wrote:
> cc'ing linux-mm for review of this one patch of the series.
> 
> This proposes a new KAPI function repin_folio_unhugely(), for use in this
> patch of the iommu_ioas_map_file series:
> 
>     iommufd: IOMMU_IOAS_MAP_FILE implementation
>     https://lore.kernel.org/linux-iommu/1726319158-283074-7-git-send-email-steven.sistare@oracle.com
> 
> - Steve
> 
> On 9/14/2024 9:05 AM, Steve Sistare wrote:
>> Export a function that repins a huge-page folio at small-page granularity.
>> This allows any range of small pages within the folio to be unpinned later.
>> For example, pages pinned via memfd_pin_folios and modified by
>> repin_folio_unhugely could be unpinned via unpin_user_page(s).
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>    include/linux/mm.h |  1 +
>>    mm/gup.c           | 18 ++++++++++++++++++
>>    2 files changed, 19 insertions(+)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1470736..ba8344f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -2514,6 +2514,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>>    long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>    		      struct folio **folios, unsigned int max_folios,
>>    		      pgoff_t *offset);
>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin);
>>    
>>    int get_user_pages_fast(unsigned long start, int nr_pages,
>>    			unsigned int gup_flags, struct page **pages);
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 947881ff..f8f3f2a 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>    	return ret;
>>    }
>>    EXPORT_SYMBOL_GPL(memfd_pin_folios);
>> +
>> +/**
>> + * repin_folio_unhugely() - repin a folio at small page granularity
>> + * @folio: the folio to repin
>> + * @npin:  the number of pages pinned in the folio
>> + *
>> + * Given a huge page folio that is already pinned, and the number of small

s/huge page folio/large folio/

>> + * pages that are pinned in it, adjust the pincount to reflect small-page
>> + * granularity.  Each small page can later be unpinned individually.
>> + */
>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin)
>> +{
>> +	if (!folio_test_large(folio) || is_huge_zero_folio(folio) || npin == 1)

Why not the huge zero folio? That looks very odd here.

>> +		return;
>> +	atomic_add(npin - 1, &folio->_refcount);
>> +	atomic_add(npin - 1, &folio->_pincount);
>> +}
>> +EXPORT_SYMBOL_GPL(repin_folio_unhugely);

Can we ... find a better name? For example, it's "large" folio not "huge"...

And repin is really misleading. We are simply adding more pins to an 
already pinned one ...

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
       [not found]   ` <ZudFcANtENlaRJ+r@nvidia.com>
@ 2024-09-18 14:51     ` Steven Sistare
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Sistare @ 2024-09-18 14:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Nicolin Chen, David Hildenbrand,
	Andrew Morton, linux-mm

On 9/15/2024 4:37 PM, Jason Gunthorpe wrote:
> On Sat, Sep 14, 2024 at 06:05:50AM -0700, Steve Sistare wrote:
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 947881ff..f8f3f2a 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>   	return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(memfd_pin_folios);
>> +
>> +/**
>> + * repin_folio_unhugely() - repin a folio at small page granularity
>> + * @folio: the folio to repin
>> + * @npin:  the number of pages pinned in the folio
>> + *
>> + * Given a huge page folio that is already pinned, and the number of small
>> + * pages that are pinned in it, adjust the pincount to reflect small-page
>> + * granularity.  Each small page can later be unpinned individually.
>> + */
> 
> I think the language choice here is probably not entirely consistent
> with the rest of the code

Yes, a good name was not obvious to me, to I submitted a slightly flippant
name to generate some discussion :)

> 
> Maybe
> 
>   folio_split_user_page_pin(folio, npages)
>   @npages: The new number of pages the folio pin reference should hold
> 
>   Given a high order folio that is already pinned adjust the reference
>   count to allow unpin_user_page_range() and related to be called on a
>   the folio. npages is the number of pages that will be passed to a
>   future unpin_user_page_range().
> 
> Which anchors it in the purpose a little more

Thanks, I will use this if no one objects.

- Steve

> The implementation looked OK to me
> 
> Jason


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-17 12:25     ` David Hildenbrand
@ 2024-09-18 14:51       ` Steven Sistare
  2024-09-19  8:11         ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Sistare @ 2024-09-18 14:51 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, iommu, Andrew Morton,
	Matthew Wilcox

On 9/17/2024 8:25 AM, David Hildenbrand wrote:
> On 14.09.24 15:19, Steven Sistare wrote:
>> cc'ing linux-mm for review of this one patch of the series.
>>
>> This proposes a new KAPI function repin_folio_unhugely(), for use in this
>> patch of the iommu_ioas_map_file series:
>>
>>     iommufd: IOMMU_IOAS_MAP_FILE implementation
>>     https://lore.kernel.org/linux-iommu/1726319158-283074-7-git-send-email-steven.sistare@oracle.com
>>
>> - Steve
>>
>> On 9/14/2024 9:05 AM, Steve Sistare wrote:
>>> Export a function that repins a huge-page folio at small-page granularity.
>>> This allows any range of small pages within the folio to be unpinned later.
>>> For example, pages pinned via memfd_pin_folios and modified by
>>> repin_folio_unhugely could be unpinned via unpin_user_page(s).
>>>
>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>>    include/linux/mm.h |  1 +
>>>    mm/gup.c           | 18 ++++++++++++++++++
>>>    2 files changed, 19 insertions(+)
>>>
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 1470736..ba8344f 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -2514,6 +2514,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>>>    long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>                  struct folio **folios, unsigned int max_folios,
>>>                  pgoff_t *offset);
>>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin);
>>>    int get_user_pages_fast(unsigned long start, int nr_pages,
>>>                unsigned int gup_flags, struct page **pages);
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 947881ff..f8f3f2a 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>        return ret;
>>>    }
>>>    EXPORT_SYMBOL_GPL(memfd_pin_folios);
>>> +
>>> +/**
>>> + * repin_folio_unhugely() - repin a folio at small page granularity
>>> + * @folio: the folio to repin
>>> + * @npin:  the number of pages pinned in the folio
>>> + *
>>> + * Given a huge page folio that is already pinned, and the number of small
> 
> s/huge page folio/large folio/
> 
>>> + * pages that are pinned in it, adjust the pincount to reflect small-page
>>> + * granularity.  Each small page can later be unpinned individually.
>>> + */
>>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin)
>>> +{
>>> +    if (!folio_test_large(folio) || is_huge_zero_folio(folio) || npin == 1)
> 
> Why not the huge zero folio? That looks very odd here.

The zero page is treated specially here and elsewhere, it can never be deleted so
reference fiddling is skipped.

>>> +        return;
>>> +    atomic_add(npin - 1, &folio->_refcount);
>>> +    atomic_add(npin - 1, &folio->_pincount);
>>> +}
>>> +EXPORT_SYMBOL_GPL(repin_folio_unhugely);
> 
> Can we ... find a better name? For example, it's "large" folio not "huge"...
> 
> And repin is really misleading. We are simply adding more pins to an already pinned one ...

Jason suggests a better name in the other thread.

- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-18 14:51       ` Steven Sistare
@ 2024-09-19  8:11         ` David Hildenbrand
  2024-09-19 21:06           ` Steven Sistare
  2024-09-20 13:28           ` Jason Gunthorpe
  0 siblings, 2 replies; 12+ messages in thread
From: David Hildenbrand @ 2024-09-19  8:11 UTC (permalink / raw)
  To: Steven Sistare, linux-mm
  Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, iommu, Andrew Morton,
	Matthew Wilcox

On 18.09.24 16:51, Steven Sistare wrote:
> On 9/17/2024 8:25 AM, David Hildenbrand wrote:
>> On 14.09.24 15:19, Steven Sistare wrote:
>>> cc'ing linux-mm for review of this one patch of the series.
>>>
>>> This proposes a new KAPI function repin_folio_unhugely(), for use in this
>>> patch of the iommu_ioas_map_file series:
>>>
>>>      iommufd: IOMMU_IOAS_MAP_FILE implementation
>>>      https://lore.kernel.org/linux-iommu/1726319158-283074-7-git-send-email-steven.sistare@oracle.com
>>>
>>> - Steve
>>>
>>> On 9/14/2024 9:05 AM, Steve Sistare wrote:
>>>> Export a function that repins a huge-page folio at small-page granularity.
>>>> This allows any range of small pages within the folio to be unpinned later.
>>>> For example, pages pinned via memfd_pin_folios and modified by
>>>> repin_folio_unhugely could be unpinned via unpin_user_page(s).
>>>>
>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>>     include/linux/mm.h |  1 +
>>>>     mm/gup.c           | 18 ++++++++++++++++++
>>>>     2 files changed, 19 insertions(+)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index 1470736..ba8344f 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -2514,6 +2514,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>>>>     long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>>                   struct folio **folios, unsigned int max_folios,
>>>>                   pgoff_t *offset);
>>>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin);
>>>>     int get_user_pages_fast(unsigned long start, int nr_pages,
>>>>                 unsigned int gup_flags, struct page **pages);
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 947881ff..f8f3f2a 100644
>>>> --- a/mm/gup.c
>>>> +++ b/mm/gup.c
>>>> @@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>>         return ret;
>>>>     }
>>>>     EXPORT_SYMBOL_GPL(memfd_pin_folios);
>>>> +
>>>> +/**
>>>> + * repin_folio_unhugely() - repin a folio at small page granularity
>>>> + * @folio: the folio to repin
>>>> + * @npin:  the number of pages pinned in the folio
>>>> + *
>>>> + * Given a huge page folio that is already pinned, and the number of small
>>
>> s/huge page folio/large folio/
>>
>>>> + * pages that are pinned in it, adjust the pincount to reflect small-page
>>>> + * granularity.  Each small page can later be unpinned individually.
>>>> + */
>>>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin)
>>>> +{
>>>> +    if (!folio_test_large(folio) || is_huge_zero_folio(folio) || npin == 1)
>>
>> Why not the huge zero folio? That looks very odd here.
> 
> The zero page is treated specially here and elsewhere, it can never be deleted so
> reference fiddling is skipped.

Please point me in mm/gup.c at that handling.

IIRC is_zero_folio() does *not* include the huge zero page.

Yes, we should likely be special-casing the huge zeropage in mm/gup.c, 
but it's not that easy because PINs can outlive MMs ... so *not* 
grabbing a reference could currently be harmful.

But that has do be changed consistently, not with doing things here 
different compared to other gup.c functions.

> 
>>>> +        return;
>>>> +    atomic_add(npin - 1, &folio->_refcount);
>>>> +    atomic_add(npin - 1, &folio->_pincount);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(repin_folio_unhugely);
>>
>> Can we ... find a better name? For example, it's "large" folio not "huge"...
>>
>> And repin is really misleading. We are simply adding more pins to an already pinned one ...
> 
> Jason suggests a better name in the other thread.

I would prefer something that simply adds more pins to an already pinned 
folio. Much easier to get.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-19  8:11         ` David Hildenbrand
@ 2024-09-19 21:06           ` Steven Sistare
  2024-09-26 11:38             ` David Hildenbrand
  2024-09-20 13:28           ` Jason Gunthorpe
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Sistare @ 2024-09-19 21:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, iommu, Andrew Morton,
	Matthew Wilcox

On 9/19/2024 4:11 AM, David Hildenbrand wrote:
> On 18.09.24 16:51, Steven Sistare wrote:
>> On 9/17/2024 8:25 AM, David Hildenbrand wrote:
>>> On 14.09.24 15:19, Steven Sistare wrote:
>>>> cc'ing linux-mm for review of this one patch of the series.
>>>>
>>>> This proposes a new KAPI function repin_folio_unhugely(), for use in this
>>>> patch of the iommu_ioas_map_file series:
>>>>
>>>>      iommufd: IOMMU_IOAS_MAP_FILE implementation
>>>>      https://lore.kernel.org/linux-iommu/1726319158-283074-7-git-send-email-steven.sistare@oracle.com
>>>>
>>>> - Steve
>>>>
>>>> On 9/14/2024 9:05 AM, Steve Sistare wrote:
>>>>> Export a function that repins a huge-page folio at small-page granularity.
>>>>> This allows any range of small pages within the folio to be unpinned later.
>>>>> For example, pages pinned via memfd_pin_folios and modified by
>>>>> repin_folio_unhugely could be unpinned via unpin_user_page(s).
>>>>>
>>>>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>> ---
>>>>>     include/linux/mm.h |  1 +
>>>>>     mm/gup.c           | 18 ++++++++++++++++++
>>>>>     2 files changed, 19 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index 1470736..ba8344f 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -2514,6 +2514,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>>>>>     long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>>>                   struct folio **folios, unsigned int max_folios,
>>>>>                   pgoff_t *offset);
>>>>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin);
>>>>>     int get_user_pages_fast(unsigned long start, int nr_pages,
>>>>>                 unsigned int gup_flags, struct page **pages);
>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>> index 947881ff..f8f3f2a 100644
>>>>> --- a/mm/gup.c
>>>>> +++ b/mm/gup.c
>>>>> @@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>>>         return ret;
>>>>>     }
>>>>>     EXPORT_SYMBOL_GPL(memfd_pin_folios);
>>>>> +
>>>>> +/**
>>>>> + * repin_folio_unhugely() - repin a folio at small page granularity
>>>>> + * @folio: the folio to repin
>>>>> + * @npin:  the number of pages pinned in the folio
>>>>> + *
>>>>> + * Given a huge page folio that is already pinned, and the number of small
>>>
>>> s/huge page folio/large folio/
>>>
>>>>> + * pages that are pinned in it, adjust the pincount to reflect small-page
>>>>> + * granularity.  Each small page can later be unpinned individually.
>>>>> + */
>>>>> +void repin_folio_unhugely(struct folio *folio, unsigned long npin)
>>>>> +{
>>>>> +    if (!folio_test_large(folio) || is_huge_zero_folio(folio) || npin == 1)
>>>
>>> Why not the huge zero folio? That looks very odd here.
>>
>> The zero page is treated specially here and elsewhere, it can never be deleted so
>> reference fiddling is skipped.
> 
> Please point me in mm/gup.c at that handling.
> 
> IIRC is_zero_folio() does *not* include the huge zero page.
> 
> Yes, we should likely be special-casing the huge zeropage in mm/gup.c, but it's not that easy because PINs can outlive MMs ... so *not* grabbing a reference could currently be harmful.
> 
> But that has do be changed consistently, not with doing things here different compared to other gup.c functions.

folios_put() -> folios_put_refs() -> is_huge_zero_folio()

I will run some tests with huge zero folios to verify the ref and pin
counts behave correctly.

>>>>> +        return;
>>>>> +    atomic_add(npin - 1, &folio->_refcount);
>>>>> +    atomic_add(npin - 1, &folio->_pincount);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(repin_folio_unhugely);
>>>
>>> Can we ... find a better name? For example, it's "large" folio not "huge"...
>>>
>>> And repin is really misleading. We are simply adding more pins to an already pinned one ...
>>
>> Jason suggests a better name in the other thread.
> 
> I would prefer something that simply adds more pins to an already pinned folio. Much easier to get.

How about folio_add_pins()?

- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-19  8:11         ` David Hildenbrand
  2024-09-19 21:06           ` Steven Sistare
@ 2024-09-20 13:28           ` Jason Gunthorpe
  2024-09-26 11:32             ` David Hildenbrand
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2024-09-20 13:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Steven Sistare, linux-mm, Kevin Tian, Nicolin Chen, iommu,
	Andrew Morton, Matthew Wilcox

On Thu, Sep 19, 2024 at 10:11:38AM +0200, David Hildenbrand wrote:

> > > And repin is really misleading. We are simply adding more pins to an already pinned one ...
> > 
> > Jason suggests a better name in the other thread.
> 
> I would prefer something that simply adds more pins to an already pinned
> folio. Much easier to get.

Yes, but also nobody should ever want to do that operation, it should
always be part of some kind of "splitting" sort of behavior..

Jason


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-20 13:28           ` Jason Gunthorpe
@ 2024-09-26 11:32             ` David Hildenbrand
  2024-09-26 11:40               ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2024-09-26 11:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Steven Sistare, linux-mm, Kevin Tian, Nicolin Chen, iommu,
	Andrew Morton, Matthew Wilcox

On 20.09.24 15:28, Jason Gunthorpe wrote:
> On Thu, Sep 19, 2024 at 10:11:38AM +0200, David Hildenbrand wrote:
> 
>>>> And repin is really misleading. We are simply adding more pins to an already pinned one ...
>>>
>>> Jason suggests a better name in the other thread.
>>
>> I would prefer something that simply adds more pins to an already pinned
>> folio. Much easier to get.
> 
> Yes, but also nobody should ever want to do that operation, it should
> always be part of some kind of "splitting" sort of behavior..

I remember patches from Dave Howells that needed that for O_DIRECT 
handling. Never say never ;)

Adding is much more intuitive than splitting ... just like we add 
references when splitting a THP, using folio_ref_add().

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-19 21:06           ` Steven Sistare
@ 2024-09-26 11:38             ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2024-09-26 11:38 UTC (permalink / raw)
  To: Steven Sistare, linux-mm
  Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, iommu, Andrew Morton,
	Matthew Wilcox

>>> reference fiddling is skipped.
>>
>> Please point me in mm/gup.c at that handling.
>>
>> IIRC is_zero_folio() does *not* include the huge zero page.
>>
>> Yes, we should likely be special-casing the huge zeropage in mm/gup.c, but it's not that easy because PINs can outlive MMs ... so *not* grabbing a reference could currently be harmful.
>>
>> But that has do be changed consistently, not with doing things here different compared to other gup.c functions.
> 

Sorry for the late reply, conferences ...

> folios_put() -> folios_put_refs() -> is_huge_zero_folio()

unpin_user_folio() -> gup_put_folio() will do

1) atomic_sub(refs, &folio->_pincount);
2) folio_put_refs(folio, refs);


What you cite above is for releasing huge folios that are mapped into 
user space.

> 
> I will run some tests with huge zero folios to verify the ref and pin
> counts behave correctly.

Whatever is pinned is supposed to be released via the unpin interface. 
We have to be consistent there, so this patch would be wrong (likely you 
never get the huge zero folio).

> 
>>>>>> +        return;
>>>>>> +    atomic_add(npin - 1, &folio->_refcount);
>>>>>> +    atomic_add(npin - 1, &folio->_pincount);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(repin_folio_unhugely);
>>>>
>>>> Can we ... find a better name? For example, it's "large" folio not "huge"...
>>>>
>>>> And repin is really misleading. We are simply adding more pins to an already pinned one ...
>>>
>>> Jason suggests a better name in the other thread.
>>
>> I would prefer something that simply adds more pins to an already pinned folio. Much easier to get.
> 
> How about folio_add_pins()?

Something like that, but more importantly, via which interface did we 
obtain the pins? That will make a difference and should be documented.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-26 11:32             ` David Hildenbrand
@ 2024-09-26 11:40               ` Jason Gunthorpe
  2024-09-26 12:57                 ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 11:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Steven Sistare, linux-mm, Kevin Tian, Nicolin Chen, iommu,
	Andrew Morton, Matthew Wilcox

On Thu, Sep 26, 2024 at 01:32:09PM +0200, David Hildenbrand wrote:
> On 20.09.24 15:28, Jason Gunthorpe wrote:
> > On Thu, Sep 19, 2024 at 10:11:38AM +0200, David Hildenbrand wrote:
> > 
> > > > > And repin is really misleading. We are simply adding more pins to an already pinned one ...
> > > > 
> > > > Jason suggests a better name in the other thread.
> > > 
> > > I would prefer something that simply adds more pins to an already pinned
> > > folio. Much easier to get.
> > 
> > Yes, but also nobody should ever want to do that operation, it should
> > always be part of some kind of "splitting" sort of behavior..
> 
> I remember patches from Dave Howells that needed that for O_DIRECT handling.
> Never say never ;)

Wouldn't O_DIRECT be the same splitting thing?

> Adding is much more intuitive than splitting ... just like we add references
> when splitting a THP, using folio_ref_add().

Well, sure, it just seems harder to document so people can use it
properly.

Jason


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-26 11:40               ` Jason Gunthorpe
@ 2024-09-26 12:57                 ` David Hildenbrand
  2024-09-26 12:58                   ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2024-09-26 12:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Steven Sistare, linux-mm, Kevin Tian, Nicolin Chen, iommu,
	Andrew Morton, Matthew Wilcox

On 26.09.24 13:40, Jason Gunthorpe wrote:
> On Thu, Sep 26, 2024 at 01:32:09PM +0200, David Hildenbrand wrote:
>> On 20.09.24 15:28, Jason Gunthorpe wrote:
>>> On Thu, Sep 19, 2024 at 10:11:38AM +0200, David Hildenbrand wrote:
>>>
>>>>>> And repin is really misleading. We are simply adding more pins to an already pinned one ...
>>>>>
>>>>> Jason suggests a better name in the other thread.
>>>>
>>>> I would prefer something that simply adds more pins to an already pinned
>>>> folio. Much easier to get.
>>>
>>> Yes, but also nobody should ever want to do that operation, it should
>>> always be part of some kind of "splitting" sort of behavior..
>>
>> I remember patches from Dave Howells that needed that for O_DIRECT handling.
>> Never say never ;)
> 
> Wouldn't O_DIRECT be the same splitting thing?

No, I recall the implementation had to duplicate pins, not split.

> 
>> Adding is much more intuitive than splitting ... just like we add references
>> when splitting a THP, using folio_ref_add().
> 
> Well, sure, it just seems harder to document so people can use it
> properly.

As soon as we have large GUP and splitting might go to multiple PMDs, 
multiple PTEs or a mixture, just being able to add the number of pins 
you actually need might be cleaner ...

Let me have a look at the latest reincarnation of this patch.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V1 1/9] mm/gup: repin_folio_unhugely
  2024-09-26 12:57                 ` David Hildenbrand
@ 2024-09-26 12:58                   ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2024-09-26 12:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Steven Sistare, linux-mm, Kevin Tian, Nicolin Chen, iommu,
	Andrew Morton, Matthew Wilcox

On 26.09.24 14:57, David Hildenbrand wrote:
> On 26.09.24 13:40, Jason Gunthorpe wrote:
>> On Thu, Sep 26, 2024 at 01:32:09PM +0200, David Hildenbrand wrote:
>>> On 20.09.24 15:28, Jason Gunthorpe wrote:
>>>> On Thu, Sep 19, 2024 at 10:11:38AM +0200, David Hildenbrand wrote:
>>>>
>>>>>>> And repin is really misleading. We are simply adding more pins to an already pinned one ...
>>>>>>
>>>>>> Jason suggests a better name in the other thread.
>>>>>
>>>>> I would prefer something that simply adds more pins to an already pinned
>>>>> folio. Much easier to get.
>>>>
>>>> Yes, but also nobody should ever want to do that operation, it should
>>>> always be part of some kind of "splitting" sort of behavior..
>>>
>>> I remember patches from Dave Howells that needed that for O_DIRECT handling.
>>> Never say never ;)
>>
>> Wouldn't O_DIRECT be the same splitting thing?
> 
> No, I recall the implementation had to duplicate pins, not split.
> 
>>
>>> Adding is much more intuitive than splitting ... just like we add references
>>> when splitting a THP, using folio_ref_add().
>>
>> Well, sure, it just seems harder to document so people can use it
>> properly.
> 
> As soon as we have large GUP 

s/GUP/PUD/

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-26 12:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1726319158-283074-1-git-send-email-steven.sistare@oracle.com>
     [not found] ` <1726319158-283074-2-git-send-email-steven.sistare@oracle.com>
2024-09-14 13:19   ` [PATCH V1 1/9] mm/gup: repin_folio_unhugely Steven Sistare
2024-09-17 12:25     ` David Hildenbrand
2024-09-18 14:51       ` Steven Sistare
2024-09-19  8:11         ` David Hildenbrand
2024-09-19 21:06           ` Steven Sistare
2024-09-26 11:38             ` David Hildenbrand
2024-09-20 13:28           ` Jason Gunthorpe
2024-09-26 11:32             ` David Hildenbrand
2024-09-26 11:40               ` Jason Gunthorpe
2024-09-26 12:57                 ` David Hildenbrand
2024-09-26 12:58                   ` David Hildenbrand
     [not found]   ` <ZudFcANtENlaRJ+r@nvidia.com>
2024-09-18 14:51     ` Steven Sistare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox