[PATCH] mm/gup: stop leaking pinned pages in low memory conditions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
@ 2024-10-16 20:22 John Hubbard
  2024-10-16 21:57 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: John Hubbard @ 2024-10-16 20:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, linux-mm, John Hubbard, Alistair Popple, Shigeru Yoshida,
	David Hildenbrand, Jason Gunthorpe, Minchan Kim, Pasha Tatashin

If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
family of functions, and requests "too many" pages, then the call will
erroneously leave pages pinned. This is visible in user space as an
actual memory leak.

Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls
to exhaust memory.

The root cause of the problem is this sequence, within
__gup_longterm_locked():

    __get_user_pages_locked()
    rc = check_and_migrate_movable_pages()

...which gets retried in a loop. The loop error handling is incomplete,
clearly due to a somewhat unusual and complicated tri-state error API.
But anyway, if -ENOMEM, or in fact, any unexpected error is returned
from check_and_migrate_movable_pages(), then __gup_longterm_locked()
happily returns the error, while leaving the pages pinned.

In the failed case, which is an app that requests (via a device driver)
30720000000 bytes to be pinned, and then exits, I see this:

    $ grep foll /proc/vmstat
        nr_foll_pin_acquired 7502048
        nr_foll_pin_released 2048

And after applying this patch, it returns to balanced pins:

    $ grep foll /proc/vmstat
        nr_foll_pin_acquired 7502048
        nr_foll_pin_released 7502048

Fix this by unpinning the pages that __get_user_pages_locked() has
pinned, in such error cases.

Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Shigeru Yoshida <syoshida@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/gup.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index a82890b46a36..24acf53c8294 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,

 		/* FOLL_LONGTERM implies FOLL_PIN */
 		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
+
+		/*
+		 * The __get_user_pages_locked() call happens before we know
+		 * that whether it's possible to successfully complete the whole
+		 * operation. To compensate for this, if we get an unexpected
+		 * error (such as -ENOMEM) then we must unpin everything, before
+		 * erroring out.
+		 */
+		if (rc != -EAGAIN && rc != 0)
+			unpin_user_pages(pages, nr_pinned_pages);
+
 	} while (rc == -EAGAIN);
 	memalloc_pin_restore(flags);
 	return rc ? rc : nr_pinned_pages;
-- 
2.47.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-16 20:22 [PATCH] mm/gup: stop leaking pinned pages in low memory conditions John Hubbard
@ 2024-10-16 21:57 ` Andrew Morton
  2024-10-16 22:05   ` John Hubbard
  2024-10-16 22:13 ` Alistair Popple
  2024-10-17  8:51 ` David Hildenbrand
  2 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2024-10-16 21:57 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	David Hildenbrand, Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On Wed, 16 Oct 2024 13:22:42 -0700 John Hubbard <jhubbard@nvidia.com> wrote:

> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
> family of functions, and requests "too many" pages, then the call will
> erroneously leave pages pinned. This is visible in user space as an
> actual memory leak.
> 
> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls
> to exhaust memory.
> 
> The root cause of the problem is this sequence, within
> __gup_longterm_locked():
> 
>     __get_user_pages_locked()
>     rc = check_and_migrate_movable_pages()
> 
> ...which gets retried in a loop. The loop error handling is incomplete,
> clearly due to a somewhat unusual and complicated tri-state error API.
> But anyway, if -ENOMEM, or in fact, any unexpected error is returned
> from check_and_migrate_movable_pages(), then __gup_longterm_locked()
> happily returns the error, while leaving the pages pinned.
> 
> In the failed case, which is an app that requests (via a device driver)
> 30720000000 bytes to be pinned, and then exits, I see this:
> 
>     $ grep foll /proc/vmstat
>         nr_foll_pin_acquired 7502048
>         nr_foll_pin_released 2048
> 
> And after applying this patch, it returns to balanced pins:
> 
>     $ grep foll /proc/vmstat
>         nr_foll_pin_acquired 7502048
>         nr_foll_pin_released 7502048
> 
> Fix this by unpinning the pages that __get_user_pages_locked() has
> pinned, in such error cases.

Thanks.

> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")

I'll add this to the -stable backport pile, although this seems a bit
marginal?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-16 21:57 ` Andrew Morton
@ 2024-10-16 22:05   ` John Hubbard
  2024-10-16 22:41     ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: John Hubbard @ 2024-10-16 22:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	David Hildenbrand, Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 10/16/24 2:57 PM, Andrew Morton wrote:
> On Wed, 16 Oct 2024 13:22:42 -0700 John Hubbard <jhubbard@nvidia.com> wrote:
...
>> Fix this by unpinning the pages that __get_user_pages_locked() has
>> pinned, in such error cases.
> 
> Thanks.
> 
>> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
> 
> I'll add this to the -stable backport pile, although this seems a bit
> marginal?

I'm on the fence about that. It is marginal: you have to
exhaust memory. On the other hand, a real user reported
this bug to us.

I guess I'd lean toward "correctness in -stable", and
add it to the pile, in the end.


thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-16 20:22 [PATCH] mm/gup: stop leaking pinned pages in low memory conditions John Hubbard
  2024-10-16 21:57 ` Andrew Morton
@ 2024-10-16 22:13 ` Alistair Popple
  2024-10-16 22:22   ` John Hubbard
  2024-10-17  8:51 ` David Hildenbrand
  2 siblings, 1 reply; 15+ messages in thread
From: Alistair Popple @ 2024-10-16 22:13 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, LKML, linux-mm, Shigeru Yoshida,
	David Hildenbrand, Jason Gunthorpe, Minchan Kim, Pasha Tatashin


John Hubbard <jhubbard@nvidia.com> writes:

> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
> family of functions, and requests "too many" pages, then the call will
> erroneously leave pages pinned. This is visible in user space as an
> actual memory leak.
>
> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls
> to exhaust memory.
>
> The root cause of the problem is this sequence, within
> __gup_longterm_locked():
>
>     __get_user_pages_locked()
>     rc = check_and_migrate_movable_pages()
>
> ...which gets retried in a loop. The loop error handling is incomplete,
> clearly due to a somewhat unusual and complicated tri-state error API.
> But anyway, if -ENOMEM, or in fact, any unexpected error is returned
> from check_and_migrate_movable_pages(), then __gup_longterm_locked()
> happily returns the error, while leaving the pages pinned.
>
> In the failed case, which is an app that requests (via a device driver)
> 30720000000 bytes to be pinned, and then exits, I see this:
>
>     $ grep foll /proc/vmstat
>         nr_foll_pin_acquired 7502048
>         nr_foll_pin_released 2048
>
> And after applying this patch, it returns to balanced pins:
>
>     $ grep foll /proc/vmstat
>         nr_foll_pin_acquired 7502048
>         nr_foll_pin_released 7502048
>
> Fix this by unpinning the pages that __get_user_pages_locked() has
> pinned, in such error cases.
>
> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Shigeru Yoshida <syoshida@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/gup.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index a82890b46a36..24acf53c8294 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>  
>  		/* FOLL_LONGTERM implies FOLL_PIN */
>  		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
> +
> +		/*
> +		 * The __get_user_pages_locked() call happens before we know
> +		 * that whether it's possible to successfully complete the whole
> +		 * operation. To compensate for this, if we get an unexpected
> +		 * error (such as -ENOMEM) then we must unpin everything, before
> +		 * erroring out.
> +		 */
> +		if (rc != -EAGAIN && rc != 0)
> +			unpin_user_pages(pages, nr_pinned_pages);

I know this is going to fall out of the loop in the next line but given
this is an error handling case it would be nice if this was made
explicit here with a break statement. It looks correct to me though so:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> +
>  	} while (rc == -EAGAIN);
>  	memalloc_pin_restore(flags);
>  	return rc ? rc : nr_pinned_pages;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-16 22:13 ` Alistair Popple
@ 2024-10-16 22:22   ` John Hubbard
  0 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2024-10-16 22:22 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, LKML, linux-mm, Shigeru Yoshida,
	David Hildenbrand, Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 10/16/24 3:13 PM, Alistair Popple wrote:
> John Hubbard <jhubbard@nvidia.com> writes:
...
>> diff --git a/mm/gup.c b/mm/gup.c
>> index a82890b46a36..24acf53c8294 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>>   
>>   		/* FOLL_LONGTERM implies FOLL_PIN */
>>   		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
>> +
>> +		/*
>> +		 * The __get_user_pages_locked() call happens before we know
>> +		 * that whether it's possible to successfully complete the whole
>> +		 * operation. To compensate for this, if we get an unexpected
>> +		 * error (such as -ENOMEM) then we must unpin everything, before
>> +		 * erroring out.
>> +		 */
>> +		if (rc != -EAGAIN && rc != 0)
>> +			unpin_user_pages(pages, nr_pinned_pages);
> 
> I know this is going to fall out of the loop in the next line but given
> this is an error handling case it would be nice if this was made
> explicit here with a break statement. It looks correct to me though so:

Yes, I'm not sure that adding another line of code for cosmetics really
makes this better. At this point, smaller is better IMHO, and then the
next step should be something bigger, such as refactoring to avoid the
tri-state return codes.

> 
> Reviewed-by: Alistair Popple <apopple@nvidia.com>

Thanks for the super fast response and the review!


thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-16 22:05   ` John Hubbard
@ 2024-10-16 22:41     ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2024-10-16 22:41 UTC (permalink / raw)
  To: John Hubbard
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	David Hildenbrand, Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On Wed, 16 Oct 2024 15:05:28 -0700 John Hubbard <jhubbard@nvidia.com> wrote:

> On 10/16/24 2:57 PM, Andrew Morton wrote:
> > On Wed, 16 Oct 2024 13:22:42 -0700 John Hubbard <jhubbard@nvidia.com> wrote:
> ...
> >> Fix this by unpinning the pages that __get_user_pages_locked() has
> >> pinned, in such error cases.
> > 
> > Thanks.
> > 
> >> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
> > 
> > I'll add this to the -stable backport pile, although this seems a bit
> > marginal?
> 
> I'm on the fence about that. It is marginal: you have to
> exhaust memory. On the other hand, a real user reported
> this bug to us.
> 
> I guess I'd lean toward "correctness in -stable", and
> add it to the pile, in the end.

Thanks.  It's a super-simple patch, which helps the decision.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-16 20:22 [PATCH] mm/gup: stop leaking pinned pages in low memory conditions John Hubbard
  2024-10-16 21:57 ` Andrew Morton
  2024-10-16 22:13 ` Alistair Popple
@ 2024-10-17  8:51 ` David Hildenbrand
  2024-10-17  8:53   ` David Hildenbrand
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-10-17  8:51 UTC (permalink / raw)
  To: John Hubbard, Andrew Morton
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 16.10.24 22:22, John Hubbard wrote:
> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
> family of functions, and requests "too many" pages, then the call will
> erroneously leave pages pinned. This is visible in user space as an
> actual memory leak.
> 
> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls
> to exhaust memory.
> 
> The root cause of the problem is this sequence, within
> __gup_longterm_locked():
> 
>      __get_user_pages_locked()
>      rc = check_and_migrate_movable_pages()
> 
> ...which gets retried in a loop. The loop error handling is incomplete,
> clearly due to a somewhat unusual and complicated tri-state error API.
> But anyway, if -ENOMEM, or in fact, any unexpected error is returned
> from check_and_migrate_movable_pages(), then __gup_longterm_locked()
> happily returns the error, while leaving the pages pinned.
> 
> In the failed case, which is an app that requests (via a device driver)
> 30720000000 bytes to be pinned, and then exits, I see this:
> 
>      $ grep foll /proc/vmstat
>          nr_foll_pin_acquired 7502048
>          nr_foll_pin_released 2048
> 
> And after applying this patch, it returns to balanced pins:
> 
>      $ grep foll /proc/vmstat
>          nr_foll_pin_acquired 7502048
>          nr_foll_pin_released 7502048
> 
> Fix this by unpinning the pages that __get_user_pages_locked() has
> pinned, in such error cases.
> 
> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Shigeru Yoshida <syoshida@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>   mm/gup.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index a82890b46a36..24acf53c8294 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>   
>   		/* FOLL_LONGTERM implies FOLL_PIN */
>   		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
> +
> +		/*
> +		 * The __get_user_pages_locked() call happens before we know
> +		 * that whether it's possible to successfully complete the whole
> +		 * operation. To compensate for this, if we get an unexpected
> +		 * error (such as -ENOMEM) then we must unpin everything, before
> +		 * erroring out.
> +		 */
> +		if (rc != -EAGAIN && rc != 0)
> +			unpin_user_pages(pages, nr_pinned_pages);
> +
>   	} while (rc == -EAGAIN);

Wouldn't it be cleaner to simply have here after the loop (possibly even 
after the memalloc_pin_restore())

if (rc)
	unpin_user_pages(pages, nr_pinned_pages);

But maybe I am missing something.

>   	memalloc_pin_restore(flags);
>   	return rc ? rc : nr_pinned_pages;


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17  8:51 ` David Hildenbrand
@ 2024-10-17  8:53   ` David Hildenbrand
  2024-10-17 17:06     ` John Hubbard
  2024-10-17 16:54   ` John Hubbard
  2024-10-17 21:28   ` Alistair Popple
  2 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-10-17  8:53 UTC (permalink / raw)
  To: John Hubbard, Andrew Morton
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 17.10.24 10:51, David Hildenbrand wrote:
> On 16.10.24 22:22, John Hubbard wrote:
>> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
>> family of functions, and requests "too many" pages, then the call will
>> erroneously leave pages pinned. This is visible in user space as an
>> actual memory leak.
>>
>> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls
>> to exhaust memory.
>>
>> The root cause of the problem is this sequence, within
>> __gup_longterm_locked():
>>
>>       __get_user_pages_locked()
>>       rc = check_and_migrate_movable_pages()
>>
>> ...which gets retried in a loop. The loop error handling is incomplete,
>> clearly due to a somewhat unusual and complicated tri-state error API.
>> But anyway, if -ENOMEM, or in fact, any unexpected error is returned
>> from check_and_migrate_movable_pages(), then __gup_longterm_locked()
>> happily returns the error, while leaving the pages pinned.
>>
>> In the failed case, which is an app that requests (via a device driver)
>> 30720000000 bytes to be pinned, and then exits, I see this:
>>
>>       $ grep foll /proc/vmstat
>>           nr_foll_pin_acquired 7502048
>>           nr_foll_pin_released 2048
>>
>> And after applying this patch, it returns to balanced pins:
>>
>>       $ grep foll /proc/vmstat
>>           nr_foll_pin_acquired 7502048
>>           nr_foll_pin_released 7502048
>>
>> Fix this by unpinning the pages that __get_user_pages_locked() has
>> pinned, in such error cases.
>>
>> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Shigeru Yoshida <syoshida@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Jason Gunthorpe <jgg@nvidia.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>    mm/gup.c | 11 +++++++++++
>>    1 file changed, 11 insertions(+)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index a82890b46a36..24acf53c8294 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>>    
>>    		/* FOLL_LONGTERM implies FOLL_PIN */
>>    		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
>> +
>> +		/*
>> +		 * The __get_user_pages_locked() call happens before we know
>> +		 * that whether it's possible to successfully complete the whole
>> +		 * operation. To compensate for this, if we get an unexpected
>> +		 * error (such as -ENOMEM) then we must unpin everything, before
>> +		 * erroring out.
>> +		 */
>> +		if (rc != -EAGAIN && rc != 0)
>> +			unpin_user_pages(pages, nr_pinned_pages);
>> +
>>    	} while (rc == -EAGAIN);
> 
> Wouldn't it be cleaner to simply have here after the loop (possibly even
> after the memalloc_pin_restore())
> 
> if (rc)
> 	unpin_user_pages(pages, nr_pinned_pages);
> 
> But maybe I am missing something.

And staring at memfd_pin_folios(), don't we have the same issue there if
check_and_migrate_movable_folios() fails?


diff --git a/mm/gup.c b/mm/gup.c
index a82890b46a36..f79974d38608 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3708,12 +3708,10 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
                 ret = check_and_migrate_movable_folios(nr_folios, folios);
         } while (ret == -EAGAIN);
  
-       memalloc_pin_restore(flags);
-       return ret ? ret : nr_folios;
  err:
         memalloc_pin_restore(flags);
-       unpin_folios(folios, nr_folios);
-
-       return ret;
+       if (ret)
+               unpin_folios(folios, nr_folios);
+       return ret ? ret : nr_folios;
  }
  EXPORT_SYMBOL_GPL(memfd_pin_folios);


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17  8:51 ` David Hildenbrand
  2024-10-17  8:53   ` David Hildenbrand
@ 2024-10-17 16:54   ` John Hubbard
  2024-10-17 21:28   ` Alistair Popple
  2 siblings, 0 replies; 15+ messages in thread
From: John Hubbard @ 2024-10-17 16:54 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 10/17/24 1:51 AM, David Hildenbrand wrote:
> On 16.10.24 22:22, John Hubbard wrote:
...
>> diff --git a/mm/gup.c b/mm/gup.c
>> index a82890b46a36..24acf53c8294 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct 
>> mm_struct *mm,
>>           /* FOLL_LONGTERM implies FOLL_PIN */
>>           rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
>> +
>> +        /*
>> +         * The __get_user_pages_locked() call happens before we know
>> +         * that whether it's possible to successfully complete the whole

oops, s/that whether/that/

>> +         * operation. To compensate for this, if we get an unexpected
>> +         * error (such as -ENOMEM) then we must unpin everything, before
>> +         * erroring out.
>> +         */
>> +        if (rc != -EAGAIN && rc != 0)
>> +            unpin_user_pages(pages, nr_pinned_pages);
>> +
>>       } while (rc == -EAGAIN);
> 
> Wouldn't it be cleaner to simply have here after the loop (possibly even 
> after the memalloc_pin_restore())
> 
> if (rc)
>      unpin_user_pages(pages, nr_pinned_pages);
> 
> But maybe I am missing something.

Yes, I think you are correct. That is cleaner. Let me retest real quick just
in case, and then send a v2 that also picks up the typo and moves the 
comment
to the new location.

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17  8:53   ` David Hildenbrand
@ 2024-10-17 17:06     ` John Hubbard
  2024-10-17 17:10       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: John Hubbard @ 2024-10-17 17:06 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 10/17/24 1:53 AM, David Hildenbrand wrote:
> On 17.10.24 10:51, David Hildenbrand wrote:
>> On 16.10.24 22:22, John Hubbard wrote:
...
> And staring at memfd_pin_folios(), don't we have the same issue there if
> check_and_migrate_movable_folios() fails?

Yes, it looks very clearly like the exact same bug, in a different location.
This complicated return code is the gift that keeps on giving. Although
likely people are just copying the pattern, which had the problem.


> 
> diff --git a/mm/gup.c b/mm/gup.c
> index a82890b46a36..f79974d38608 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3708,12 +3708,10 @@ long memfd_pin_folios(struct file *memfd, loff_t 
> start, loff_t end,
>                  ret = check_and_migrate_movable_folios(nr_folios, folios);
>          } while (ret == -EAGAIN);
> 
> -       memalloc_pin_restore(flags);
> -       return ret ? ret : nr_folios;
>   err:
>          memalloc_pin_restore(flags);
> -       unpin_folios(folios, nr_folios);
> -
> -       return ret;
> +       if (ret)
> +               unpin_folios(folios, nr_folios);
> +       return ret ? ret : nr_folios;

That looks correct. I can send this out with the other patch as a tiny
2-patch series since they are related. Would you prefer to appear
as a Signed-off-by, or a Suggested-by, or "other"? :)

>   }
>   EXPORT_SYMBOL_GPL(memfd_pin_folios);
> 
> 

thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17 17:06     ` John Hubbard
@ 2024-10-17 17:10       ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-10-17 17:10 UTC (permalink / raw)
  To: John Hubbard, Andrew Morton
  Cc: LKML, linux-mm, Alistair Popple, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 17.10.24 19:06, John Hubbard wrote:
> On 10/17/24 1:53 AM, David Hildenbrand wrote:
>> On 17.10.24 10:51, David Hildenbrand wrote:
>>> On 16.10.24 22:22, John Hubbard wrote:
> ...
>> And staring at memfd_pin_folios(), don't we have the same issue there if
>> check_and_migrate_movable_folios() fails?
> 
> Yes, it looks very clearly like the exact same bug, in a different location.
> This complicated return code is the gift that keeps on giving. Although
> likely people are just copying the pattern, which had the problem.
> 
> 
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index a82890b46a36..f79974d38608 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3708,12 +3708,10 @@ long memfd_pin_folios(struct file *memfd, loff_t
>> start, loff_t end,
>>                   ret = check_and_migrate_movable_folios(nr_folios, folios);
>>           } while (ret == -EAGAIN);
>>
>> -       memalloc_pin_restore(flags);
>> -       return ret ? ret : nr_folios;
>>    err:
>>           memalloc_pin_restore(flags);
>> -       unpin_folios(folios, nr_folios);
>> -
>> -       return ret;
>> +       if (ret)
>> +               unpin_folios(folios, nr_folios);
>> +       return ret ? ret : nr_folios;
> 
> That looks correct. I can send this out with the other patch as a tiny
> 2-patch series since they are related. Would you prefer to appear
> as a Signed-off-by, or a Suggested-by, or "other"? :)

Suggested-by: please :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17  8:51 ` David Hildenbrand
  2024-10-17  8:53   ` David Hildenbrand
  2024-10-17 16:54   ` John Hubbard
@ 2024-10-17 21:28   ` Alistair Popple
  2024-10-17 21:47     ` David Hildenbrand
  2 siblings, 1 reply; 15+ messages in thread
From: Alistair Popple @ 2024-10-17 21:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: John Hubbard, Andrew Morton, LKML, linux-mm, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin


David Hildenbrand <david@redhat.com> writes:

> On 16.10.24 22:22, John Hubbard wrote:
>> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
>> family of functions, and requests "too many" pages, then the call will
>> erroneously leave pages pinned. This is visible in user space as an
>> actual memory leak.
>> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM)
>> calls
>> to exhaust memory.
>> The root cause of the problem is this sequence, within
>> __gup_longterm_locked():
>>      __get_user_pages_locked()
>>      rc = check_and_migrate_movable_pages()
>> ...which gets retried in a loop. The loop error handling is
>> incomplete,
>> clearly due to a somewhat unusual and complicated tri-state error API.
>> But anyway, if -ENOMEM, or in fact, any unexpected error is returned
>> from check_and_migrate_movable_pages(), then __gup_longterm_locked()
>> happily returns the error, while leaving the pages pinned.
>> In the failed case, which is an app that requests (via a device
>> driver)
>> 30720000000 bytes to be pinned, and then exits, I see this:
>>      $ grep foll /proc/vmstat
>>          nr_foll_pin_acquired 7502048
>>          nr_foll_pin_released 2048
>> And after applying this patch, it returns to balanced pins:
>>      $ grep foll /proc/vmstat
>>          nr_foll_pin_acquired 7502048
>>          nr_foll_pin_released 7502048
>> Fix this by unpinning the pages that __get_user_pages_locked() has
>> pinned, in such error cases.
>> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix
>> check_and_migrate_movable_pages() return codes")
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Shigeru Yoshida <syoshida@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Jason Gunthorpe <jgg@nvidia.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>> ---
>>   mm/gup.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>> diff --git a/mm/gup.c b/mm/gup.c
>> index a82890b46a36..24acf53c8294 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>>     		/* FOLL_LONGTERM implies FOLL_PIN */
>>   		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
>> +
>> +		/*
>> +		 * The __get_user_pages_locked() call happens before we know
>> +		 * that whether it's possible to successfully complete the whole
>> +		 * operation. To compensate for this, if we get an unexpected
>> +		 * error (such as -ENOMEM) then we must unpin everything, before
>> +		 * erroring out.
>> +		 */
>> +		if (rc != -EAGAIN && rc != 0)
>> +			unpin_user_pages(pages, nr_pinned_pages);
>> +
>>   	} while (rc == -EAGAIN);
>
> Wouldn't it be cleaner to simply have here after the loop (possibly
> even after the memalloc_pin_restore())
>
> if (rc)
> 	unpin_user_pages(pages, nr_pinned_pages);
>
> But maybe I am missing something.

I initially thought the same thing but I'm not sure it is
correct. Consider what happens when __get_user_pages_locked() fails
earlier in the loop. In this case it will have bailed out of the loop
with rc <= 0 but we shouldn't call unpin_user_pages().

>>   	memalloc_pin_restore(flags);
>>   	return rc ? rc : nr_pinned_pages;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17 21:28   ` Alistair Popple
@ 2024-10-17 21:47     ` David Hildenbrand
  2024-10-17 21:57       ` John Hubbard
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-10-17 21:47 UTC (permalink / raw)
  To: Alistair Popple
  Cc: John Hubbard, Andrew Morton, LKML, linux-mm, Shigeru Yoshida,
	Jason Gunthorpe, Minchan Kim, Pasha Tatashin

On 17.10.24 23:28, Alistair Popple wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 16.10.24 22:22, John Hubbard wrote:
>>> If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM)
>>> family of functions, and requests "too many" pages, then the call will
>>> erroneously leave pages pinned. This is visible in user space as an
>>> actual memory leak.
>>> Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM)
>>> calls
>>> to exhaust memory.
>>> The root cause of the problem is this sequence, within
>>> __gup_longterm_locked():
>>>       __get_user_pages_locked()
>>>       rc = check_and_migrate_movable_pages()
>>> ...which gets retried in a loop. The loop error handling is
>>> incomplete,
>>> clearly due to a somewhat unusual and complicated tri-state error API.
>>> But anyway, if -ENOMEM, or in fact, any unexpected error is returned
>>> from check_and_migrate_movable_pages(), then __gup_longterm_locked()
>>> happily returns the error, while leaving the pages pinned.
>>> In the failed case, which is an app that requests (via a device
>>> driver)
>>> 30720000000 bytes to be pinned, and then exits, I see this:
>>>       $ grep foll /proc/vmstat
>>>           nr_foll_pin_acquired 7502048
>>>           nr_foll_pin_released 2048
>>> And after applying this patch, it returns to balanced pins:
>>>       $ grep foll /proc/vmstat
>>>           nr_foll_pin_acquired 7502048
>>>           nr_foll_pin_released 7502048
>>> Fix this by unpinning the pages that __get_user_pages_locked() has
>>> pinned, in such error cases.
>>> Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix
>>> check_and_migrate_movable_pages() return codes")
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: Shigeru Yoshida <syoshida@redhat.com>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Jason Gunthorpe <jgg@nvidia.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
>>> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>>> ---
>>>    mm/gup.c | 11 +++++++++++
>>>    1 file changed, 11 insertions(+)
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index a82890b46a36..24acf53c8294 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct mm_struct *mm,
>>>      		/* FOLL_LONGTERM implies FOLL_PIN */
>>>    		rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
>>> +
>>> +		/*
>>> +		 * The __get_user_pages_locked() call happens before we know
>>> +		 * that whether it's possible to successfully complete the whole
>>> +		 * operation. To compensate for this, if we get an unexpected
>>> +		 * error (such as -ENOMEM) then we must unpin everything, before
>>> +		 * erroring out.
>>> +		 */
>>> +		if (rc != -EAGAIN && rc != 0)
>>> +			unpin_user_pages(pages, nr_pinned_pages);
>>> +
>>>    	} while (rc == -EAGAIN);
>>
>> Wouldn't it be cleaner to simply have here after the loop (possibly
>> even after the memalloc_pin_restore())
>>
>> if (rc)
>> 	unpin_user_pages(pages, nr_pinned_pages);
>>
>> But maybe I am missing something.
> 
> I initially thought the same thing but I'm not sure it is
> correct. Consider what happens when __get_user_pages_locked() fails
> earlier in the loop. In this case it will have bailed out of the loop
> with rc <= 0 but we shouldn't call unpin_user_pages().

Ah, I see what you mean, I primarily only stared at the diff.

We should likely avoid using nr_pinned_pages as a temporary variable that
can hold an error value.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17 21:47     ` David Hildenbrand
@ 2024-10-17 21:57       ` John Hubbard
  2024-10-17 22:03         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: John Hubbard @ 2024-10-17 21:57 UTC (permalink / raw)
  To: David Hildenbrand, Alistair Popple
  Cc: Andrew Morton, LKML, linux-mm, Shigeru Yoshida, Jason Gunthorpe,
	Minchan Kim, Pasha Tatashin

On 10/17/24 2:47 PM, David Hildenbrand wrote:
> On 17.10.24 23:28, Alistair Popple wrote:
>> David Hildenbrand <david@redhat.com> writes:
>>> On 16.10.24 22:22, John Hubbard wrote:
...
>>>> +        if (rc != -EAGAIN && rc != 0)
>>>> +            unpin_user_pages(pages, nr_pinned_pages);
>>>> +
>>>>        } while (rc == -EAGAIN);
>>>
>>> Wouldn't it be cleaner to simply have here after the loop (possibly
>>> even after the memalloc_pin_restore())
>>>
>>> if (rc)
>>>     unpin_user_pages(pages, nr_pinned_pages);
>>>
>>> But maybe I am missing something.
>>
>> I initially thought the same thing but I'm not sure it is
>> correct. Consider what happens when __get_user_pages_locked() fails
>> earlier in the loop. In this case it will have bailed out of the loop
>> with rc <= 0 but we shouldn't call unpin_user_pages().

doh. yes. Thanks for catching that, Alistair! I actually considered
it during the first draft, too, but got tunnel vision during the
review, sigh.

> 
> Ah, I see what you mean, I primarily only stared at the diff.
> 
> We should likely avoid using nr_pinned_pages as a temporary variable that
> can hold an error value.
> 

OK, I still want to defer all the pretty refactoring ideas into some
future effort, so for now, let's just leave v1 alone except for fixing
the typo in the comment, yes?

I'll still send out a 2-patch series with that, plus a suitably
modified fix for the memfd case too.


thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/gup: stop leaking pinned pages in low memory conditions
  2024-10-17 21:57       ` John Hubbard
@ 2024-10-17 22:03         ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-10-17 22:03 UTC (permalink / raw)
  To: John Hubbard, Alistair Popple
  Cc: Andrew Morton, LKML, linux-mm, Shigeru Yoshida, Jason Gunthorpe,
	Minchan Kim, Pasha Tatashin

On 17.10.24 23:57, John Hubbard wrote:
> On 10/17/24 2:47 PM, David Hildenbrand wrote:
>> On 17.10.24 23:28, Alistair Popple wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>> On 16.10.24 22:22, John Hubbard wrote:
> ...
>>>>> +        if (rc != -EAGAIN && rc != 0)
>>>>> +            unpin_user_pages(pages, nr_pinned_pages);
>>>>> +
>>>>>         } while (rc == -EAGAIN);
>>>>
>>>> Wouldn't it be cleaner to simply have here after the loop (possibly
>>>> even after the memalloc_pin_restore())
>>>>
>>>> if (rc)
>>>>      unpin_user_pages(pages, nr_pinned_pages);
>>>>
>>>> But maybe I am missing something.
>>>
>>> I initially thought the same thing but I'm not sure it is
>>> correct. Consider what happens when __get_user_pages_locked() fails
>>> earlier in the loop. In this case it will have bailed out of the loop
>>> with rc <= 0 but we shouldn't call unpin_user_pages().
> 
> doh. yes. Thanks for catching that, Alistair! I actually considered
> it during the first draft, too, but got tunnel vision during the
> review, sigh.
> 
>>
>> Ah, I see what you mean, I primarily only stared at the diff.
>>
>> We should likely avoid using nr_pinned_pages as a temporary variable that
>> can hold an error value.
>>
> 
> OK, I still want to defer all the pretty refactoring ideas into some
> future effort, so for now, let's just leave v1 alone except for fixing
> the typo in the comment, yes?

Fine with me!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-10-17 22:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-16 20:22 [PATCH] mm/gup: stop leaking pinned pages in low memory conditions John Hubbard
2024-10-16 21:57 ` Andrew Morton
2024-10-16 22:05   ` John Hubbard
2024-10-16 22:41     ` Andrew Morton
2024-10-16 22:13 ` Alistair Popple
2024-10-16 22:22   ` John Hubbard
2024-10-17  8:51 ` David Hildenbrand
2024-10-17  8:53   ` David Hildenbrand
2024-10-17 17:06     ` John Hubbard
2024-10-17 17:10       ` David Hildenbrand
2024-10-17 16:54   ` John Hubbard
2024-10-17 21:28   ` Alistair Popple
2024-10-17 21:47     ` David Hildenbrand
2024-10-17 21:57       ` John Hubbard
2024-10-17 22:03         ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox