linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
@ 2025-01-20  9:26 zhaoyang.huang
  2025-01-20 19:34 ` John Hubbard
  2025-01-20 20:14 ` David Hildenbrand
  0 siblings, 2 replies; 6+ messages in thread
From: zhaoyang.huang @ 2025-01-20  9:26 UTC (permalink / raw)
  To: Andrew Morton, Alistair Popple, John Hubbard, linux-mm,
	linux-kernel, Zhaoyang Huang, steve.kang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Infinite loop within __get_longterm_locked detected in an unique usage
of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
function allocate pages via cma_alloc for hardware purpose and leave them
out of LRU). Fixing this by have 'collected' reflect the actual number
of pages in movable_folio_list.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/gup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 3b75e631f369..2231ce7221f9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
 		if (folio_is_longterm_pinnable(folio))
 			continue;
 
-		collected++;
-
 		if (folio_is_device_coherent(folio))
 			continue;
 
@@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
 		if (!folio_isolate_lru(folio))
 			continue;
 
+		collected++;
+
 		list_add_tail(&folio->lru, movable_folio_list);
 		node_stat_mod_folio(folio,
 				    NR_ISOLATED_ANON + folio_is_file_lru(folio),
-- 
2.25.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
  2025-01-20  9:26 [PATCH] mm: gup: fix infinite loop within __get_longterm_locked zhaoyang.huang
@ 2025-01-20 19:34 ` John Hubbard
  2025-01-21  1:28   ` Zhaoyang Huang
  2025-01-20 20:14 ` David Hildenbrand
  1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2025-01-20 19:34 UTC (permalink / raw)
  To: zhaoyang.huang, Andrew Morton, Alistair Popple, linux-mm,
	linux-kernel, Zhaoyang Huang, steve.kang

On 1/20/25 1:26 AM, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Infinite loop within __get_longterm_locked detected in an unique usage
> of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> function allocate pages via cma_alloc for hardware purpose and leave them
> out of LRU). Fixing this by have 'collected' reflect the actual number
> of pages in movable_folio_list.

The above is rather terse, although perhaps by kernel standards it's OK.

Isn't this missing a Fixes tag?

Fixes: 67e139b02d994 ("mm/gup.c: refactor 
check_and_migrate_movable_pages()")

> 
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>   mm/gup.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 3b75e631f369..2231ce7221f9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
>   		if (folio_is_longterm_pinnable(folio))
>   			continue;
>   
> -		collected++;
> -
>   		if (folio_is_device_coherent(folio))
>   			continue;
>   
> @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
>   		if (!folio_isolate_lru(folio))
>   			continue;
>   
> +		collected++;
> +

Well, this seems correct to me. Somehow I talked myself into believing
that it was OK to do collected++ early, even though later on we skip
actually collecting the folio, thus miscounting things.

But now I believe it was just incorrect all along.


Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
  2025-01-20  9:26 [PATCH] mm: gup: fix infinite loop within __get_longterm_locked zhaoyang.huang
  2025-01-20 19:34 ` John Hubbard
@ 2025-01-20 20:14 ` David Hildenbrand
  2025-01-21  1:31   ` Zhaoyang Huang
  1 sibling, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-01-20 20:14 UTC (permalink / raw)
  To: zhaoyang.huang, Andrew Morton, Alistair Popple, John Hubbard,
	linux-mm, linux-kernel, Zhaoyang Huang, steve.kang

On 20.01.25 10:26, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Infinite loop within __get_longterm_locked detected in an unique usage
> of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> function allocate pages via cma_alloc for hardware purpose and leave them
> out of LRU) Fixing this by have 'collected' reflect the actual number> of pages in movable_folio_list.

Maybe something like:

"
We can run into an infinite loop in __get_longterm_locked() when 
collect_longterm_unpinnable_folios() finds only folios that are isolated 
from the LRU or were never added to the LRU. This can happen when all 
folios to be pinned are never added to the LRU, for example when 
vm_ops->fault allocated pages using cma_alloc() and never added them to 
the LRU.

We incorrectly update the "collected" variable even if nothing was 
collected. Fix it by incrementing "collected" only when we isolated a 
folio and added it to the list of folios to migrate.
"

I assume, long-term these things will not actually be folios, but pages, 
and we'll have to skip them in different code -- or assume they can be 
longterm pinned even on CMA because they are allocated by the CMA-owning 
driver.

> 
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>   mm/gup.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 3b75e631f369..2231ce7221f9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
>   		if (folio_is_longterm_pinnable(folio))
>   			continue;
>   
> -		collected++;
> -
>   		if (folio_is_device_coherent(folio))
>   			continue;
>   
> @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
>   		if (!folio_isolate_lru(folio))
>   			continue;
>   
> +		collected++;
> +
>   		list_add_tail(&folio->lru, movable_folio_list);
>   		node_stat_mod_folio(folio,
>   				    NR_ISOLATED_ANON + folio_is_file_lru(folio),

What if folio_isolate_hugetlb() succeeded? The return value can tell us 
if it actually succeeded.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
  2025-01-20 19:34 ` John Hubbard
@ 2025-01-21  1:28   ` Zhaoyang Huang
  0 siblings, 0 replies; 6+ messages in thread
From: Zhaoyang Huang @ 2025-01-21  1:28 UTC (permalink / raw)
  To: John Hubbard
  Cc: zhaoyang.huang, Andrew Morton, Alistair Popple, linux-mm,
	linux-kernel, steve.kang

On Tue, Jan 21, 2025 at 3:34 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 1/20/25 1:26 AM, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Infinite loop within __get_longterm_locked detected in an unique usage
> > of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> > function allocate pages via cma_alloc for hardware purpose and leave them
> > out of LRU). Fixing this by have 'collected' reflect the actual number
> > of pages in movable_folio_list.
>
> The above is rather terse, although perhaps by kernel standards it's OK.
>
> Isn't this missing a Fixes tag?
>
> Fixes: 67e139b02d994 ("mm/gup.c: refactor
> check_and_migrate_movable_pages()")
ok. will amend in v2
>
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >   mm/gup.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3b75e631f369..2231ce7221f9 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
> >               if (folio_is_longterm_pinnable(folio))
> >                       continue;
> >
> > -             collected++;
> > -
> >               if (folio_is_device_coherent(folio))
> >                       continue;
> >
> > @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
> >               if (!folio_isolate_lru(folio))
> >                       continue;
> >
> > +             collected++;
> > +
>
> Well, this seems correct to me. Somehow I talked myself into believing
> that it was OK to do collected++ early, even though later on we skip
> actually collecting the folio, thus miscounting things.
>
> But now I believe it was just incorrect all along.
>
>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks
>
> thanks,
> --
> John Hubbard


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
  2025-01-20 20:14 ` David Hildenbrand
@ 2025-01-21  1:31   ` Zhaoyang Huang
  2025-01-21  7:49     ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Zhaoyang Huang @ 2025-01-21  1:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: zhaoyang.huang, Andrew Morton, Alistair Popple, John Hubbard,
	linux-mm, linux-kernel, steve.kang

On Tue, Jan 21, 2025 at 4:14 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 20.01.25 10:26, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Infinite loop within __get_longterm_locked detected in an unique usage
> > of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> > function allocate pages via cma_alloc for hardware purpose and leave them
> > out of LRU) Fixing this by have 'collected' reflect the actual number> of pages in movable_folio_list.
>
> Maybe something like:
>
> "
> We can run into an infinite loop in __get_longterm_locked() when
> collect_longterm_unpinnable_folios() finds only folios that are isolated
> from the LRU or were never added to the LRU. This can happen when all
> folios to be pinned are never added to the LRU, for example when
> vm_ops->fault allocated pages using cma_alloc() and never added them to
> the LRU.
>
> We incorrectly update the "collected" variable even if nothing was
> collected. Fix it by incrementing "collected" only when we isolated a
> folio and added it to the list of folios to migrate.
> "
>
> I assume, long-term these things will not actually be folios, but pages,
> and we'll have to skip them in different code -- or assume they can be
> longterm pinned even on CMA because they are allocated by the CMA-owning
> driver.
Thanks for the commit message. will update them to v2
>
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >   mm/gup.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3b75e631f369..2231ce7221f9 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
> >               if (folio_is_longterm_pinnable(folio))
> >                       continue;
> >
> > -             collected++;
> > -
> >               if (folio_is_device_coherent(folio))
> >                       continue;
> >
> > @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
> >               if (!folio_isolate_lru(folio))
> >                       continue;
> >
> > +             collected++;
> > +
> >               list_add_tail(&folio->lru, movable_folio_list);
> >               node_stat_mod_folio(folio,
> >                                   NR_ISOLATED_ANON + folio_is_file_lru(folio),
>
> What if folio_isolate_hugetlb() succeeded? The return value can tell us
> if it actually succeeded.
How about remove the variable 'collected' and change the criteria to
if(list_empty(&movable_folio_list))

>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
  2025-01-21  1:31   ` Zhaoyang Huang
@ 2025-01-21  7:49     ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-01-21  7:49 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: zhaoyang.huang, Andrew Morton, Alistair Popple, John Hubbard,
	linux-mm, linux-kernel, steve.kang

On 21.01.25 02:31, Zhaoyang Huang wrote:
> On Tue, Jan 21, 2025 at 4:14 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 20.01.25 10:26, zhaoyang.huang wrote:
>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>>
>>> Infinite loop within __get_longterm_locked detected in an unique usage
>>> of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
>>> function allocate pages via cma_alloc for hardware purpose and leave them
>>> out of LRU) Fixing this by have 'collected' reflect the actual number> of pages in movable_folio_list.
>>
>> Maybe something like:
>>
>> "
>> We can run into an infinite loop in __get_longterm_locked() when
>> collect_longterm_unpinnable_folios() finds only folios that are isolated
>> from the LRU or were never added to the LRU. This can happen when all
>> folios to be pinned are never added to the LRU, for example when
>> vm_ops->fault allocated pages using cma_alloc() and never added them to
>> the LRU.
>>
>> We incorrectly update the "collected" variable even if nothing was
>> collected. Fix it by incrementing "collected" only when we isolated a
>> folio and added it to the list of folios to migrate.
>> "
>>
>> I assume, long-term these things will not actually be folios, but pages,
>> and we'll have to skip them in different code -- or assume they can be
>> longterm pinned even on CMA because they are allocated by the CMA-owning
>> driver.
> Thanks for the commit message. will update them to v2
>>
>>>
>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>> ---
>>>    mm/gup.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 3b75e631f369..2231ce7221f9 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
>>>                if (folio_is_longterm_pinnable(folio))
>>>                        continue;
>>>
>>> -             collected++;
>>> -
>>>                if (folio_is_device_coherent(folio))
>>>                        continue;
>>>
>>> @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
>>>                if (!folio_isolate_lru(folio))
>>>                        continue;
>>>
>>> +             collected++;
>>> +
>>>                list_add_tail(&folio->lru, movable_folio_list);
>>>                node_stat_mod_folio(folio,
>>>                                    NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>
>> What if folio_isolate_hugetlb() succeeded? The return value can tell us
>> if it actually succeeded.
> How about remove the variable 'collected' and change the criteria to
> if(list_empty(&movable_folio_list))

That works if we know that the input list is empty, which is the case.

So let's turn that function into a void function, ans simply check 
list_empty() in the caller that prepares the empty list.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-21  7:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-20  9:26 [PATCH] mm: gup: fix infinite loop within __get_longterm_locked zhaoyang.huang
2025-01-20 19:34 ` John Hubbard
2025-01-21  1:28   ` Zhaoyang Huang
2025-01-20 20:14 ` David Hildenbrand
2025-01-21  1:31   ` Zhaoyang Huang
2025-01-21  7:49     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox