* [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
@ 2025-01-20 9:26 zhaoyang.huang
2025-01-20 19:34 ` John Hubbard
2025-01-20 20:14 ` David Hildenbrand
0 siblings, 2 replies; 6+ messages in thread
From: zhaoyang.huang @ 2025-01-20 9:26 UTC (permalink / raw)
To: Andrew Morton, Alistair Popple, John Hubbard, linux-mm,
linux-kernel, Zhaoyang Huang, steve.kang
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Infinite loop within __get_longterm_locked detected in an unique usage
of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
function allocate pages via cma_alloc for hardware purpose and leave them
out of LRU). Fixing this by have 'collected' reflect the actual number
of pages in movable_folio_list.
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
mm/gup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 3b75e631f369..2231ce7221f9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
if (folio_is_longterm_pinnable(folio))
continue;
- collected++;
-
if (folio_is_device_coherent(folio))
continue;
@@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
if (!folio_isolate_lru(folio))
continue;
+ collected++;
+
list_add_tail(&folio->lru, movable_folio_list);
node_stat_mod_folio(folio,
NR_ISOLATED_ANON + folio_is_file_lru(folio),
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
2025-01-20 9:26 [PATCH] mm: gup: fix infinite loop within __get_longterm_locked zhaoyang.huang
@ 2025-01-20 19:34 ` John Hubbard
2025-01-21 1:28 ` Zhaoyang Huang
2025-01-20 20:14 ` David Hildenbrand
1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2025-01-20 19:34 UTC (permalink / raw)
To: zhaoyang.huang, Andrew Morton, Alistair Popple, linux-mm,
linux-kernel, Zhaoyang Huang, steve.kang
On 1/20/25 1:26 AM, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> Infinite loop within __get_longterm_locked detected in an unique usage
> of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> function allocate pages via cma_alloc for hardware purpose and leave them
> out of LRU). Fixing this by have 'collected' reflect the actual number
> of pages in movable_folio_list.
The above is rather terse, although perhaps by kernel standards it's OK.
Isn't this missing a Fixes tag?
Fixes: 67e139b02d994 ("mm/gup.c: refactor
check_and_migrate_movable_pages()")
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> mm/gup.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 3b75e631f369..2231ce7221f9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
> if (folio_is_longterm_pinnable(folio))
> continue;
>
> - collected++;
> -
> if (folio_is_device_coherent(folio))
> continue;
>
> @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
> if (!folio_isolate_lru(folio))
> continue;
>
> + collected++;
> +
Well, this seems correct to me. Somehow I talked myself into believing
that it was OK to do collected++ early, even though later on we skip
actually collecting the folio, thus miscounting things.
But now I believe it was just incorrect all along.
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
2025-01-20 19:34 ` John Hubbard
@ 2025-01-21 1:28 ` Zhaoyang Huang
0 siblings, 0 replies; 6+ messages in thread
From: Zhaoyang Huang @ 2025-01-21 1:28 UTC (permalink / raw)
To: John Hubbard
Cc: zhaoyang.huang, Andrew Morton, Alistair Popple, linux-mm,
linux-kernel, steve.kang
On Tue, Jan 21, 2025 at 3:34 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 1/20/25 1:26 AM, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Infinite loop within __get_longterm_locked detected in an unique usage
> > of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> > function allocate pages via cma_alloc for hardware purpose and leave them
> > out of LRU). Fixing this by have 'collected' reflect the actual number
> > of pages in movable_folio_list.
>
> The above is rather terse, although perhaps by kernel standards it's OK.
>
> Isn't this missing a Fixes tag?
>
> Fixes: 67e139b02d994 ("mm/gup.c: refactor
> check_and_migrate_movable_pages()")
ok. will amend in v2
>
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> > mm/gup.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3b75e631f369..2231ce7221f9 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
> > if (folio_is_longterm_pinnable(folio))
> > continue;
> >
> > - collected++;
> > -
> > if (folio_is_device_coherent(folio))
> > continue;
> >
> > @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
> > if (!folio_isolate_lru(folio))
> > continue;
> >
> > + collected++;
> > +
>
> Well, this seems correct to me. Somehow I talked myself into believing
> that it was OK to do collected++ early, even though later on we skip
> actually collecting the folio, thus miscounting things.
>
> But now I believe it was just incorrect all along.
>
>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks
>
> thanks,
> --
> John Hubbard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
2025-01-20 9:26 [PATCH] mm: gup: fix infinite loop within __get_longterm_locked zhaoyang.huang
2025-01-20 19:34 ` John Hubbard
@ 2025-01-20 20:14 ` David Hildenbrand
2025-01-21 1:31 ` Zhaoyang Huang
1 sibling, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-01-20 20:14 UTC (permalink / raw)
To: zhaoyang.huang, Andrew Morton, Alistair Popple, John Hubbard,
linux-mm, linux-kernel, Zhaoyang Huang, steve.kang
On 20.01.25 10:26, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> Infinite loop within __get_longterm_locked detected in an unique usage
> of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> function allocate pages via cma_alloc for hardware purpose and leave them
> out of LRU) Fixing this by have 'collected' reflect the actual number> of pages in movable_folio_list.
Maybe something like:
"
We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU. This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.
We incorrectly update the "collected" variable even if nothing was
collected. Fix it by incrementing "collected" only when we isolated a
folio and added it to the list of folios to migrate.
"
I assume, long-term these things will not actually be folios, but pages,
and we'll have to skip them in different code -- or assume they can be
longterm pinned even on CMA because they are allocated by the CMA-owning
driver.
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> mm/gup.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 3b75e631f369..2231ce7221f9 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
> if (folio_is_longterm_pinnable(folio))
> continue;
>
> - collected++;
> -
> if (folio_is_device_coherent(folio))
> continue;
>
> @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
> if (!folio_isolate_lru(folio))
> continue;
>
> + collected++;
> +
> list_add_tail(&folio->lru, movable_folio_list);
> node_stat_mod_folio(folio,
> NR_ISOLATED_ANON + folio_is_file_lru(folio),
What if folio_isolate_hugetlb() succeeded? The return value can tell us
if it actually succeeded.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
2025-01-20 20:14 ` David Hildenbrand
@ 2025-01-21 1:31 ` Zhaoyang Huang
2025-01-21 7:49 ` David Hildenbrand
0 siblings, 1 reply; 6+ messages in thread
From: Zhaoyang Huang @ 2025-01-21 1:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: zhaoyang.huang, Andrew Morton, Alistair Popple, John Hubbard,
linux-mm, linux-kernel, steve.kang
On Tue, Jan 21, 2025 at 4:14 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 20.01.25 10:26, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Infinite loop within __get_longterm_locked detected in an unique usage
> > of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
> > function allocate pages via cma_alloc for hardware purpose and leave them
> > out of LRU) Fixing this by have 'collected' reflect the actual number> of pages in movable_folio_list.
>
> Maybe something like:
>
> "
> We can run into an infinite loop in __get_longterm_locked() when
> collect_longterm_unpinnable_folios() finds only folios that are isolated
> from the LRU or were never added to the LRU. This can happen when all
> folios to be pinned are never added to the LRU, for example when
> vm_ops->fault allocated pages using cma_alloc() and never added them to
> the LRU.
>
> We incorrectly update the "collected" variable even if nothing was
> collected. Fix it by incrementing "collected" only when we isolated a
> folio and added it to the list of folios to migrate.
> "
>
> I assume, long-term these things will not actually be folios, but pages,
> and we'll have to skip them in different code -- or assume they can be
> longterm pinned even on CMA because they are allocated by the CMA-owning
> driver.
Thanks for the commit message. will update them to v2
>
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> > mm/gup.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3b75e631f369..2231ce7221f9 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
> > if (folio_is_longterm_pinnable(folio))
> > continue;
> >
> > - collected++;
> > -
> > if (folio_is_device_coherent(folio))
> > continue;
> >
> > @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
> > if (!folio_isolate_lru(folio))
> > continue;
> >
> > + collected++;
> > +
> > list_add_tail(&folio->lru, movable_folio_list);
> > node_stat_mod_folio(folio,
> > NR_ISOLATED_ANON + folio_is_file_lru(folio),
>
> What if folio_isolate_hugetlb() succeeded? The return value can tell us
> if it actually succeeded.
How about remove the variable 'collected' and change the criteria to
if(list_empty(&movable_folio_list))
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: gup: fix infinite loop within __get_longterm_locked
2025-01-21 1:31 ` Zhaoyang Huang
@ 2025-01-21 7:49 ` David Hildenbrand
0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-01-21 7:49 UTC (permalink / raw)
To: Zhaoyang Huang
Cc: zhaoyang.huang, Andrew Morton, Alistair Popple, John Hubbard,
linux-mm, linux-kernel, steve.kang
On 21.01.25 02:31, Zhaoyang Huang wrote:
> On Tue, Jan 21, 2025 at 4:14 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 20.01.25 10:26, zhaoyang.huang wrote:
>>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>>
>>> Infinite loop within __get_longterm_locked detected in an unique usage
>>> of pin_user_pages where the VA's pages are all unpinnable(vm_ops->fault
>>> function allocate pages via cma_alloc for hardware purpose and leave them
>>> out of LRU) Fixing this by have 'collected' reflect the actual number> of pages in movable_folio_list.
>>
>> Maybe something like:
>>
>> "
>> We can run into an infinite loop in __get_longterm_locked() when
>> collect_longterm_unpinnable_folios() finds only folios that are isolated
>> from the LRU or were never added to the LRU. This can happen when all
>> folios to be pinned are never added to the LRU, for example when
>> vm_ops->fault allocated pages using cma_alloc() and never added them to
>> the LRU.
>>
>> We incorrectly update the "collected" variable even if nothing was
>> collected. Fix it by incrementing "collected" only when we isolated a
>> folio and added it to the list of folios to migrate.
>> "
>>
>> I assume, long-term these things will not actually be folios, but pages,
>> and we'll have to skip them in different code -- or assume they can be
>> longterm pinned even on CMA because they are allocated by the CMA-owning
>> driver.
> Thanks for the commit message. will update them to v2
>>
>>>
>>> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>> ---
>>> mm/gup.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 3b75e631f369..2231ce7221f9 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2341,8 +2341,6 @@ static unsigned long collect_longterm_unpinnable_folios(
>>> if (folio_is_longterm_pinnable(folio))
>>> continue;
>>>
>>> - collected++;
>>> -
>>> if (folio_is_device_coherent(folio))
>>> continue;
>>>
>>> @@ -2359,6 +2357,8 @@ static unsigned long collect_longterm_unpinnable_folios(
>>> if (!folio_isolate_lru(folio))
>>> continue;
>>>
>>> + collected++;
>>> +
>>> list_add_tail(&folio->lru, movable_folio_list);
>>> node_stat_mod_folio(folio,
>>> NR_ISOLATED_ANON + folio_is_file_lru(folio),
>>
>> What if folio_isolate_hugetlb() succeeded? The return value can tell us
>> if it actually succeeded.
> How about remove the variable 'collected' and change the criteria to
> if(list_empty(&movable_folio_list))
That works if we know that the input list is empty, which is the case.
So let's turn that function into a void function, ans simply check
list_empty() in the caller that prepares the empty list.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-21 7:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-20 9:26 [PATCH] mm: gup: fix infinite loop within __get_longterm_locked zhaoyang.huang
2025-01-20 19:34 ` John Hubbard
2025-01-21 1:28 ` Zhaoyang Huang
2025-01-20 20:14 ` David Hildenbrand
2025-01-21 1:31 ` Zhaoyang Huang
2025-01-21 7:49 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox