* [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
@ 2023-05-11 13:47 Tetsuo Handa
2023-05-12 3:45 ` Andrew Morton
2023-05-13 10:23 ` Mel Gorman
0 siblings, 2 replies; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-11 13:47 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Mel Gorman; +Cc: linux-mm
Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
observe this flag being set immediately after another thread doing
__GFP_KSWAPD_RECLAIM allocation request set this flag.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47421bedc12b..4283b5916f36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
out:
/* Separate test+clear to avoid unnecessary atomics */
- if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
+ if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
+ && (alloc_flags & ALLOC_KSWAPD)) {
clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
wakeup_kswapd(zone, 0, 0, zone_idx(zone));
}
--
2.18.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-11 13:47 [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified Tetsuo Handa
@ 2023-05-12 3:45 ` Andrew Morton
2023-05-13 9:38 ` Tetsuo Handa
2023-05-13 10:23 ` Mel Gorman
1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2023-05-12 3:45 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Vlastimil Babka, Mel Gorman, linux-mm
On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
> observe this flag being set immediately after another thread doing
> __GFP_KSWAPD_RECLAIM allocation request set this flag.
What are the user-visible runtime effects of this flaw?
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j
> +++ b/mm/page_alloc.c
> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
>
> out:
> /* Separate test+clear to avoid unnecessary atomics */
> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
> + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
> + && (alloc_flags & ALLOC_KSWAPD)) {
> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
> wakeup_kswapd(zone, 0, 0, zone_idx(zone));
> }
Thanks, I'll queue this up for some testing while awaiting input from
Mel.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-12 3:45 ` Andrew Morton
@ 2023-05-13 9:38 ` Tetsuo Handa
0 siblings, 0 replies; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-13 9:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: Vlastimil Babka, Mel Gorman, linux-mm
On 2023/05/12 12:45, Andrew Morton wrote:
> On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>
>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
>> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
>> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
>> observe this flag being set immediately after another thread doing
>> __GFP_KSWAPD_RECLAIM allocation request set this flag.
>
> What are the user-visible runtime effects of this flaw?
Potential deadlock upon __GFP_HIGH (I mean, !__GFP_KSWAPD_RECLAIM)
allocation requests (like debugobject code is about to start doing).
>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j
>> +++ b/mm/page_alloc.c
>> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
>>
>> out:
>> /* Separate test+clear to avoid unnecessary atomics */
>> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
>> + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
>> + && (alloc_flags & ALLOC_KSWAPD)) {
>> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
>> wakeup_kswapd(zone, 0, 0, zone_idx(zone));
>> }
>
> Thanks, I'll queue this up for some testing while awaiting input from
> Mel.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-11 13:47 [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified Tetsuo Handa
2023-05-12 3:45 ` Andrew Morton
@ 2023-05-13 10:23 ` Mel Gorman
2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa
1 sibling, 1 reply; 12+ messages in thread
From: Mel Gorman @ 2023-05-13 10:23 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Andrew Morton, Vlastimil Babka, linux-mm
On Thu, May 11, 2023 at 10:47:36PM +0900, Tetsuo Handa wrote:
> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
> observe this flag being set immediately after another thread doing
> __GFP_KSWAPD_RECLAIM allocation request set this flag.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
The issue is real but it needs to be explained why this is a problem.
Only allocation contexts that specify ALLOC_KSWAPD should wake kswapd
similar to this
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
The consequences are that kswapd could potentially be woken spuriously
for callsites that clear __GFP_KSWAPD_RECLAIM explicitly or implicitly
via combinations like GFP_TRANSHUGE_LIGHT. The other side is that kswapd
does not get woken to reclaim pages up to the boosted watermark
leading to a higher risk of fragmentation that may prevent future
hugepage allocations.
There is a slight risk this will increase reclaim because the zone flag
is not being cleared in as many contexts but the risk is low.
I also suggest as a micro-optimisation that ALLOC_KSWAPD is checked first
because it should be cache hot and cheaper than the shared cache line for
zone flags.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-13 10:23 ` Mel Gorman
@ 2023-05-14 0:28 ` Tetsuo Handa
2023-05-15 6:03 ` Huang, Ying
2023-05-15 7:38 ` Mel Gorman
0 siblings, 2 replies; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-14 0:28 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: Vlastimil Babka, linux-mm
Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
using ZONE_BOOSTED_WATERMARK flag.
Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
__GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
risk of deadlock. But since zone->flags is a shared variable, a thread
doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag
being set immediately after another thread doing __GFP_KSWAPD_RECLAIM
allocation request set this flag, causing possibility of deadlock.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
---
Changes in v2:
Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update
description, suggested by Mel Gorman <mgorman@techsingularity.net>.
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47421bedc12b..ecad680cec53 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
out:
/* Separate test+clear to avoid unnecessary atomics */
- if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
+ if ((alloc_flags & ALLOC_KSWAPD) &&
+ unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
wakeup_kswapd(zone, 0, 0, zone_idx(zone));
}
--
2.18.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa
@ 2023-05-15 6:03 ` Huang, Ying
2023-05-15 6:35 ` Huang, Ying
2023-05-15 7:38 ` Mel Gorman
1 sibling, 1 reply; 12+ messages in thread
From: Huang, Ying @ 2023-05-15 6:03 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Mel Gorman, Andrew Morton, Vlastimil Babka, linux-mm
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag.
>
> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
> risk of deadlock. But since zone->flags is a shared variable, a thread
> doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag
> being set immediately after another thread doing __GFP_KSWAPD_RECLAIM
> allocation request set this flag, causing possibility of deadlock.
Sorry, I don't understand what is the deadlock here.
I checked commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with
zone lock held") and the corresponding mail thread. From the below
mail,
https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/
commit 73444bc4d8f9 fixed a circular locking ordering as follows,
pi lock -> rq lock -> timer base lock -> zone lock -> wakeup lock
(kswapd_wait, fixed) -> pi lock
But I don't know what is the deadlock that your patch fixed. Can you
teach me on that?
Best Regards,
Huang, Ying
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
> ---
> Changes in v2:
> Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update
> description, suggested by Mel Gorman <mgorman@techsingularity.net>.
>
> mm/page_alloc.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 47421bedc12b..ecad680cec53 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
>
> out:
> /* Separate test+clear to avoid unnecessary atomics */
> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
> + if ((alloc_flags & ALLOC_KSWAPD) &&
> + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
> wakeup_kswapd(zone, 0, 0, zone_idx(zone));
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-15 6:03 ` Huang, Ying
@ 2023-05-15 6:35 ` Huang, Ying
0 siblings, 0 replies; 12+ messages in thread
From: Huang, Ying @ 2023-05-15 6:35 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Mel Gorman, Andrew Morton, Vlastimil Babka, linux-mm
Hi, Tetsuo,
"Huang, Ying" <ying.huang@intel.com> writes:
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
>
>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
>> using ZONE_BOOSTED_WATERMARK flag.
>>
>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
>> risk of deadlock. But since zone->flags is a shared variable, a thread
>> doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag
>> being set immediately after another thread doing __GFP_KSWAPD_RECLAIM
>> allocation request set this flag, causing possibility of deadlock.
>
> Sorry, I don't understand what is the deadlock here.
>
> I checked commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with
> zone lock held") and the corresponding mail thread. From the below
> mail,
>
> https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/
>
> commit 73444bc4d8f9 fixed a circular locking ordering as follows,
>
> pi lock -> rq lock -> timer base lock -> zone lock -> wakeup lock
> (kswapd_wait, fixed) -> pi lock
>
> But I don't know what is the deadlock that your patch fixed. Can you
> teach me on that?
Just read your email in another thread related to this patch as follow,
https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
Is that the deadlock that you tried to fix in this patch?
It appears that commit 73444bc4d8f9 didn't fix the deadlock above. It
just convert the circular locking ordering to,
pi lock -> rq lock -> timer base lock -> wakeup lock (kswapd_wait,
fixed) -> pi lock
If so, I think that it's better to add corresponding information in
patch description to avoid the possible confusion.
Best Regards,
Huang, Ying
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
>> ---
>> Changes in v2:
>> Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update
>> description, suggested by Mel Gorman <mgorman@techsingularity.net>.
>>
>> mm/page_alloc.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 47421bedc12b..ecad680cec53 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
>>
>> out:
>> /* Separate test+clear to avoid unnecessary atomics */
>> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
>> + if ((alloc_flags & ALLOC_KSWAPD) &&
>> + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
>> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
>> wakeup_kswapd(zone, 0, 0, zone_idx(zone));
>> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa
2023-05-15 6:03 ` Huang, Ying
@ 2023-05-15 7:38 ` Mel Gorman
2023-05-15 10:17 ` Tetsuo Handa
1 sibling, 1 reply; 12+ messages in thread
From: Mel Gorman @ 2023-05-15 7:38 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Andrew Morton, Vlastimil Babka, linux-mm
On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote:
> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
> using ZONE_BOOSTED_WATERMARK flag.
>
> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
> risk of deadlock.
kswapd_wait is a waitqueue so what is being held? It's safe for kswapd
to try wake itself as the waitqueue will be active when wakeup_kswapd()
is called so no wakeup occurs. If there is a deadlock, it needs a better
explanation. I believe I already stated why this patch is fixing a bug
but it wasn't deadlock related.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-15 7:38 ` Mel Gorman
@ 2023-05-15 10:17 ` Tetsuo Handa
2023-05-16 1:44 ` Huang, Ying
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-15 10:17 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Vlastimil Babka, linux-mm, Huang, Ying
On 2023/05/15 16:38, Mel Gorman wrote:
> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote:
>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
>> using ZONE_BOOSTED_WATERMARK flag.
>>
>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
>> risk of deadlock.
>
> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd
> to try wake itself as the waitqueue will be active when wakeup_kswapd()
> is called so no wakeup occurs. If there is a deadlock, it needs a better
> explanation. I believe I already stated why this patch is fixing a bug
> but it wasn't deadlock related.
>
I noticed this problem ( pgdat->kswapd_wait might be held without
__GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code
is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at
https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 .
I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name.
The latter was explained at
https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at
https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ .
This patch is for making sure that debugobject code will not try to hold
pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM.
Thus, the problem this patch will fix is a deadlock related, isn't it?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-15 10:17 ` Tetsuo Handa
@ 2023-05-16 1:44 ` Huang, Ying
2023-05-22 13:57 ` Tetsuo Handa
0 siblings, 1 reply; 12+ messages in thread
From: Huang, Ying @ 2023-05-16 1:44 UTC (permalink / raw)
To: Tetsuo Handa, Mel Gorman; +Cc: Andrew Morton, Vlastimil Babka, linux-mm
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
> On 2023/05/15 16:38, Mel Gorman wrote:
>> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote:
>>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
>>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
>>> using ZONE_BOOSTED_WATERMARK flag.
>>>
>>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
>>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
>>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
>>> risk of deadlock.
>>
>> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd
>> to try wake itself as the waitqueue will be active when wakeup_kswapd()
>> is called so no wakeup occurs. If there is a deadlock, it needs a better
>> explanation. I believe I already stated why this patch is fixing a bug
>> but it wasn't deadlock related.
>>
>
> I noticed this problem ( pgdat->kswapd_wait might be held without
> __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code
> is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at
> https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 .
>
> I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name.
This has confused me much before. IIUC, the deadlock is unrelated with
kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock
in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is,
pgdat->kswapd_wait.lock holders take the pi lock
pi lock holders take the rq lock
rq lock holders take the timer base lock
timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check)
The above is based on analysis in
https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/
https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base
lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the
circular dependency chain.
> The latter was explained at
> https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
> and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at
> https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ .
>
> This patch is for making sure that debugobject code will not try to hold
> pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM.
>
> Thus, the problem this patch will fix is a deadlock related, isn't it?
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-16 1:44 ` Huang, Ying
@ 2023-05-22 13:57 ` Tetsuo Handa
2023-05-22 14:58 ` Mel Gorman
0 siblings, 1 reply; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-22 13:57 UTC (permalink / raw)
To: Huang, Ying, Mel Gorman; +Cc: Andrew Morton, Vlastimil Babka, linux-mm
On 2023/05/16 10:44, Huang, Ying wrote:
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
>
>> On 2023/05/15 16:38, Mel Gorman wrote:
>>> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote:
>>>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
>>>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
>>>> using ZONE_BOOSTED_WATERMARK flag.
>>>>
>>>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to
>>>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to
>>>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a
>>>> risk of deadlock.
>>>
>>> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd
>>> to try wake itself as the waitqueue will be active when wakeup_kswapd()
>>> is called so no wakeup occurs. If there is a deadlock, it needs a better
>>> explanation. I believe I already stated why this patch is fixing a bug
>>> but it wasn't deadlock related.
>>>
>>
>> I noticed this problem ( pgdat->kswapd_wait might be held without
>> __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code
>> is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at
>> https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 .
>>
>> I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name.
>
> This has confused me much before. IIUC, the deadlock is unrelated with
> kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock
> in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is,
>
> pgdat->kswapd_wait.lock holders take the pi lock
> pi lock holders take the rq lock
> rq lock holders take the timer base lock
> timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check)
Yes, thank you for explanation.
>
> The above is based on analysis in
>
> https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/
> https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
>
> Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base
> lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the
> circular dependency chain.
Yes. Mel, any questions on this patch?
Thomas Gleixner described this lock as kswapd_wait::lock at
https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 .
Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or
s/pgdat->kswapd_wait/kswapd_wait::lock/ ?
>
>> The latter was explained at
>> https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
>> and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at
>> https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ .
>>
>> This patch is for making sure that debugobject code will not try to hold
>> pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM.
>>
>> Thus, the problem this patch will fix is a deadlock related, isn't it?
>
> Best Regards,
> Huang, Ying
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
2023-05-22 13:57 ` Tetsuo Handa
@ 2023-05-22 14:58 ` Mel Gorman
0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2023-05-22 14:58 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: Huang, Ying, Andrew Morton, Vlastimil Babka, linux-mm
On Mon, May 22, 2023 at 10:57:03PM +0900, Tetsuo Handa wrote:
> On 2023/05/16 10:44, Huang, Ying wrote:
> > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
> >
>
> >
> > The above is based on analysis in
> >
> > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/
> > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/
> >
> > Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base
> > lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the
> > circular dependency chain.
>
> Yes. Mel, any questions on this patch?
>
No, I do not, the deadlock issue is more clear now.
> Thomas Gleixner described this lock as kswapd_wait::lock at
> https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 .
> Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or
> s/pgdat->kswapd_wait/kswapd_wait::lock/ ?
>
I don't think a revision of what's in Andrew's tree is necessary. It could
be improved by outlining the exact deadlock similar based on this thread
but it's somewhat overkill as the fix has other obvious justifications on
its own.
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Thanks.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-05-22 14:58 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11 13:47 [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified Tetsuo Handa
2023-05-12 3:45 ` Andrew Morton
2023-05-13 9:38 ` Tetsuo Handa
2023-05-13 10:23 ` Mel Gorman
2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa
2023-05-15 6:03 ` Huang, Ying
2023-05-15 6:35 ` Huang, Ying
2023-05-15 7:38 ` Mel Gorman
2023-05-15 10:17 ` Tetsuo Handa
2023-05-16 1:44 ` Huang, Ying
2023-05-22 13:57 ` Tetsuo Handa
2023-05-22 14:58 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox