* [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified
@ 2023-05-11 13:47 Tetsuo Handa
2023-05-12 3:45 ` Andrew Morton
2023-05-13 10:23 ` Mel Gorman
0 siblings, 2 replies; 12+ messages in thread
From: Tetsuo Handa @ 2023-05-11 13:47 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka, Mel Gorman; +Cc: linux-mm
Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock
held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue()
using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared
variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might
observe this flag being set immediately after another thread doing
__GFP_KSWAPD_RECLAIM allocation request set this flag.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held")
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 47421bedc12b..4283b5916f36 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone,
out:
/* Separate test+clear to avoid unnecessary atomics */
- if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) {
+ if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))
+ && (alloc_flags & ALLOC_KSWAPD)) {
clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
wakeup_kswapd(zone, 0, 0, zone_idx(zone));
}
--
2.18.4
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-11 13:47 [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified Tetsuo Handa @ 2023-05-12 3:45 ` Andrew Morton 2023-05-13 9:38 ` Tetsuo Handa 2023-05-13 10:23 ` Mel Gorman 1 sibling, 1 reply; 12+ messages in thread From: Andrew Morton @ 2023-05-12 3:45 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Vlastimil Babka, Mel Gorman, linux-mm On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared > variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might > observe this flag being set immediately after another thread doing > __GFP_KSWAPD_RECLAIM allocation request set this flag. What are the user-visible runtime effects of this flaw? > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j > +++ b/mm/page_alloc.c > @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, > > out: > /* Separate test+clear to avoid unnecessary atomics */ > - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { > + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) > + && (alloc_flags & ALLOC_KSWAPD)) { > clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); > wakeup_kswapd(zone, 0, 0, zone_idx(zone)); > } Thanks, I'll queue this up for some testing while awaiting input from Mel. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-12 3:45 ` Andrew Morton @ 2023-05-13 9:38 ` Tetsuo Handa 0 siblings, 0 replies; 12+ messages in thread From: Tetsuo Handa @ 2023-05-13 9:38 UTC (permalink / raw) To: Andrew Morton; +Cc: Vlastimil Babka, Mel Gorman, linux-mm On 2023/05/12 12:45, Andrew Morton wrote: > On Thu, 11 May 2023 22:47:36 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >> using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared >> variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might >> observe this flag being set immediately after another thread doing >> __GFP_KSWAPD_RECLAIM allocation request set this flag. > > What are the user-visible runtime effects of this flaw? Potential deadlock upon __GFP_HIGH (I mean, !__GFP_KSWAPD_RECLAIM) allocation requests (like debugobject code is about to start doing). > >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.j >> +++ b/mm/page_alloc.c >> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, >> >> out: >> /* Separate test+clear to avoid unnecessary atomics */ >> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { >> + if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) >> + && (alloc_flags & ALLOC_KSWAPD)) { >> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); >> wakeup_kswapd(zone, 0, 0, zone_idx(zone)); >> } > > Thanks, I'll queue this up for some testing while awaiting input from > Mel. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-11 13:47 [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified Tetsuo Handa 2023-05-12 3:45 ` Andrew Morton @ 2023-05-13 10:23 ` Mel Gorman 2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa 1 sibling, 1 reply; 12+ messages in thread From: Mel Gorman @ 2023-05-13 10:23 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, Vlastimil Babka, linux-mm On Thu, May 11, 2023 at 10:47:36PM +0900, Tetsuo Handa wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. But since zone->flags is a shared > variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might > observe this flag being set immediately after another thread doing > __GFP_KSWAPD_RECLAIM allocation request set this flag. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") The issue is real but it needs to be explained why this is a problem. Only allocation contexts that specify ALLOC_KSWAPD should wake kswapd similar to this if (alloc_flags & ALLOC_KSWAPD) wake_all_kswapds(order, gfp_mask, ac); The consequences are that kswapd could potentially be woken spuriously for callsites that clear __GFP_KSWAPD_RECLAIM explicitly or implicitly via combinations like GFP_TRANSHUGE_LIGHT. The other side is that kswapd does not get woken to reclaim pages up to the boosted watermark leading to a higher risk of fragmentation that may prevent future hugepage allocations. There is a slight risk this will increase reclaim because the zone flag is not being cleared in as many contexts but the risk is low. I also suggest as a micro-optimisation that ALLOC_KSWAPD is checked first because it should be cache hot and cheaper than the shared cache line for zone flags. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-13 10:23 ` Mel Gorman @ 2023-05-14 0:28 ` Tetsuo Handa 2023-05-15 6:03 ` Huang, Ying 2023-05-15 7:38 ` Mel Gorman 0 siblings, 2 replies; 12+ messages in thread From: Tetsuo Handa @ 2023-05-14 0:28 UTC (permalink / raw) To: Mel Gorman, Andrew Morton; +Cc: Vlastimil Babka, linux-mm Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() using ZONE_BOOSTED_WATERMARK flag. Only allocation contexts that include ALLOC_KSWAPD (which corresponds to __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a risk of deadlock. But since zone->flags is a shared variable, a thread doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag being set immediately after another thread doing __GFP_KSWAPD_RECLAIM allocation request set this flag, causing possibility of deadlock. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") --- Changes in v2: Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update description, suggested by Mel Gorman <mgorman@techsingularity.net>. mm/page_alloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..ecad680cec53 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, out: /* Separate test+clear to avoid unnecessary atomics */ - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { + if ((alloc_flags & ALLOC_KSWAPD) && + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); wakeup_kswapd(zone, 0, 0, zone_idx(zone)); } -- 2.18.4 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa @ 2023-05-15 6:03 ` Huang, Ying 2023-05-15 6:35 ` Huang, Ying 2023-05-15 7:38 ` Mel Gorman 1 sibling, 1 reply; 12+ messages in thread From: Huang, Ying @ 2023-05-15 6:03 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Mel Gorman, Andrew Morton, Vlastimil Babka, linux-mm Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. > > Only allocation contexts that include ALLOC_KSWAPD (which corresponds to > __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to > remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a > risk of deadlock. But since zone->flags is a shared variable, a thread > doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag > being set immediately after another thread doing __GFP_KSWAPD_RECLAIM > allocation request set this flag, causing possibility of deadlock. Sorry, I don't understand what is the deadlock here. I checked commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") and the corresponding mail thread. From the below mail, https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ commit 73444bc4d8f9 fixed a circular locking ordering as follows, pi lock -> rq lock -> timer base lock -> zone lock -> wakeup lock (kswapd_wait, fixed) -> pi lock But I don't know what is the deadlock that your patch fixed. Can you teach me on that? Best Regards, Huang, Ying > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") > --- > Changes in v2: > Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update > description, suggested by Mel Gorman <mgorman@techsingularity.net>. > > mm/page_alloc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47421bedc12b..ecad680cec53 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, > > out: > /* Separate test+clear to avoid unnecessary atomics */ > - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { > + if ((alloc_flags & ALLOC_KSWAPD) && > + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { > clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); > wakeup_kswapd(zone, 0, 0, zone_idx(zone)); > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-15 6:03 ` Huang, Ying @ 2023-05-15 6:35 ` Huang, Ying 0 siblings, 0 replies; 12+ messages in thread From: Huang, Ying @ 2023-05-15 6:35 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Mel Gorman, Andrew Morton, Vlastimil Babka, linux-mm Hi, Tetsuo, "Huang, Ying" <ying.huang@intel.com> writes: > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > >> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >> using ZONE_BOOSTED_WATERMARK flag. >> >> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >> risk of deadlock. But since zone->flags is a shared variable, a thread >> doing !__GFP_KSWAPD_RECLAIM allocation request might observe this flag >> being set immediately after another thread doing __GFP_KSWAPD_RECLAIM >> allocation request set this flag, causing possibility of deadlock. > > Sorry, I don't understand what is the deadlock here. > > I checked commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with > zone lock held") and the corresponding mail thread. From the below > mail, > > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ > > commit 73444bc4d8f9 fixed a circular locking ordering as follows, > > pi lock -> rq lock -> timer base lock -> zone lock -> wakeup lock > (kswapd_wait, fixed) -> pi lock > > But I don't know what is the deadlock that your patch fixed. Can you > teach me on that? Just read your email in another thread related to this patch as follow, https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ Is that the deadlock that you tried to fix in this patch? It appears that commit 73444bc4d8f9 didn't fix the deadlock above. It just convert the circular locking ordering to, pi lock -> rq lock -> timer base lock -> wakeup lock (kswapd_wait, fixed) -> pi lock If so, I think that it's better to add corresponding information in patch description to avoid the possible confusion. Best Regards, Huang, Ying >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> >> Fixes: 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock held") >> --- >> Changes in v2: >> Check ALLOC_KSWAPD before checking ZONE_BOOSTED_WATERMARK and update >> description, suggested by Mel Gorman <mgorman@techsingularity.net>. >> >> mm/page_alloc.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 47421bedc12b..ecad680cec53 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3052,7 +3052,8 @@ struct page *rmqueue(struct zone *preferred_zone, >> >> out: >> /* Separate test+clear to avoid unnecessary atomics */ >> - if (unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { >> + if ((alloc_flags & ALLOC_KSWAPD) && >> + unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { >> clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); >> wakeup_kswapd(zone, 0, 0, zone_idx(zone)); >> } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa 2023-05-15 6:03 ` Huang, Ying @ 2023-05-15 7:38 ` Mel Gorman 2023-05-15 10:17 ` Tetsuo Handa 1 sibling, 1 reply; 12+ messages in thread From: Mel Gorman @ 2023-05-15 7:38 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Andrew Morton, Vlastimil Babka, linux-mm On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: > Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock > held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() > using ZONE_BOOSTED_WATERMARK flag. > > Only allocation contexts that include ALLOC_KSWAPD (which corresponds to > __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to > remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a > risk of deadlock. kswapd_wait is a waitqueue so what is being held? It's safe for kswapd to try wake itself as the waitqueue will be active when wakeup_kswapd() is called so no wakeup occurs. If there is a deadlock, it needs a better explanation. I believe I already stated why this patch is fixing a bug but it wasn't deadlock related. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-15 7:38 ` Mel Gorman @ 2023-05-15 10:17 ` Tetsuo Handa 2023-05-16 1:44 ` Huang, Ying 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2023-05-15 10:17 UTC (permalink / raw) To: Mel Gorman; +Cc: Andrew Morton, Vlastimil Babka, linux-mm, Huang, Ying On 2023/05/15 16:38, Mel Gorman wrote: > On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >> using ZONE_BOOSTED_WATERMARK flag. >> >> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >> risk of deadlock. > > kswapd_wait is a waitqueue so what is being held? It's safe for kswapd > to try wake itself as the waitqueue will be active when wakeup_kswapd() > is called so no wakeup occurs. If there is a deadlock, it needs a better > explanation. I believe I already stated why this patch is fixing a bug > but it wasn't deadlock related. > I noticed this problem ( pgdat->kswapd_wait might be held without __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. The latter was explained at https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . This patch is for making sure that debugobject code will not try to hold pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. Thus, the problem this patch will fix is a deadlock related, isn't it? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-15 10:17 ` Tetsuo Handa @ 2023-05-16 1:44 ` Huang, Ying 2023-05-22 13:57 ` Tetsuo Handa 0 siblings, 1 reply; 12+ messages in thread From: Huang, Ying @ 2023-05-16 1:44 UTC (permalink / raw) To: Tetsuo Handa, Mel Gorman; +Cc: Andrew Morton, Vlastimil Babka, linux-mm Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > On 2023/05/15 16:38, Mel Gorman wrote: >> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >>> using ZONE_BOOSTED_WATERMARK flag. >>> >>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >>> risk of deadlock. >> >> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd >> to try wake itself as the waitqueue will be active when wakeup_kswapd() >> is called so no wakeup occurs. If there is a deadlock, it needs a better >> explanation. I believe I already stated why this patch is fixing a bug >> but it wasn't deadlock related. >> > > I noticed this problem ( pgdat->kswapd_wait might be held without > __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code > is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at > https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . > > I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. This has confused me much before. IIUC, the deadlock is unrelated with kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is, pgdat->kswapd_wait.lock holders take the pi lock pi lock holders take the rq lock rq lock holders take the timer base lock timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check) The above is based on analysis in https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the circular dependency chain. > The latter was explained at > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ > and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at > https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . > > This patch is for making sure that debugobject code will not try to hold > pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. > > Thus, the problem this patch will fix is a deadlock related, isn't it? Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-16 1:44 ` Huang, Ying @ 2023-05-22 13:57 ` Tetsuo Handa 2023-05-22 14:58 ` Mel Gorman 0 siblings, 1 reply; 12+ messages in thread From: Tetsuo Handa @ 2023-05-22 13:57 UTC (permalink / raw) To: Huang, Ying, Mel Gorman; +Cc: Andrew Morton, Vlastimil Babka, linux-mm On 2023/05/16 10:44, Huang, Ying wrote: > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > >> On 2023/05/15 16:38, Mel Gorman wrote: >>> On Sun, May 14, 2023 at 09:28:56AM +0900, Tetsuo Handa wrote: >>>> Commit 73444bc4d8f9 ("mm, page_alloc: do not wake kswapd with zone lock >>>> held") moved wakeup_kswapd() from steal_suitable_fallback() to rmqueue() >>>> using ZONE_BOOSTED_WATERMARK flag. >>>> >>>> Only allocation contexts that include ALLOC_KSWAPD (which corresponds to >>>> __GFP_KSWAPD_RECLAIM) should wake kswapd, for callers are supposed to >>>> remove __GFP_KSWAPD_RECLAIM if trying to hold pgdat->kswapd_wait has a >>>> risk of deadlock. >>> >>> kswapd_wait is a waitqueue so what is being held? It's safe for kswapd >>> to try wake itself as the waitqueue will be active when wakeup_kswapd() >>> is called so no wakeup occurs. If there is a deadlock, it needs a better >>> explanation. I believe I already stated why this patch is fixing a bug >>> but it wasn't deadlock related. >>> >> >> I noticed this problem ( pgdat->kswapd_wait might be held without >> __GFP_KSWAPD_RECLAIM ) when analyzing a different problem ( debugobject code >> is holding pgdat->kswapd_wait due to __GFP_KSWAPD_RECLAIM ) reported at >> https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380 . >> >> I'm calling pgdat->kswapd_wait a lock, as lockdep uses it as a lock name. > > This has confused me much before. IIUC, the deadlock is unrelated with > kswapd wakeup itself. pgdat->kswapd_wait is the lock name and the lock > in fact is the spinlock: pgdat->kswapd_wait.lock. So the deadlock is, > > pgdat->kswapd_wait.lock holders take the pi lock > pi lock holders take the rq lock > rq lock holders take the timer base lock > timer base lock holders take the pgdat->kswapd_wait.lock (missing __GFP_KSWAPD_RECLAIM check) Yes, thank you for explanation. > > The above is based on analysis in > > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ > > Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base > lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the > circular dependency chain. Yes. Mel, any questions on this patch? Thomas Gleixner described this lock as kswapd_wait::lock at https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 . Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or s/pgdat->kswapd_wait/kswapd_wait::lock/ ? > >> The latter was explained at >> https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ >> and we agreed that debugobject code needs to drop __GFP_KSWAPD_RECLAIM at >> https://lore.kernel.org/linux-mm/87fs809fwk.ffs@tglx/ . >> >> This patch is for making sure that debugobject code will not try to hold >> pgdat->kswapd_wait after debugobject code dropped __GFP_KSWAPD_RECLAIM. >> >> Thus, the problem this patch will fix is a deadlock related, isn't it? > > Best Regards, > Huang, Ying ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/page_alloc: don't wake kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified 2023-05-22 13:57 ` Tetsuo Handa @ 2023-05-22 14:58 ` Mel Gorman 0 siblings, 0 replies; 12+ messages in thread From: Mel Gorman @ 2023-05-22 14:58 UTC (permalink / raw) To: Tetsuo Handa; +Cc: Huang, Ying, Andrew Morton, Vlastimil Babka, linux-mm On Mon, May 22, 2023 at 10:57:03PM +0900, Tetsuo Handa wrote: > On 2023/05/16 10:44, Huang, Ying wrote: > > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes: > > > > > > > The above is based on analysis in > > > > https://lore.kernel.org/all/20190107204627.GA25526@cmpxchg.org/ > > https://lore.kernel.org/linux-mm/d642e597-cf7d-b410-16ce-22dff483fd8e@I-love.SAKURA.ne.jp/ > > > > Tetsuo's patch avoids to take pgdat->kswapd_wait.lock when timer base > > lock is held via adding check for __GFP_KSWAPD_RECLAIM, so breaks the > > circular dependency chain. > > Yes. Mel, any questions on this patch? > No, I do not, the deadlock issue is more clear now. > Thomas Gleixner described this lock as kswapd_wait::lock at > https://lkml.kernel.org/r/168476016890.404.6911447269153588182.tip-bot2@tip-bot2 . > Should I resubmit this patch with s/pgdat->kswapd_wait/pgdat->kswapd_wait.lock/ or > s/pgdat->kswapd_wait/kswapd_wait::lock/ ? > I don't think a revision of what's in Andrew's tree is necessary. It could be improved by outlining the exact deadlock similar based on this thread but it's somewhat overkill as the fix has other obvious justifications on its own. Acked-by: Mel Gorman <mgorman@techsingularity.net> Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-05-22 14:58 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-11 13:47 [PATCH] mm/page_alloc: don't wake up kswapd from rmqueue() unless __GFP_KSWAPD_RECLAIM is specified Tetsuo Handa 2023-05-12 3:45 ` Andrew Morton 2023-05-13 9:38 ` Tetsuo Handa 2023-05-13 10:23 ` Mel Gorman 2023-05-14 0:28 ` [PATCH v2] mm/page_alloc: don't wake " Tetsuo Handa 2023-05-15 6:03 ` Huang, Ying 2023-05-15 6:35 ` Huang, Ying 2023-05-15 7:38 ` Mel Gorman 2023-05-15 10:17 ` Tetsuo Handa 2023-05-16 1:44 ` Huang, Ying 2023-05-22 13:57 ` Tetsuo Handa 2023-05-22 14:58 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox