linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Brendan Jackman <jackmanb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] mm: page_alloc: defrag_mode
Date: Sat, 22 Mar 2025 23:46:57 -0400	[thread overview]
Message-ID: <20250323034657.GD1894930@cmpxchg.org> (raw)
In-Reply-To: <20250323013405.GC1894930@cmpxchg.org>

On Sat, Mar 22, 2025 at 09:34:09PM -0400, Johannes Weiner wrote:
> On Sat, Mar 22, 2025 at 08:58:27PM -0400, Johannes Weiner wrote:
> > On Sat, Mar 22, 2025 at 04:05:52PM +0100, Brendan Jackman wrote:
> > > On Thu Mar 13, 2025 at 10:05 PM CET, Johannes Weiner wrote:
> > > > +	/* Reclaim/compaction failed to prevent the fallback */
> > > > +	if (defrag_mode) {
> > > > +		alloc_flags &= ALLOC_NOFRAGMENT;
> > > > +		goto retry;
> > > > +	}
> > > 
> > > I can't see where ALLOC_NOFRAGMENT gets cleared, is it supposed to be
> > > here (i.e. should this be ~ALLOC_NOFRAGMENT)?
> 
> Please ignore my previous email, this is actually a much more severe
> issue than I thought at first. The screwed up clearing is bad, but
> this will also not check the flag before retrying, which means the
> thread will retry reclaim/compaction and never reach OOM.
> 
> This code has weeks of load testing, with workloads fine-tuned to
> *avoid* OOM. A blatant OOM test shows this problem immediately.
> 
> A simple fix, but I'll put it through the wringer before sending it.

Ok, here is the patch. I verified this with intentional OOMing 100
times in a loop; this would previously lock up on first try in
defrag_mode, but kills and recovers reliably with this applied.

I also re-ran the full THP benchmarks, to verify that erroneous
looping here did not accidentally contribute to fragmentation
avoidance and thus THP success & latency rates. They were in fact not;
the improvements claimed for defrag_mode are unchanged with this fix:

                                                VANILLA    defrag_mode=1-OOMFIX
Hugealloc Time mean               52739.45 (    +0.00%)   27342.44 (   -48.15%)
Hugealloc Time stddev             56541.26 (    +0.00%)   33227.16 (   -41.23%)
Kbuild Real time                    197.47 (    +0.00%)     196.32 (    -0.58%)
Kbuild User time                   1240.49 (    +0.00%)    1231.89 (    -0.69%)
Kbuild System time                   70.08 (    +0.00%)      58.75 (   -15.95%)
THP fault alloc                   46727.07 (    +0.00%)   62669.93 (   +34.12%)
THP fault fallback                21910.60 (    +0.00%)    5966.40 (   -72.77%)
Direct compact fail                 195.80 (    +0.00%)      50.53 (   -73.81%)
Direct compact success                7.93 (    +0.00%)       4.07 (   -43.28%)
Compact daemon scanned migrate  3369601.27 (    +0.00%) 1588238.93 (   -52.87%)
Compact daemon scanned free     5075474.47 (    +0.00%) 1441944.27 (   -71.59%)
Compact direct scanned migrate   161787.27 (    +0.00%)   64838.53 (   -59.92%)
Compact direct scanned free      163467.53 (    +0.00%)   37243.00 (   -77.22%)
Compact total migrate scanned   3531388.53 (    +0.00%) 1653077.47 (   -53.19%)
Compact total free scanned      5238942.00 (    +0.00%) 1479187.27 (   -71.77%)
Alloc stall                        2371.07 (    +0.00%)     553.00 (   -76.64%)
Pages kswapd scanned            2160926.73 (    +0.00%) 4052539.93 (   +87.54%)
Pages kswapd reclaimed           533191.07 (    +0.00%)  765447.47 (   +43.56%)
Pages direct scanned             400450.33 (    +0.00%)  358933.93 (   -10.37%)
Pages direct reclaimed            94441.73 (    +0.00%)   26991.60 (   -71.42%)
Pages total scanned             2561377.07 (    +0.00%) 4411473.87 (   +72.23%)
Pages total reclaimed            627632.80 (    +0.00%)  792439.07 (   +26.26%)
Swap out                          47959.53 (    +0.00%)  128511.80 (  +167.96%)
Swap in                            7276.00 (    +0.00%)   27736.20 (  +281.16%)
File refaults                    138043.00 (    +0.00%)  206198.40 (   +49.37%)

Many thanks for your careful review, Brendan.

---

From c84651a46910448c6cfaf44885644fdb215f7f6a Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Sat, 22 Mar 2025 19:21:45 -0400
Subject: [PATCH] mm: page_alloc: fix defrag_mode's retry & OOM path

Brendan points out that defrag_mode doesn't properly clear
ALLOC_NOFRAGMENT on its last-ditch attempt to allocate. But looking
closer, the problem is actually more severe: it doesn't actually
*check* whether it's already retried, and keeps looping. This means
the OOM path is never taken, and the thread can loop indefinitely.

This is verified with an intentional OOM test on defrag_mode=1, which
results in the machine hanging. After this patch, it triggers the OOM
kill reliably and recovers.

Clear ALLOC_NOFRAGMENT properly, and only retry once.

Fixes: e3aa7df331bc ("mm: page_alloc: defrag_mode")
Reported-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0c01998cb3a0..582364d42906 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4543,8 +4543,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto retry;
 
 	/* Reclaim/compaction failed to prevent the fallback */
-	if (defrag_mode) {
-		alloc_flags &= ALLOC_NOFRAGMENT;
+	if (defrag_mode && (alloc_flags & ALLOC_NOFRAGMENT)) {
+		alloc_flags &= ~ALLOC_NOFRAGMENT;
 		goto retry;
 	}
 
-- 
2.49.0



  reply	other threads:[~2025-03-23  3:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 21:05 [PATCH 0/5] mm: reliable huge page allocator Johannes Weiner
2025-03-13 21:05 ` [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers Johannes Weiner
2025-03-14 15:08   ` Zi Yan
2025-03-16  4:28   ` Hugh Dickins
2025-03-17 18:18     ` Johannes Weiner
2025-03-21  6:21   ` kernel test robot
2025-03-21 13:55     ` Johannes Weiner
2025-04-10 15:19   ` Vlastimil Babka
2025-04-10 20:17     ` Johannes Weiner
2025-04-11  7:32       ` Vlastimil Babka
2025-03-13 21:05 ` [PATCH 2/5] mm: page_alloc: trace type pollution from compaction capturing Johannes Weiner
2025-03-14 18:36   ` Zi Yan
2025-03-13 21:05 ` [PATCH 3/5] mm: page_alloc: defrag_mode Johannes Weiner
2025-03-14 18:54   ` Zi Yan
2025-03-14 20:50     ` Johannes Weiner
2025-03-14 22:54       ` Zi Yan
2025-03-22 15:05   ` Brendan Jackman
2025-03-23  0:58     ` Johannes Weiner
2025-03-23  1:34       ` Johannes Weiner
2025-03-23  3:46         ` Johannes Weiner [this message]
2025-03-23 18:04           ` Brendan Jackman
2025-03-31 15:55             ` Johannes Weiner
2025-03-13 21:05 ` [PATCH 4/5] mm: page_alloc: defrag_mode kswapd/kcompactd assistance Johannes Weiner
2025-03-13 21:05 ` [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks Johannes Weiner
2025-03-14 21:05   ` Johannes Weiner
2025-04-11  8:19   ` Vlastimil Babka
2025-04-11 15:39     ` Johannes Weiner
2025-04-11 16:51       ` Vlastimil Babka
2025-04-11 18:21         ` Johannes Weiner
2025-04-13  2:20           ` Johannes Weiner
2025-04-15  7:31             ` Vlastimil Babka
2025-04-15  7:44             ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250323034657.GD1894930@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox