From: Vlastimil Babka <vbabka@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks
Date: Tue, 15 Apr 2025 09:31:36 +0200 [thread overview]
Message-ID: <56245588-640d-471a-9e72-ce1288b7b81e@suse.cz> (raw)
In-Reply-To: <20250413022058.GF366747@cmpxchg.org>
On 4/13/25 04:20, Johannes Weiner wrote:
> On Fri, Apr 11, 2025 at 02:21:58PM -0400, Johannes Weiner wrote:
>> On Fri, Apr 11, 2025 at 06:51:51PM +0200, Vlastimil Babka wrote:
>> > [*] now when checking the code between kswapd and kcompactd handover, I
>> > think I found a another problem?
>> >
>> > we have:
>> > kswapd_try_to_sleep()
>> > prepare_kswapd_sleep() - needs to succeed for wakeup_kcompactd()
>> > pgdat_balanced() - needs to be true for prepare_kswapd_sleep() to be true
>> > - with defrag_mode we want high watermark of NR_FREE_PAGES_BLOCKS, but
>> > we were only reclaiming until now and didn't wake up kcompactd and
>> > this actually prevents the wake up?
>
> Coming back to this, there is indeed a defrag_mode issue. My
> apologies, I misunderstood what you were pointing at.
>
> Like I said, kswapd reverts to order-0 in some other place to go to
> sleep and trigger the handoff. At that point, defrag_mode also needs
> to revert to NR_FREE_PAGES.
I missed that revert to order-0 and that without it the current code also
wouldn't make sense. But I agree with the fix.
> It's curious that this didn't stand out in testing. On the contrary,
> kcompactd was still doing the vast majority of the compaction work. It
> looks like kswapd and direct workers on their own manage to balance
> NR_FREE_PAGES_BLOCK every so often, and then kswapd hands off. Given
> the low number of kcompactd wakeups, the consumers keep it going.
>
> So testing with this:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cc422ad830d6..c2aa0a4b67de 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -6747,8 +6747,11 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
> /*
> * In defrag_mode, watermarks must be met in whole
> * blocks to avoid polluting allocator fallbacks.
> + *
> + * When kswapd has compact gap, check regular
> + * NR_FREE_PAGES and hand over to kcompactd.
> */
> - if (defrag_mode)
> + if (defrag_mode && order)
> item = NR_FREE_PAGES_BLOCKS;
> else
> item = NR_FREE_PAGES;
>
> I'm getting the following results:
>
> fallbackspeed/STUPID-DEFRAGMODE fallbackspeed/DEFRAGMODE
> Hugealloc Time mean 79381.34 ( +0.00%) 88126.12 ( +11.02%)
> Hugealloc Time stddev 85852.16 ( +0.00%) 135366.75 ( +57.67%)
> Kbuild Real time 249.35 ( +0.00%) 226.71 ( -9.04%)
> Kbuild User time 1249.16 ( +0.00%) 1249.37 ( +0.02%)
> Kbuild System time 171.76 ( +0.00%) 166.93 ( -2.79%)
> THP fault alloc 51666.87 ( +0.00%) 52685.60 ( +1.97%)
> THP fault fallback 16970.00 ( +0.00%) 15951.87 ( -6.00%)
> Direct compact fail 166.53 ( +0.00%) 178.93 ( +7.40%)
> Direct compact success 17.13 ( +0.00%) 4.13 ( -71.69%)
> Compact daemon scanned migrate 3095413.33 ( +0.00%) 9231239.53 ( +198.22%)
> Compact daemon scanned free 2155966.53 ( +0.00%) 7053692.87 ( +227.17%)
> Compact direct scanned migrate 265642.47 ( +0.00%) 68388.33 ( -74.26%)
> Compact direct scanned free 130252.60 ( +0.00%) 55634.87 ( -57.29%)
> Compact total migrate scanned 3361055.80 ( +0.00%) 9299627.87 ( +176.69%)
> Compact total free scanned 2286219.13 ( +0.00%) 7109327.73 ( +210.96%)
> Alloc stall 1890.80 ( +0.00%) 6297.60 ( +232.94%)
> Pages kswapd scanned 9043558.80 ( +0.00%) 5952576.73 ( -34.18%)
> Pages kswapd reclaimed 1891708.67 ( +0.00%) 1030645.00 ( -45.52%)
> Pages direct scanned 1017090.60 ( +0.00%) 2688047.60 ( +164.29%)
> Pages direct reclaimed 92682.60 ( +0.00%) 309770.53 ( +234.22%)
> Pages total scanned 10060649.40 ( +0.00%) 8640624.33 ( -14.11%)
> Pages total reclaimed 1984391.27 ( +0.00%) 1340415.53 ( -32.45%)
> Swap out 884585.73 ( +0.00%) 417781.93 ( -52.77%)
> Swap in 287106.27 ( +0.00%) 95589.73 ( -66.71%)
> File refaults 551697.60 ( +0.00%) 426474.80 ( -22.70%)
>
> Work has shifted from direct to kcompactd. In aggregate there is more
> compaction happening. Meanwhile aggregate reclaim and swapping drops
> quite substantially. %sys is down, so this is just more efficient.
>
> Reclaim and swapping are down substantially, which is great. But the
> reclaim work that remains has shifted somewhat to direct reclaim,
> which is unfortunate. THP delivery is also a tad worse, but still much
> better than !defrag_mode, so not too concerning. That part deserves a
> bit more thought.
>
> Overall, this looks good, though. I'll send a proper patch next week.
>
> Thanks for the review, Vlastimil.
NP!
next prev parent reply other threads:[~2025-04-15 7:31 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 21:05 [PATCH 0/5] mm: reliable huge page allocator Johannes Weiner
2025-03-13 21:05 ` [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers Johannes Weiner
2025-03-14 15:08 ` Zi Yan
2025-03-16 4:28 ` Hugh Dickins
2025-03-17 18:18 ` Johannes Weiner
2025-03-21 6:21 ` kernel test robot
2025-03-21 13:55 ` Johannes Weiner
2025-04-10 15:19 ` Vlastimil Babka
2025-04-10 20:17 ` Johannes Weiner
2025-04-11 7:32 ` Vlastimil Babka
2025-03-13 21:05 ` [PATCH 2/5] mm: page_alloc: trace type pollution from compaction capturing Johannes Weiner
2025-03-14 18:36 ` Zi Yan
2025-03-13 21:05 ` [PATCH 3/5] mm: page_alloc: defrag_mode Johannes Weiner
2025-03-14 18:54 ` Zi Yan
2025-03-14 20:50 ` Johannes Weiner
2025-03-14 22:54 ` Zi Yan
2025-03-22 15:05 ` Brendan Jackman
2025-03-23 0:58 ` Johannes Weiner
2025-03-23 1:34 ` Johannes Weiner
2025-03-23 3:46 ` Johannes Weiner
2025-03-23 18:04 ` Brendan Jackman
2025-03-31 15:55 ` Johannes Weiner
2025-03-13 21:05 ` [PATCH 4/5] mm: page_alloc: defrag_mode kswapd/kcompactd assistance Johannes Weiner
2025-03-13 21:05 ` [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks Johannes Weiner
2025-03-14 21:05 ` Johannes Weiner
2025-04-11 8:19 ` Vlastimil Babka
2025-04-11 15:39 ` Johannes Weiner
2025-04-11 16:51 ` Vlastimil Babka
2025-04-11 18:21 ` Johannes Weiner
2025-04-13 2:20 ` Johannes Weiner
2025-04-15 7:31 ` Vlastimil Babka [this message]
2025-04-15 7:44 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56245588-640d-471a-9e72-ce1288b7b81e@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox