linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks
Date: Tue, 15 Apr 2025 09:44:38 +0200	[thread overview]
Message-ID: <c5db03d5-d934-4d7d-abbc-f989b1c58389@suse.cz> (raw)
In-Reply-To: <20250413022058.GF366747@cmpxchg.org>

On 4/13/25 04:20, Johannes Weiner wrote:
> On Fri, Apr 11, 2025 at 02:21:58PM -0400, Johannes Weiner wrote:
>> On Fri, Apr 11, 2025 at 06:51:51PM +0200, Vlastimil Babka wrote:
>> > [*] now when checking the code between kswapd and kcompactd handover, I
>> > think I found a another problem?
>> > 
>> > we have:
>> > kswapd_try_to_sleep()
>> >   prepare_kswapd_sleep() - needs to succeed for wakeup_kcompactd()
>> >    pgdat_balanced() - needs to be true for prepare_kswapd_sleep() to be true
>> >     - with defrag_mode we want high watermark of NR_FREE_PAGES_BLOCKS, but
>> >       we were only reclaiming until now and didn't wake up kcompactd and
>> >       this actually prevents the wake up?
> 
> Coming back to this, there is indeed a defrag_mode issue. My
> apologies, I misunderstood what you were pointing at.
> 
> Like I said, kswapd reverts to order-0 in some other place to go to
> sleep and trigger the handoff. At that point, defrag_mode also needs
> to revert to NR_FREE_PAGES.
> 
> It's curious that this didn't stand out in testing. On the contrary,
> kcompactd was still doing the vast majority of the compaction work. It
> looks like kswapd and direct workers on their own manage to balance
> NR_FREE_PAGES_BLOCK every so often, and then kswapd hands off. Given
> the low number of kcompactd wakeups, the consumers keep it going.
> 
> So testing with this:
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cc422ad830d6..c2aa0a4b67de 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -6747,8 +6747,11 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
>  		/*
>  		 * In defrag_mode, watermarks must be met in whole
>  		 * blocks to avoid polluting allocator fallbacks.
> +		 *
> +		 * When kswapd has compact gap, check regular
> +		 * NR_FREE_PAGES and hand over to kcompactd.
>  		 */
> -		if (defrag_mode)
> +		if (defrag_mode && order)
>  			item = NR_FREE_PAGES_BLOCKS;
>  		else
>  			item = NR_FREE_PAGES;
> 
> I'm getting the following results:
> 
>                                 fallbackspeed/STUPID-DEFRAGMODE fallbackspeed/DEFRAGMODE
> Hugealloc Time mean                       79381.34 (    +0.00%)    88126.12 (   +11.02%)
> Hugealloc Time stddev                     85852.16 (    +0.00%)   135366.75 (   +57.67%)
> Kbuild Real time                            249.35 (    +0.00%)      226.71 (    -9.04%)
> Kbuild User time                           1249.16 (    +0.00%)     1249.37 (    +0.02%)
> Kbuild System time                          171.76 (    +0.00%)      166.93 (    -2.79%)
> THP fault alloc                           51666.87 (    +0.00%)    52685.60 (    +1.97%)
> THP fault fallback                        16970.00 (    +0.00%)    15951.87 (    -6.00%)
> Direct compact fail                         166.53 (    +0.00%)      178.93 (    +7.40%)
> Direct compact success                       17.13 (    +0.00%)        4.13 (   -71.69%)
> Compact daemon scanned migrate          3095413.33 (    +0.00%)  9231239.53 (  +198.22%)
> Compact daemon scanned free             2155966.53 (    +0.00%)  7053692.87 (  +227.17%)

However this brings me back to my concern with __compact_finished()
requiring high watermark of NR_FREE_PAGES_BLOCKS. IMHO it can really easily
lead to situations where all free memory is already contiguous, but because
of one or two THP concurrent allocations we're below the high watermark (but
not yet low watermark to wake kswapd again) so further compaction by
kcompactd is simply wasting cpu cycles at that point.
Again I think a comparison of NR_FREE_PAGES_BLOCKS to NR_FREE_PAGES would in
theory work better for determining if all free space is as defragmented as
possible?

> Compact direct scanned migrate           265642.47 (    +0.00%)    68388.33 (   -74.26%)
> Compact direct scanned free              130252.60 (    +0.00%)    55634.87 (   -57.29%)
> Compact total migrate scanned           3361055.80 (    +0.00%)  9299627.87 (  +176.69%)
> Compact total free scanned              2286219.13 (    +0.00%)  7109327.73 (  +210.96%)
> Alloc stall                                1890.80 (    +0.00%)     6297.60 (  +232.94%)
> Pages kswapd scanned                    9043558.80 (    +0.00%)  5952576.73 (   -34.18%)
> Pages kswapd reclaimed                  1891708.67 (    +0.00%)  1030645.00 (   -45.52%)
> Pages direct scanned                    1017090.60 (    +0.00%)  2688047.60 (  +164.29%)
> Pages direct reclaimed                    92682.60 (    +0.00%)   309770.53 (  +234.22%)
> Pages total scanned                    10060649.40 (    +0.00%)  8640624.33 (   -14.11%)
> Pages total reclaimed                   1984391.27 (    +0.00%)  1340415.53 (   -32.45%)
> Swap out                                 884585.73 (    +0.00%)   417781.93 (   -52.77%)
> Swap in                                  287106.27 (    +0.00%)    95589.73 (   -66.71%)
> File refaults                            551697.60 (    +0.00%)   426474.80 (   -22.70%)
> 
> Work has shifted from direct to kcompactd. In aggregate there is more
> compaction happening. Meanwhile aggregate reclaim and swapping drops
> quite substantially. %sys is down, so this is just more efficient.
> 
> Reclaim and swapping are down substantially, which is great. But the
> reclaim work that remains has shifted somewhat to direct reclaim,
> which is unfortunate. THP delivery is also a tad worse, but still much
> better than !defrag_mode, so not too concerning. That part deserves a
> bit more thought.
> 
> Overall, this looks good, though. I'll send a proper patch next week.
> 
> Thanks for the review, Vlastimil.



      parent reply	other threads:[~2025-04-15  7:44 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13 21:05 [PATCH 0/5] mm: reliable huge page allocator Johannes Weiner
2025-03-13 21:05 ` [PATCH 1/5] mm: compaction: push watermark into compaction_suitable() callers Johannes Weiner
2025-03-14 15:08   ` Zi Yan
2025-03-16  4:28   ` Hugh Dickins
2025-03-17 18:18     ` Johannes Weiner
2025-03-21  6:21   ` kernel test robot
2025-03-21 13:55     ` Johannes Weiner
2025-04-10 15:19   ` Vlastimil Babka
2025-04-10 20:17     ` Johannes Weiner
2025-04-11  7:32       ` Vlastimil Babka
2025-03-13 21:05 ` [PATCH 2/5] mm: page_alloc: trace type pollution from compaction capturing Johannes Weiner
2025-03-14 18:36   ` Zi Yan
2025-03-13 21:05 ` [PATCH 3/5] mm: page_alloc: defrag_mode Johannes Weiner
2025-03-14 18:54   ` Zi Yan
2025-03-14 20:50     ` Johannes Weiner
2025-03-14 22:54       ` Zi Yan
2025-03-22 15:05   ` Brendan Jackman
2025-03-23  0:58     ` Johannes Weiner
2025-03-23  1:34       ` Johannes Weiner
2025-03-23  3:46         ` Johannes Weiner
2025-03-23 18:04           ` Brendan Jackman
2025-03-31 15:55             ` Johannes Weiner
2025-03-13 21:05 ` [PATCH 4/5] mm: page_alloc: defrag_mode kswapd/kcompactd assistance Johannes Weiner
2025-03-13 21:05 ` [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks Johannes Weiner
2025-03-14 21:05   ` Johannes Weiner
2025-04-11  8:19   ` Vlastimil Babka
2025-04-11 15:39     ` Johannes Weiner
2025-04-11 16:51       ` Vlastimil Babka
2025-04-11 18:21         ` Johannes Weiner
2025-04-13  2:20           ` Johannes Weiner
2025-04-15  7:31             ` Vlastimil Babka
2025-04-15  7:44             ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5db03d5-d934-4d7d-abbc-f989b1c58389@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox