From: SeongJae Park <sj@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: SeongJae Park <sj@kernel.org>,
Qiliang Yuan <realwujing@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Brendan Jackman <jackmanb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Lance Yang <lance.yang@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9] mm/page_alloc: boost watermarks on atomic allocation failure
Date: Fri, 13 Feb 2026 07:07:20 -0800 [thread overview]
Message-ID: <20260213150721.72997-1-sj@kernel.org> (raw)
In-Reply-To: <c2388a14-9ef1-419b-b86a-56629708be15@suse.cz>
On Fri, 13 Feb 2026 09:46:14 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> On 2/13/26 04:17, Qiliang Yuan wrote:
> > Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> > pressure as they cannot enter direct reclaim. This patch introduces a
> > watermark boost mechanism to mitigate this issue.
> >
> > When a GFP_ATOMIC request enters the slowpath, the preferred zone's
> > watermark_boost is increased under zone->lock protection. This triggers
> > kswapd to proactively reclaim memory, creating a safety buffer for
> > future atomic allocations. A 1-second debounce timer prevents excessive
> > boosts during traffic bursts.
> >
> > This approach reuses existing watermark_boost infrastructure with
> > minimal overhead and proper locking to ensure thread safety.
[...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index c380f063e8b7..8af88584a8bd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -218,6 +218,13 @@ unsigned int pageblock_order __read_mostly;
> > static void __free_pages_ok(struct page *page, unsigned int order,
> > fpi_t fpi_flags);
> >
> > +/*
> > + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
> > + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone size.
> > + * Larger zones under GFP_ATOMIC pressure need proportionally larger reserves.
> > + */
> > +#define ATOMIC_BOOST_FACTOR 1
>
> ... so now we #define 1 but it makes little sense without that hardcoded
> 1000 below.
I agree. I think it could be easier to understand if we use 10000 as the
denominator, consistent to other similar ones, like watermark_scale_factor.
Or, defining as a constant local variable or hard-coded value before its real
single use case might be easier to read, for below-mentioned reason.
>
> > +
> > /*
> > * results with 256, 32 in the lowmem_reserve sysctl:
> > * 1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
> > @@ -2161,6 +2168,9 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
> > static inline bool boost_watermark(struct zone *zone)
> > {
> > unsigned long max_boost;
> > + unsigned long boost_amount;
> > +
> > + lockdep_assert_held(&zone->lock);
> >
> > if (!watermark_boost_factor)
> > return false;
> > @@ -2189,12 +2199,43 @@ static inline bool boost_watermark(struct zone *zone)
> >
> > max_boost = max(pageblock_nr_pages, max_boost);
> >
> > - zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> > - max_boost);
> > + boost_amount = max(pageblock_nr_pages,
> > + mult_frac(zone_managed_pages(zone), ATOMIC_BOOST_FACTOR, 1000));
>
> I don't think mult_frac() was a great suggestion. We're talking about right
> shifting by a constant 10. In the other cases of mult_frac() we use dynamic
> values for x and n so it's justified. But this IMHO is unnecessary complication.
This file uses multi_frac() in two places with hard-coded denominator 10000.
Hence I feel it is more consistent to use mutl_frac() with the same denominator
(10000) and consistent naming. In terms of overhead, I think the added
overhead is negligible, since this is called only once per second.
No strong opinion but just a trivial and personal taste, though. Right
shifting should also be good to me. :)
And now I find I was thinking the ATOMIC_BOOST_SHIFT coulb be better to be
consistent with other similar code, because it is defined as a macro. That is,
I was assuming it would be used in multiple places and therefore better to be
easily understood by readers. Now I find it is actually being used only here.
What about defining it as a constant local variable here, or just hard-coding?
Thanks,
SJ
[...]
next prev parent reply other threads:[~2026-02-13 15:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 3:17 Qiliang Yuan
2026-02-13 8:46 ` Vlastimil Babka
2026-02-13 15:07 ` SeongJae Park [this message]
2026-02-13 19:36 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260213150721.72997-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=realwujing@gmail.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox