linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Qiliang Yuan <realwujing@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Lance Yang <lance.yang@linux.dev>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10] mm/page_alloc: boost watermarks on atomic allocation failure
Date: Wed, 18 Feb 2026 09:36:06 +0100	[thread overview]
Message-ID: <e011c6a8-cda5-42ce-9d42-b23d1c81b26b@suse.cz> (raw)
In-Reply-To: <20260214-wujing-mm-page_alloc-v8-v10-1-bdfea431fd97@gmail.com>

On 2/14/26 16:13, Qiliang Yuan wrote:
> Atomic allocations (GFP_ATOMIC) are prone to failure under heavy memory
> pressure as they cannot enter direct reclaim.
> 
> Handle these failures by introducing a watermark boost mechanism for
> atomic requests. Refactor boost_watermark() using an internal helper to
> support both fragmentation and atomic paths. Apply zone-proportional
> boosts (~0.1% of managed pages) for atomic allocations, while
> decoupling it from watermark_boost_factor.
> 
> Implement boost_zones_for_atomic() to iterate through and boost all
> eligible zones in the zonelist, respecting nodemasks. Use a per-zone
> 1-second debounce timer via last_boost_jiffies to prevent excessive
> boosting. Protect modifications with zone->lock and verify with
> lockdep. Integrate the mechanism into the page allocation slowpath
> specifically for order-0 GFP_ATOMIC requests.
> 
> This approach reuses existing infrastructure and ensures emergency
> reserves even if fragmentation boosting is disabled.
> 
> Allocation failure logs:
> [38535644.718700] node 0: slabs: 1031, objs: 43328, free: 0
> [38535644.725059] node 1: slabs: 339, objs: 17616, free: 317
> [38535645.428345] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.436888] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535645.447664] node 0: slabs: 940, objs: 40864, free: 144
> [38535645.454026] node 1: slabs: 322, objs: 19168, free: 383
> [38535645.556122] SLUB: Unable to allocate memory on node -1, gfp=0x480020(GFP_ATOMIC)
> [38535645.564576] cache: skbuff_head_cache, object size: 232, buffer size: 256, default order: 2, min order: 0
> [38535649.655523] warn_alloc: 59 callbacks suppressed
> [38535649.655527] swapper/100: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null)
> [38535649.671692] swapper/100 cpuset=/ mems_allowed=0-1
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

That was for an older version and the changes since were not trivial (and
about some I argued) so consider the Ack taken back now. (please do not send
v11 just to remove it)

Moreover I'm concerned that we went to v10 now, through many significant
changes, and while there are failure logs in the changelog, there's no
indication how the patch actually helps in practice (and whether there are
some negative consequences as well). How has that been evaluated? Also I
think we wanted to involve networking people in earlier versions but I don't
see them CC'd now?

> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> ---
> v10:
> - Refactor watermark boosting into mechanism (__boost_watermark) and policy logic.
> - Decouple Atomic boost from watermark_boost_factor to ensure emergency reserves.
> - Simplify Atomic boost calculation to ~0.1% of managed pages with a 10% high-WM cap.
> - Boost all eligible zones in the zonelist while respecting nodemasks.
> 
> v9:
> - Use mult_frac() for boost calculation.
> - Add !can_direct_reclaim check.
> - Code cleanup: naming, scope, and line limits.
> - Update tags: Add Vlastimil's Acked-by.
> - Link: https://lore.kernel.org/r/20260213-wujing-mm-page_alloc-v8-v9-1-cd99f3a6cb70@gmail.com
> 
> v8:
> - Use spin_lock_irqsave() to prevent inconsistent lock state.
> 
> v7:
> - Use local variable for boost_amount.
> - Add zone->lock protection.
> - Add lockdep assertion.
> 
> v6:
> - Use ATOMIC_BOOST_SCALE_SHIFT define.
> - Add documentation for 0.1% rationale.
> 
> v5:
> - Use native boost_watermark().
> 
> v4:
> - Add watermark_scale_boost and gradual decay.
> 
> v3:
> - Per-zone debounce timer.
> 
> v2:
> - Debounce logic and zone-proportional boosting.
> 
> v1:
> - Initial version.
> ---
>  include/linux/mmzone.h |  1 +
>  mm/page_alloc.c        | 77 ++++++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 69 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 75ef7c9f9307..8e37e4e6765b 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -882,6 +882,7 @@ struct zone {
>  	/* zone watermarks, access with *_wmark_pages(zone) macros */
>  	unsigned long _watermark[NR_WMARK];
>  	unsigned long watermark_boost;
> +	unsigned long last_boost_jiffies;
>  
>  	unsigned long nr_reserved_highatomic;
>  	unsigned long nr_free_highatomic;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c380f063e8b7..9219bfca806b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2158,12 +2158,15 @@ bool pageblock_unisolate_and_move_free_pages(struct zone *zone, struct page *pag
>  
>  #endif /* CONFIG_MEMORY_ISOLATION */
>  
> -static inline bool boost_watermark(struct zone *zone)
> +/*
> + * Helper for boosting watermarks. Called with zone->lock held.
> + * Use max_boost to limit the boost to a percentage of the high watermark.
> + */
> +static inline bool __boost_watermark(struct zone *zone, unsigned long amount,
> +				     unsigned long max_boost)
>  {
> -	unsigned long max_boost;
> +	lockdep_assert_held(&zone->lock);
>  
> -	if (!watermark_boost_factor)
> -		return false;
>  	/*
>  	 * Don't bother in zones that are unlikely to produce results.
>  	 * On small machines, including kdump capture kernels running
> @@ -2173,9 +2176,6 @@ static inline bool boost_watermark(struct zone *zone)
>  	if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
>  		return false;
>  
> -	max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
> -			watermark_boost_factor, 10000);
> -
>  	/*
>  	 * high watermark may be uninitialised if fragmentation occurs
>  	 * very early in boot so do not boost. We do not fall
> @@ -2189,12 +2189,67 @@ static inline bool boost_watermark(struct zone *zone)
>  
>  	max_boost = max(pageblock_nr_pages, max_boost);
>  
> -	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
> -		max_boost);
> +	zone->watermark_boost = min(zone->watermark_boost + amount,
> +				    max_boost);
>  
>  	return true;
>  }
>  
> +/*
> + * Boost watermarks to increase reclaim pressure when fragmentation occurs
> + * and we fall back to other migratetypes.
> + */
> +static inline bool boost_watermark(struct zone *zone)
> +{
> +	if (!watermark_boost_factor)
> +		return false;
> +
> +	return __boost_watermark(zone, pageblock_nr_pages,
> +			mult_frac(zone->_watermark[WMARK_HIGH],
> +				  watermark_boost_factor, 10000));
> +}
> +
> +/*
> + * Boost watermarks by ~0.1% of zone size on atomic allocation pressure.
> + * This provides zone-proportional safety buffers: ~1MB per 1GB of zone
> + * size. Max boost ceiling is fixed at ~10% of high watermark.
> + *
> + * This emergency reserve is independent of watermark_boost_factor.
> + */
> +static inline bool boost_watermark_atomic(struct zone *zone)
> +{
> +	return __boost_watermark(zone,
> +			max(pageblock_nr_pages, zone_managed_pages(zone) / 1000),
> +			zone->_watermark[WMARK_HIGH] / 10);
> +}
> +
> +static void boost_zones_for_atomic(struct alloc_context *ac, gfp_t gfp_mask)
> +{
> +	struct zoneref *z;
> +	struct zone *zone;
> +	unsigned long now = jiffies;
> +
> +	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist,
> +					ac->highest_zoneidx, ac->nodemask) {
> +		/* Rate-limit boosts to once per second per zone */
> +		if (time_after(now, zone->last_boost_jiffies + HZ)) {
> +			unsigned long flags;
> +			bool should_wake;
> +
> +			zone->last_boost_jiffies = now;
> +
> +			/* Modify watermark under lock, wake kswapd outside */
> +			spin_lock_irqsave(&zone->lock, flags);
> +			should_wake = boost_watermark_atomic(zone);
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +
> +			if (should_wake)
> +				wakeup_kswapd(zone, gfp_mask, 0,
> +					      ac->highest_zoneidx);
> +		}
> +	}
> +}
> +
>  /*
>   * When we are falling back to another migratetype during allocation, should we
>   * try to claim an entire block to satisfy further allocations, instead of
> @@ -4742,6 +4797,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	if (page)
>  		goto got_pg;
>  
> +	/* Boost watermarks for atomic requests entering slowpath */
> +	if (((gfp_mask & GFP_ATOMIC) == GFP_ATOMIC) && order == 0 && !can_direct_reclaim)
> +		boost_zones_for_atomic(ac, gfp_mask);
> +
>  	/*
>  	 * For costly allocations, try direct compaction first, as it's likely
>  	 * that we have enough base pages and don't need to reclaim. For non-
> 
> ---
> base-commit: b54345928fa1dbde534e32ecaa138678fd5d2135
> change-id: 20260206-wujing-mm-page_alloc-v8-fb1979bac6fe
> 
> Best regards,



      parent reply	other threads:[~2026-02-18  8:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14 15:13 Qiliang Yuan
2026-02-18  0:59 ` SeongJae Park
2026-02-18  8:36 ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e011c6a8-cda5-42ce-9d42-b23d1c81b26b@suse.cz \
    --to=vbabka@suse.cz \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=realwujing@gmail.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox