linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk
@ 2025-09-24 20:44 Joshua Hahn
  2025-09-24 20:44 ` [PATCH v2 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Joshua Hahn @ 2025-09-24 20:44 UTC (permalink / raw)
  To: Andrew Morton, Johannes Weiner
  Cc: Chris Mason, Kiryl Shutsemau, Liam R. Howlett, Brendan Jackman,
	David Hildenbrand, Lorenzo Stoakes, Michal Hocko, Mike Rapoport,
	Suren Baghdasaryan, Vlastimil Babka, Zi Yan, linux-kernel,
	linux-mm

Motivation & Approach
=====================

While testing workloads with high sustained memory pressure on large machines
in the Meta fleet (1Tb memory, 316 CPUs), we saw an unexpectedly high number
of softlockups. Further investigation showed that the lock in
free_pcppages_bulk was being held for a long time, and was called to free
2k+ pages over 100 times just during boot.

This causes starvation in other processes for both the pcp and zone locks,
which can lead to the system stalling as multiple threads cannot make progress
without the locks. We can see these issues manifesting as warnings:

[ 4512.591979] rcu: INFO: rcu_sched self-detected stall on CPU
[ 4512.604370] rcu: 	20-....: (9312 ticks this GP) idle=a654/1/0x4000000000000000 softirq=309340/309344 fqs=5426
[ 4512.626401] rcu: 	         hardirqs   softirqs   csw/system
[ 4512.638793] rcu: 	 number:        0        145            0
[ 4512.651177] rcu: 	cputime:       30      10410          174   ==> 10558(ms)
[ 4512.666657] rcu: 	(t=21077 jiffies g=783665 q=1242213 ncpus=316)

While these warnings are benign, they do point to the underlying issue of
lock contention. To prevent starvation in both locks, batch the freeing of
pages using pcp->batch.

Because free_pcppages_bulk is called with both the pcp and zone lock,
relinquishing and reacquiring the locks are only effective when both of them
are broken together (unless the system was built with queued spinlocks).
Thus, instead of modifying free_pcppages_bulk to break both locks, batch the
freeing from its callers instead.

A similar fix has been implemented in the Meta fleet, and we have seen
significantly less softlockups.

Testing
=======
The following are a few synthetic benchmarks, made on a machine with
250G RAM, 179G swap, and 176 CPUs.

stress-ng --vm 50 --vm-bytes 5G -M -t 100
+----------------------+---------------+----------+
|        Metric        | Variation (%) | Delta(%) |
+----------------------+---------------+----------+
| bogo ops             |        0.0216 |  -0.0172 |
| bogo ops/s (real)    |        0.0223 |  -0.0163 |
| bogo ops/s (usr+sys) |        1.3433 |  +1.0769 |
+----------------------+---------------+----------+

stress-ng --vm 10 --vm-bytes 30G -M -t 100
+----------------------+---------------+----------+
|        Metric        | Variation (%) | Delta(%) |
+----------------------+---------------+----------+
| bogo ops             |        2.1736 |  +4.8535 |
| bogo ops/s (real)    |        2.2689 |  +5.1719 |
| bogo ops/s (usr+sys) |        2.1283 |  +0.6587 |
+----------------------+---------------+----------+

It seems like depending on the workload, this patch may lead to an increase
in performance, or stay neutral. I believe this has to do with how much lock
contention there is, and how many free_pcppages_bulk calls were being made
previously with high counts.

The difference between bogo ops/s (real) and (usr+sys) seems to indicate that
there is meaningful difference in the amount of time threads spend blocked
on getting either the pcp or zone lock.

Changelog
=========
v1 --> v2:
- Reworded cover letter to be more explicit about what kinds of issues
  running processes might face as a result of the existing lock starvation
- Reworded cover letter to be in sections to make it easier to read
- Fixed patch 4/4 to properly store & restore UP flags.
- Re-ran tests, updated the testing results and interpretation

Joshua Hahn (4):
  mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection
  mm/page_alloc: Perform appropriate batching in drain_pages_zone
  mm/page_alloc: Batch page freeing in decay_pcp_high
  mm/page_alloc: Batch page freeing in free_frozen_page_commit

 include/linux/gfp.h |  2 +-
 mm/page_alloc.c     | 67 ++++++++++++++++++++++++++++++++-------------
 mm/vmstat.c         | 26 +++++++++---------
 3 files changed, 62 insertions(+), 33 deletions(-)


base-commit: 097a6c336d0080725c626fda118ecfec448acd0f
-- 
2.47.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-10-04  9:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-24 20:44 [PATCH v2 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
2025-09-24 20:44 ` [PATCH v2 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
2025-09-24 22:51   ` Christoph Lameter (Ampere)
2025-09-25 18:26     ` Joshua Hahn
2025-09-26 15:34   ` Dan Carpenter
2025-09-26 16:40     ` Joshua Hahn
2025-09-26 17:50   ` SeongJae Park
2025-09-26 18:24     ` Joshua Hahn
2025-09-26 18:33       ` SeongJae Park
2025-09-24 20:44 ` [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone Joshua Hahn
2025-09-24 23:09   ` Christoph Lameter (Ampere)
2025-09-25 18:44     ` Joshua Hahn
2025-09-26 16:21       ` Christoph Lameter (Ampere)
2025-09-26 17:25         ` Joshua Hahn
2025-10-01 11:23         ` Vlastimil Babka
2025-09-26 14:01   ` Brendan Jackman
2025-09-26 15:48     ` Joshua Hahn
2025-09-26 16:57       ` Brendan Jackman
2025-09-26 17:33         ` Joshua Hahn
2025-09-27  0:46   ` Hillf Danton
2025-09-30 14:42     ` Joshua Hahn
2025-09-30 22:14       ` Hillf Danton
2025-10-01 15:37         ` Joshua Hahn
2025-10-01 23:48           ` Hillf Danton
2025-10-03  8:35             ` Vlastimil Babka
2025-10-03 10:02               ` Hillf Danton
2025-10-04  9:03                 ` Mike Rapoport
2025-09-24 20:44 ` [PATCH v2 3/4] mm/page_alloc: Batch page freeing in decay_pcp_high Joshua Hahn
2025-09-24 20:44 ` [PATCH v2 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
2025-09-28  5:17   ` kernel test robot
2025-09-29 15:17     ` Joshua Hahn
2025-10-01 10:04       ` Vlastimil Babka
2025-10-01 15:55         ` Joshua Hahn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox