[PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
@ 2025-11-28  3:10 Hongru Zhang
  2025-11-28  3:11 ` [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator Hongru Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Hongru Zhang @ 2025-11-28  3:10 UTC (permalink / raw)
  To: akpm, vbabka, david
  Cc: linux-mm, linux-kernel, surenb, mhocko, jackmanb, hannes, ziy,
	lorenzo.stoakes, Liam.Howlett, rppt, axelrasmussen, yuanchu,
	weixugc, Hongru Zhang

On mobile devices, some user-space memory management components check
memory pressure and fragmentation status periodically or via PSI, and
take actions such as killing processes or performing memory compaction
based on this information.

Under high load scenarios, reading /proc/pagetypeinfo causes memory
management components or memory allocation/free paths to be blocked
for extended periods waiting for the zone lock, leading to the following
issues:
1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
   8750 platforms, reducing system real-time performance
2. Memory management components being blocked for extended periods,
   preventing rapid acquisition of memory fragmentation information for
   critical memory management decisions and actions
3. Increased latency in memory allocation and free paths due to prolonged
   zone lock contention

Changes:
1. Add per-migratetype counts to the buddy allocator to track free page
   block counts for each migratetype and order
2. Optimize /proc/pagetypeinfo access by utilizing these per-migratetype
   counts instead of iterating through free lists under zone lock

Performance testing shows following improvements:
1. /proc/pagetypeinfo access latency reduced

    +-----------------------+----------+------------+
    |                       | no-patch | with-patch |
    +-----------------------+----------+------------+
    |    Just after boot    | 700.9 us |  268.6 us  |
    +-----------------------+----------+------------+
    | After building kernel |  28.7 ms |  269.8 us  |
    +-----------------------+----------+------------+

2. When /proc/pagetypeinfo is accessed concurrently, memory allocation and
   free performance degradation is reduced compared to the previous
   implementation

Test setup:
- Using config-pagealloc-micro
- Monitor set to proc-pagetypeinfo, update frequency set to 10ms
- PAGEALLOC_ORDER_MIN=4, PAGEALLOC_ORDER_MAX=4

Without patch test results:
                                         vanilla                vanilla
                                       no-monitor               monitor
Min       alloc-odr4-1        8539.00 (   0.00%)     8762.00 (  -2.61%)
Min       alloc-odr4-2        6501.00 (   0.00%)     6683.00 (  -2.80%)
Min       alloc-odr4-4        5537.00 (   0.00%)     5873.00 (  -6.07%)
Min       alloc-odr4-8        5030.00 (   0.00%)     5361.00 (  -6.58%)
Min       alloc-odr4-16       4782.00 (   0.00%)     5162.00 (  -7.95%)
Min       alloc-odr4-32       5838.00 (   0.00%)     6499.00 ( -11.32%)
Min       alloc-odr4-64       6565.00 (   0.00%)     7413.00 ( -12.92%)
Min       alloc-odr4-128      6896.00 (   0.00%)     7898.00 ( -14.53%)
Min       alloc-odr4-256      7303.00 (   0.00%)     8163.00 ( -11.78%)
Min       alloc-odr4-512     10179.00 (   0.00%)    11985.00 ( -17.74%)
Min       alloc-odr4-1024    11000.00 (   0.00%)    12165.00 ( -10.59%)
Min       free-odr4-1          820.00 (   0.00%)     1230.00 ( -50.00%)
Min       free-odr4-2          511.00 (   0.00%)      952.00 ( -86.30%)
Min       free-odr4-4          347.00 (   0.00%)      434.00 ( -25.07%)
Min       free-odr4-8          286.00 (   0.00%)      399.00 ( -39.51%)
Min       free-odr4-16         250.00 (   0.00%)      405.00 ( -62.00%)
Min       free-odr4-32         294.00 (   0.00%)      405.00 ( -37.76%)
Min       free-odr4-64         333.00 (   0.00%)      363.00 (  -9.01%)
Min       free-odr4-128        340.00 (   0.00%)      412.00 ( -21.18%)
Min       free-odr4-256        339.00 (   0.00%)      329.00 (   2.95%)
Min       free-odr4-512        361.00 (   0.00%)      409.00 ( -13.30%)
Min       free-odr4-1024       300.00 (   0.00%)      361.00 ( -20.33%)
Stddev    alloc-odr4-1           7.29 (   0.00%)       90.78 (-1146.00%)
Stddev    alloc-odr4-2           3.87 (   0.00%)       51.30 (-1225.75%)
Stddev    alloc-odr4-4           3.20 (   0.00%)       50.90 (-1491.24%)
Stddev    alloc-odr4-8           4.67 (   0.00%)       52.23 (-1019.35%)
Stddev    alloc-odr4-16          5.72 (   0.00%)       27.53 (-381.04%)
Stddev    alloc-odr4-32          6.25 (   0.00%)      641.23 (-10154.46%)
Stddev    alloc-odr4-64          2.06 (   0.00%)      386.99 (-18714.22%)
Stddev    alloc-odr4-128        14.36 (   0.00%)       52.39 (-264.77%)
Stddev    alloc-odr4-256        32.42 (   0.00%)      326.19 (-906.05%)
Stddev    alloc-odr4-512        65.58 (   0.00%)      184.49 (-181.31%)
Stddev    alloc-odr4-1024        8.88 (   0.00%)      153.01 (-1622.67%)
Stddev    free-odr4-1            2.29 (   0.00%)      152.27 (-6549.85%)
Stddev    free-odr4-2           10.99 (   0.00%)       73.10 (-564.89%)
Stddev    free-odr4-4            1.99 (   0.00%)       28.40 (-1324.45%)
Stddev    free-odr4-8            2.51 (   0.00%)       52.93 (-2007.64%)
Stddev    free-odr4-16           2.85 (   0.00%)       26.04 (-814.88%)
Stddev    free-odr4-32           4.04 (   0.00%)       27.05 (-569.79%)
Stddev    free-odr4-64           2.10 (   0.00%)       48.07 (-2185.66%)
Stddev    free-odr4-128          2.63 (   0.00%)       26.23 (-897.86%)
Stddev    free-odr4-256          6.29 (   0.00%)       37.04 (-488.71%)
Stddev    free-odr4-512          2.56 (   0.00%)       10.65 (-315.28%)
Stddev    free-odr4-1024         0.95 (   0.00%)        6.46 (-582.22%)
Max       alloc-odr4-1        8564.00 (   0.00%)     9099.00 (  -6.25%)
Max       alloc-odr4-2        6511.00 (   0.00%)     6844.00 (  -5.11%)
Max       alloc-odr4-4        5549.00 (   0.00%)     6038.00 (  -8.81%)
Max       alloc-odr4-8        5045.00 (   0.00%)     5551.00 ( -10.03%)
Max       alloc-odr4-16       4800.00 (   0.00%)     5257.00 (  -9.52%)
Max       alloc-odr4-32       5861.00 (   0.00%)     8115.00 ( -38.46%)
Max       alloc-odr4-64       6571.00 (   0.00%)     8292.00 ( -26.19%)
Max       alloc-odr4-128      6930.00 (   0.00%)     8081.00 ( -16.61%)
Max       alloc-odr4-256      7372.00 (   0.00%)     9150.00 ( -24.12%)
Max       alloc-odr4-512     10333.00 (   0.00%)    12636.00 ( -22.29%)
Max       alloc-odr4-1024    11035.00 (   0.00%)    12590.00 ( -14.09%)
Max       free-odr4-1          828.00 (   0.00%)     1724.00 (-108.21%)
Max       free-odr4-2          543.00 (   0.00%)     1192.00 (-119.52%)
Max       free-odr4-4          354.00 (   0.00%)      519.00 ( -46.61%)
Max       free-odr4-8          293.00 (   0.00%)      617.00 (-110.58%)
Max       free-odr4-16         260.00 (   0.00%)      483.00 ( -85.77%)
Max       free-odr4-32         308.00 (   0.00%)      488.00 ( -58.44%)
Max       free-odr4-64         341.00 (   0.00%)      505.00 ( -48.09%)
Max       free-odr4-128        346.00 (   0.00%)      497.00 ( -43.64%)
Max       free-odr4-256        353.00 (   0.00%)      463.00 ( -31.16%)
Max       free-odr4-512        367.00 (   0.00%)      442.00 ( -20.44%)
Max       free-odr4-1024       303.00 (   0.00%)      381.00 ( -25.74%)

With patch test results:
                                         patched                patched
                                      no-monitor                monitor
Min       alloc-odr4-1        8488.00 (   0.00%)     8514.00 (  -0.31%)
Min       alloc-odr4-2        6551.00 (   0.00%)     6527.00 (   0.37%)
Min       alloc-odr4-4        5536.00 (   0.00%)     5591.00 (  -0.99%)
Min       alloc-odr4-8        5008.00 (   0.00%)     5098.00 (  -1.80%)
Min       alloc-odr4-16       4760.00 (   0.00%)     4857.00 (  -2.04%)
Min       alloc-odr4-32       5827.00 (   0.00%)     5919.00 (  -1.58%)
Min       alloc-odr4-64       6561.00 (   0.00%)     6680.00 (  -1.81%)
Min       alloc-odr4-128      6898.00 (   0.00%)     7014.00 (  -1.68%)
Min       alloc-odr4-256      7311.00 (   0.00%)     7464.00 (  -2.09%)
Min       alloc-odr4-512     10181.00 (   0.00%)    10286.00 (  -1.03%)
Min       alloc-odr4-1024    11205.00 (   0.00%)    11725.00 (  -4.64%)
Min       free-odr4-1          789.00 (   0.00%)      867.00 (  -9.89%)
Min       free-odr4-2          490.00 (   0.00%)      526.00 (  -7.35%)
Min       free-odr4-4          350.00 (   0.00%)      360.00 (  -2.86%)
Min       free-odr4-8          272.00 (   0.00%)      287.00 (  -5.51%)
Min       free-odr4-16         247.00 (   0.00%)      254.00 (  -2.83%)
Min       free-odr4-32         298.00 (   0.00%)      304.00 (  -2.01%)
Min       free-odr4-64         334.00 (   0.00%)      325.00 (   2.69%)
Min       free-odr4-128        334.00 (   0.00%)      329.00 (   1.50%)
Min       free-odr4-256        336.00 (   0.00%)      336.00 (   0.00%)
Min       free-odr4-512        360.00 (   0.00%)      342.00 (   5.00%)
Min       free-odr4-1024       327.00 (   0.00%)      355.00 (  -8.56%)
Stddev    alloc-odr4-1           5.19 (   0.00%)       45.38 (-775.09%)
Stddev    alloc-odr4-2           6.99 (   0.00%)       37.63 (-437.98%)
Stddev    alloc-odr4-4           3.91 (   0.00%)       17.85 (-356.28%)
Stddev    alloc-odr4-8           5.15 (   0.00%)        9.34 ( -81.47%)
Stddev    alloc-odr4-16          3.83 (   0.00%)        5.34 ( -39.34%)
Stddev    alloc-odr4-32          1.96 (   0.00%)       10.28 (-425.09%)
Stddev    alloc-odr4-64          1.32 (   0.00%)      333.30 (-25141.39%)
Stddev    alloc-odr4-128         2.06 (   0.00%)        7.37 (-258.28%)
Stddev    alloc-odr4-256        15.56 (   0.00%)      113.48 (-629.25%)
Stddev    alloc-odr4-512        61.25 (   0.00%)      165.09 (-169.53%)
Stddev    alloc-odr4-1024       18.89 (   0.00%)        2.93 (  84.51%)
Stddev    free-odr4-1            4.45 (   0.00%)       40.12 (-800.98%)
Stddev    free-odr4-2            1.50 (   0.00%)       29.30 (-1850.31%)
Stddev    free-odr4-4            1.27 (   0.00%)       19.49 (-1439.40%)
Stddev    free-odr4-8            0.97 (   0.00%)        8.93 (-823.07%)
Stddev    free-odr4-16           8.38 (   0.00%)        4.51 (  46.21%)
Stddev    free-odr4-32           3.18 (   0.00%)        6.59 (-107.42%)
Stddev    free-odr4-64           2.40 (   0.00%)        3.09 ( -28.50%)
Stddev    free-odr4-128          1.55 (   0.00%)        2.53 ( -62.92%)
Stddev    free-odr4-256          0.41 (   0.00%)        2.80 (-585.57%)
Stddev    free-odr4-512          1.60 (   0.00%)        4.84 (-202.08%)
Stddev    free-odr4-1024         0.66 (   0.00%)        1.19 ( -80.68%)
Max       alloc-odr4-1        8505.00 (   0.00%)     8676.00 (  -2.01%)
Max       alloc-odr4-2        6572.00 (   0.00%)     6651.00 (  -1.20%)
Max       alloc-odr4-4        5552.00 (   0.00%)     5646.00 (  -1.69%)
Max       alloc-odr4-8        5024.00 (   0.00%)     5131.00 (  -2.13%)
Max       alloc-odr4-16       4774.00 (   0.00%)     4875.00 (  -2.12%)
Max       alloc-odr4-32       5834.00 (   0.00%)     5950.00 (  -1.99%)
Max       alloc-odr4-64       6565.00 (   0.00%)     7434.00 ( -13.24%)
Max       alloc-odr4-128      6907.00 (   0.00%)     7034.00 (  -1.84%)
Max       alloc-odr4-256      7347.00 (   0.00%)     7843.00 (  -6.75%)
Max       alloc-odr4-512     10315.00 (   0.00%)    10866.00 (  -5.34%)
Max       alloc-odr4-1024    11278.00 (   0.00%)    11733.00 (  -4.03%)
Max       free-odr4-1          803.00 (   0.00%)     1009.00 ( -25.65%)
Max       free-odr4-2          495.00 (   0.00%)      607.00 ( -22.63%)
Max       free-odr4-4          354.00 (   0.00%)      417.00 ( -17.80%)
Max       free-odr4-8          275.00 (   0.00%)      313.00 ( -13.82%)
Max       free-odr4-16         273.00 (   0.00%)      272.00 (   0.37%)
Max       free-odr4-32         309.00 (   0.00%)      324.00 (  -4.85%)
Max       free-odr4-64         340.00 (   0.00%)      335.00 (   1.47%)
Max       free-odr4-128        340.00 (   0.00%)      338.00 (   0.59%)
Max       free-odr4-256        338.00 (   0.00%)      346.00 (  -2.37%)
Max       free-odr4-512        364.00 (   0.00%)      359.00 (   1.37%)
Max       free-odr4-1024       329.00 (   0.00%)      359.00 (  -9.12%)

The main overhead is a slight increase in latency on the memory allocation
and free paths due to additional per-migratetype counting, with
theoretically minimal impact on overall performance.

This patch series is based on v6.18-rc7

Hongru Zhang (3):
  mm/page_alloc: add per-migratetype counts to buddy allocator
  mm/vmstat: get fragmentation statistics from per-migragetype count
  mm: optimize free_area_empty() check using per-migratetype counts

 include/linux/mmzone.h |  1 +
 mm/internal.h          |  2 +-
 mm/mm_init.c           |  1 +
 mm/page_alloc.c        |  9 ++++++++-
 mm/vmstat.c            | 30 +++++++-----------------------
 5 files changed, 18 insertions(+), 25 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator
  2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
@ 2025-11-28  3:11 ` Hongru Zhang
  2025-11-29  0:34   ` Barry Song
  2025-11-28  3:12 ` [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count Hongru Zhang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Hongru Zhang @ 2025-11-28  3:11 UTC (permalink / raw)
  To: akpm, vbabka, david
  Cc: linux-mm, linux-kernel, surenb, mhocko, jackmanb, hannes, ziy,
	lorenzo.stoakes, Liam.Howlett, rppt, axelrasmussen, yuanchu,
	weixugc, Hongru Zhang

From: Hongru Zhang <zhanghongru@xiaomi.com>

On mobile devices, some user-space memory management components check
memory pressure and fragmentation status periodically or via PSI, and
take actions such as killing processes or performing memory compaction
based on this information.

Under high load scenarios, reading /proc/pagetypeinfo causes memory
management components or memory allocation/free paths to be blocked
for extended periods waiting for the zone lock, leading to the
following issues:
1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on
   Qcom 8750 platforms, reducing system real-time performance
2. Memory management components being blocked for extended periods,
   preventing rapid acquisition of memory fragmentation information for
   critical memory management decisions and actions
3. Increased latency in memory allocation and free paths due to prolonged
   zone lock contention

This patch adds per-migratetype counts to the buddy allocator in
preparation for optimizing /proc/pagetypeinfo access.

The optimized implementation:
- Make per-migratetype count updates protected by zone lock on the write
  side while /proc/pagetypeinfo reads are lock-free, which reduces
  interrupt-disabled spinlock duration and improves system real-time
  performance (addressing issue #1)
- Reduce blocking time for memory management components when reading
  /proc/pagetypeinfo, enabling more rapid acquisition of memory
  fragmentation information (addressing issue #2)
- Minimize the critical section held during /proc/pagetypeinfo reads to
  reduce zone lock contention on memory allocation and free paths
  (addressing issue #3)

The main overhead is a slight increase in latency on the memory
allocation and free paths due to additional per-migratetype counting,
with theoretically minimal impact on overall performance.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
---
 include/linux/mmzone.h | 1 +
 mm/mm_init.c           | 1 +
 mm/page_alloc.c        | 7 ++++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7fb7331c5725..6eeefe6a3727 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -138,6 +138,7 @@ extern int page_group_by_mobility_disabled;
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
+	unsigned long		mt_nr_free[MIGRATE_TYPES];
 };
 
 struct pglist_data;
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 7712d887b696..dca2be8cc3b1 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1439,6 +1439,7 @@ static void __meminit zone_init_free_lists(struct zone *zone)
 	for_each_migratetype_order(order, t) {
 		INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
 		zone->free_area[order].nr_free = 0;
+		zone->free_area[order].mt_nr_free[t] = 0;
 	}
 
 #ifdef CONFIG_UNACCEPTED_MEMORY
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ed82ee55e66a..9431073e7255 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -818,6 +818,7 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone,
 	else
 		list_add(&page->buddy_list, &area->free_list[migratetype]);
 	area->nr_free++;
+	area->mt_nr_free[migratetype]++;
 
 	if (order >= pageblock_order && !is_migrate_isolate(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages);
@@ -840,6 +841,8 @@ static inline void move_to_free_list(struct page *page, struct zone *zone,
 		     get_pageblock_migratetype(page), old_mt, nr_pages);
 
 	list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
+	area->mt_nr_free[old_mt]--;
+	area->mt_nr_free[new_mt]++;
 
 	account_freepages(zone, -nr_pages, old_mt);
 	account_freepages(zone, nr_pages, new_mt);
@@ -855,6 +858,7 @@ static inline void move_to_free_list(struct page *page, struct zone *zone,
 static inline void __del_page_from_free_list(struct page *page, struct zone *zone,
 					     unsigned int order, int migratetype)
 {
+	struct free_area *area = &zone->free_area[order];
 	int nr_pages = 1 << order;
 
         VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype,
@@ -868,7 +872,8 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon
 	list_del(&page->buddy_list);
 	__ClearPageBuddy(page);
 	set_page_private(page, 0);
-	zone->free_area[order].nr_free--;
+	area->nr_free--;
+	area->mt_nr_free[migratetype]--;
 
 	if (order >= pageblock_order && !is_migrate_isolate(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
-- 
2.43.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator
  2025-11-28  3:11 ` [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator Hongru Zhang
@ 2025-11-29  0:34   ` Barry Song
  0 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2025-11-29  0:34 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: akpm, vbabka, david, linux-mm, linux-kernel, surenb, mhocko,
	jackmanb, hannes, ziy, lorenzo.stoakes, Liam.Howlett, rppt,
	axelrasmussen, yuanchu, weixugc, Hongru Zhang

On Fri, Nov 28, 2025 at 11:12 AM Hongru Zhang <zhanghongru06@gmail.com> wrote:
>
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ed82ee55e66a..9431073e7255 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -818,6 +818,7 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone,
>         else
>                 list_add(&page->buddy_list, &area->free_list[migratetype]);
>         area->nr_free++;
> +       area->mt_nr_free[migratetype]++;
>
>         if (order >= pageblock_order && !is_migrate_isolate(migratetype))
>                 __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages);
> @@ -840,6 +841,8 @@ static inline void move_to_free_list(struct page *page, struct zone *zone,
>                      get_pageblock_migratetype(page), old_mt, nr_pages);
>
>         list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
> +       area->mt_nr_free[old_mt]--;
> +       area->mt_nr_free[new_mt]++;

The overhead comes from effectively counting twice. Have we checked whether
the readers of area->nr_free are on a hot path? If not, we might just drop
nr_free and compute the sum each time.

Buddyinfo and compaction do not seem to be on a hot path ?

Thanks
Barry


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
  2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
  2025-11-28  3:11 ` [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator Hongru Zhang
@ 2025-11-28  3:12 ` Hongru Zhang
  2025-11-28 12:03   ` zhongjinji
  2025-11-28  3:12 ` [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts Hongru Zhang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Hongru Zhang @ 2025-11-28  3:12 UTC (permalink / raw)
  To: akpm, vbabka, david
  Cc: linux-mm, linux-kernel, surenb, mhocko, jackmanb, hannes, ziy,
	lorenzo.stoakes, Liam.Howlett, rppt, axelrasmussen, yuanchu,
	weixugc, Hongru Zhang

From: Hongru Zhang <zhanghongru@xiaomi.com>

This patch optimizes /proc/pagetypeinfo access by utilizing the
per-migratetype free page block counts already maintained by the buddy
allocator, instead of iterating through free lists under zone lock.

Accuracy. Both implementations have accuracy limitations. The previous
implementation required acquiring and releasing the zone lock for counting
each order and migratetype, making it potentially inaccurate. Under high
memory pressure, accuracy would further degrade due to zone lock
contention or fragmentation. The new implementation collects data within a
short time window, which helps maintain relatively small errors, and is
unaffected by memory pressure. Furthermore, user-space memory management
components inherently experience decision latency - by the time they
process the collected data and execute actions, the memory state has
already changed. This means that even perfectly accurate data at
collection time becomes stale by decision time. Considering these factors,
the accuracy trade-off introduced by the new implementation should be
acceptable for practical use cases, offering a balance between performance
and accuracy requirements.

Performance benefits:

System setup:
- 12th Gen Intel(R) Core(TM) i7-12700
- 1 NUMA node, 16G memory in total
- Turbo disabled
- cpufreq governor set to performance

1. Average latency over 10,000 /proc/pagetypeinfo accesses
+-----------------------+----------+------------+
|                       | no-patch | with-patch |
+-----------------------+----------+------------+
|    Just after boot    | 700.9 us |  268.6 us  |
+-----------------------+----------+------------+
| After building kernel |  28.7 ms |  269.8 us  |
+-----------------------+----------+------------+

2. Page alloc/free latency with concurrent /proc/pagetypeinfo access
Test setup:
- Using config-pagealloc-micro
- Monitor set to proc-pagetypeinfo, update frequency set to 10ms
- PAGEALLOC_ORDER_MIN=4, PAGEALLOC_ORDER_MAX=4

Without patch test results:
                                         vanilla                vanilla
                                       no-monitor               monitor
Min       alloc-odr4-1        8539.00 (   0.00%)     8762.00 (  -2.61%)
Min       alloc-odr4-2        6501.00 (   0.00%)     6683.00 (  -2.80%)
Min       alloc-odr4-4        5537.00 (   0.00%)     5873.00 (  -6.07%)
Min       alloc-odr4-8        5030.00 (   0.00%)     5361.00 (  -6.58%)
Min       alloc-odr4-16       4782.00 (   0.00%)     5162.00 (  -7.95%)
Min       alloc-odr4-32       5838.00 (   0.00%)     6499.00 ( -11.32%)
Min       alloc-odr4-64       6565.00 (   0.00%)     7413.00 ( -12.92%)
Min       alloc-odr4-128      6896.00 (   0.00%)     7898.00 ( -14.53%)
Min       alloc-odr4-256      7303.00 (   0.00%)     8163.00 ( -11.78%)
Min       alloc-odr4-512     10179.00 (   0.00%)    11985.00 ( -17.74%)
Min       alloc-odr4-1024    11000.00 (   0.00%)    12165.00 ( -10.59%)
Min       free-odr4-1          820.00 (   0.00%)     1230.00 ( -50.00%)
Min       free-odr4-2          511.00 (   0.00%)      952.00 ( -86.30%)
Min       free-odr4-4          347.00 (   0.00%)      434.00 ( -25.07%)
Min       free-odr4-8          286.00 (   0.00%)      399.00 ( -39.51%)
Min       free-odr4-16         250.00 (   0.00%)      405.00 ( -62.00%)
Min       free-odr4-32         294.00 (   0.00%)      405.00 ( -37.76%)
Min       free-odr4-64         333.00 (   0.00%)      363.00 (  -9.01%)
Min       free-odr4-128        340.00 (   0.00%)      412.00 ( -21.18%)
Min       free-odr4-256        339.00 (   0.00%)      329.00 (   2.95%)
Min       free-odr4-512        361.00 (   0.00%)      409.00 ( -13.30%)
Min       free-odr4-1024       300.00 (   0.00%)      361.00 ( -20.33%)
Stddev    alloc-odr4-1           7.29 (   0.00%)       90.78 (-1146.00%)
Stddev    alloc-odr4-2           3.87 (   0.00%)       51.30 (-1225.75%)
Stddev    alloc-odr4-4           3.20 (   0.00%)       50.90 (-1491.24%)
Stddev    alloc-odr4-8           4.67 (   0.00%)       52.23 (-1019.35%)
Stddev    alloc-odr4-16          5.72 (   0.00%)       27.53 (-381.04%)
Stddev    alloc-odr4-32          6.25 (   0.00%)      641.23 (-10154.46%)
Stddev    alloc-odr4-64          2.06 (   0.00%)      386.99 (-18714.22%)
Stddev    alloc-odr4-128        14.36 (   0.00%)       52.39 (-264.77%)
Stddev    alloc-odr4-256        32.42 (   0.00%)      326.19 (-906.05%)
Stddev    alloc-odr4-512        65.58 (   0.00%)      184.49 (-181.31%)
Stddev    alloc-odr4-1024        8.88 (   0.00%)      153.01 (-1622.67%)
Stddev    free-odr4-1            2.29 (   0.00%)      152.27 (-6549.85%)
Stddev    free-odr4-2           10.99 (   0.00%)       73.10 (-564.89%)
Stddev    free-odr4-4            1.99 (   0.00%)       28.40 (-1324.45%)
Stddev    free-odr4-8            2.51 (   0.00%)       52.93 (-2007.64%)
Stddev    free-odr4-16           2.85 (   0.00%)       26.04 (-814.88%)
Stddev    free-odr4-32           4.04 (   0.00%)       27.05 (-569.79%)
Stddev    free-odr4-64           2.10 (   0.00%)       48.07 (-2185.66%)
Stddev    free-odr4-128          2.63 (   0.00%)       26.23 (-897.86%)
Stddev    free-odr4-256          6.29 (   0.00%)       37.04 (-488.71%)
Stddev    free-odr4-512          2.56 (   0.00%)       10.65 (-315.28%)
Stddev    free-odr4-1024         0.95 (   0.00%)        6.46 (-582.22%)
Max       alloc-odr4-1        8564.00 (   0.00%)     9099.00 (  -6.25%)
Max       alloc-odr4-2        6511.00 (   0.00%)     6844.00 (  -5.11%)
Max       alloc-odr4-4        5549.00 (   0.00%)     6038.00 (  -8.81%)
Max       alloc-odr4-8        5045.00 (   0.00%)     5551.00 ( -10.03%)
Max       alloc-odr4-16       4800.00 (   0.00%)     5257.00 (  -9.52%)
Max       alloc-odr4-32       5861.00 (   0.00%)     8115.00 ( -38.46%)
Max       alloc-odr4-64       6571.00 (   0.00%)     8292.00 ( -26.19%)
Max       alloc-odr4-128      6930.00 (   0.00%)     8081.00 ( -16.61%)
Max       alloc-odr4-256      7372.00 (   0.00%)     9150.00 ( -24.12%)
Max       alloc-odr4-512     10333.00 (   0.00%)    12636.00 ( -22.29%)
Max       alloc-odr4-1024    11035.00 (   0.00%)    12590.00 ( -14.09%)
Max       free-odr4-1          828.00 (   0.00%)     1724.00 (-108.21%)
Max       free-odr4-2          543.00 (   0.00%)     1192.00 (-119.52%)
Max       free-odr4-4          354.00 (   0.00%)      519.00 ( -46.61%)
Max       free-odr4-8          293.00 (   0.00%)      617.00 (-110.58%)
Max       free-odr4-16         260.00 (   0.00%)      483.00 ( -85.77%)
Max       free-odr4-32         308.00 (   0.00%)      488.00 ( -58.44%)
Max       free-odr4-64         341.00 (   0.00%)      505.00 ( -48.09%)
Max       free-odr4-128        346.00 (   0.00%)      497.00 ( -43.64%)
Max       free-odr4-256        353.00 (   0.00%)      463.00 ( -31.16%)
Max       free-odr4-512        367.00 (   0.00%)      442.00 ( -20.44%)
Max       free-odr4-1024       303.00 (   0.00%)      381.00 ( -25.74%)

With patch test results:
                                         patched                patched
                                      no-monitor                monitor
Min       alloc-odr4-1        8488.00 (   0.00%)     8514.00 (  -0.31%)
Min       alloc-odr4-2        6551.00 (   0.00%)     6527.00 (   0.37%)
Min       alloc-odr4-4        5536.00 (   0.00%)     5591.00 (  -0.99%)
Min       alloc-odr4-8        5008.00 (   0.00%)     5098.00 (  -1.80%)
Min       alloc-odr4-16       4760.00 (   0.00%)     4857.00 (  -2.04%)
Min       alloc-odr4-32       5827.00 (   0.00%)     5919.00 (  -1.58%)
Min       alloc-odr4-64       6561.00 (   0.00%)     6680.00 (  -1.81%)
Min       alloc-odr4-128      6898.00 (   0.00%)     7014.00 (  -1.68%)
Min       alloc-odr4-256      7311.00 (   0.00%)     7464.00 (  -2.09%)
Min       alloc-odr4-512     10181.00 (   0.00%)    10286.00 (  -1.03%)
Min       alloc-odr4-1024    11205.00 (   0.00%)    11725.00 (  -4.64%)
Min       free-odr4-1          789.00 (   0.00%)      867.00 (  -9.89%)
Min       free-odr4-2          490.00 (   0.00%)      526.00 (  -7.35%)
Min       free-odr4-4          350.00 (   0.00%)      360.00 (  -2.86%)
Min       free-odr4-8          272.00 (   0.00%)      287.00 (  -5.51%)
Min       free-odr4-16         247.00 (   0.00%)      254.00 (  -2.83%)
Min       free-odr4-32         298.00 (   0.00%)      304.00 (  -2.01%)
Min       free-odr4-64         334.00 (   0.00%)      325.00 (   2.69%)
Min       free-odr4-128        334.00 (   0.00%)      329.00 (   1.50%)
Min       free-odr4-256        336.00 (   0.00%)      336.00 (   0.00%)
Min       free-odr4-512        360.00 (   0.00%)      342.00 (   5.00%)
Min       free-odr4-1024       327.00 (   0.00%)      355.00 (  -8.56%)
Stddev    alloc-odr4-1           5.19 (   0.00%)       45.38 (-775.09%)
Stddev    alloc-odr4-2           6.99 (   0.00%)       37.63 (-437.98%)
Stddev    alloc-odr4-4           3.91 (   0.00%)       17.85 (-356.28%)
Stddev    alloc-odr4-8           5.15 (   0.00%)        9.34 ( -81.47%)
Stddev    alloc-odr4-16          3.83 (   0.00%)        5.34 ( -39.34%)
Stddev    alloc-odr4-32          1.96 (   0.00%)       10.28 (-425.09%)
Stddev    alloc-odr4-64          1.32 (   0.00%)      333.30 (-25141.39%)
Stddev    alloc-odr4-128         2.06 (   0.00%)        7.37 (-258.28%)
Stddev    alloc-odr4-256        15.56 (   0.00%)      113.48 (-629.25%)
Stddev    alloc-odr4-512        61.25 (   0.00%)      165.09 (-169.53%)
Stddev    alloc-odr4-1024       18.89 (   0.00%)        2.93 (  84.51%)
Stddev    free-odr4-1            4.45 (   0.00%)       40.12 (-800.98%)
Stddev    free-odr4-2            1.50 (   0.00%)       29.30 (-1850.31%)
Stddev    free-odr4-4            1.27 (   0.00%)       19.49 (-1439.40%)
Stddev    free-odr4-8            0.97 (   0.00%)        8.93 (-823.07%)
Stddev    free-odr4-16           8.38 (   0.00%)        4.51 (  46.21%)
Stddev    free-odr4-32           3.18 (   0.00%)        6.59 (-107.42%)
Stddev    free-odr4-64           2.40 (   0.00%)        3.09 ( -28.50%)
Stddev    free-odr4-128          1.55 (   0.00%)        2.53 ( -62.92%)
Stddev    free-odr4-256          0.41 (   0.00%)        2.80 (-585.57%)
Stddev    free-odr4-512          1.60 (   0.00%)        4.84 (-202.08%)
Stddev    free-odr4-1024         0.66 (   0.00%)        1.19 ( -80.68%)
Max       alloc-odr4-1        8505.00 (   0.00%)     8676.00 (  -2.01%)
Max       alloc-odr4-2        6572.00 (   0.00%)     6651.00 (  -1.20%)
Max       alloc-odr4-4        5552.00 (   0.00%)     5646.00 (  -1.69%)
Max       alloc-odr4-8        5024.00 (   0.00%)     5131.00 (  -2.13%)
Max       alloc-odr4-16       4774.00 (   0.00%)     4875.00 (  -2.12%)
Max       alloc-odr4-32       5834.00 (   0.00%)     5950.00 (  -1.99%)
Max       alloc-odr4-64       6565.00 (   0.00%)     7434.00 ( -13.24%)
Max       alloc-odr4-128      6907.00 (   0.00%)     7034.00 (  -1.84%)
Max       alloc-odr4-256      7347.00 (   0.00%)     7843.00 (  -6.75%)
Max       alloc-odr4-512     10315.00 (   0.00%)    10866.00 (  -5.34%)
Max       alloc-odr4-1024    11278.00 (   0.00%)    11733.00 (  -4.03%)
Max       free-odr4-1          803.00 (   0.00%)     1009.00 ( -25.65%)
Max       free-odr4-2          495.00 (   0.00%)      607.00 ( -22.63%)
Max       free-odr4-4          354.00 (   0.00%)      417.00 ( -17.80%)
Max       free-odr4-8          275.00 (   0.00%)      313.00 ( -13.82%)
Max       free-odr4-16         273.00 (   0.00%)      272.00 (   0.37%)
Max       free-odr4-32         309.00 (   0.00%)      324.00 (  -4.85%)
Max       free-odr4-64         340.00 (   0.00%)      335.00 (   1.47%)
Max       free-odr4-128        340.00 (   0.00%)      338.00 (   0.59%)
Max       free-odr4-256        338.00 (   0.00%)      346.00 (  -2.37%)
Max       free-odr4-512        364.00 (   0.00%)      359.00 (   1.37%)
Max       free-odr4-1024       329.00 (   0.00%)      359.00 (  -9.12%)

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
---
 mm/page_alloc.c | 10 ++++++----
 mm/vmstat.c     | 30 +++++++-----------------------
 2 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9431073e7255..a90f2bf735f6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -818,7 +818,8 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone,
 	else
 		list_add(&page->buddy_list, &area->free_list[migratetype]);
 	area->nr_free++;
-	area->mt_nr_free[migratetype]++;
+	WRITE_ONCE(area->mt_nr_free[migratetype],
+		area->mt_nr_free[migratetype] + 1);
 
 	if (order >= pageblock_order && !is_migrate_isolate(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages);
@@ -841,8 +842,8 @@ static inline void move_to_free_list(struct page *page, struct zone *zone,
 		     get_pageblock_migratetype(page), old_mt, nr_pages);
 
 	list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
-	area->mt_nr_free[old_mt]--;
-	area->mt_nr_free[new_mt]++;
+	WRITE_ONCE(area->mt_nr_free[old_mt], area->mt_nr_free[old_mt] - 1);
+	WRITE_ONCE(area->mt_nr_free[new_mt], area->mt_nr_free[new_mt] + 1);
 
 	account_freepages(zone, -nr_pages, old_mt);
 	account_freepages(zone, nr_pages, new_mt);
@@ -873,7 +874,8 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon
 	__ClearPageBuddy(page);
 	set_page_private(page, 0);
 	area->nr_free--;
-	area->mt_nr_free[migratetype]--;
+	WRITE_ONCE(area->mt_nr_free[migratetype],
+		area->mt_nr_free[migratetype] - 1);
 
 	if (order >= pageblock_order && !is_migrate_isolate(migratetype))
 		__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index bb09c032eecf..9334bbbe1e16 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
 					zone->name,
 					migratetype_names[mtype]);
 		for (order = 0; order < NR_PAGE_ORDERS; ++order) {
-			unsigned long freecount = 0;
-			struct free_area *area;
-			struct list_head *curr;
+			unsigned long freecount;
 			bool overflow = false;
 
-			area = &(zone->free_area[order]);
-
-			list_for_each(curr, &area->free_list[mtype]) {
-				/*
-				 * Cap the free_list iteration because it might
-				 * be really large and we are under a spinlock
-				 * so a long time spent here could trigger a
-				 * hard lockup detector. Anyway this is a
-				 * debugging tool so knowing there is a handful
-				 * of pages of this order should be more than
-				 * sufficient.
-				 */
-				if (++freecount >= 100000) {
-					overflow = true;
-					break;
-				}
+			/* Keep the same output format for user-space tools compatibility */
+			freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+			if (freecount >= 100000) {
+				overflow = true;
+				freecount = 100000;
 			}
 			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
-			spin_unlock_irq(&zone->lock);
-			cond_resched();
-			spin_lock_irq(&zone->lock);
 		}
 		seq_putc(m, '\n');
 	}
@@ -1633,7 +1617,7 @@ static void pagetypeinfo_showfree(struct seq_file *m, void *arg)
 		seq_printf(m, "%6d ", order);
 	seq_putc(m, '\n');
 
-	walk_zones_in_node(m, pgdat, true, false, pagetypeinfo_showfree_print);
+	walk_zones_in_node(m, pgdat, true, true, pagetypeinfo_showfree_print);
 }
 
 static void pagetypeinfo_showblockcount_print(struct seq_file *m,
-- 
2.43.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
  2025-11-28  3:12 ` [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count Hongru Zhang
@ 2025-11-28 12:03   ` zhongjinji
  2025-11-29  0:00     ` Barry Song
  0 siblings, 1 reply; 20+ messages in thread
From: zhongjinji @ 2025-11-28 12:03 UTC (permalink / raw)
  To: zhanghongru06
  Cc: Liam.Howlett, akpm, axelrasmussen, david, hannes, jackmanb,
	linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt, surenb,
	vbabka, weixugc, yuanchu, zhanghongru, ziy

Hi, Hongru

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9431073e7255..a90f2bf735f6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -818,7 +818,8 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone,
>  	else
>  		list_add(&page->buddy_list, &area->free_list[migratetype]);
>  	area->nr_free++;
> -	area->mt_nr_free[migratetype]++;
> +	WRITE_ONCE(area->mt_nr_free[migratetype],
> +		area->mt_nr_free[migratetype] + 1);
>  
>  	if (order >= pageblock_order && !is_migrate_isolate(migratetype))
>  		__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, nr_pages);
> @@ -841,8 +842,8 @@ static inline void move_to_free_list(struct page *page, struct zone *zone,
>  		     get_pageblock_migratetype(page), old_mt, nr_pages);
>  
>  	list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
> -	area->mt_nr_free[old_mt]--;
> -	area->mt_nr_free[new_mt]++;
> +	WRITE_ONCE(area->mt_nr_free[old_mt], area->mt_nr_free[old_mt] - 1);
> +	WRITE_ONCE(area->mt_nr_free[new_mt], area->mt_nr_free[new_mt] + 1);
>  
>  	account_freepages(zone, -nr_pages, old_mt);
>  	account_freepages(zone, nr_pages, new_mt);
> @@ -873,7 +874,8 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon
>  	__ClearPageBuddy(page);
>  	set_page_private(page, 0);
>  	area->nr_free--;
> -	area->mt_nr_free[migratetype]--;
> +	WRITE_ONCE(area->mt_nr_free[migratetype],
> +		area->mt_nr_free[migratetype] - 1);

It doesn't seem like a good idea to use WRITE_ONCE on the hot path.

>  
>  	if (order >= pageblock_order && !is_migrate_isolate(migratetype))
>  		__mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index bb09c032eecf..9334bbbe1e16 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>  					zone->name,
>  					migratetype_names[mtype]);
>  		for (order = 0; order < NR_PAGE_ORDERS; ++order) {
> -			unsigned long freecount = 0;
> -			struct free_area *area;
> -			struct list_head *curr;
> +			unsigned long freecount;
>  			bool overflow = false;
>  
> -			area = &(zone->free_area[order]);
> -
> -			list_for_each(curr, &area->free_list[mtype]) {
> -				/*
> -				 * Cap the free_list iteration because it might
> -				 * be really large and we are under a spinlock
> -				 * so a long time spent here could trigger a
> -				 * hard lockup detector. Anyway this is a
> -				 * debugging tool so knowing there is a handful
> -				 * of pages of this order should be more than
> -				 * sufficient.
> -				 */
> -				if (++freecount >= 100000) {
> -					overflow = true;
> -					break;
> -				}
> +			/* Keep the same output format for user-space tools compatibility */
> +			freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);

I think it might be better for using an array of size NR_PAGE_ORDERS to store
the free count for each order. Like the code below.

unsigned long freecount[NR_PAGE_ORDERS]

spin_lock_irq(&zone->lock)
	for_each_order
		freecount[order] = zone->free_area[order].mt_nr_free[mtype]
spin_unlock_irq(&zone->lock)

for_each_order
	print freecount[order]

> +			if (freecount >= 100000) {
> +				overflow = true;
> +				freecount = 100000;
>  			}
>  			seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount);
> -			spin_unlock_irq(&zone->lock);
> -			cond_resched();
> -			spin_lock_irq(&zone->lock);
>  		}
>  		seq_putc(m, '\n');
>  	}
> @@ -1633,7 +1617,7 @@ static void pagetypeinfo_showfree(struct seq_file *m, void *arg)
>  		seq_printf(m, "%6d ", order);
>  	seq_putc(m, '\n');
>  
> -	walk_zones_in_node(m, pgdat, true, false, pagetypeinfo_showfree_print);
> +	walk_zones_in_node(m, pgdat, true, true, pagetypeinfo_showfree_print);
>  }


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
  2025-11-28 12:03   ` zhongjinji
@ 2025-11-29  0:00     ` Barry Song
  2025-11-29  7:55       ` Barry Song
  2025-12-01 12:29       ` Hongru Zhang
  0 siblings, 2 replies; 20+ messages in thread
From: Barry Song @ 2025-11-29  0:00 UTC (permalink / raw)
  To: zhongjinji
  Cc: zhanghongru06, Liam.Howlett, akpm, axelrasmussen, david, hannes,
	jackmanb, linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt,
	surenb, vbabka, weixugc, yuanchu, zhanghongru, ziy

> >       if (order >= pageblock_order && !is_migrate_isolate(migratetype))
> >               __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index bb09c032eecf..9334bbbe1e16 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> >                                       zone->name,
> >                                       migratetype_names[mtype]);
> >               for (order = 0; order < NR_PAGE_ORDERS; ++order) {
> > -                     unsigned long freecount = 0;
> > -                     struct free_area *area;
> > -                     struct list_head *curr;
> > +                     unsigned long freecount;
> >                       bool overflow = false;
> >
> > -                     area = &(zone->free_area[order]);
> > -
> > -                     list_for_each(curr, &area->free_list[mtype]) {
> > -                             /*
> > -                              * Cap the free_list iteration because it might
> > -                              * be really large and we are under a spinlock
> > -                              * so a long time spent here could trigger a
> > -                              * hard lockup detector. Anyway this is a
> > -                              * debugging tool so knowing there is a handful
> > -                              * of pages of this order should be more than
> > -                              * sufficient.
> > -                              */
> > -                             if (++freecount >= 100000) {
> > -                                     overflow = true;
> > -                                     break;
> > -                             }
> > +                     /* Keep the same output format for user-space tools compatibility */
> > +                     freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
>
> I think it might be better for using an array of size NR_PAGE_ORDERS to store
> the free count for each order. Like the code below.

Right. If we want the freecount to accurately reflect the current system
state, we still need to take the zone lock.

Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
correctness. They may ensure single-copy atomicity per access, but not for the
overall result.

Thanks
Barry


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
  2025-11-29  0:00     ` Barry Song
@ 2025-11-29  7:55       ` Barry Song
  2025-12-01 12:29       ` Hongru Zhang
  1 sibling, 0 replies; 20+ messages in thread
From: Barry Song @ 2025-11-29  7:55 UTC (permalink / raw)
  To: zhongjinji
  Cc: zhanghongru06, Liam.Howlett, akpm, axelrasmussen, david, hannes,
	jackmanb, linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt,
	surenb, vbabka, weixugc, yuanchu, zhanghongru, ziy

On Sat, Nov 29, 2025 at 8:00 AM Barry Song <21cnbao@gmail.com> wrote:
>
> > >       if (order >= pageblock_order && !is_migrate_isolate(migratetype))
> > >               __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > index bb09c032eecf..9334bbbe1e16 100644
> > > --- a/mm/vmstat.c
> > > +++ b/mm/vmstat.c
> > > @@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> > >                                       zone->name,
> > >                                       migratetype_names[mtype]);
> > >               for (order = 0; order < NR_PAGE_ORDERS; ++order) {
> > > -                     unsigned long freecount = 0;
> > > -                     struct free_area *area;
> > > -                     struct list_head *curr;
> > > +                     unsigned long freecount;
> > >                       bool overflow = false;
> > >
> > > -                     area = &(zone->free_area[order]);
> > > -
> > > -                     list_for_each(curr, &area->free_list[mtype]) {
> > > -                             /*
> > > -                              * Cap the free_list iteration because it might
> > > -                              * be really large and we are under a spinlock
> > > -                              * so a long time spent here could trigger a
> > > -                              * hard lockup detector. Anyway this is a
> > > -                              * debugging tool so knowing there is a handful
> > > -                              * of pages of this order should be more than
> > > -                              * sufficient.
> > > -                              */
> > > -                             if (++freecount >= 100000) {
> > > -                                     overflow = true;
> > > -                                     break;
> > > -                             }
> > > +                     /* Keep the same output format for user-space tools compatibility */
> > > +                     freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
> >
> > I think it might be better for using an array of size NR_PAGE_ORDERS to store
> > the free count for each order. Like the code below.
>
> Right. If we want the freecount to accurately reflect the current system
> state, we still need to take the zone lock.
>
> Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
> correctness. They may ensure single-copy atomicity per access, but not for the
> overall result.

On second thought, the original code releases and re-acquires the spinlock
for each order, so cross-variable consistency may not be a real issue.
Adding data_race() to silence KCSAN warnings should be sufficient？
I mean something like the following.

@@ -843,8 +842,8 @@ static inline void move_to_free_list(struct page
*page, struct zone *zone,
                     get_pageblock_migratetype(page), old_mt, nr_pages);

        list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
-       WRITE_ONCE(area->mt_nr_free[old_mt], area->mt_nr_free[old_mt] - 1);
-       WRITE_ONCE(area->mt_nr_free[new_mt], area->mt_nr_free[new_mt] + 1);
+       area->mt_nr_free[old_mt]--;
+       area->mt_nr_free[new_mt]++;

        account_freepages(zone, -nr_pages, old_mt);
        account_freepages(zone, nr_pages, new_mt);
@@ -875,8 +874,7 @@ static inline void
__del_page_from_free_list(struct page *page, struct zone *zon
        __ClearPageBuddy(page);
        set_page_private(page, 0);
        area->nr_free--;
-       WRITE_ONCE(area->mt_nr_free[migratetype],
-               area->mt_nr_free[migratetype] - 1);
+       area->mt_nr_free[migratetype]--;

        if (order >= pageblock_order && !is_migrate_isolate(migratetype))
                __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7e1e931eb209..d74004eb8c4d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1599,7 +1599,7 @@ static void pagetypeinfo_showfree_print(struct
seq_file *m,
                        bool overflow = false;

                        /* Keep the same output format for user-space
tools compatibility */
-                       freecount =
READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+                       freecount =
data_race(zone->free_area[order].mt_nr_free[mtype]);
                        if (freecount >= 100000) {
                                overflow = true;
                                freecount = 100000;

Thanks
Barry


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
  2025-11-29  0:00     ` Barry Song
  2025-11-29  7:55       ` Barry Song
@ 2025-12-01 12:29       ` Hongru Zhang
  2025-12-01 18:54         ` Barry Song
  1 sibling, 1 reply; 20+ messages in thread
From: Hongru Zhang @ 2025-12-01 12:29 UTC (permalink / raw)
  To: 21cnbao, zhongjinji
  Cc: Liam.Howlett, akpm, axelrasmussen, david, hannes, jackmanb,
	linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt, surenb,
	vbabka, weixugc, yuanchu, zhanghongru06, zhanghongru, ziy

> Right. If we want the freecount to accurately reflect the current system
> state, we still need to take the zone lock.

Yeah, as I mentioned in patch (2/3), this implementation has accuracy
limitation:

    "Accuracy. Both implementations have accuracy limitations. The previous
    implementation required acquiring and releasing the zone lock for counting
    each order and migratetype, making it potentially inaccurate. Under high
    memory pressure, accuracy would further degrade due to zone lock
    contention or fragmentation. The new implementation collects data within a
    short time window, which helps maintain relatively small errors, and is
    unaffected by memory pressure. Furthermore, user-space memory management
    components inherently experience decision latency - by the time they
    process the collected data and execute actions, the memory state has
    already changed. This means that even perfectly accurate data at
    collection time becomes stale by decision time. Considering these factors,
    the accuracy trade-off introduced by the new implementation should be
    acceptable for practical use cases, offering a balance between performance
    and accuracy requirements."

Additional data:
1. average latency of pagetypeinfo_showfree_print() over 1,000,000 
   times is 4.67 us

2. average latency is 125 ns, if seq_printf() is taken out of the loop

Example code:

+unsigned long total_lat = 0;
+unsigned long total_count = 0;
+
 static void pagetypeinfo_showfree_print(struct seq_file *m,
                                        pg_data_t *pgdat, struct zone *zone)
 {
        int order, mtype;
+       ktime_t start;
+       u64 lat;
+       unsigned long freecounts[NR_PAGE_ORDERS][MIGRATE_TYPES]; /* ignore potential stack overflow */
+
+       start = ktime_get();
+       for (order = 0; order < NR_PAGE_ORDERS; ++order)
+               for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
+                       freecounts[order][mtype] = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+
+       lat = ktime_to_ns(ktime_sub(ktime_get(), start));
+       total_count++;
+       total_lat += lat;
 
        for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
                seq_printf(m, "Node %4d, zone %8s, type %12s ",
@@ -1594,7 +1609,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
                        bool overflow = false;
 
                        /* Keep the same output format for user-space tools compatibility */
-                       freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+                       freecount = freecounts[order][mtype];
                        if (freecount >= 100000) {
                                overflow = true;
                                freecount = 100000;
@@ -1692,6 +1707,13 @@ static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
 #endif /* CONFIG_PAGE_OWNER */
 }

I think both are small time window (if IRQ is disabled, latency is more
deterministic).

> Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
> correctness. They may ensure single-copy atomicity per access, but not for the
> overall result.

I know this does not guarantee correctness of the overall result.
READ_ONCE() and WRITE_ONCE() in this patch are used to avoid potential
store tearing and read tearing caused by compiler optimizations.

In fact, I have already noticed /proc/buddyinfo, which collects data under
zone lock and uses data_race to avoid KCSAN reports. But I'm wondering if
we could remove its zone lock as well, for the same reasons as
/proc/pagetypeinfo.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
  2025-12-01 12:29       ` Hongru Zhang
@ 2025-12-01 18:54         ` Barry Song
  0 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2025-12-01 18:54 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: zhongjinji, Liam.Howlett, akpm, axelrasmussen, david, hannes,
	jackmanb, linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt,
	surenb, vbabka, weixugc, yuanchu, zhanghongru, ziy

On Mon, Dec 1, 2025 at 8:29 PM Hongru Zhang <zhanghongru06@gmail.com> wrote:
>
> > Right. If we want the freecount to accurately reflect the current system
> > state, we still need to take the zone lock.
>
> Yeah, as I mentioned in patch (2/3), this implementation has accuracy
> limitation:
>
>     "Accuracy. Both implementations have accuracy limitations. The previous
>     implementation required acquiring and releasing the zone lock for counting
>     each order and migratetype, making it potentially inaccurate. Under high
>     memory pressure, accuracy would further degrade due to zone lock
>     contention or fragmentation. The new implementation collects data within a
>     short time window, which helps maintain relatively small errors, and is
>     unaffected by memory pressure. Furthermore, user-space memory management
>     components inherently experience decision latency - by the time they
>     process the collected data and execute actions, the memory state has
>     already changed. This means that even perfectly accurate data at
>     collection time becomes stale by decision time. Considering these factors,
>     the accuracy trade-off introduced by the new implementation should be
>     acceptable for practical use cases, offering a balance between performance
>     and accuracy requirements."
>
> Additional data:
> 1. average latency of pagetypeinfo_showfree_print() over 1,000,000
>    times is 4.67 us
>
> 2. average latency is 125 ns, if seq_printf() is taken out of the loop
>
> Example code:
>
> +unsigned long total_lat = 0;
> +unsigned long total_count = 0;
> +
>  static void pagetypeinfo_showfree_print(struct seq_file *m,
>                                         pg_data_t *pgdat, struct zone *zone)
>  {
>         int order, mtype;
> +       ktime_t start;
> +       u64 lat;
> +       unsigned long freecounts[NR_PAGE_ORDERS][MIGRATE_TYPES]; /* ignore potential stack overflow */
> +
> +       start = ktime_get();
> +       for (order = 0; order < NR_PAGE_ORDERS; ++order)
> +               for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
> +                       freecounts[order][mtype] = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
> +
> +       lat = ktime_to_ns(ktime_sub(ktime_get(), start));
> +       total_count++;
> +       total_lat += lat;
>
>         for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
>                 seq_printf(m, "Node %4d, zone %8s, type %12s ",
> @@ -1594,7 +1609,7 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>                         bool overflow = false;
>
>                         /* Keep the same output format for user-space tools compatibility */
> -                       freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
> +                       freecount = freecounts[order][mtype];
>                         if (freecount >= 100000) {
>                                 overflow = true;
>                                 freecount = 100000;
> @@ -1692,6 +1707,13 @@ static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat)
>  #endif /* CONFIG_PAGE_OWNER */
>  }
>
> I think both are small time window (if IRQ is disabled, latency is more
> deterministic).
>
> > Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
> > correctness. They may ensure single-copy atomicity per access, but not for the
> > overall result.
>
> I know this does not guarantee correctness of the overall result.
> READ_ONCE() and WRITE_ONCE() in this patch are used to avoid potential
> store tearing and read tearing caused by compiler optimizations.

Yes, I realized that correctness might not be a major concern, so I sent a
follow-up email [1] after replying to you.

>
> In fact, I have already noticed /proc/buddyinfo, which collects data under
> zone lock and uses data_race to avoid KCSAN reports. But I'm wondering if
> we could remove its zone lock as well, for the same reasons as
> /proc/pagetypeinfo.

That might be correct. However, if it doesn’t significantly affect performance
and buddyinfo is accessed much less frequently than the buddy list, we may
just leave it as is.

[1] https://lore.kernel.org/linux-mm/CAGsJ_4wUQdQyB_3y0Buf3uG34hvgpMAP3qHHwJM3=R01RJOuvw@mail.gmail.com/

Thanks
Barry


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts
  2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
  2025-11-28  3:11 ` [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator Hongru Zhang
  2025-11-28  3:12 ` [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count Hongru Zhang
@ 2025-11-28  3:12 ` Hongru Zhang
  2025-11-29  0:04   ` Barry Song
  2025-11-28  7:49 ` [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Lorenzo Stoakes
  2025-11-28  9:24 ` Vlastimil Babka
  4 siblings, 1 reply; 20+ messages in thread
From: Hongru Zhang @ 2025-11-28  3:12 UTC (permalink / raw)
  To: akpm, vbabka, david
  Cc: linux-mm, linux-kernel, surenb, mhocko, jackmanb, hannes, ziy,
	lorenzo.stoakes, Liam.Howlett, rppt, axelrasmussen, yuanchu,
	weixugc, Hongru Zhang

From: Hongru Zhang <zhanghongru@xiaomi.com>

Use per-migratetype counts instead of list_empty() helps reduce a
few cpu instructions.

Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
---
 mm/internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..7759f8fdf445 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -954,7 +954,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
 
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
-	return list_empty(&area->free_list[migratetype]);
+	return !READ_ONCE(area->mt_nr_free[migratetype]);
 }
 
 /* mm/util.c */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts
  2025-11-28  3:12 ` [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts Hongru Zhang
@ 2025-11-29  0:04   ` Barry Song
  2025-11-29  9:24     ` Barry Song
  0 siblings, 1 reply; 20+ messages in thread
From: Barry Song @ 2025-11-29  0:04 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: akpm, vbabka, david, linux-mm, linux-kernel, surenb, mhocko,
	jackmanb, hannes, ziy, lorenzo.stoakes, Liam.Howlett, rppt,
	axelrasmussen, yuanchu, weixugc, Hongru Zhang

On Fri, Nov 28, 2025 at 11:13 AM Hongru Zhang <zhanghongru06@gmail.com> wrote:
>
> From: Hongru Zhang <zhanghongru@xiaomi.com>
>
> Use per-migratetype counts instead of list_empty() helps reduce a
> few cpu instructions.
>
> Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
> ---
>  mm/internal.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1561fc2ff5b8..7759f8fdf445 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -954,7 +954,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
>
>  static inline bool free_area_empty(struct free_area *area, int migratetype)
>  {
> -       return list_empty(&area->free_list[migratetype]);
> +       return !READ_ONCE(area->mt_nr_free[migratetype]);

I'm not quite sure about this. Since the counter is written and read more
frequently, cache coherence traffic may actually be higher than for the list
head.

I'd prefer to drop this unless there is real data showing it performs better.

Thanks
Barry


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts
  2025-11-29  0:04   ` Barry Song
@ 2025-11-29  9:24     ` Barry Song
  0 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2025-11-29  9:24 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: akpm, vbabka, david, linux-mm, linux-kernel, surenb, mhocko,
	jackmanb, hannes, ziy, lorenzo.stoakes, Liam.Howlett, rppt,
	axelrasmussen, yuanchu, weixugc, Hongru Zhang

On Sat, Nov 29, 2025 at 8:04 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Fri, Nov 28, 2025 at 11:13 AM Hongru Zhang <zhanghongru06@gmail.com> wrote:
> >
> > From: Hongru Zhang <zhanghongru@xiaomi.com>
> >
> > Use per-migratetype counts instead of list_empty() helps reduce a
> > few cpu instructions.
> >
> > Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com>
> > ---
> >  mm/internal.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 1561fc2ff5b8..7759f8fdf445 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -954,7 +954,7 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
> >
> >  static inline bool free_area_empty(struct free_area *area, int migratetype)
> >  {
> > -       return list_empty(&area->free_list[migratetype]);
> > +       return !READ_ONCE(area->mt_nr_free[migratetype]);
>
> I'm not quite sure about this. Since the counter is written and read more
> frequently, cache coherence traffic may actually be higher than for the list
> head.
>
> I'd prefer to drop this unless there is real data showing it performs better.

If the goal is to optimize free_area list checks and list_add,
a reasonable approach is to organize the data structure
to reduce false sharing between different mt and order entries.

struct mt_free_area {
        struct list_head        free_list;
        unsigned long           nr_free;
} ____cacheline_aligned;

struct free_area {
        struct mt_free_area     mt_free_area[MIGRATE_TYPES];
};

However, without supporting data, it’s unclear if the space increase
is justified :-)

Thanks
Barry


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
                   ` (2 preceding siblings ...)
  2025-11-28  3:12 ` [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts Hongru Zhang
@ 2025-11-28  7:49 ` Lorenzo Stoakes
  2025-11-28  8:34   ` Hongru Zhang
  2025-11-28  9:24 ` Vlastimil Babka
  4 siblings, 1 reply; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-11-28  7:49 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: akpm, vbabka, david, linux-mm, linux-kernel, surenb, mhocko,
	jackmanb, hannes, ziy, Liam.Howlett, rppt, axelrasmussen,
	yuanchu, weixugc, Hongru Zhang

Just a general plea :) could we please try not to send larger series like
this so late.

We're at the last day before the merge window, this is better sent during
6.19-rc1 or if now as an RFC.

Thanks, Lorenzo


On Fri, Nov 28, 2025 at 11:10:11AM +0800, Hongru Zhang wrote:
> On mobile devices, some user-space memory management components check
> memory pressure and fragmentation status periodically or via PSI, and
> take actions such as killing processes or performing memory compaction
> based on this information.
>
> Under high load scenarios, reading /proc/pagetypeinfo causes memory
> management components or memory allocation/free paths to be blocked
> for extended periods waiting for the zone lock, leading to the following
> issues:
> 1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
>    8750 platforms, reducing system real-time performance
> 2. Memory management components being blocked for extended periods,
>    preventing rapid acquisition of memory fragmentation information for
>    critical memory management decisions and actions
> 3. Increased latency in memory allocation and free paths due to prolonged
>    zone lock contention
>
> Changes:
> 1. Add per-migratetype counts to the buddy allocator to track free page
>    block counts for each migratetype and order
> 2. Optimize /proc/pagetypeinfo access by utilizing these per-migratetype
>    counts instead of iterating through free lists under zone lock
>
> Performance testing shows following improvements:
> 1. /proc/pagetypeinfo access latency reduced
>
>     +-----------------------+----------+------------+
>     |                       | no-patch | with-patch |
>     +-----------------------+----------+------------+
>     |    Just after boot    | 700.9 us |  268.6 us  |
>     +-----------------------+----------+------------+
>     | After building kernel |  28.7 ms |  269.8 us  |
>     +-----------------------+----------+------------+
>
> 2. When /proc/pagetypeinfo is accessed concurrently, memory allocation and
>    free performance degradation is reduced compared to the previous
>    implementation
>
> Test setup:
> - Using config-pagealloc-micro
> - Monitor set to proc-pagetypeinfo, update frequency set to 10ms
> - PAGEALLOC_ORDER_MIN=4, PAGEALLOC_ORDER_MAX=4
>
> Without patch test results:
>                                          vanilla                vanilla
>                                        no-monitor               monitor
> Min       alloc-odr4-1        8539.00 (   0.00%)     8762.00 (  -2.61%)
> Min       alloc-odr4-2        6501.00 (   0.00%)     6683.00 (  -2.80%)
> Min       alloc-odr4-4        5537.00 (   0.00%)     5873.00 (  -6.07%)
> Min       alloc-odr4-8        5030.00 (   0.00%)     5361.00 (  -6.58%)
> Min       alloc-odr4-16       4782.00 (   0.00%)     5162.00 (  -7.95%)
> Min       alloc-odr4-32       5838.00 (   0.00%)     6499.00 ( -11.32%)
> Min       alloc-odr4-64       6565.00 (   0.00%)     7413.00 ( -12.92%)
> Min       alloc-odr4-128      6896.00 (   0.00%)     7898.00 ( -14.53%)
> Min       alloc-odr4-256      7303.00 (   0.00%)     8163.00 ( -11.78%)
> Min       alloc-odr4-512     10179.00 (   0.00%)    11985.00 ( -17.74%)
> Min       alloc-odr4-1024    11000.00 (   0.00%)    12165.00 ( -10.59%)
> Min       free-odr4-1          820.00 (   0.00%)     1230.00 ( -50.00%)
> Min       free-odr4-2          511.00 (   0.00%)      952.00 ( -86.30%)
> Min       free-odr4-4          347.00 (   0.00%)      434.00 ( -25.07%)
> Min       free-odr4-8          286.00 (   0.00%)      399.00 ( -39.51%)
> Min       free-odr4-16         250.00 (   0.00%)      405.00 ( -62.00%)
> Min       free-odr4-32         294.00 (   0.00%)      405.00 ( -37.76%)
> Min       free-odr4-64         333.00 (   0.00%)      363.00 (  -9.01%)
> Min       free-odr4-128        340.00 (   0.00%)      412.00 ( -21.18%)
> Min       free-odr4-256        339.00 (   0.00%)      329.00 (   2.95%)
> Min       free-odr4-512        361.00 (   0.00%)      409.00 ( -13.30%)
> Min       free-odr4-1024       300.00 (   0.00%)      361.00 ( -20.33%)
> Stddev    alloc-odr4-1           7.29 (   0.00%)       90.78 (-1146.00%)
> Stddev    alloc-odr4-2           3.87 (   0.00%)       51.30 (-1225.75%)
> Stddev    alloc-odr4-4           3.20 (   0.00%)       50.90 (-1491.24%)
> Stddev    alloc-odr4-8           4.67 (   0.00%)       52.23 (-1019.35%)
> Stddev    alloc-odr4-16          5.72 (   0.00%)       27.53 (-381.04%)
> Stddev    alloc-odr4-32          6.25 (   0.00%)      641.23 (-10154.46%)
> Stddev    alloc-odr4-64          2.06 (   0.00%)      386.99 (-18714.22%)
> Stddev    alloc-odr4-128        14.36 (   0.00%)       52.39 (-264.77%)
> Stddev    alloc-odr4-256        32.42 (   0.00%)      326.19 (-906.05%)
> Stddev    alloc-odr4-512        65.58 (   0.00%)      184.49 (-181.31%)
> Stddev    alloc-odr4-1024        8.88 (   0.00%)      153.01 (-1622.67%)
> Stddev    free-odr4-1            2.29 (   0.00%)      152.27 (-6549.85%)
> Stddev    free-odr4-2           10.99 (   0.00%)       73.10 (-564.89%)
> Stddev    free-odr4-4            1.99 (   0.00%)       28.40 (-1324.45%)
> Stddev    free-odr4-8            2.51 (   0.00%)       52.93 (-2007.64%)
> Stddev    free-odr4-16           2.85 (   0.00%)       26.04 (-814.88%)
> Stddev    free-odr4-32           4.04 (   0.00%)       27.05 (-569.79%)
> Stddev    free-odr4-64           2.10 (   0.00%)       48.07 (-2185.66%)
> Stddev    free-odr4-128          2.63 (   0.00%)       26.23 (-897.86%)
> Stddev    free-odr4-256          6.29 (   0.00%)       37.04 (-488.71%)
> Stddev    free-odr4-512          2.56 (   0.00%)       10.65 (-315.28%)
> Stddev    free-odr4-1024         0.95 (   0.00%)        6.46 (-582.22%)
> Max       alloc-odr4-1        8564.00 (   0.00%)     9099.00 (  -6.25%)
> Max       alloc-odr4-2        6511.00 (   0.00%)     6844.00 (  -5.11%)
> Max       alloc-odr4-4        5549.00 (   0.00%)     6038.00 (  -8.81%)
> Max       alloc-odr4-8        5045.00 (   0.00%)     5551.00 ( -10.03%)
> Max       alloc-odr4-16       4800.00 (   0.00%)     5257.00 (  -9.52%)
> Max       alloc-odr4-32       5861.00 (   0.00%)     8115.00 ( -38.46%)
> Max       alloc-odr4-64       6571.00 (   0.00%)     8292.00 ( -26.19%)
> Max       alloc-odr4-128      6930.00 (   0.00%)     8081.00 ( -16.61%)
> Max       alloc-odr4-256      7372.00 (   0.00%)     9150.00 ( -24.12%)
> Max       alloc-odr4-512     10333.00 (   0.00%)    12636.00 ( -22.29%)
> Max       alloc-odr4-1024    11035.00 (   0.00%)    12590.00 ( -14.09%)
> Max       free-odr4-1          828.00 (   0.00%)     1724.00 (-108.21%)
> Max       free-odr4-2          543.00 (   0.00%)     1192.00 (-119.52%)
> Max       free-odr4-4          354.00 (   0.00%)      519.00 ( -46.61%)
> Max       free-odr4-8          293.00 (   0.00%)      617.00 (-110.58%)
> Max       free-odr4-16         260.00 (   0.00%)      483.00 ( -85.77%)
> Max       free-odr4-32         308.00 (   0.00%)      488.00 ( -58.44%)
> Max       free-odr4-64         341.00 (   0.00%)      505.00 ( -48.09%)
> Max       free-odr4-128        346.00 (   0.00%)      497.00 ( -43.64%)
> Max       free-odr4-256        353.00 (   0.00%)      463.00 ( -31.16%)
> Max       free-odr4-512        367.00 (   0.00%)      442.00 ( -20.44%)
> Max       free-odr4-1024       303.00 (   0.00%)      381.00 ( -25.74%)
>
> With patch test results:
>                                          patched                patched
>                                       no-monitor                monitor
> Min       alloc-odr4-1        8488.00 (   0.00%)     8514.00 (  -0.31%)
> Min       alloc-odr4-2        6551.00 (   0.00%)     6527.00 (   0.37%)
> Min       alloc-odr4-4        5536.00 (   0.00%)     5591.00 (  -0.99%)
> Min       alloc-odr4-8        5008.00 (   0.00%)     5098.00 (  -1.80%)
> Min       alloc-odr4-16       4760.00 (   0.00%)     4857.00 (  -2.04%)
> Min       alloc-odr4-32       5827.00 (   0.00%)     5919.00 (  -1.58%)
> Min       alloc-odr4-64       6561.00 (   0.00%)     6680.00 (  -1.81%)
> Min       alloc-odr4-128      6898.00 (   0.00%)     7014.00 (  -1.68%)
> Min       alloc-odr4-256      7311.00 (   0.00%)     7464.00 (  -2.09%)
> Min       alloc-odr4-512     10181.00 (   0.00%)    10286.00 (  -1.03%)
> Min       alloc-odr4-1024    11205.00 (   0.00%)    11725.00 (  -4.64%)
> Min       free-odr4-1          789.00 (   0.00%)      867.00 (  -9.89%)
> Min       free-odr4-2          490.00 (   0.00%)      526.00 (  -7.35%)
> Min       free-odr4-4          350.00 (   0.00%)      360.00 (  -2.86%)
> Min       free-odr4-8          272.00 (   0.00%)      287.00 (  -5.51%)
> Min       free-odr4-16         247.00 (   0.00%)      254.00 (  -2.83%)
> Min       free-odr4-32         298.00 (   0.00%)      304.00 (  -2.01%)
> Min       free-odr4-64         334.00 (   0.00%)      325.00 (   2.69%)
> Min       free-odr4-128        334.00 (   0.00%)      329.00 (   1.50%)
> Min       free-odr4-256        336.00 (   0.00%)      336.00 (   0.00%)
> Min       free-odr4-512        360.00 (   0.00%)      342.00 (   5.00%)
> Min       free-odr4-1024       327.00 (   0.00%)      355.00 (  -8.56%)
> Stddev    alloc-odr4-1           5.19 (   0.00%)       45.38 (-775.09%)
> Stddev    alloc-odr4-2           6.99 (   0.00%)       37.63 (-437.98%)
> Stddev    alloc-odr4-4           3.91 (   0.00%)       17.85 (-356.28%)
> Stddev    alloc-odr4-8           5.15 (   0.00%)        9.34 ( -81.47%)
> Stddev    alloc-odr4-16          3.83 (   0.00%)        5.34 ( -39.34%)
> Stddev    alloc-odr4-32          1.96 (   0.00%)       10.28 (-425.09%)
> Stddev    alloc-odr4-64          1.32 (   0.00%)      333.30 (-25141.39%)
> Stddev    alloc-odr4-128         2.06 (   0.00%)        7.37 (-258.28%)
> Stddev    alloc-odr4-256        15.56 (   0.00%)      113.48 (-629.25%)
> Stddev    alloc-odr4-512        61.25 (   0.00%)      165.09 (-169.53%)
> Stddev    alloc-odr4-1024       18.89 (   0.00%)        2.93 (  84.51%)
> Stddev    free-odr4-1            4.45 (   0.00%)       40.12 (-800.98%)
> Stddev    free-odr4-2            1.50 (   0.00%)       29.30 (-1850.31%)
> Stddev    free-odr4-4            1.27 (   0.00%)       19.49 (-1439.40%)
> Stddev    free-odr4-8            0.97 (   0.00%)        8.93 (-823.07%)
> Stddev    free-odr4-16           8.38 (   0.00%)        4.51 (  46.21%)
> Stddev    free-odr4-32           3.18 (   0.00%)        6.59 (-107.42%)
> Stddev    free-odr4-64           2.40 (   0.00%)        3.09 ( -28.50%)
> Stddev    free-odr4-128          1.55 (   0.00%)        2.53 ( -62.92%)
> Stddev    free-odr4-256          0.41 (   0.00%)        2.80 (-585.57%)
> Stddev    free-odr4-512          1.60 (   0.00%)        4.84 (-202.08%)
> Stddev    free-odr4-1024         0.66 (   0.00%)        1.19 ( -80.68%)
> Max       alloc-odr4-1        8505.00 (   0.00%)     8676.00 (  -2.01%)
> Max       alloc-odr4-2        6572.00 (   0.00%)     6651.00 (  -1.20%)
> Max       alloc-odr4-4        5552.00 (   0.00%)     5646.00 (  -1.69%)
> Max       alloc-odr4-8        5024.00 (   0.00%)     5131.00 (  -2.13%)
> Max       alloc-odr4-16       4774.00 (   0.00%)     4875.00 (  -2.12%)
> Max       alloc-odr4-32       5834.00 (   0.00%)     5950.00 (  -1.99%)
> Max       alloc-odr4-64       6565.00 (   0.00%)     7434.00 ( -13.24%)
> Max       alloc-odr4-128      6907.00 (   0.00%)     7034.00 (  -1.84%)
> Max       alloc-odr4-256      7347.00 (   0.00%)     7843.00 (  -6.75%)
> Max       alloc-odr4-512     10315.00 (   0.00%)    10866.00 (  -5.34%)
> Max       alloc-odr4-1024    11278.00 (   0.00%)    11733.00 (  -4.03%)
> Max       free-odr4-1          803.00 (   0.00%)     1009.00 ( -25.65%)
> Max       free-odr4-2          495.00 (   0.00%)      607.00 ( -22.63%)
> Max       free-odr4-4          354.00 (   0.00%)      417.00 ( -17.80%)
> Max       free-odr4-8          275.00 (   0.00%)      313.00 ( -13.82%)
> Max       free-odr4-16         273.00 (   0.00%)      272.00 (   0.37%)
> Max       free-odr4-32         309.00 (   0.00%)      324.00 (  -4.85%)
> Max       free-odr4-64         340.00 (   0.00%)      335.00 (   1.47%)
> Max       free-odr4-128        340.00 (   0.00%)      338.00 (   0.59%)
> Max       free-odr4-256        338.00 (   0.00%)      346.00 (  -2.37%)
> Max       free-odr4-512        364.00 (   0.00%)      359.00 (   1.37%)
> Max       free-odr4-1024       329.00 (   0.00%)      359.00 (  -9.12%)
>
> The main overhead is a slight increase in latency on the memory allocation
> and free paths due to additional per-migratetype counting, with
> theoretically minimal impact on overall performance.
>
> This patch series is based on v6.18-rc7
>
> Hongru Zhang (3):
>   mm/page_alloc: add per-migratetype counts to buddy allocator
>   mm/vmstat: get fragmentation statistics from per-migragetype count
>   mm: optimize free_area_empty() check using per-migratetype counts
>
>  include/linux/mmzone.h |  1 +
>  mm/internal.h          |  2 +-
>  mm/mm_init.c           |  1 +
>  mm/page_alloc.c        |  9 ++++++++-
>  mm/vmstat.c            | 30 +++++++-----------------------
>  5 files changed, 18 insertions(+), 25 deletions(-)
>
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-11-28  7:49 ` [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Lorenzo Stoakes
@ 2025-11-28  8:34   ` Hongru Zhang
  2025-11-28  8:40     ` Lorenzo Stoakes
  0 siblings, 1 reply; 20+ messages in thread
From: Hongru Zhang @ 2025-11-28  8:34 UTC (permalink / raw)
  To: lorenzo.stoakes
  Cc: Liam.Howlett, akpm, axelrasmussen, david, hannes, jackmanb,
	linux-kernel, linux-mm, mhocko, rppt, surenb, vbabka, weixugc,
	yuanchu, zhanghongru06, zhanghongru, ziy

> Just a general plea :) could we please try not to send larger series like
> this so late.
>
> We're at the last day before the merge window, this is better sent during
> 6.19-rc1 or if now as an RFC.
>
> Thanks, Lorenzo

Hi Lorenzo,

Thank you for your feedback and sorry for the late submission. You're
right - this series should have been sent earlier. Apologize for not
following the proper submission timing guidelines. I'll make sure to
follow the community norms and submit similar work well in advance in
the future.

Thanks again for your patience and guidance.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-11-28  8:34   ` Hongru Zhang
@ 2025-11-28  8:40     ` Lorenzo Stoakes
  0 siblings, 0 replies; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-11-28  8:40 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: Liam.Howlett, akpm, axelrasmussen, david, hannes, jackmanb,
	linux-kernel, linux-mm, mhocko, rppt, surenb, vbabka, weixugc,
	yuanchu, zhanghongru, ziy

On Fri, Nov 28, 2025 at 04:34:37PM +0800, Hongru Zhang wrote:
> > Just a general plea :) could we please try not to send larger series like
> > this so late.
> >
> > We're at the last day before the merge window, this is better sent during
> > 6.19-rc1 or if now as an RFC.
> >
> > Thanks, Lorenzo
>
> Hi Lorenzo,
>
> Thank you for your feedback and sorry for the late submission. You're
> right - this series should have been sent earlier. Apologize for not
> following the proper submission timing guidelines. I'll make sure to
> follow the community norms and submit similar work well in advance in
> the future.
>
> Thanks again for your patience and guidance.

Hi Hongru,

Sorry I don't mean to be critical here and you weren't to know :) rather just in
general - a plea for how we do things in mm.

Your series is very much appreciated, you didn't do anything wrong at all - this
is just essentially an admin thing :P

We will absolutely review your series it's just about timing. And of course I'm
just sort of making a point here, reviewers can choose to review as and when
they want! :)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
                   ` (3 preceding siblings ...)
  2025-11-28  7:49 ` [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Lorenzo Stoakes
@ 2025-11-28  9:24 ` Vlastimil Babka
  2025-11-28 13:08   ` Johannes Weiner
  2025-12-01  2:36   ` Hongru Zhang
  4 siblings, 2 replies; 20+ messages in thread
From: Vlastimil Babka @ 2025-11-28  9:24 UTC (permalink / raw)
  To: Hongru Zhang, akpm, david
  Cc: linux-mm, linux-kernel, surenb, mhocko, jackmanb, hannes, ziy,
	lorenzo.stoakes, Liam.Howlett, rppt, axelrasmussen, yuanchu,
	weixugc, Hongru Zhang

On 11/28/25 04:10, Hongru Zhang wrote:
> On mobile devices, some user-space memory management components check
> memory pressure and fragmentation status periodically or via PSI, and
> take actions such as killing processes or performing memory compaction
> based on this information.

Hm /proc/buddyinfo could be enough to determine fragmentation? Also we have
in-kernel proactive compaction these days.

> Under high load scenarios, reading /proc/pagetypeinfo causes memory
> management components or memory allocation/free paths to be blocked
> for extended periods waiting for the zone lock, leading to the following
> issues:
> 1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
>    8750 platforms, reducing system real-time performance
> 2. Memory management components being blocked for extended periods,
>    preventing rapid acquisition of memory fragmentation information for
>    critical memory management decisions and actions
> 3. Increased latency in memory allocation and free paths due to prolonged
>    zone lock contention

It could be argued that not capturing /proc/pagetypeinfo (often) would help.
I wonder if we can find also other benefits from the counters in the kernel
itself.

Adding these migratetype counters is something that wouldn't be even
possible in the past, until the freelist migratetype hygiene was merged.
So now it should be AFAIK possible, but it's still some overhead in
relatively hot paths. I wonder if we even considered this before in the
context of migratetype hygiene? Couldn't find anything quickly.

> Changes:
> 1. Add per-migratetype counts to the buddy allocator to track free page
>    block counts for each migratetype and order
> 2. Optimize /proc/pagetypeinfo access by utilizing these per-migratetype
>    counts instead of iterating through free lists under zone lock
> 
> Performance testing shows following improvements:
> 1. /proc/pagetypeinfo access latency reduced
> 
>     +-----------------------+----------+------------+
>     |                       | no-patch | with-patch |
>     +-----------------------+----------+------------+
>     |    Just after boot    | 700.9 us |  268.6 us  |
>     +-----------------------+----------+------------+
>     | After building kernel |  28.7 ms |  269.8 us  |
>     +-----------------------+----------+------------+
> 
> 2. When /proc/pagetypeinfo is accessed concurrently, memory allocation and
>    free performance degradation is reduced compared to the previous
>    implementation
> 
> Test setup:
> - Using config-pagealloc-micro
> - Monitor set to proc-pagetypeinfo, update frequency set to 10ms
> - PAGEALLOC_ORDER_MIN=4, PAGEALLOC_ORDER_MAX=4
> 
> Without patch test results:
>                                          vanilla                vanilla
>                                        no-monitor               monitor
> Min       alloc-odr4-1        8539.00 (   0.00%)     8762.00 (  -2.61%)
> Min       alloc-odr4-2        6501.00 (   0.00%)     6683.00 (  -2.80%)
> Min       alloc-odr4-4        5537.00 (   0.00%)     5873.00 (  -6.07%)
> Min       alloc-odr4-8        5030.00 (   0.00%)     5361.00 (  -6.58%)
> Min       alloc-odr4-16       4782.00 (   0.00%)     5162.00 (  -7.95%)
> Min       alloc-odr4-32       5838.00 (   0.00%)     6499.00 ( -11.32%)
> Min       alloc-odr4-64       6565.00 (   0.00%)     7413.00 ( -12.92%)
> Min       alloc-odr4-128      6896.00 (   0.00%)     7898.00 ( -14.53%)
> Min       alloc-odr4-256      7303.00 (   0.00%)     8163.00 ( -11.78%)
> Min       alloc-odr4-512     10179.00 (   0.00%)    11985.00 ( -17.74%)
> Min       alloc-odr4-1024    11000.00 (   0.00%)    12165.00 ( -10.59%)
> Min       free-odr4-1          820.00 (   0.00%)     1230.00 ( -50.00%)
> Min       free-odr4-2          511.00 (   0.00%)      952.00 ( -86.30%)
> Min       free-odr4-4          347.00 (   0.00%)      434.00 ( -25.07%)
> Min       free-odr4-8          286.00 (   0.00%)      399.00 ( -39.51%)
> Min       free-odr4-16         250.00 (   0.00%)      405.00 ( -62.00%)
> Min       free-odr4-32         294.00 (   0.00%)      405.00 ( -37.76%)
> Min       free-odr4-64         333.00 (   0.00%)      363.00 (  -9.01%)
> Min       free-odr4-128        340.00 (   0.00%)      412.00 ( -21.18%)
> Min       free-odr4-256        339.00 (   0.00%)      329.00 (   2.95%)
> Min       free-odr4-512        361.00 (   0.00%)      409.00 ( -13.30%)
> Min       free-odr4-1024       300.00 (   0.00%)      361.00 ( -20.33%)
> Stddev    alloc-odr4-1           7.29 (   0.00%)       90.78 (-1146.00%)
> Stddev    alloc-odr4-2           3.87 (   0.00%)       51.30 (-1225.75%)
> Stddev    alloc-odr4-4           3.20 (   0.00%)       50.90 (-1491.24%)
> Stddev    alloc-odr4-8           4.67 (   0.00%)       52.23 (-1019.35%)
> Stddev    alloc-odr4-16          5.72 (   0.00%)       27.53 (-381.04%)
> Stddev    alloc-odr4-32          6.25 (   0.00%)      641.23 (-10154.46%)
> Stddev    alloc-odr4-64          2.06 (   0.00%)      386.99 (-18714.22%)
> Stddev    alloc-odr4-128        14.36 (   0.00%)       52.39 (-264.77%)
> Stddev    alloc-odr4-256        32.42 (   0.00%)      326.19 (-906.05%)
> Stddev    alloc-odr4-512        65.58 (   0.00%)      184.49 (-181.31%)
> Stddev    alloc-odr4-1024        8.88 (   0.00%)      153.01 (-1622.67%)
> Stddev    free-odr4-1            2.29 (   0.00%)      152.27 (-6549.85%)
> Stddev    free-odr4-2           10.99 (   0.00%)       73.10 (-564.89%)
> Stddev    free-odr4-4            1.99 (   0.00%)       28.40 (-1324.45%)
> Stddev    free-odr4-8            2.51 (   0.00%)       52.93 (-2007.64%)
> Stddev    free-odr4-16           2.85 (   0.00%)       26.04 (-814.88%)
> Stddev    free-odr4-32           4.04 (   0.00%)       27.05 (-569.79%)
> Stddev    free-odr4-64           2.10 (   0.00%)       48.07 (-2185.66%)
> Stddev    free-odr4-128          2.63 (   0.00%)       26.23 (-897.86%)
> Stddev    free-odr4-256          6.29 (   0.00%)       37.04 (-488.71%)
> Stddev    free-odr4-512          2.56 (   0.00%)       10.65 (-315.28%)
> Stddev    free-odr4-1024         0.95 (   0.00%)        6.46 (-582.22%)
> Max       alloc-odr4-1        8564.00 (   0.00%)     9099.00 (  -6.25%)
> Max       alloc-odr4-2        6511.00 (   0.00%)     6844.00 (  -5.11%)
> Max       alloc-odr4-4        5549.00 (   0.00%)     6038.00 (  -8.81%)
> Max       alloc-odr4-8        5045.00 (   0.00%)     5551.00 ( -10.03%)
> Max       alloc-odr4-16       4800.00 (   0.00%)     5257.00 (  -9.52%)
> Max       alloc-odr4-32       5861.00 (   0.00%)     8115.00 ( -38.46%)
> Max       alloc-odr4-64       6571.00 (   0.00%)     8292.00 ( -26.19%)
> Max       alloc-odr4-128      6930.00 (   0.00%)     8081.00 ( -16.61%)
> Max       alloc-odr4-256      7372.00 (   0.00%)     9150.00 ( -24.12%)
> Max       alloc-odr4-512     10333.00 (   0.00%)    12636.00 ( -22.29%)
> Max       alloc-odr4-1024    11035.00 (   0.00%)    12590.00 ( -14.09%)
> Max       free-odr4-1          828.00 (   0.00%)     1724.00 (-108.21%)
> Max       free-odr4-2          543.00 (   0.00%)     1192.00 (-119.52%)
> Max       free-odr4-4          354.00 (   0.00%)      519.00 ( -46.61%)
> Max       free-odr4-8          293.00 (   0.00%)      617.00 (-110.58%)
> Max       free-odr4-16         260.00 (   0.00%)      483.00 ( -85.77%)
> Max       free-odr4-32         308.00 (   0.00%)      488.00 ( -58.44%)
> Max       free-odr4-64         341.00 (   0.00%)      505.00 ( -48.09%)
> Max       free-odr4-128        346.00 (   0.00%)      497.00 ( -43.64%)
> Max       free-odr4-256        353.00 (   0.00%)      463.00 ( -31.16%)
> Max       free-odr4-512        367.00 (   0.00%)      442.00 ( -20.44%)
> Max       free-odr4-1024       303.00 (   0.00%)      381.00 ( -25.74%)
> 
> With patch test results:
>                                          patched                patched
>                                       no-monitor                monitor
> Min       alloc-odr4-1        8488.00 (   0.00%)     8514.00 (  -0.31%)
> Min       alloc-odr4-2        6551.00 (   0.00%)     6527.00 (   0.37%)
> Min       alloc-odr4-4        5536.00 (   0.00%)     5591.00 (  -0.99%)
> Min       alloc-odr4-8        5008.00 (   0.00%)     5098.00 (  -1.80%)
> Min       alloc-odr4-16       4760.00 (   0.00%)     4857.00 (  -2.04%)
> Min       alloc-odr4-32       5827.00 (   0.00%)     5919.00 (  -1.58%)
> Min       alloc-odr4-64       6561.00 (   0.00%)     6680.00 (  -1.81%)
> Min       alloc-odr4-128      6898.00 (   0.00%)     7014.00 (  -1.68%)
> Min       alloc-odr4-256      7311.00 (   0.00%)     7464.00 (  -2.09%)
> Min       alloc-odr4-512     10181.00 (   0.00%)    10286.00 (  -1.03%)
> Min       alloc-odr4-1024    11205.00 (   0.00%)    11725.00 (  -4.64%)
> Min       free-odr4-1          789.00 (   0.00%)      867.00 (  -9.89%)
> Min       free-odr4-2          490.00 (   0.00%)      526.00 (  -7.35%)
> Min       free-odr4-4          350.00 (   0.00%)      360.00 (  -2.86%)
> Min       free-odr4-8          272.00 (   0.00%)      287.00 (  -5.51%)
> Min       free-odr4-16         247.00 (   0.00%)      254.00 (  -2.83%)
> Min       free-odr4-32         298.00 (   0.00%)      304.00 (  -2.01%)
> Min       free-odr4-64         334.00 (   0.00%)      325.00 (   2.69%)
> Min       free-odr4-128        334.00 (   0.00%)      329.00 (   1.50%)
> Min       free-odr4-256        336.00 (   0.00%)      336.00 (   0.00%)
> Min       free-odr4-512        360.00 (   0.00%)      342.00 (   5.00%)
> Min       free-odr4-1024       327.00 (   0.00%)      355.00 (  -8.56%)
> Stddev    alloc-odr4-1           5.19 (   0.00%)       45.38 (-775.09%)
> Stddev    alloc-odr4-2           6.99 (   0.00%)       37.63 (-437.98%)
> Stddev    alloc-odr4-4           3.91 (   0.00%)       17.85 (-356.28%)
> Stddev    alloc-odr4-8           5.15 (   0.00%)        9.34 ( -81.47%)
> Stddev    alloc-odr4-16          3.83 (   0.00%)        5.34 ( -39.34%)
> Stddev    alloc-odr4-32          1.96 (   0.00%)       10.28 (-425.09%)
> Stddev    alloc-odr4-64          1.32 (   0.00%)      333.30 (-25141.39%)
> Stddev    alloc-odr4-128         2.06 (   0.00%)        7.37 (-258.28%)
> Stddev    alloc-odr4-256        15.56 (   0.00%)      113.48 (-629.25%)
> Stddev    alloc-odr4-512        61.25 (   0.00%)      165.09 (-169.53%)
> Stddev    alloc-odr4-1024       18.89 (   0.00%)        2.93 (  84.51%)
> Stddev    free-odr4-1            4.45 (   0.00%)       40.12 (-800.98%)
> Stddev    free-odr4-2            1.50 (   0.00%)       29.30 (-1850.31%)
> Stddev    free-odr4-4            1.27 (   0.00%)       19.49 (-1439.40%)
> Stddev    free-odr4-8            0.97 (   0.00%)        8.93 (-823.07%)
> Stddev    free-odr4-16           8.38 (   0.00%)        4.51 (  46.21%)
> Stddev    free-odr4-32           3.18 (   0.00%)        6.59 (-107.42%)
> Stddev    free-odr4-64           2.40 (   0.00%)        3.09 ( -28.50%)
> Stddev    free-odr4-128          1.55 (   0.00%)        2.53 ( -62.92%)
> Stddev    free-odr4-256          0.41 (   0.00%)        2.80 (-585.57%)
> Stddev    free-odr4-512          1.60 (   0.00%)        4.84 (-202.08%)
> Stddev    free-odr4-1024         0.66 (   0.00%)        1.19 ( -80.68%)
> Max       alloc-odr4-1        8505.00 (   0.00%)     8676.00 (  -2.01%)
> Max       alloc-odr4-2        6572.00 (   0.00%)     6651.00 (  -1.20%)
> Max       alloc-odr4-4        5552.00 (   0.00%)     5646.00 (  -1.69%)
> Max       alloc-odr4-8        5024.00 (   0.00%)     5131.00 (  -2.13%)
> Max       alloc-odr4-16       4774.00 (   0.00%)     4875.00 (  -2.12%)
> Max       alloc-odr4-32       5834.00 (   0.00%)     5950.00 (  -1.99%)
> Max       alloc-odr4-64       6565.00 (   0.00%)     7434.00 ( -13.24%)
> Max       alloc-odr4-128      6907.00 (   0.00%)     7034.00 (  -1.84%)
> Max       alloc-odr4-256      7347.00 (   0.00%)     7843.00 (  -6.75%)
> Max       alloc-odr4-512     10315.00 (   0.00%)    10866.00 (  -5.34%)
> Max       alloc-odr4-1024    11278.00 (   0.00%)    11733.00 (  -4.03%)
> Max       free-odr4-1          803.00 (   0.00%)     1009.00 ( -25.65%)
> Max       free-odr4-2          495.00 (   0.00%)      607.00 ( -22.63%)
> Max       free-odr4-4          354.00 (   0.00%)      417.00 ( -17.80%)
> Max       free-odr4-8          275.00 (   0.00%)      313.00 ( -13.82%)
> Max       free-odr4-16         273.00 (   0.00%)      272.00 (   0.37%)
> Max       free-odr4-32         309.00 (   0.00%)      324.00 (  -4.85%)
> Max       free-odr4-64         340.00 (   0.00%)      335.00 (   1.47%)
> Max       free-odr4-128        340.00 (   0.00%)      338.00 (   0.59%)
> Max       free-odr4-256        338.00 (   0.00%)      346.00 (  -2.37%)
> Max       free-odr4-512        364.00 (   0.00%)      359.00 (   1.37%)
> Max       free-odr4-1024       329.00 (   0.00%)      359.00 (  -9.12%)
> 
> The main overhead is a slight increase in latency on the memory allocation
> and free paths due to additional per-migratetype counting, with
> theoretically minimal impact on overall performance.
> 
> This patch series is based on v6.18-rc7
> 
> Hongru Zhang (3):
>   mm/page_alloc: add per-migratetype counts to buddy allocator
>   mm/vmstat: get fragmentation statistics from per-migragetype count
>   mm: optimize free_area_empty() check using per-migratetype counts
> 
>  include/linux/mmzone.h |  1 +
>  mm/internal.h          |  2 +-
>  mm/mm_init.c           |  1 +
>  mm/page_alloc.c        |  9 ++++++++-
>  mm/vmstat.c            | 30 +++++++-----------------------
>  5 files changed, 18 insertions(+), 25 deletions(-)
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-11-28  9:24 ` Vlastimil Babka
@ 2025-11-28 13:08   ` Johannes Weiner
  2025-12-01  2:36   ` Hongru Zhang
  1 sibling, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2025-11-28 13:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hongru Zhang, akpm, david, linux-mm, linux-kernel, surenb,
	mhocko, jackmanb, ziy, lorenzo.stoakes, Liam.Howlett, rppt,
	axelrasmussen, yuanchu, weixugc, Hongru Zhang

On Fri, Nov 28, 2025 at 10:24:16AM +0100, Vlastimil Babka wrote:
> On 11/28/25 04:10, Hongru Zhang wrote:
> > On mobile devices, some user-space memory management components check
> > memory pressure and fragmentation status periodically or via PSI, and
> > take actions such as killing processes or performing memory compaction
> > based on this information.
> 
> Hm /proc/buddyinfo could be enough to determine fragmentation? Also we have
> in-kernel proactive compaction these days.
> 
> > Under high load scenarios, reading /proc/pagetypeinfo causes memory
> > management components or memory allocation/free paths to be blocked
> > for extended periods waiting for the zone lock, leading to the following
> > issues:
> > 1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
> >    8750 platforms, reducing system real-time performance
> > 2. Memory management components being blocked for extended periods,
> >    preventing rapid acquisition of memory fragmentation information for
> >    critical memory management decisions and actions
> > 3. Increased latency in memory allocation and free paths due to prolonged
> >    zone lock contention
> 
> It could be argued that not capturing /proc/pagetypeinfo (often) would help.
> I wonder if we can find also other benefits from the counters in the kernel
> itself.

In earlier iterations of the huge allocator patches, I played around
with using these for compaction_suitable():

https://lore.kernel.org/linux-mm/20230418191313.268131-17-hannes@cmpxchg.org/

ISTR it cut down compaction numbers, because it would avoid runs where
free pages are mostly in unsuitable targets (free_unmovable). But this
was also in a series that used compaction_suitable() to stop kswapd,
which in hindsight was a mistake; it would need re-evaluating by itself.

I also found these counters useful to have in OOM/allocfail dumps to
see if allocator packing or compaction could have done better.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-11-28  9:24 ` Vlastimil Babka
  2025-11-28 13:08   ` Johannes Weiner
@ 2025-12-01  2:36   ` Hongru Zhang
  2025-12-01 17:01     ` Zi Yan
  1 sibling, 1 reply; 20+ messages in thread
From: Hongru Zhang @ 2025-12-01  2:36 UTC (permalink / raw)
  To: vbabka
  Cc: Liam.Howlett, akpm, axelrasmussen, david, hannes, jackmanb,
	linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt, surenb,
	weixugc, yuanchu, zhanghongru06, zhanghongru, ziy

> > On mobile devices, some user-space memory management components check
> > memory pressure and fragmentation status periodically or via PSI, and
> > take actions such as killing processes or performing memory compaction
> > based on this information.
>
> Hm /proc/buddyinfo could be enough to determine fragmentation? Also we have
> in-kernel proactive compaction these days.

In fact, besides /proc/pagetypeinfo, other system resource information is
also collected at appropriate times, and resource usage throughout the
process lifecycle is appropriately tracked as well. User-space management
components integrate this information together to make decisions and
perform proper actions.

> > Under high load scenarios, reading /proc/pagetypeinfo causes memory
> > management components or memory allocation/free paths to be blocked
> > for extended periods waiting for the zone lock, leading to the following
> > issues:
> > 1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
> >    8750 platforms, reducing system real-time performance
> > 2. Memory management components being blocked for extended periods,
> >    preventing rapid acquisition of memory fragmentation information for
> >    critical memory management decisions and actions
> > 3. Increased latency in memory allocation and free paths due to prolonged
> >    zone lock contention
>
> It could be argued that not capturing /proc/pagetypeinfo (often) would help.
> I wonder if we can find also other benefits from the counters in the kernel
> itself.

Collecting system and app resource statistics and making decisions based
on this information is a common practice among Android device manufacturers.

Currently, there should be over a billion Android phones being used daily
worldwide. The diversity of hardware configurations across Android devices
makes it difficult for kernel mechanisms alone to maintain good
performance across all usage scenarios.

First, hardware capabilities vary greatly - flagship phones may have up to
24GB of memory, while low-end devices may have as little as 4GB. CPU,
storage, battery, and passive cooling capabilities vary significantly due
to market positioning and cost factors. Hardware resources seem always
inadequate.

Second, usage scenarios also differ - some people use devices in hot
environments while others in cold environments; some enjoy high-definition
gaming while others simply browse the web.

Third, user habits vary as well. Some people rarely restart their phones
except when the battery dies or the system crashes; others restart daily,
like me. Some users never actively close apps, only switching them to
the background, resulting in dozens of apps running in the background and
keeping system resources consumed (especially memory). Yet others just use
a few apps, closing unused apps rather than leaving them in the
background.

Despite the above challenges, Android device manufacturers hope to ensure
a good user experience (no UI jank) across all situations.

Even at 60 Hz frame refresh rate (90 Hz, 120 Hz also supported now), all
work from user input to render and display should be done within 16.7 ms.
To achieve this goal, the management components perform tasks such as:
- Track system resource status: what system has
  (system resource awareness)
- Learn and predict app resource demands: what app needs
  (resource demand awareness)
- Monitor app launch, exit, and foreground-background switches: least
  important app gives back resource to system to serve most important
  one, usually the foreground app
  (user intent awareness)

Tracking system resources seems necessary for Android devices, not
optional. So the related paths are not that cold on Android devices.

All the above are from workload perspective. From the kernel perspective,
regardless of when or how frequently user-space tools read statistical
information, they should not affect the kernel's own efficiency
significantly. That's why I submit this patch series to make the read side
of /proc/pagetypeinfo lock-free. But this does introduce overhead in hot
path, I would greatly appreciate if we can discuss how to improve it here.

> Adding these migratetype counters is something that wouldn't be even
> possible in the past, until the freelist migratetype hygiene was merged.
> So now it should be AFAIK possible, but it's still some overhead in
> relatively hot paths. I wonder if we even considered this before in the
> context of migratetype hygiene? Couldn't find anything quickly.

Yes, I wrote the code on old kernel initially, at that time, I reused
set_pcppage_migratetype (also renamed) to cache the exact migratetype
list that the page block is on. After the freelist migratetype hygiene
patches were merged, I removed that logic.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-12-01  2:36   ` Hongru Zhang
@ 2025-12-01 17:01     ` Zi Yan
  2025-12-02  2:42       ` Hongru Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2025-12-01 17:01 UTC (permalink / raw)
  To: Hongru Zhang
  Cc: vbabka, Liam.Howlett, akpm, axelrasmussen, david, hannes,
	jackmanb, linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt,
	surenb, weixugc, yuanchu, zhanghongru

On 30 Nov 2025, at 21:36, Hongru Zhang wrote:

>>> On mobile devices, some user-space memory management components check
>>> memory pressure and fragmentation status periodically or via PSI, and
>>> take actions such as killing processes or performing memory compaction
>>> based on this information.
>>
>> Hm /proc/buddyinfo could be enough to determine fragmentation? Also we have
>> in-kernel proactive compaction these days.
>
> In fact, besides /proc/pagetypeinfo, other system resource information is
> also collected at appropriate times, and resource usage throughout the
> process lifecycle is appropriately tracked as well. User-space management
> components integrate this information together to make decisions and
> perform proper actions.
>
>>> Under high load scenarios, reading /proc/pagetypeinfo causes memory
>>> management components or memory allocation/free paths to be blocked
>>> for extended periods waiting for the zone lock, leading to the following
>>> issues:
>>> 1. Long interrupt-disabled spinlocks - occasionally exceeding 10ms on Qcom
>>>    8750 platforms, reducing system real-time performance
>>> 2. Memory management components being blocked for extended periods,
>>>    preventing rapid acquisition of memory fragmentation information for
>>>    critical memory management decisions and actions
>>> 3. Increased latency in memory allocation and free paths due to prolonged
>>>    zone lock contention
>>
>> It could be argued that not capturing /proc/pagetypeinfo (often) would help.
>> I wonder if we can find also other benefits from the counters in the kernel
>> itself.
>
> Collecting system and app resource statistics and making decisions based
> on this information is a common practice among Android device manufacturers.
>
> Currently, there should be over a billion Android phones being used daily
> worldwide. The diversity of hardware configurations across Android devices
> makes it difficult for kernel mechanisms alone to maintain good
> performance across all usage scenarios.
>
> First, hardware capabilities vary greatly - flagship phones may have up to
> 24GB of memory, while low-end devices may have as little as 4GB. CPU,
> storage, battery, and passive cooling capabilities vary significantly due
> to market positioning and cost factors. Hardware resources seem always
> inadequate.
>
> Second, usage scenarios also differ - some people use devices in hot
> environments while others in cold environments; some enjoy high-definition
> gaming while others simply browse the web.
>
> Third, user habits vary as well. Some people rarely restart their phones
> except when the battery dies or the system crashes; others restart daily,
> like me. Some users never actively close apps, only switching them to
> the background, resulting in dozens of apps running in the background and
> keeping system resources consumed (especially memory). Yet others just use
> a few apps, closing unused apps rather than leaving them in the
> background.
>
> Despite the above challenges, Android device manufacturers hope to ensure
> a good user experience (no UI jank) across all situations.
>
> Even at 60 Hz frame refresh rate (90 Hz, 120 Hz also supported now), all
> work from user input to render and display should be done within 16.7 ms.
> To achieve this goal, the management components perform tasks such as:
> - Track system resource status: what system has
>   (system resource awareness)
> - Learn and predict app resource demands: what app needs
>   (resource demand awareness)
> - Monitor app launch, exit, and foreground-background switches: least
>   important app gives back resource to system to serve most important
>   one, usually the foreground app
>   (user intent awareness)
>
> Tracking system resources seems necessary for Android devices, not
> optional. So the related paths are not that cold on Android devices.

These are all good background information. But how does userspace monitor
utilize pageblock migratetype information? Can you give a concrete example?

Something like when free_movable is low, background apps is killed to
provide more free pages? Or is userspace monitor even trying to attribute
different pageblock usage to each app by monitoring /proc/pagetypeinfo
before and after an app launch?

Thanks.

>
> All the above are from workload perspective. From the kernel perspective,
> regardless of when or how frequently user-space tools read statistical
> information, they should not affect the kernel's own efficiency
> significantly. That's why I submit this patch series to make the read side
> of /proc/pagetypeinfo lock-free. But this does introduce overhead in hot
> path, I would greatly appreciate if we can discuss how to improve it here.
>
>> Adding these migratetype counters is something that wouldn't be even
>> possible in the past, until the freelist migratetype hygiene was merged.
>> So now it should be AFAIK possible, but it's still some overhead in
>> relatively hot paths. I wonder if we even considered this before in the
>> context of migratetype hygiene? Couldn't find anything quickly.
>
> Yes, I wrote the code on old kernel initially, at that time, I reused
> set_pcppage_migratetype (also renamed) to cache the exact migratetype
> list that the page block is on. After the freelist migratetype hygiene
> patches were merged, I removed that logic.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access
  2025-12-01 17:01     ` Zi Yan
@ 2025-12-02  2:42       ` Hongru Zhang
  0 siblings, 0 replies; 20+ messages in thread
From: Hongru Zhang @ 2025-12-02  2:42 UTC (permalink / raw)
  To: ziy
  Cc: Liam.Howlett, akpm, axelrasmussen, david, hannes, jackmanb,
	linux-kernel, linux-mm, lorenzo.stoakes, mhocko, rppt, surenb,
	vbabka, weixugc, yuanchu, zhanghongru06, zhanghongru

> > Despite the above challenges, Android device manufacturers hope to ensure
> > a good user experience (no UI jank) across all situations.
> >
> > Even at 60 Hz frame refresh rate (90 Hz, 120 Hz also supported now), all
> > work from user input to render and display should be done within 16.7 ms.
> > To achieve this goal, the management components perform tasks such as:
> > - Track system resource status: what system has
> >   (system resource awareness)
> > - Learn and predict app resource demands: what app needs
> >   (resource demand awareness)
> > - Monitor app launch, exit, and foreground-background switches: least
> >   important app gives back resource to system to serve most important
> >   one, usually the foreground app
> >   (user intent awareness)
> >
> > Tracking system resources seems necessary for Android devices, not
> > optional. So the related paths are not that cold on Android devices.
>
> These are all good background information. But how does userspace monitor
> utilize pageblock migratetype information? Can you give a concrete example?
>
> Something like when free_movable is low, background apps is killed to
> provide more free pages? Or is userspace monitor even trying to attribute
> different pageblock usage to each app by monitoring /proc/pagetypeinfo
> before and after an app launch?
>
> Thanks.

AOSP:
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/main/core/java/com/android/internal/app/procstats/ProcessStats.java#:~:text=public%20void%20updateFragmentation()

We have proprietary algorithms, but they are confidential. I cannot
describe them publicly, sorry about that.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-12-02  2:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
2025-11-28  3:11 ` [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator Hongru Zhang
2025-11-29  0:34   ` Barry Song
2025-11-28  3:12 ` [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count Hongru Zhang
2025-11-28 12:03   ` zhongjinji
2025-11-29  0:00     ` Barry Song
2025-11-29  7:55       ` Barry Song
2025-12-01 12:29       ` Hongru Zhang
2025-12-01 18:54         ` Barry Song
2025-11-28  3:12 ` [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts Hongru Zhang
2025-11-29  0:04   ` Barry Song
2025-11-29  9:24     ` Barry Song
2025-11-28  7:49 ` [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Lorenzo Stoakes
2025-11-28  8:34   ` Hongru Zhang
2025-11-28  8:40     ` Lorenzo Stoakes
2025-11-28  9:24 ` Vlastimil Babka
2025-11-28 13:08   ` Johannes Weiner
2025-12-01  2:36   ` Hongru Zhang
2025-12-01 17:01     ` Zi Yan
2025-12-02  2:42       ` Hongru Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox