From: Muhammad Usama Anjum <usama.anjum@arm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Uladzislau Rezki <urezki@gmail.com>,
Nick Terrell <terrelln@fb.com>, David Sterba <dsterba@suse.com>,
Vishal Moola <vishal.moola@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
bpf@vger.kernel.org, Ryan.Roberts@arm.com,
david.hildenbrand@arm.com
Cc: Muhammad Usama Anjum <usama.anjum@arm.com>
Subject: [PATCH v3 0/3] mm: Free contiguous order-0 pages efficiently
Date: Tue, 24 Mar 2026 13:35:31 +0000 [thread overview]
Message-ID: <20260324133538.497616-1-usama.anjum@arm.com> (raw)
Hi All,
A recent change to vmalloc caused some performance benchmark regressions (see
[1]). I'm attempting to fix that (and at the same time significantly improve
beyond the baseline) by freeing a contiguous set of order-0 pages as a batch.
At the same time I observed that free_contig_range() was essentially doing the
same thing as vfree() so I've fixed it there too. While at it, optimize the
__free_contig_frozen_range() as well.
[1] https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com
v6.18 - Before the patch causing regression was added
mm-new - current latest code
this series - v2 series of these patches
(>0 is faster, <0 is slower, (R)/(I) = statistically significant
Regression/Improvement)
v6.18 vs mm-new
+-----------------+----------------------------------------------------------+-------------------+-------------+
| Benchmark | Result Class | v6.18 (base) | mm-new |
+=================+==========================================================+===================+=============+
| micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | 653643.33 | (R) -50.92% |
| | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 366167.33 | (R) -11.96% |
| | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 489484.00 | (R) -35.21% |
| | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | 1011250.33 | (R) -36.45% |
| | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 1086812.33 | (R) -31.83% |
| | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 657940.00 | (R) -38.62% |
| | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 765422.00 | (R) -24.84% |
| | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 2468585.00 | (R) -37.83% |
| | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 2815758.33 | (R) -26.32% |
| | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | 4851969.00 | (R) -37.76% |
| | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | 4496257.33 | (R) -31.15% |
| | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 570605.00 | -8.97% |
| | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 500866.00 | -5.88% |
| | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 499733.00 | -6.95% |
| | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | 5266237.67 | (R) -40.19% |
| | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | 490284.00 | -2.10% |
| | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 850986.33 | (R) -48.03% |
| | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 2712106.00 | (R) -40.48% |
| | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 111151.33 | 3.52% |
+-----------------+----------------------------------------------------------+-------------------+-------------+
v6.18 vs mm-new with patches
+-----------------+----------------------------------------------------------+-------------------+--------------+
| Benchmark | Result Class | v6.18 (base) | this series |
+=================+==========================================================+===================+==============+
| micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | 653643.33 | -14.02% |
| | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 366167.33 | -7.23% |
| | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 489484.00 | -1.57% |
| | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | 1011250.33 | 1.57% |
| | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 1086812.33 | (I) 15.75% |
| | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 657940.00 | (I) 9.05% |
| | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 765422.00 | (I) 38.45% |
| | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 2468585.00 | (I) 12.56% |
| | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 2815758.33 | (I) 38.61% |
| | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | 4851969.00 | (I) 13.43% |
| | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | 4496257.33 | (I) 49.21% |
| | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 570605.00 | -8.47% |
| | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 500866.00 | -8.17% |
| | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 499733.00 | -5.54% |
| | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | 5266237.67 | (I) 4.63% |
| | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | 490284.00 | 1.53% |
| | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 850986.33 | -0.00% |
| | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 2712106.00 | 1.22% |
| | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 111151.33 | (I) 4.98% |
+-----------------+----------------------------------------------------------+-------------------+--------------+
mm-new vs vmalloc_2 results are in 2/3 patch.
So this series is mitigating the regression on average as results show -14% to 49% improvement.
Thanks,
Muhammad Usama Anjum
---
Changes since v2: (summary)
- Patch 1 and 3: Rework the loop to check for memory sections
- Patch 2: Rework by removing the BUG on and add helper free_pages_bulk()
Changes since v1:
- Update description
- Rebase on mm-new and rerun benchmarks/tests
- Patch 1: move FPI_PREPARED check and add todo
- Patch 2: Rework catering newer changes in vfree()
- New Patch 3: optimizes __free_contig_frozen_range()
Muhammad Usama Anjum (1):
mm/page_alloc: Optimize __free_contig_frozen_range()
Ryan Roberts (2):
mm/page_alloc: Optimize free_contig_range()
vmalloc: Optimize vfree
include/linux/gfp.h | 4 ++
mm/page_alloc.c | 146 ++++++++++++++++++++++++++++++++++++++++++--
mm/vmalloc.c | 16 ++---
3 files changed, 151 insertions(+), 15 deletions(-)
--
2.47.3
next reply other threads:[~2026-03-24 13:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 13:35 Muhammad Usama Anjum [this message]
2026-03-24 13:35 ` [PATCH v3 1/3] mm/page_alloc: Optimize free_contig_range() Muhammad Usama Anjum
2026-03-24 14:46 ` Zi Yan
2026-03-24 15:22 ` David Hildenbrand
2026-03-24 17:14 ` Zi Yan
2026-03-25 14:06 ` Muhammad Usama Anjum
2026-03-24 20:56 ` David Hildenbrand (Arm)
2026-03-25 14:11 ` Muhammad Usama Anjum
2026-03-24 13:35 ` [PATCH v3 2/3] vmalloc: Optimize vfree Muhammad Usama Anjum
2026-03-24 14:55 ` Zi Yan
2026-03-25 8:56 ` Uladzislau Rezki
2026-03-25 15:02 ` Muhammad Usama Anjum
2026-03-25 16:16 ` Uladzislau Rezki
2026-03-25 16:25 ` Muhammad Usama Anjum
2026-03-25 16:34 ` David Hildenbrand (Arm)
2026-03-25 16:49 ` Uladzislau Rezki
2026-03-25 14:34 ` Usama Anjum
2026-03-25 10:05 ` David Hildenbrand (Arm)
2026-03-25 14:26 ` Muhammad Usama Anjum
2026-03-25 15:01 ` David Hildenbrand (Arm)
2026-03-24 13:35 ` [PATCH v3 3/3] mm/page_alloc: Optimize __free_contig_frozen_range() Muhammad Usama Anjum
2026-03-24 15:06 ` Zi Yan
2026-03-25 10:14 ` David Hildenbrand (Arm)
2026-03-25 16:03 ` Muhammad Usama Anjum
2026-03-25 19:52 ` Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324133538.497616-1-usama.anjum@arm.com \
--to=usama.anjum@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=Ryan.Roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=bpf@vger.kernel.org \
--cc=david.hildenbrand@arm.com \
--cc=david@kernel.org \
--cc=dsterba@suse.com \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=terrelln@fb.com \
--cc=urezki@gmail.com \
--cc=vbabka@kernel.org \
--cc=vishal.moola@gmail.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox