* [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
@ 2025-11-22 9:03 Barry Song
2025-11-27 16:53 ` Uladzislau Rezki
2025-12-01 10:36 ` David Hildenbrand (Red Hat)
0 siblings, 2 replies; 8+ messages in thread
From: Barry Song @ 2025-11-22 9:03 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, Barry Song,
Uladzislau Rezki, Sumit Semwal, John Stultz, Maxime Ripard
From: Barry Song <v-songbaohua@oppo.com>
In many cases, the pages passed to vmap() may include
high-order pages—for example, the systemheap often allocates
pages in descending order: order 8, then 4, then 0. Currently,
vmap() iterates over every page individually—even the pages
inside a high-order block are handled one by one. This patch
detects high-order pages and maps them as a single contiguous
block whenever possible.
Another possibility is to implement a new API, vmap_sg().
However, that change seems to be quite large in scope.
When vmapping a 128MB dma-buf using the systemheap,
this RFC appears to make system_heap_do_vmap() 16× faster:
W/ patch:
[ 51.363682] system_heap_do_vmap took 2474000 ns
[ 53.307044] system_heap_do_vmap took 2469008 ns
[ 55.061985] system_heap_do_vmap took 2519008 ns
[ 56.653810] system_heap_do_vmap took 2674000 ns
W/o patch:
[ 8.260880] system_heap_do_vmap took 39490000 ns
[ 32.513292] system_heap_do_vmap took 38784000 ns
[ 82.673374] system_heap_do_vmap took 40711008 ns
[ 84.579062] system_heap_do_vmap took 40236000 ns
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: John Stultz <jstultz@google.com>
Cc: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 6 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0832f944544c..af2e3e8c052a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
return err;
}
+static inline int get_vmap_batch_order(struct page **pages,
+ unsigned int stride,
+ int max_steps,
+ unsigned int idx)
+{
+ /*
+ * Currently, batching is only supported in vmap_pages_range
+ * when page_shift == PAGE_SHIFT.
+ */
+ if (stride != 1)
+ return 0;
+
+ struct page *base = pages[idx];
+ if (!PageHead(base))
+ return 0;
+
+ int order = compound_order(base);
+ int nr_pages = 1 << order;
+
+ if (max_steps < nr_pages)
+ return 0;
+
+ for (int i = 0; i < nr_pages; i++)
+ if (pages[idx + i] != base + i)
+ return 0;
+ return order;
+}
+
/*
* vmap_pages_range_noflush is similar to vmap_pages_range, but does not
* flush caches.
@@ -655,23 +683,32 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
+ unsigned int stride;
WARN_ON(page_shift < PAGE_SHIFT);
+ /*
+ * Some users may allocate pages from high-order down to order 0.
+ * We roughly check if the first page is a compound page. If so,
+ * there is a chance to batch multiple pages together.
+ */
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
- page_shift == PAGE_SHIFT)
+ (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
return vmap_small_pages_range_noflush(addr, end, prot, pages);
- for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
- int err;
+ stride = 1U << (page_shift - PAGE_SHIFT);
+ for (i = 0; i < nr; ) {
+ int err, order;
- err = vmap_range_noflush(addr, addr + (1UL << page_shift),
+ order = get_vmap_batch_order(pages, stride, nr - i, i);
+ err = vmap_range_noflush(addr, addr + (1UL << (page_shift + order)),
page_to_phys(pages[i]), prot,
- page_shift);
+ page_shift + order);
if (err)
return err;
- addr += 1UL << page_shift;
+ addr += 1UL << (page_shift + order);
+ i += 1U << (order + page_shift - PAGE_SHIFT);
}
return 0;
--
2.39.3 (Apple Git-146)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-11-22 9:03 [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible Barry Song
@ 2025-11-27 16:53 ` Uladzislau Rezki
2025-11-27 20:43 ` Barry Song
2025-12-01 10:36 ` David Hildenbrand (Red Hat)
1 sibling, 1 reply; 8+ messages in thread
From: Uladzislau Rezki @ 2025-11-27 16:53 UTC (permalink / raw)
To: Barry Song
Cc: akpm, linux-mm, linux-media, dri-devel, linaro-mm-sig,
linux-kernel, Barry Song, Uladzislau Rezki, Sumit Semwal,
John Stultz, Maxime Ripard
On Sat, Nov 22, 2025 at 05:03:43PM +0800, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> In many cases, the pages passed to vmap() may include
> high-order pages—for example, the systemheap often allocates
> pages in descending order: order 8, then 4, then 0. Currently,
> vmap() iterates over every page individually—even the pages
> inside a high-order block are handled one by one. This patch
> detects high-order pages and maps them as a single contiguous
> block whenever possible.
>
> Another possibility is to implement a new API, vmap_sg().
> However, that change seems to be quite large in scope.
>
> When vmapping a 128MB dma-buf using the systemheap,
> this RFC appears to make system_heap_do_vmap() 16× faster:
>
> W/ patch:
> [ 51.363682] system_heap_do_vmap took 2474000 ns
> [ 53.307044] system_heap_do_vmap took 2469008 ns
> [ 55.061985] system_heap_do_vmap took 2519008 ns
> [ 56.653810] system_heap_do_vmap took 2674000 ns
>
> W/o patch:
> [ 8.260880] system_heap_do_vmap took 39490000 ns
> [ 32.513292] system_heap_do_vmap took 38784000 ns
> [ 82.673374] system_heap_do_vmap took 40711008 ns
> [ 84.579062] system_heap_do_vmap took 40236000 ns
>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
> mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 0832f944544c..af2e3e8c052a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> return err;
> }
>
> +static inline int get_vmap_batch_order(struct page **pages,
> + unsigned int stride,
> + int max_steps,
> + unsigned int idx)
> +{
> + /*
> + * Currently, batching is only supported in vmap_pages_range
> + * when page_shift == PAGE_SHIFT.
> + */
> + if (stride != 1)
> + return 0;
> +
> + struct page *base = pages[idx];
> + if (!PageHead(base))
> + return 0;
> +
> + int order = compound_order(base);
> + int nr_pages = 1 << order;
> +
> + if (max_steps < nr_pages)
> + return 0;
> +
> + for (int i = 0; i < nr_pages; i++)
> + if (pages[idx + i] != base + i)
> + return 0;
> + return order;
> +}
> +
> /*
> * vmap_pages_range_noflush is similar to vmap_pages_range, but does not
> * flush caches.
> @@ -655,23 +683,32 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> pgprot_t prot, struct page **pages, unsigned int page_shift)
> {
> unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> + unsigned int stride;
>
> WARN_ON(page_shift < PAGE_SHIFT);
>
> + /*
> + * Some users may allocate pages from high-order down to order 0.
> + * We roughly check if the first page is a compound page. If so,
> + * there is a chance to batch multiple pages together.
> + */
> if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> - page_shift == PAGE_SHIFT)
> + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
>
Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
/*
* See __vmalloc_node_range() for a clear list of supported vmalloc flags.
* This gfp lists all flags currently passed through vmalloc. Currently,
* __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
* and BPF also use GFP_USER. Additionally, various users pass
* GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
*/
#define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
__GFP_NOFAIL | __GFP_ZERO | __GFP_NORETRY |\
GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
GFP_USER | __GFP_NOLOCKDEP)
Could you please clarify when PageCompound(pages[0]) returns true?
> return vmap_small_pages_range_noflush(addr, end, prot, pages);
>
> - for (i = 0; i < nr; i += 1U << (page_shift - PAGE_SHIFT)) {
> - int err;
> + stride = 1U << (page_shift - PAGE_SHIFT);
> + for (i = 0; i < nr; ) {
> + int err, order;
>
> - err = vmap_range_noflush(addr, addr + (1UL << page_shift),
> + order = get_vmap_batch_order(pages, stride, nr - i, i);
> + err = vmap_range_noflush(addr, addr + (1UL << (page_shift + order)),
> page_to_phys(pages[i]), prot,
> - page_shift);
> + page_shift + order);
> if (err)
> return err;
>
> - addr += 1UL << page_shift;
> + addr += 1UL << (page_shift + order);
> + i += 1U << (order + page_shift - PAGE_SHIFT);
> }
>
> return 0;
> --
> 2.39.3 (Apple Git-146)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-11-27 16:53 ` Uladzislau Rezki
@ 2025-11-27 20:43 ` Barry Song
2025-12-01 11:08 ` Uladzislau Rezki
0 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2025-11-27 20:43 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: akpm, linux-mm, linux-media, dri-devel, linaro-mm-sig,
linux-kernel, Barry Song, Sumit Semwal, John Stultz,
Maxime Ripard
> >
> > + /*
> > + * Some users may allocate pages from high-order down to order 0.
> > + * We roughly check if the first page is a compound page. If so,
> > + * there is a chance to batch multiple pages together.
> > + */
> > if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > - page_shift == PAGE_SHIFT)
> > + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> >
> Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
This is not the case for vmalloc, but applies to dma-bufs that are allocated
using alloc_pages() with GFP_COMP.
#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
#define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
| __GFP_NORETRY) & ~__GFP_RECLAIM) \
| __GFP_COMP)
>
> /*
> * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
> * This gfp lists all flags currently passed through vmalloc. Currently,
> * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
> * and BPF also use GFP_USER. Additionally, various users pass
> * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
> */
> #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
> __GFP_NOFAIL | __GFP_ZERO | __GFP_NORETRY |\
> GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> GFP_USER | __GFP_NOLOCKDEP)
>
> Could you please clarify when PageCompound(pages[0]) returns true?
>
In this case, dma-buf attempts to allocate as many compound high-order pages
as possible, falling back to 0-order allocations if necessary.
Then, dma_buf_vmap() is called by the GPU drivers:
1 404 drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
dma_buf_vmap(abo->dma_buf, map);
2 1568 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
ret = dma_buf_vmap(dmabuf, map);
3 354 drivers/gpu/drm/drm_gem_shmem_helper.c
<<drm_gem_shmem_vmap_locked>>
ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
4 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
<<etnaviv_gem_prime_vmap_impl>>
ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
5 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
6 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
Thanks
Barry
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-11-22 9:03 [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible Barry Song
2025-11-27 16:53 ` Uladzislau Rezki
@ 2025-12-01 10:36 ` David Hildenbrand (Red Hat)
2025-12-01 21:39 ` Barry Song
1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-12-01 10:36 UTC (permalink / raw)
To: Barry Song, akpm, linux-mm
Cc: linux-media, dri-devel, linaro-mm-sig, linux-kernel, Barry Song,
Uladzislau Rezki, Sumit Semwal, John Stultz, Maxime Ripard
On 11/22/25 10:03, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
>
> In many cases, the pages passed to vmap() may include
> high-order pages—for example, the systemheap often allocates
> pages in descending order: order 8, then 4, then 0. Currently,
> vmap() iterates over every page individually—even the pages
> inside a high-order block are handled one by one. This patch
> detects high-order pages and maps them as a single contiguous
> block whenever possible.
>
> Another possibility is to implement a new API, vmap_sg().
> However, that change seems to be quite large in scope.
>
> When vmapping a 128MB dma-buf using the systemheap,
> this RFC appears to make system_heap_do_vmap() 16× faster:
>
> W/ patch:
> [ 51.363682] system_heap_do_vmap took 2474000 ns
> [ 53.307044] system_heap_do_vmap took 2469008 ns
> [ 55.061985] system_heap_do_vmap took 2519008 ns
> [ 56.653810] system_heap_do_vmap took 2674000 ns
>
> W/o patch:
> [ 8.260880] system_heap_do_vmap took 39490000 ns
> [ 32.513292] system_heap_do_vmap took 38784000 ns
> [ 82.673374] system_heap_do_vmap took 40711008 ns
> [ 84.579062] system_heap_do_vmap took 40236000 ns
>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
> mm/vmalloc.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 0832f944544c..af2e3e8c052a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> return err;
> }
>
> +static inline int get_vmap_batch_order(struct page **pages,
> + unsigned int stride,
> + int max_steps,
> + unsigned int idx)
These fit into less lines.
ideally
\t\tunsigned int stride, int max_steps, unsigned int idx)
> +{
int order, nr_pages, i;
struct page *base;
But I think you can just drop "base". And order.
> + /*
> + * Currently, batching is only supported in vmap_pages_range
> + * when page_shift == PAGE_SHIFT.
> + */
> + if (stride != 1)
> + return 0;
> +
> + struct page *base = pages[idx];
> + if (!PageHead(base))
> + return 0;
> +
> + int order = compound_order(base);
> + int nr_pages = 1 << order;
You can drop the head check etc and simply do
nr_pages = compound_nr(pages[idx]);
if (nr_pages == 1)
return 0;
Which raises the question: are these things folios? I assume not.
> +
> + if (max_steps < nr_pages)
> + return 0;
> +
> + for (int i = 0; i < nr_pages; i++)
> + if (pages[idx + i] != base + i)
> + return 0;
if (num_pages_contiguous(&pages[idx], nr_pages) == nr_pages)
return compound_order(pages[idx]);
return 0;
--
Cheers
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-11-27 20:43 ` Barry Song
@ 2025-12-01 11:08 ` Uladzislau Rezki
2025-12-01 22:05 ` Barry Song
0 siblings, 1 reply; 8+ messages in thread
From: Uladzislau Rezki @ 2025-12-01 11:08 UTC (permalink / raw)
To: Barry Song
Cc: Uladzislau Rezki, akpm, linux-mm, linux-media, dri-devel,
linaro-mm-sig, linux-kernel, Barry Song, Sumit Semwal,
John Stultz, Maxime Ripard
On Fri, Nov 28, 2025 at 04:43:54AM +0800, Barry Song wrote:
> > >
> > > + /*
> > > + * Some users may allocate pages from high-order down to order 0.
> > > + * We roughly check if the first page is a compound page. If so,
> > > + * there is a chance to batch multiple pages together.
> > > + */
> > > if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > - page_shift == PAGE_SHIFT)
> > > + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > >
> > Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
>
> This is not the case for vmalloc, but applies to dma-bufs that are allocated
> using alloc_pages() with GFP_COMP.
>
> #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
> #define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> | __GFP_NORETRY) & ~__GFP_RECLAIM) \
> | __GFP_COMP)
>
> >
> > /*
> > * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
> > * This gfp lists all flags currently passed through vmalloc. Currently,
> > * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
> > * and BPF also use GFP_USER. Additionally, various users pass
> > * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
> > */
> > #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
> > __GFP_NOFAIL | __GFP_ZERO | __GFP_NORETRY |\
> > GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> > GFP_USER | __GFP_NOLOCKDEP)
> >
> > Could you please clarify when PageCompound(pages[0]) returns true?
> >
>
> In this case, dma-buf attempts to allocate as many compound high-order pages
> as possible, falling back to 0-order allocations if necessary.
>
OK, it is folio who uses it.
> Then, dma_buf_vmap() is called by the GPU drivers:
>
> 1 404 drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
> dma_buf_vmap(abo->dma_buf, map);
> 2 1568 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> ret = dma_buf_vmap(dmabuf, map);
> 3 354 drivers/gpu/drm/drm_gem_shmem_helper.c
> <<drm_gem_shmem_vmap_locked>>
> ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> 4 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> <<etnaviv_gem_prime_vmap_impl>>
> ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> 5 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> 6 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
>
Thank you for clarification. That would be good to reflect it in the
commit message. Also, please note that:
> if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> - page_shift == PAGE_SHIFT)
> + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
>
we rely on page_shift == PAGE_SHIFT condition for the non-sleep vmalloc()
allocations(GFP_ATOMIC, GFP_NOWAIT), so we go via vmap_small_pages_range_noflush()
path. Your patch adds !PageCompound(pages[0]) also. It is not a problem
since it is vmap() path but we need to comment that.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-12-01 10:36 ` David Hildenbrand (Red Hat)
@ 2025-12-01 21:39 ` Barry Song
0 siblings, 0 replies; 8+ messages in thread
From: Barry Song @ 2025-12-01 21:39 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: akpm, linux-mm, linux-media, dri-devel, linaro-mm-sig,
linux-kernel, Barry Song, Uladzislau Rezki, Sumit Semwal,
John Stultz, Maxime Ripard
On Mon, Dec 1, 2025 at 6:36 PM David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
[...]
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 0832f944544c..af2e3e8c052a 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -642,6 +642,34 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
> > return err;
> > }
> >
> > +static inline int get_vmap_batch_order(struct page **pages,
> > + unsigned int stride,
> > + int max_steps,
> > + unsigned int idx)
>
> These fit into less lines.
>
> ideally
>
> \t\tunsigned int stride, int max_steps, unsigned int idx)
Right, thanks!
>
> > +{
>
> int order, nr_pages, i;
> struct page *base;
>
> But I think you can just drop "base". And order.
Right, thanks!
>
> > + /*
> > + * Currently, batching is only supported in vmap_pages_range
> > + * when page_shift == PAGE_SHIFT.
> > + */
> > + if (stride != 1)
> > + return 0;
> > +
> > + struct page *base = pages[idx];
> > + if (!PageHead(base))
> > + return 0;
> > +
> > + int order = compound_order(base);
> > + int nr_pages = 1 << order;
>
>
> You can drop the head check etc and simply do
>
> nr_pages = compound_nr(pages[idx]);
> if (nr_pages == 1)
> return 0;
>
Nice. Since compound_nr() returns 1 for tail pages.
> Which raises the question: are these things folios? I assume not.
In my case, it’s simply alloc_pages with GFP_COMP. I assume that folios
allocated via folio_alloc() would also automatically benefit from this patch?
Currently, vmap() takes a pages array as an argument. So even for a folio,
we need to expand it into individual pages. Simply passing a folios array to
vmalloc likely won’t work, since vmap() could start and end at subpages
in the middle of a folio.
Thanks
Barry
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-12-01 11:08 ` Uladzislau Rezki
@ 2025-12-01 22:05 ` Barry Song
2025-12-02 14:03 ` Uladzislau Rezki
0 siblings, 1 reply; 8+ messages in thread
From: Barry Song @ 2025-12-01 22:05 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: akpm, linux-mm, linux-media, dri-devel, linaro-mm-sig,
linux-kernel, Barry Song, Sumit Semwal, John Stultz,
Maxime Ripard
On Mon, Dec 1, 2025 at 7:08 PM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Fri, Nov 28, 2025 at 04:43:54AM +0800, Barry Song wrote:
> > > >
> > > > + /*
> > > > + * Some users may allocate pages from high-order down to order 0.
> > > > + * We roughly check if the first page is a compound page. If so,
> > > > + * there is a chance to batch multiple pages together.
> > > > + */
> > > > if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > > - page_shift == PAGE_SHIFT)
> > > > + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > > >
> > > Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
> >
> > This is not the case for vmalloc, but applies to dma-bufs that are allocated
> > using alloc_pages() with GFP_COMP.
> >
> > #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
> > #define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> > | __GFP_NORETRY) & ~__GFP_RECLAIM) \
> > | __GFP_COMP)
> >
> > >
> > > /*
> > > * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
> > > * This gfp lists all flags currently passed through vmalloc. Currently,
> > > * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
> > > * and BPF also use GFP_USER. Additionally, various users pass
> > > * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
> > > */
> > > #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
> > > __GFP_NOFAIL | __GFP_ZERO | __GFP_NORETRY |\
> > > GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> > > GFP_USER | __GFP_NOLOCKDEP)
> > >
> > > Could you please clarify when PageCompound(pages[0]) returns true?
> > >
> >
> > In this case, dma-buf attempts to allocate as many compound high-order pages
> > as possible, falling back to 0-order allocations if necessary.
> >
> OK, it is folio who uses it.
>
> > Then, dma_buf_vmap() is called by the GPU drivers:
> >
> > 1 404 drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
> > dma_buf_vmap(abo->dma_buf, map);
> > 2 1568 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> > ret = dma_buf_vmap(dmabuf, map);
> > 3 354 drivers/gpu/drm/drm_gem_shmem_helper.c
> > <<drm_gem_shmem_vmap_locked>>
> > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > 4 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > <<etnaviv_gem_prime_vmap_impl>>
> > ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> > 5 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> > ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> > 6 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> >
> Thank you for clarification. That would be good to reflect it in the
> commit message. Also, please note that:
Sure.
>
> > if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > - page_shift == PAGE_SHIFT)
> > + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> >
> we rely on page_shift == PAGE_SHIFT condition for the non-sleep vmalloc()
> allocations(GFP_ATOMIC, GFP_NOWAIT), so we go via vmap_small_pages_range_noflush()
> path. Your patch adds !PageCompound(pages[0]) also. It is not a problem
> since it is vmap() path but we need to comment that.
Sure. Would the following work?
/*
* For vmap(), users may allocate pages from high orders down
to order 0,
* while always using PAGE_SHIFT as the page_shift.
* We first check whether the initial page is a compound page. If so,
* there may be an opportunity to batch multiple pages together.
*/
if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
(page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
return vmap_small_pages_range_noflush(addr, end, prot, pages);
Thanks
Barry
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible
2025-12-01 22:05 ` Barry Song
@ 2025-12-02 14:03 ` Uladzislau Rezki
0 siblings, 0 replies; 8+ messages in thread
From: Uladzislau Rezki @ 2025-12-02 14:03 UTC (permalink / raw)
To: Barry Song
Cc: Uladzislau Rezki, akpm, linux-mm, linux-media, dri-devel,
linaro-mm-sig, linux-kernel, Barry Song, Sumit Semwal,
John Stultz, Maxime Ripard
On Tue, Dec 02, 2025 at 06:05:56AM +0800, Barry Song wrote:
> On Mon, Dec 1, 2025 at 7:08 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Fri, Nov 28, 2025 at 04:43:54AM +0800, Barry Song wrote:
> > > > >
> > > > > + /*
> > > > > + * Some users may allocate pages from high-order down to order 0.
> > > > > + * We roughly check if the first page is a compound page. If so,
> > > > > + * there is a chance to batch multiple pages together.
> > > > > + */
> > > > > if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > > > - page_shift == PAGE_SHIFT)
> > > > > + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > > > >
> > > > Do we support __GFP_COMP as vmalloc/vmap flag? As i see from latest:
> > >
> > > This is not the case for vmalloc, but applies to dma-bufs that are allocated
> > > using alloc_pages() with GFP_COMP.
> > >
> > > #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO)
> > > #define HIGH_ORDER_GFP (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
> > > | __GFP_NORETRY) & ~__GFP_RECLAIM) \
> > > | __GFP_COMP)
> > >
> > > >
> > > > /*
> > > > * See __vmalloc_node_range() for a clear list of supported vmalloc flags.
> > > > * This gfp lists all flags currently passed through vmalloc. Currently,
> > > > * __GFP_ZERO is used by BPF and __GFP_NORETRY is used by percpu. Both drm
> > > > * and BPF also use GFP_USER. Additionally, various users pass
> > > > * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP.
> > > > */
> > > > #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\
> > > > __GFP_NOFAIL | __GFP_ZERO | __GFP_NORETRY |\
> > > > GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\
> > > > GFP_USER | __GFP_NOLOCKDEP)
> > > >
> > > > Could you please clarify when PageCompound(pages[0]) returns true?
> > > >
> > >
> > > In this case, dma-buf attempts to allocate as many compound high-order pages
> > > as possible, falling back to 0-order allocations if necessary.
> > >
> > OK, it is folio who uses it.
> >
> > > Then, dma_buf_vmap() is called by the GPU drivers:
> > >
> > > 1 404 drivers/accel/amdxdna/amdxdna_gem.c <<amdxdna_gem_obj_vmap>>
> > > dma_buf_vmap(abo->dma_buf, map);
> > > 2 1568 drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
> > > ret = dma_buf_vmap(dmabuf, map);
> > > 3 354 drivers/gpu/drm/drm_gem_shmem_helper.c
> > > <<drm_gem_shmem_vmap_locked>>
> > > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > > 4 85 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> > > <<etnaviv_gem_prime_vmap_impl>>
> > > ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
> > > 5 433 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
> > > ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
> > > 6 88 drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
> > > ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> > >
> > Thank you for clarification. That would be good to reflect it in the
> > commit message. Also, please note that:
>
> Sure.
>
> >
> > > if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> > > - page_shift == PAGE_SHIFT)
> > > + (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> > >
> > we rely on page_shift == PAGE_SHIFT condition for the non-sleep vmalloc()
> > allocations(GFP_ATOMIC, GFP_NOWAIT), so we go via vmap_small_pages_range_noflush()
> > path. Your patch adds !PageCompound(pages[0]) also. It is not a problem
> > since it is vmap() path but we need to comment that.
>
> Sure. Would the following work?
>
> /*
> * For vmap(), users may allocate pages from high orders down
> to order 0,
> * while always using PAGE_SHIFT as the page_shift.
> * We first check whether the initial page is a compound page. If so,
> * there may be an opportunity to batch multiple pages together.
> */
> if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) ||
> (page_shift == PAGE_SHIFT && !PageCompound(pages[0])))
> return vmap_small_pages_range_noflush(addr, end, prot, pages);
>
Sounds good!
Thank you.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-12-02 14:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-22 9:03 [PATCH RFC] mm/vmap: map contiguous pages in batches whenever possible Barry Song
2025-11-27 16:53 ` Uladzislau Rezki
2025-11-27 20:43 ` Barry Song
2025-12-01 11:08 ` Uladzislau Rezki
2025-12-01 22:05 ` Barry Song
2025-12-02 14:03 ` Uladzislau Rezki
2025-12-01 10:36 ` David Hildenbrand (Red Hat)
2025-12-01 21:39 ` Barry Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox