From: Song Liu <song@kernel.org>
To: <bpf@vger.kernel.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>
Cc: <ast@kernel.org>, <daniel@iogearbox.net>, <kernel-team@fb.com>,
<akpm@linux-foundation.org>, <rick.p.edgecombe@intel.com>,
<hch@infradead.org>, <imbrenda@linux.ibm.com>,
<mcgrof@kernel.org>, Song Liu <song@kernel.org>,
Christoph Hellwig <hch@lst.de>
Subject: [PATCH v4 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
Date: Fri, 15 Apr 2022 09:44:10 -0700 [thread overview]
Message-ID: <20220415164413.2727220-2-song@kernel.org> (raw)
In-Reply-To: <20220415164413.2727220-1-song@kernel.org>
Huge page backed vmalloc memory could benefit performance in many cases.
However, some users of vmalloc may not be ready to handle huge pages for
various reasons: hardware constraints, potential pages split, etc.
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to track down all the users that require
the opt-out, as the allocation are passed different stacks and may cause
issues in different layers.
To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag,
VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask
specificially.
Also, remove vmalloc_no_huge() and add opt-in helpers vmalloc_huge().
Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with
HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
---
arch/Kconfig | 6 ++----
arch/powerpc/kernel/module.c | 2 +-
arch/s390/kvm/pv.c | 7 +------
include/linux/vmalloc.h | 4 ++--
mm/vmalloc.c | 17 ++++++++++-------
5 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..31c4fdc4a4ba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,10 +854,8 @@ config HAVE_ARCH_HUGE_VMAP
#
# Archs that select this would be capable of PMD-sized vmaps (i.e.,
-# arch_vmap_pmd_supported() returns true), and they must make no assumptions
-# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
-# can be used to prohibit arch-specific allocations from using hugepages to
-# help with this (e.g., modules may require it).
+# arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag
+# must be used to enable allocations to use hugepages.
#
config HAVE_ARCH_HUGE_VMALLOC
depends on HAVE_ARCH_HUGE_VMAP
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 40a583e9d3c7..97a76a8619fb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
* too.
*/
return __vmalloc_node_range(size, 1, start, end, gfp, prot,
- VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
+ VM_FLUSH_RESET_PERMS,
NUMA_NO_NODE, __builtin_return_address(0));
}
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 7f7c0d6af2ce..cc7c9599f43e 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -137,12 +137,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
/* Allocate variable storage */
vlen = ALIGN(virt * ((npages * PAGE_SIZE) / HPAGE_SIZE), PAGE_SIZE);
vlen += uv_info.guest_virt_base_stor_len;
- /*
- * The Create Secure Configuration Ultravisor Call does not support
- * using large pages for the virtual memory area.
- * This is a hardware limitation.
- */
- kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);
+ kvm->arch.pv.stor_var = vzalloc(vlen);
if (!kvm->arch.pv.stor_var)
goto out_err;
return 0;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3b1df7da402d..b159c2789961 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -26,7 +26,7 @@ struct notifier_block; /* in notifier.h */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */
#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
-#define VM_NO_HUGE_VMAP 0x00000400 /* force PAGE_SIZE pte mapping */
+#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
!defined(CONFIG_KASAN_VMALLOC)
@@ -153,7 +153,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
const void *caller) __alloc_size(1);
void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller) __alloc_size(1);
-void *vmalloc_no_huge(unsigned long size) __alloc_size(1);
+void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..be7c4f22d5dc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3106,7 +3106,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
return NULL;
}
- if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
+ if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
unsigned long size_per_node;
/*
@@ -3273,21 +3273,24 @@ void *vmalloc(unsigned long size)
EXPORT_SYMBOL(vmalloc);
/**
- * vmalloc_no_huge - allocate virtually contiguous memory using small pages
- * @size: allocation size
+ * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
+ * @size: allocation size
+ * @gfp_mask: flags for the page level allocator
*
- * Allocate enough non-huge pages to cover @size from the page level
+ * Allocate enough pages to cover @size from the page level
* allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_no_huge(unsigned long size)
+void *vmalloc_huge(unsigned long size, gfp_t gfp_mask)
{
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
- GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP,
+ gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
NUMA_NO_NODE, __builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_no_huge);
+EXPORT_SYMBOL_GPL(vmalloc_huge);
/**
* vzalloc - allocate virtually contiguous memory with zero fill
--
2.30.2
next prev parent reply other threads:[~2022-04-15 16:54 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-15 16:44 [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-15 16:44 ` Song Liu [this message]
2022-04-15 17:43 ` [PATCH v4 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Rik van Riel
2022-04-15 16:44 ` [PATCH v4 bpf 2/4] page_alloc: use vmalloc_huge for large system hash Song Liu
2022-04-15 17:43 ` Rik van Riel
2022-04-25 7:07 ` Geert Uytterhoeven
2022-04-25 8:17 ` Linus Torvalds
2022-04-25 8:24 ` Geert Uytterhoeven
2022-04-15 16:44 ` [PATCH v4 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-15 18:06 ` Rik van Riel
2022-06-16 16:10 ` Dave Hansen
2022-04-15 16:44 ` [PATCH v4 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-04-15 19:05 ` [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
2022-04-16 1:34 ` Song Liu
2022-04-16 1:42 ` Luis Chamberlain
2022-04-16 1:43 ` Luis Chamberlain
2022-04-16 5:08 ` Christoph Hellwig
2022-04-16 19:55 ` Song Liu
2022-04-16 20:30 ` Linus Torvalds
2022-04-16 22:26 ` Song Liu
2022-04-18 10:06 ` Mike Rapoport
2022-04-19 0:44 ` Luis Chamberlain
2022-04-19 1:56 ` Edgecombe, Rick P
2022-04-19 5:36 ` Song Liu
2022-04-19 18:42 ` Mike Rapoport
2022-04-19 19:20 ` Linus Torvalds
2022-04-20 2:03 ` Alexei Starovoitov
2022-04-20 2:18 ` Linus Torvalds
2022-04-20 14:42 ` Song Liu
2022-04-20 18:28 ` Luis Chamberlain
2022-04-21 7:29 ` Song Liu
2022-04-21 3:25 ` Nicholas Piggin
2022-04-21 5:48 ` Linus Torvalds
2022-04-21 6:02 ` Linus Torvalds
2022-04-21 9:07 ` Nicholas Piggin
2022-04-21 8:57 ` Nicholas Piggin
2022-04-21 15:44 ` Linus Torvalds
2022-04-21 23:30 ` Nicholas Piggin
2022-04-22 0:49 ` Linus Torvalds
2022-04-22 1:51 ` Nicholas Piggin
2022-04-22 2:31 ` Linus Torvalds
2022-04-22 2:57 ` Nicholas Piggin
2022-04-21 15:47 ` Edgecombe, Rick P
2022-04-21 16:15 ` Linus Torvalds
2022-04-22 0:12 ` Nicholas Piggin
2022-04-22 2:29 ` Edgecombe, Rick P
2022-04-22 2:47 ` Linus Torvalds
2022-04-22 16:54 ` Edgecombe, Rick P
2022-04-22 3:08 ` Nicholas Piggin
2022-04-22 4:31 ` Nicholas Piggin
2022-04-22 17:10 ` Edgecombe, Rick P
2022-04-22 20:22 ` Edgecombe, Rick P
2022-04-22 3:33 ` Nicholas Piggin
2022-04-21 9:47 ` Nicholas Piggin
2022-04-19 21:24 ` Luis Chamberlain
2022-04-19 23:58 ` Edgecombe, Rick P
2022-04-20 7:58 ` Petr Mladek
2022-04-19 18:20 ` Mike Rapoport
2022-04-24 17:43 ` Linus Torvalds
2022-04-25 6:48 ` Song Liu
2022-04-21 3:19 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220415164413.2727220-2-song@kernel.org \
--to=song@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=imbrenda@linux.ibm.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=rick.p.edgecombe@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox