[PATCH v2 0/2] Refine kmalloc caches randomization in kvmalloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Refine kmalloc caches randomization in kvmalloc
@ 2025-02-08  1:47 GONG Ruiqi
  2025-02-08  1:47 ` [PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof GONG Ruiqi
  2025-02-08  1:47 ` [PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc GONG Ruiqi
  0 siblings, 2 replies; 5+ messages in thread
From: GONG Ruiqi @ 2025-02-08  1:47 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Kees Cook
  Cc: Tamas Koczka, Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng,
	linux-mm, linux-hardening, linux-kernel, gongruiqi1

Hi,

v2: change the implementation as Vlastimil suggested
v1: https://lore.kernel.org/all/20250122074817.991060-1-gongruiqi1@huawei.com/

Tamás reported [1] that kmalloc cache randomization doesn't actually
work for those kmalloc invoked via kvmalloc. For more details, see the
commit log of patch 2.

The current solution requires a direct call from __kvmalloc_node_noprof
to __do_kmalloc_node, a static function in a different .c file.
Comparing to v1, this version achieves this by simply moving
__kvmalloc_node_noprof to mm/slub.c, as suggested by Vlastimil [2].

Link: https://github.com/google/security-research/pull/83/files#diff-1604319b55a48c39a210ee52034ed7ff5b9cdc3d704d2d9e34eb230d19fae235R200 [1]
Link: https://lore.kernel.org/all/62044279-0c56-4185-97f7-7afac65ff449@suse.cz/ [2]

GONG Ruiqi (2):
  slab: Adjust placement of __kvmalloc_node_noprof
  slab: Achieve better kmalloc caches randomization in kvmalloc

 include/linux/slab.h |  22 +++++++++
 mm/slub.c            |  90 ++++++++++++++++++++++++++++++++++
 mm/util.c            | 112 -------------------------------------------
 3 files changed, 112 insertions(+), 112 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof
  2025-02-08  1:47 [PATCH v2 0/2] Refine kmalloc caches randomization in kvmalloc GONG Ruiqi
@ 2025-02-08  1:47 ` GONG Ruiqi
  2025-02-10  9:59   ` Vlastimil Babka
  2025-02-08  1:47 ` [PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc GONG Ruiqi
  1 sibling, 1 reply; 5+ messages in thread
From: GONG Ruiqi @ 2025-02-08  1:47 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Kees Cook
  Cc: Tamas Koczka, Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng,
	linux-mm, linux-hardening, linux-kernel, gongruiqi1

Move __kvmalloc_node_noprof (and also kvfree* for consistency) into
mm/slub.c so that it can directly invoke __do_kmalloc_node, which is
needed for the next patch. Move kmalloc_gfp_adjust to slab.h since now
its two callers are in different .c files.

No functional changes intended.

Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
---
 include/linux/slab.h |  22 +++++++++
 mm/slub.c            |  90 ++++++++++++++++++++++++++++++++++
 mm/util.c            | 112 -------------------------------------------
 3 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 09eedaecf120..0bf4cbf306fe 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1101,4 +1101,26 @@ size_t kmalloc_size_roundup(size_t size);
 void __init kmem_cache_init_late(void);
 void __init kvfree_rcu_init(void);
 
+static inline gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
+{
+	/*
+	 * We want to attempt a large physically contiguous block first because
+	 * it is less likely to fragment multiple larger blocks and therefore
+	 * contribute to a long term fragmentation less than vmalloc fallback.
+	 * However make sure that larger requests are not too disruptive - no
+	 * OOM killer and no allocation failure warnings as we have a fallback.
+	 */
+	if (size > PAGE_SIZE) {
+		flags |= __GFP_NOWARN;
+
+		if (!(flags & __GFP_RETRY_MAYFAIL))
+			flags |= __GFP_NORETRY;
+
+		/* nofail semantic is implemented by the vmalloc fallback */
+		flags &= ~__GFP_NOFAIL;
+	}
+
+	return flags;
+}
+
 #endif	/* _LINUX_SLAB_H */
diff --git a/mm/slub.c b/mm/slub.c
index 1f50129dcfb3..0830894bb92c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4878,6 +4878,96 @@ void *krealloc_noprof(const void *p, size_t new_size, gfp_t flags)
 }
 EXPORT_SYMBOL(krealloc_noprof);
 
+/**
+ * __kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * failure, fall back to non-contiguous (vmalloc) allocation.
+ * @size: size of the request.
+ * @b: which set of kmalloc buckets to allocate from.
+ * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ *
+ * GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY modifier.
+ * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
+ * preferable to the vmalloc fallback, due to visible performance drawbacks.
+ *
+ * Return: pointer to the allocated memory of %NULL in case of failure
+ */
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
+{
+	void *ret;
+
+	/*
+	 * It doesn't really make sense to fallback to vmalloc for sub page
+	 * requests
+	 */
+	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
+				    kmalloc_gfp_adjust(flags, size),
+				    node);
+	if (ret || size <= PAGE_SIZE)
+		return ret;
+
+	/* non-sleeping allocations are not supported by vmalloc */
+	if (!gfpflags_allow_blocking(flags))
+		return NULL;
+
+	/* Don't even allow crazy sizes */
+	if (unlikely(size > INT_MAX)) {
+		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
+		return NULL;
+	}
+
+	/*
+	 * kvmalloc() can always use VM_ALLOW_HUGE_VMAP,
+	 * since the callers already cannot assume anything
+	 * about the resulting pointer, and cannot play
+	 * protection games.
+	 */
+	return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
+			flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+			node, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__kvmalloc_node_noprof);
+
+/**
+ * kvfree() - Free memory.
+ * @addr: Pointer to allocated memory.
+ *
+ * kvfree frees memory allocated by any of vmalloc(), kmalloc() or kvmalloc().
+ * It is slightly more efficient to use kfree() or vfree() if you are certain
+ * that you know which one to use.
+ *
+ * Context: Either preemptible task context or not-NMI interrupt.
+ */
+void kvfree(const void *addr)
+{
+	if (is_vmalloc_addr(addr))
+		vfree(addr);
+	else
+		kfree(addr);
+}
+EXPORT_SYMBOL(kvfree);
+
+/**
+ * kvfree_sensitive - Free a data object containing sensitive information.
+ * @addr: address of the data object to be freed.
+ * @len: length of the data object.
+ *
+ * Use the special memzero_explicit() function to clear the content of a
+ * kvmalloc'ed object containing sensitive data to make sure that the
+ * compiler won't optimize out the data clearing.
+ */
+void kvfree_sensitive(const void *addr, size_t len)
+{
+	if (likely(!ZERO_OR_NULL_PTR(addr))) {
+		memzero_explicit((void *)addr, len);
+		kvfree(addr);
+	}
+}
+EXPORT_SYMBOL(kvfree_sensitive);
+
 struct detached_freelist {
 	struct slab *slab;
 	void *tail;
diff --git a/mm/util.c b/mm/util.c
index b6b9684a1438..5a755d2a7347 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -612,118 +612,6 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 }
 EXPORT_SYMBOL(vm_mmap);
 
-static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
-{
-	/*
-	 * We want to attempt a large physically contiguous block first because
-	 * it is less likely to fragment multiple larger blocks and therefore
-	 * contribute to a long term fragmentation less than vmalloc fallback.
-	 * However make sure that larger requests are not too disruptive - no
-	 * OOM killer and no allocation failure warnings as we have a fallback.
-	 */
-	if (size > PAGE_SIZE) {
-		flags |= __GFP_NOWARN;
-
-		if (!(flags & __GFP_RETRY_MAYFAIL))
-			flags |= __GFP_NORETRY;
-
-		/* nofail semantic is implemented by the vmalloc fallback */
-		flags &= ~__GFP_NOFAIL;
-	}
-
-	return flags;
-}
-
-/**
- * __kvmalloc_node - attempt to allocate physically contiguous memory, but upon
- * failure, fall back to non-contiguous (vmalloc) allocation.
- * @size: size of the request.
- * @b: which set of kmalloc buckets to allocate from.
- * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
- * @node: numa node to allocate from
- *
- * Uses kmalloc to get the memory but if the allocation fails then falls back
- * to the vmalloc allocator. Use kvfree for freeing the memory.
- *
- * GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY modifier.
- * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
- * preferable to the vmalloc fallback, due to visible performance drawbacks.
- *
- * Return: pointer to the allocated memory of %NULL in case of failure
- */
-void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
-{
-	void *ret;
-
-	/*
-	 * It doesn't really make sense to fallback to vmalloc for sub page
-	 * requests
-	 */
-	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
-				    kmalloc_gfp_adjust(flags, size),
-				    node);
-	if (ret || size <= PAGE_SIZE)
-		return ret;
-
-	/* non-sleeping allocations are not supported by vmalloc */
-	if (!gfpflags_allow_blocking(flags))
-		return NULL;
-
-	/* Don't even allow crazy sizes */
-	if (unlikely(size > INT_MAX)) {
-		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
-		return NULL;
-	}
-
-	/*
-	 * kvmalloc() can always use VM_ALLOW_HUGE_VMAP,
-	 * since the callers already cannot assume anything
-	 * about the resulting pointer, and cannot play
-	 * protection games.
-	 */
-	return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
-			flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
-			node, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(__kvmalloc_node_noprof);
-
-/**
- * kvfree() - Free memory.
- * @addr: Pointer to allocated memory.
- *
- * kvfree frees memory allocated by any of vmalloc(), kmalloc() or kvmalloc().
- * It is slightly more efficient to use kfree() or vfree() if you are certain
- * that you know which one to use.
- *
- * Context: Either preemptible task context or not-NMI interrupt.
- */
-void kvfree(const void *addr)
-{
-	if (is_vmalloc_addr(addr))
-		vfree(addr);
-	else
-		kfree(addr);
-}
-EXPORT_SYMBOL(kvfree);
-
-/**
- * kvfree_sensitive - Free a data object containing sensitive information.
- * @addr: address of the data object to be freed.
- * @len: length of the data object.
- *
- * Use the special memzero_explicit() function to clear the content of a
- * kvmalloc'ed object containing sensitive data to make sure that the
- * compiler won't optimize out the data clearing.
- */
-void kvfree_sensitive(const void *addr, size_t len)
-{
-	if (likely(!ZERO_OR_NULL_PTR(addr))) {
-		memzero_explicit((void *)addr, len);
-		kvfree(addr);
-	}
-}
-EXPORT_SYMBOL(kvfree_sensitive);
-
 /**
  * kvrealloc - reallocate memory; contents remain unchanged
  * @p: object to reallocate memory for
-- 
2.25.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc
  2025-02-08  1:47 [PATCH v2 0/2] Refine kmalloc caches randomization in kvmalloc GONG Ruiqi
  2025-02-08  1:47 ` [PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof GONG Ruiqi
@ 2025-02-08  1:47 ` GONG Ruiqi
  2025-02-10 10:05   ` Vlastimil Babka
  1 sibling, 1 reply; 5+ messages in thread
From: GONG Ruiqi @ 2025-02-08  1:47 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, Kees Cook
  Cc: Tamas Koczka, Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng,
	linux-mm, linux-hardening, linux-kernel, gongruiqi1

As revealed by this writeup[1], due to the fact that __kmalloc_node (now
renamed to __kmalloc_node_noprof) is an exported symbol and will never
get inlined, using it in kvmalloc_node (now is __kvmalloc_node_noprof)
would make the RET_IP inside always point to the same address:

    upper_caller
        kvmalloc
        kvmalloc_node
        kvmalloc_node_noprof
        __kvmalloc_node_noprof	<-- all macros all the way down here
            __kmalloc_node_noprof
                __do_kmalloc_node(.., _RET_IP_)
            ...			<-- _RET_IP_ points to

That literally means all kmalloc invoked via kvmalloc would use the same
seed for cache randomization (CONFIG_RANDOM_KMALLOC_CACHES), which makes
this hardening unfunctional.

The root cause of this problem, IMHO, is that using RET_IP only cannot
identify the actual allocation site in case of kmalloc being called
inside wrappers or helper functions. And I believe there could be
similar cases in other functions. Nevertheless, I haven't thought of
any good solution for this. So for now let's solve this specific case
first.

For __kvmalloc_node_noprof, replace __kmalloc_node_noprof and call
__do_kmalloc_node directly instead, so that RET_IP can take the return
address of kvmalloc and differentiate each kvmalloc invocation:

    upper_caller
        kvmalloc
        kvmalloc_node
        kvmalloc_node_noprof
        __kvmalloc_node_noprof	<-- all macros all the way down here
            __do_kmalloc_node(.., _RET_IP_)
        ...			<-- _RET_IP_ points to

Thanks to Tamás Koczka for the report and discussion!

Link: https://github.com/google/security-research/pull/83/files#diff-1604319b55a48c39a210ee52034ed7ff5b9cdc3d704d2d9e34eb230d19fae235R200 [1]
Reported-by: Tamás Koczka <poprdi@google.com>
Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
---
 mm/slub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0830894bb92c..46e884b77dca 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4903,9 +4903,9 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
 	 * It doesn't really make sense to fallback to vmalloc for sub page
 	 * requests
 	 */
-	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
-				    kmalloc_gfp_adjust(flags, size),
-				    node);
+	ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
+				kmalloc_gfp_adjust(flags, size),
+				node, _RET_IP_);
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof
  2025-02-08  1:47 ` [PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof GONG Ruiqi
@ 2025-02-10  9:59   ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2025-02-10  9:59 UTC (permalink / raw)
  To: GONG Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Kees Cook
  Cc: Tamas Koczka, Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng,
	linux-mm, linux-hardening, linux-kernel

On 2/8/25 02:47, GONG Ruiqi wrote:
> Move __kvmalloc_node_noprof (and also kvfree* for consistency) into
> mm/slub.c so that it can directly invoke __do_kmalloc_node, which is
> needed for the next patch. Move kmalloc_gfp_adjust to slab.h since now
> its two callers are in different .c files.
> 
> No functional changes intended.
> 
> Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
> ---
>  include/linux/slab.h |  22 +++++++++
>  mm/slub.c            |  90 ++++++++++++++++++++++++++++++++++
>  mm/util.c            | 112 -------------------------------------------
>  3 files changed, 112 insertions(+), 112 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 09eedaecf120..0bf4cbf306fe 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h

This could be mm/slab.h instead.
But I'd just go all the way and move kvrealloc_noprof() too, it would be
more consistent anyway.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc
  2025-02-08  1:47 ` [PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc GONG Ruiqi
@ 2025-02-10 10:05   ` Vlastimil Babka
  0 siblings, 0 replies; 5+ messages in thread
From: Vlastimil Babka @ 2025-02-10 10:05 UTC (permalink / raw)
  To: GONG Ruiqi, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Kees Cook
  Cc: Tamas Koczka, Roman Gushchin, Hyeonggon Yoo, Xiu Jianfeng,
	linux-mm, linux-hardening, linux-kernel

On 2/8/25 02:47, GONG Ruiqi wrote:
> As revealed by this writeup[1], due to the fact that __kmalloc_node (now
> renamed to __kmalloc_node_noprof) is an exported symbol and will never
> get inlined, using it in kvmalloc_node (now is __kvmalloc_node_noprof)
> would make the RET_IP inside always point to the same address:
> 
>     upper_caller
>         kvmalloc
>         kvmalloc_node
>         kvmalloc_node_noprof
>         __kvmalloc_node_noprof	<-- all macros all the way down here
>             __kmalloc_node_noprof
>                 __do_kmalloc_node(.., _RET_IP_)
>             ...			<-- _RET_IP_ points to
> 
> That literally means all kmalloc invoked via kvmalloc would use the same
> seed for cache randomization (CONFIG_RANDOM_KMALLOC_CACHES), which makes
> this hardening unfunctional.

                 non-functional?

> The root cause of this problem, IMHO, is that using RET_IP only cannot
> identify the actual allocation site in case of kmalloc being called
> inside wrappers or helper functions.

inside non-inlined wrappers... ?

>  And I believe there could be
> similar cases in other functions. Nevertheless, I haven't thought of
> any good solution for this. So for now let's solve this specific case
> first.

Yeah it's the similar problem with shared allocation wrappers as what
allocation tagging has.

> For __kvmalloc_node_noprof, replace __kmalloc_node_noprof and call
> __do_kmalloc_node directly instead, so that RET_IP can take the return
> address of kvmalloc and differentiate each kvmalloc invocation:
> 
>     upper_caller
>         kvmalloc
>         kvmalloc_node
>         kvmalloc_node_noprof
>         __kvmalloc_node_noprof	<-- all macros all the way down here
>             __do_kmalloc_node(.., _RET_IP_)
>         ...			<-- _RET_IP_ points to
> 
> Thanks to Tamás Koczka for the report and discussion!
> 
> Link: https://github.com/google/security-research/pull/83/files#diff-1604319b55a48c39a210ee52034ed7ff5b9cdc3d704d2d9e34eb230d19fae235R200 [1]

This should be slightly better? A permalink for the file itself:
https://github.com/google/security-research/blob/908d59b573960dc0b90adda6f16f7017aca08609/pocs/linux/kernelctf/CVE-2024-27397_mitigation/docs/exploit.md

Thanks.

> Reported-by: Tamás Koczka <poprdi@google.com>
> Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
> ---
>  mm/slub.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 0830894bb92c..46e884b77dca 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4903,9 +4903,9 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
>  	 * It doesn't really make sense to fallback to vmalloc for sub page
>  	 * requests
>  	 */
> -	ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
> -				    kmalloc_gfp_adjust(flags, size),
> -				    node);
> +	ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
> +				kmalloc_gfp_adjust(flags, size),
> +				node, _RET_IP_);
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-10 10:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-08  1:47 [PATCH v2 0/2] Refine kmalloc caches randomization in kvmalloc GONG Ruiqi
2025-02-08  1:47 ` [PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof GONG Ruiqi
2025-02-10  9:59   ` Vlastimil Babka
2025-02-08  1:47 ` [PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc GONG Ruiqi
2025-02-10 10:05   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox