linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] slab: Introduce dedicated bucket allocator
@ 2024-03-04 18:49 Kees Cook
  2024-03-04 18:49 ` [PATCH 1/4] " Kees Cook
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 18:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	Christian Brauner, Al Viro, Jan Kara, linux-kernel, linux-mm,
	linux-fsdevel, linux-hardening

Hi,

Repeating the commit logs for patch 1 here:

    Dedicated caches are available For fixed size allocations via
    kmem_cache_alloc(), but for dynamically sized allocations there is only
    the global kmalloc API's set of buckets available. This means it isn't
    possible to separate specific sets of dynamically sized allocations into
    a separate collection of caches.

    This leads to a use-after-free exploitation weakness in the Linux
    kernel since many heap memory spraying/grooming attacks depend on using
    userspace-controllable dynamically sized allocations to collide with
    fixed size allocations that end up in same cache.

    While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
    against these kinds of "type confusion" attacks, including for fixed
    same-size heap objects, we can create a complementary deterministic
    defense for dynamically sized allocations.

    In order to isolate user-controllable sized allocations from system
    allocations, introduce kmem_buckets_create() and kmem_buckets_alloc(),
    which behave like kmem_cache_create() and like kmem_cache_alloc() for
    confining allocations to a dedicated set of sized caches (which have
    the same layout as the kmalloc caches).

    This can also be used in the future once codetag allocation annotations
    exist to implement per-caller allocation cache isolation[0] even for
    dynamic allocations.

    Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [0]

After the implemetation are 3 example patch of how this could be used
for some repeat "offenders" that get used in exploits. There are more to
be isolated beyond just these. Repeating the commit log for patch 2 here:

    The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
    use-after-free type confusion flaws in the kernel for both read and
    write primitives. Avoid having a user-controlled size cache share the
    global kmalloc allocator by using a separate set of kmalloc buckets.

    After a fresh boot under Ubuntu 23.10, we can see the caches are already
    in use:

     # grep ^msg_msg /proc/slabinfo
     msg_msg-8k             0      0   8192    4    8 : ...
     msg_msg-4k            96    128   4096    8    8 : ...
     msg_msg-2k            64     64   2048   16    8 : ...
     msg_msg-1k            64     64   1024   16    4 : ...
     msg_msg-16          1024   1024     16  256    1 : ...
     msg_msg-8              0      0      8  512    1 : ...

    Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
    Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
    Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
    Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
    Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
    Link: https://zplin.me/papers/ELOISE.pdf [6]

-Kees

Kees Cook (4):
  slab: Introduce dedicated bucket allocator
  ipc, msg: Use dedicated slab buckets for alloc_msg()
  xattr: Use dedicated slab buckets for setxattr()
  mm/util: Use dedicated slab buckets for memdup_user()

 fs/xattr.c           | 12 ++++++++-
 include/linux/slab.h | 26 ++++++++++++++++++
 ipc/msgutil.c        | 11 +++++++-
 mm/slab_common.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++
 mm/util.c            | 12 ++++++++-
 5 files changed, 122 insertions(+), 3 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] slab: Introduce dedicated bucket allocator
  2024-03-04 18:49 [PATCH 0/4] slab: Introduce dedicated bucket allocator Kees Cook
@ 2024-03-04 18:49 ` Kees Cook
  2024-03-04 18:49 ` [PATCH 2/4] ipc, msg: Use dedicated slab buckets for alloc_msg() Kees Cook
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 18:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Hyeonggon Yoo,
	linux-mm, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Christian Brauner, Al Viro, Jan Kara,
	linux-kernel, linux-fsdevel, linux-hardening

Dedicated caches are available For fixed size allocations via
kmem_cache_alloc(), but for dynamically sized allocations there is only
the global kmalloc API's set of buckets available. This means it isn't
possible to separate specific sets of dynamically sized allocations into
a separate collection of caches.

This leads to a use-after-free exploitation weakness in the Linux
kernel since many heap memory spraying/grooming attacks depend on using
userspace-controllable dynamically sized allocations to collide with
fixed size allocations that end up in same cache.

While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
against these kinds of "type confusion" attacks, including for fixed
same-size heap objects, we can create a complementary deterministic
defense for dynamically sized allocations.

In order to isolate user-controllable sized allocations from system
allocations, introduce kmem_buckets_create() and kmem_buckets_alloc(),
which behave like kmem_cache_create() and like kmem_cache_alloc() for
confining allocations to a dedicated set of sized caches (which have
the same layout as the kmalloc caches).

This can also be used in the future once codetag allocation annotations
exist to implement per-caller allocation cache isolation[1] even for
dynamic allocations.

Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
---
 include/linux/slab.h | 26 ++++++++++++++++++
 mm/slab_common.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index b5f5ee8308d0..4a4ff84534be 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -492,6 +492,16 @@ void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
 			   gfp_t gfpflags) __assume_slab_alignment __malloc;
 void kmem_cache_free(struct kmem_cache *s, void *objp);
 
+struct kmem_buckets {
+	struct kmem_cache *caches[ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL])];
+};
+
+struct kmem_buckets *
+kmem_buckets_create(const char *name, unsigned int align, slab_flags_t flags,
+		    unsigned int useroffset, unsigned int usersize,
+		    void (*ctor)(void *));
+
+
 /*
  * Bulk allocation and freeing operations. These are accelerated in an
  * allocator specific way to avoid taking locks repeatedly or building
@@ -594,6 +604,22 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
 	return __kmalloc(size, flags);
 }
 
+static __always_inline __alloc_size(2)
+void *kmem_buckets_alloc(struct kmem_buckets *b, size_t size, gfp_t flags)
+{
+	unsigned int index;
+
+	if (size > KMALLOC_MAX_CACHE_SIZE)
+		return kmalloc_large(size, flags);
+	if (WARN_ON_ONCE(!b))
+		return NULL;
+	index = kmalloc_index(size);
+	if (WARN_ONCE(!b->caches[index],
+		      "missing cache for size %zu (index %d)\n", size, index))
+		return kmalloc(size, flags);
+	return kmalloc_trace(b->caches[index], flags, size);
+}
+
 static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
 	if (__builtin_constant_p(size) && size) {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 238293b1dbe1..6002a182d014 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -392,6 +392,66 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
 }
 EXPORT_SYMBOL(kmem_cache_create);
 
+static struct kmem_cache *kmem_buckets_cache __ro_after_init;
+
+struct kmem_buckets *
+kmem_buckets_create(const char *name, unsigned int align,
+		  slab_flags_t flags,
+		  unsigned int useroffset, unsigned int usersize,
+		  void (*ctor)(void *))
+{
+	struct kmem_buckets *b;
+	int idx;
+
+	if (WARN_ON(!kmem_buckets_cache))
+		return NULL;
+
+	b = kmem_cache_alloc(kmem_buckets_cache, GFP_KERNEL|__GFP_ZERO);
+	if (WARN_ON(!b))
+		return NULL;
+
+	for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) {
+		char *short_size, *cache_name;
+		unsigned int size;
+
+		if (!kmalloc_caches[KMALLOC_NORMAL][idx])
+			continue;
+
+		size = kmalloc_caches[KMALLOC_NORMAL][idx]->object_size;
+		if (!size)
+			continue;
+
+		short_size = strchr(kmalloc_caches[KMALLOC_NORMAL][idx]->name, '-');
+		if (WARN_ON(!short_size))
+			goto fail;
+
+		cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1);
+		if (WARN_ON(!cache_name))
+			goto fail;
+
+		b->caches[idx] = kmem_cache_create_usercopy(cache_name, size,
+					align, flags, useroffset,
+					min(size - useroffset, usersize), ctor);
+		kfree(cache_name);
+		if (WARN_ON(!b->caches[idx]))
+			goto fail;
+	}
+
+	return b;
+
+fail:
+	for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) {
+		if (b->caches[idx]) {
+			kfree(b->caches[idx]->name);
+			kmem_cache_destroy(b->caches[idx]);
+		}
+	}
+	kfree(b);
+
+	return NULL;
+}
+EXPORT_SYMBOL(kmem_buckets_create);
+
 #ifdef SLAB_SUPPORTS_SYSFS
 /*
  * For a given kmem_cache, kmem_cache_destroy() should only be called
@@ -934,6 +994,10 @@ void __init create_kmalloc_caches(slab_flags_t flags)
 
 	/* Kmalloc array is now usable */
 	slab_state = UP;
+
+	kmem_buckets_cache = kmem_cache_create("kmalloc_buckets",
+				sizeof(struct kmem_buckets) * ARRAY_SIZE(kmalloc_info),
+				0, 0, NULL);
 }
 
 /**
-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/4] ipc, msg: Use dedicated slab buckets for alloc_msg()
  2024-03-04 18:49 [PATCH 0/4] slab: Introduce dedicated bucket allocator Kees Cook
  2024-03-04 18:49 ` [PATCH 1/4] " Kees Cook
@ 2024-03-04 18:49 ` Kees Cook
  2024-03-04 18:49 ` [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr() Kees Cook
  2024-03-04 18:49 ` [PATCH 4/4] mm/util: Use dedicated slab buckets for memdup_user() Kees Cook
  3 siblings, 0 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 18:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	Christian Brauner, Al Viro, Jan Kara, linux-kernel, linux-mm,
	linux-fsdevel, linux-hardening

The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
use-after-free type confusion flaws in the kernel for both read and
write primitives. Avoid having a user-controlled size cache share the
global kmalloc allocator by using a separate set of kmalloc buckets.

After a fresh boot under Ubuntu 23.10, we can see the caches are already
in use:

 # grep ^msg_msg /proc/slabinfo
 msg_msg-8k             0      0   8192    4    8 : ...
 msg_msg-4k            96    128   4096    8    8 : ...
 msg_msg-2k            64     64   2048   16    8 : ...
 msg_msg-1k            64     64   1024   16    4 : ...
 msg_msg-16          1024   1024     16  256    1 : ...
 msg_msg-8              0      0      8  512    1 : ...

Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
Link: https://zplin.me/papers/ELOISE.pdf [6]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
---
 ipc/msgutil.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index d0a0e877cadd..36f1aa9ea1cf 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -42,6 +42,15 @@ struct msg_msgseg {
 #define DATALEN_MSG	((size_t)PAGE_SIZE-sizeof(struct msg_msg))
 #define DATALEN_SEG	((size_t)PAGE_SIZE-sizeof(struct msg_msgseg))
 
+static struct kmem_buckets *msg_buckets __ro_after_init;
+
+static int __init init_msg_buckets(void)
+{
+	msg_buckets = kmem_buckets_create("msg_msg", 0, SLAB_ACCOUNT, 0, 0, NULL);
+
+	return 0;
+}
+subsys_initcall(init_msg_buckets);
 
 static struct msg_msg *alloc_msg(size_t len)
 {
@@ -50,7 +59,7 @@ static struct msg_msg *alloc_msg(size_t len)
 	size_t alen;
 
 	alen = min(len, DATALEN_MSG);
-	msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL_ACCOUNT);
+	msg = kmem_buckets_alloc(msg_buckets, sizeof(*msg) + alen, GFP_KERNEL);
 	if (msg == NULL)
 		return NULL;
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr()
  2024-03-04 18:49 [PATCH 0/4] slab: Introduce dedicated bucket allocator Kees Cook
  2024-03-04 18:49 ` [PATCH 1/4] " Kees Cook
  2024-03-04 18:49 ` [PATCH 2/4] ipc, msg: Use dedicated slab buckets for alloc_msg() Kees Cook
@ 2024-03-04 18:49 ` Kees Cook
  2024-03-04 21:16   ` Dave Chinner
  2024-03-04 22:16   ` Eric Biggers
  2024-03-04 18:49 ` [PATCH 4/4] mm/util: Use dedicated slab buckets for memdup_user() Kees Cook
  3 siblings, 2 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 18:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Christian Brauner, Alexander Viro, Jan Kara,
	linux-fsdevel, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	linux-kernel, linux-mm, linux-hardening

The setxattr() API can be used for exploiting[1][2][3] use-after-free
type confusion flaws in the kernel. Avoid having a user-controlled size
cache share the global kmalloc allocator by using a separate set of
kmalloc buckets.

Link: https://duasynt.com/blog/linux-kernel-heap-spray [1]
Link: https://etenal.me/archives/1336 [2]
Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [3]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/xattr.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 09d927603433..2b06316f1d1f 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -821,6 +821,16 @@ SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
 	return error;
 }
 
+static struct kmem_buckets *xattr_buckets;
+static int __init init_xattr_buckets(void)
+{
+	xattr_buckets = kmem_buckets_create("xattr", 0, 0, 0,
+					    XATTR_LIST_MAX, NULL);
+
+	return 0;
+}
+subsys_initcall(init_xattr_buckets);
+
 /*
  * Extended attribute LIST operations
  */
@@ -833,7 +843,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
 	if (size) {
 		if (size > XATTR_LIST_MAX)
 			size = XATTR_LIST_MAX;
-		klist = kvmalloc(size, GFP_KERNEL);
+		klist = kmem_buckets_alloc(xattr_buckets, size, GFP_KERNEL);
 		if (!klist)
 			return -ENOMEM;
 	}
-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/4] mm/util: Use dedicated slab buckets for memdup_user()
  2024-03-04 18:49 [PATCH 0/4] slab: Introduce dedicated bucket allocator Kees Cook
                   ` (2 preceding siblings ...)
  2024-03-04 18:49 ` [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr() Kees Cook
@ 2024-03-04 18:49 ` Kees Cook
  3 siblings, 0 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 18:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kees Cook, Andrew Morton, linux-mm, GONG, Ruiqi, Xiu Jianfeng,
	Suren Baghdasaryan, Kent Overstreet, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Christian Brauner, Al Viro, Jan Kara,
	linux-kernel, linux-fsdevel, linux-hardening

The prctl() PR_SET_VMA_ANON_NAME command can be used for exploiting[1]
use-after-free type confusion flaws in the kernel. This is just one
path to memdup_user() which is designed for contents coming from
userspace. Avoid having a user-controlled size cache share the global
kmalloc allocator by using a separate set of kmalloc buckets.

After a fresh boot under Ubuntu 23.10, we can see the caches are already
in use:

 # grep ^memdup /proc/slabinfo
 memdup_user-8k         4      4   8192    4    8 : ...
 memdup_user-4k         0      0   4096    8    8 : ...
 memdup_user-2k        16     16   2048   16    8 : ...
 memdup_user-1k         0      0   1024   16    4 : ...
 memdup_user-512        0      0    512   16    2 : ...
 memdup_user-256        0      0    256   16    1 : ...
 memdup_user-128        0      0    128   32    1 : ...
 memdup_user-64       256    256     64   64    1 : ...
 memdup_user-32       512    512     32  128    1 : ...
 memdup_user-16      1024   1024     16  256    1 : ...
 memdup_user-8       2048   2048      8  512    1 : ...
 memdup_user-192        0      0    192   21    1 : ...
 memdup_user-96       168    168     96   42    1 : ...

Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
---
 mm/util.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index 5a6a9802583b..818e74d11fb6 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -181,6 +181,16 @@ char *kmemdup_nul(const char *s, size_t len, gfp_t gfp)
 }
 EXPORT_SYMBOL(kmemdup_nul);
 
+static struct kmem_buckets *user_buckets __ro_after_init;
+
+static int __init init_user_buckets(void)
+{
+	user_buckets = kmem_buckets_create("memdup_user", 0, 0, 0, UINT_MAX, NULL);
+
+	return 0;
+}
+subsys_initcall(init_user_buckets);
+
 /**
  * memdup_user - duplicate memory region from user space
  *
@@ -194,7 +204,7 @@ void *memdup_user(const void __user *src, size_t len)
 {
 	void *p;
 
-	p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);
+	p = kmem_buckets_alloc(user_buckets, len, GFP_USER | __GFP_NOWARN);
 	if (!p)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr()
  2024-03-04 18:49 ` [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr() Kees Cook
@ 2024-03-04 21:16   ` Dave Chinner
  2024-03-04 21:32     ` Kees Cook
  2024-03-04 22:16   ` Eric Biggers
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2024-03-04 21:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Christian Brauner, Alexander Viro, Jan Kara,
	linux-fsdevel, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	linux-kernel, linux-mm, linux-hardening

On Mon, Mar 04, 2024 at 10:49:31AM -0800, Kees Cook wrote:
> The setxattr() API can be used for exploiting[1][2][3] use-after-free
> type confusion flaws in the kernel. Avoid having a user-controlled size
> cache share the global kmalloc allocator by using a separate set of
> kmalloc buckets.
> 
> Link: https://duasynt.com/blog/linux-kernel-heap-spray [1]
> Link: https://etenal.me/archives/1336 [2]
> Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [3]
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Jan Kara <jack@suse.cz>
> Cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/xattr.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 09d927603433..2b06316f1d1f 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -821,6 +821,16 @@ SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
>  	return error;
>  }
>  
> +static struct kmem_buckets *xattr_buckets;
> +static int __init init_xattr_buckets(void)
> +{
> +	xattr_buckets = kmem_buckets_create("xattr", 0, 0, 0,
> +					    XATTR_LIST_MAX, NULL);
> +
> +	return 0;
> +}
> +subsys_initcall(init_xattr_buckets);
> +
>  /*
>   * Extended attribute LIST operations
>   */
> @@ -833,7 +843,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
>  	if (size) {
>  		if (size > XATTR_LIST_MAX)
>  			size = XATTR_LIST_MAX;
> -		klist = kvmalloc(size, GFP_KERNEL);
> +		klist = kmem_buckets_alloc(xattr_buckets, size, GFP_KERNEL);

There's a reason this uses kvmalloc() - allocations can be up to
64kB in size and it's not uncommon for large slab allocation to
fail on long running machines. hence this needs to fall back to
vmalloc() to ensure that large xattrs can always be read.

Essentially, you're trading a heap spraying vector that almost
no-one will ever see for a far more frequent -ENOMEM denial of
service that will be seen on production systems where large xattrs
are used.

-Dave.
-- 
Dave Chinner
david@fromorbit.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr()
  2024-03-04 21:16   ` Dave Chinner
@ 2024-03-04 21:32     ` Kees Cook
  0 siblings, 0 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 21:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Vlastimil Babka, Christian Brauner, Alexander Viro, Jan Kara,
	linux-fsdevel, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	linux-kernel, linux-mm, linux-hardening

On Tue, Mar 05, 2024 at 08:16:30AM +1100, Dave Chinner wrote:
> On Mon, Mar 04, 2024 at 10:49:31AM -0800, Kees Cook wrote:
> > The setxattr() API can be used for exploiting[1][2][3] use-after-free
> > type confusion flaws in the kernel. Avoid having a user-controlled size
> > cache share the global kmalloc allocator by using a separate set of
> > kmalloc buckets.
> > 
> > Link: https://duasynt.com/blog/linux-kernel-heap-spray [1]
> > Link: https://etenal.me/archives/1336 [2]
> > Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [3]
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: linux-fsdevel@vger.kernel.org
> > ---
> >  fs/xattr.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xattr.c b/fs/xattr.c
> > index 09d927603433..2b06316f1d1f 100644
> > --- a/fs/xattr.c
> > +++ b/fs/xattr.c
> > @@ -821,6 +821,16 @@ SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
> >  	return error;
> >  }
> >  
> > +static struct kmem_buckets *xattr_buckets;
> > +static int __init init_xattr_buckets(void)
> > +{
> > +	xattr_buckets = kmem_buckets_create("xattr", 0, 0, 0,
> > +					    XATTR_LIST_MAX, NULL);
> > +
> > +	return 0;
> > +}
> > +subsys_initcall(init_xattr_buckets);
> > +
> >  /*
> >   * Extended attribute LIST operations
> >   */
> > @@ -833,7 +843,7 @@ listxattr(struct dentry *d, char __user *list, size_t size)
> >  	if (size) {
> >  		if (size > XATTR_LIST_MAX)
> >  			size = XATTR_LIST_MAX;
> > -		klist = kvmalloc(size, GFP_KERNEL);
> > +		klist = kmem_buckets_alloc(xattr_buckets, size, GFP_KERNEL);
> 
> There's a reason this uses kvmalloc() - allocations can be up to
> 64kB in size and it's not uncommon for large slab allocation to
> fail on long running machines. hence this needs to fall back to
> vmalloc() to ensure that large xattrs can always be read.

I can add a vmalloc fallback interface too. It looked like the larger
xattr usage (8k-64k) was less common, but yeah, let's not remove the
correct allocation fallback here. I'll fix this for v2.

Thanks!

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr()
  2024-03-04 18:49 ` [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr() Kees Cook
  2024-03-04 21:16   ` Dave Chinner
@ 2024-03-04 22:16   ` Eric Biggers
  2024-03-04 23:03     ` Kees Cook
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2024-03-04 22:16 UTC (permalink / raw)
  To: Kees Cook
  Cc: Vlastimil Babka, Christian Brauner, Alexander Viro, Jan Kara,
	linux-fsdevel, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	linux-kernel, linux-mm, linux-hardening

On Mon, Mar 04, 2024 at 10:49:31AM -0800, Kees Cook wrote:
> xattr: Use dedicated slab buckets for setxattr()

This patch actually changes listxattr(), not setxattr().

getxattr(), setxattr(), and listxattr() all allocate a user controlled size.
Perhaps you meant to change all three?  What is special about listxattr() (or
setxattr() if you actually meant to change that one)?

- Eric


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr()
  2024-03-04 22:16   ` Eric Biggers
@ 2024-03-04 23:03     ` Kees Cook
  0 siblings, 0 replies; 9+ messages in thread
From: Kees Cook @ 2024-03-04 23:03 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Vlastimil Babka, Christian Brauner, Alexander Viro, Jan Kara,
	linux-fsdevel, GONG, Ruiqi, Xiu Jianfeng, Suren Baghdasaryan,
	Kent Overstreet, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	linux-kernel, linux-mm, linux-hardening

On Mon, Mar 04, 2024 at 02:16:48PM -0800, Eric Biggers wrote:
> On Mon, Mar 04, 2024 at 10:49:31AM -0800, Kees Cook wrote:
> > xattr: Use dedicated slab buckets for setxattr()
> 
> This patch actually changes listxattr(), not setxattr().
> 
> getxattr(), setxattr(), and listxattr() all allocate a user controlled size.
> Perhaps you meant to change all three?  What is special about listxattr() (or
> setxattr() if you actually meant to change that one)?

Whoops. Yes, I did one and stopped. :P I'll fix it up in v2.

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-04 23:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-04 18:49 [PATCH 0/4] slab: Introduce dedicated bucket allocator Kees Cook
2024-03-04 18:49 ` [PATCH 1/4] " Kees Cook
2024-03-04 18:49 ` [PATCH 2/4] ipc, msg: Use dedicated slab buckets for alloc_msg() Kees Cook
2024-03-04 18:49 ` [PATCH 3/4] xattr: Use dedicated slab buckets for setxattr() Kees Cook
2024-03-04 21:16   ` Dave Chinner
2024-03-04 21:32     ` Kees Cook
2024-03-04 22:16   ` Eric Biggers
2024-03-04 23:03     ` Kees Cook
2024-03-04 18:49 ` [PATCH 4/4] mm/util: Use dedicated slab buckets for memdup_user() Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox