From: Vlastimil Babka <vbabka@suse.cz>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>
Cc: linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/slab: fix a memory leak on kobject_init_and_add() failure
Date: Mon, 21 Oct 2024 12:13:55 +0200 [thread overview]
Message-ID: <77d5c04b-9b97-4aee-85a9-c5efb2fa21fe@suse.cz> (raw)
In-Reply-To: <20241021091413.154775-1-42.hyeyoo@gmail.com>
Hi,
On 10/21/24 11:14, Hyeonggon Yoo wrote:
> When kobject_init_and_add() fails during cache creation,
> kobj->name can be leaked because SLUB does not call kobject_put(),
> which should have been invoked per the kobject API documentation.
> It has a bit of historical context, though; SLUB does not call
> kobject_put() to avoid double-free for struct kmem_cache because
> 1) simply calling it would free all resources related to the cache, and
> 2) struct kmem_cache descriptor is always freed by cache_cache()'s
> error handling path, causing struct kmem_cache to be freed twice.
>
> This issue can be reproduced by creating new slab caches while applying
> failslab for kernfs_node_cache. This makes kobject_add_varg() succeed,
> but causes kobject_add_internal() to fail in kobject_init_and_add()
> during cache creation.
>
> Historically, this issue has attracted developers' attention several times.
> Each time a fix addressed either the leak or the double-free,
> it caused the other issue. Let's summarize a bit of history here:
>
> The leak has existed since the early days of SLUB.
>
> Commit 54b6a731025f ("slub: fix leak of 'name' in sysfs_slab_add")
> introduced a double-free bug while fixing the leak.
>
> Commit 80da026a8e5d ("mm/slub: fix slab double-free in case of duplicate
> sysfs filename") re-introduced the leak while fixing the double-free
> error.
>
> Commit dde3c6b72a16 ("mm/slub: fix a memory leak in sysfs_slab_add()")
> fixed the memory leak, but it was later reverted by commit 757fed1d0898
> ("Revert "mm/slub: fix a memory leak in sysfs_slab_add()"") to avoid
> the double-free error.
Thanks a lot for digging that history out.
> This is where we are now: we've chosen a memory leak over a double-free.
> So, how can this be fixed? Let's change the logic this way:
>
> 1. Move cache name allocation from __kmem_cache_create_args() to
> do_kmem_cache_create() and free all resources associated with the cache
> on error in do_kmem_cache_create() to simplify error handling.
>
> 2. In do_kmem_cache_create(), assume all resources are already freed upon
> sysfs_slab_add() failure and do not take the error handling path.
Sounds good.
> One more thing to address when calling kobject_put() in sysfs_slab_add()
> on kobject_init_and_add() failure is that it shouldn't destroy caches
> that were already created during the boot process just because it failed to
> create sysfs files.
Yeah that would likely lead to numerous errors.
> To avoid that, use a no-op function as the release function if SLUB is
> trying to create sysfs files for those early caches.
> This ensures that resources allocated by the kobject infrastructure are
> properly freed on error while keeping the early caches active.
>
> Note that using a no-op function as the release function is not recommended
> by the kobject API document, but I can't come up with a better approach
> here.
So what if we just don't call kobject_put() in that case? We should not be
leaking anything in that case, we just accept that the cache wasn't added to
sysfs (should we pr_warn() btw?) but it will work otherwise? Since we want
the boot caches remain around anyway, does it matter if they are also an
initialized but not linked kobject?
I think the comment "If this function returns an error, kobject_put() must
be called" means that *if* you want to destroy it due to the failure, you
must use kobject_put() and not e.g. kfree(). But IMHO it doesn't mean you
must destroy it because of the kobject_add() failure.
Seems it would be less hacky to me than the dummy release function approach.
Thanks,
Vlastimil
> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> ---
> mm/slab_common.c | 17 +++-------------
> mm/slub.c | 50 ++++++++++++++++++++++++++++++++++++++----------
> 2 files changed, 43 insertions(+), 24 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 62878edb0a81..d4ccd10f3098 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -236,14 +236,12 @@ static struct kmem_cache *create_cache(const char *name,
> goto out;
> err = do_kmem_cache_create(s, name, object_size, args, flags);
> if (err)
> - goto out_free_cache;
> + goto out;
>
> s->refcount = 1;
> list_add(&s->list, &slab_caches);
> return s;
>
> -out_free_cache:
> - kmem_cache_free(kmem_cache, s);
> out:
> return ERR_PTR(err);
> }
> @@ -281,7 +279,6 @@ struct kmem_cache *__kmem_cache_create_args(const char *name,
> slab_flags_t flags)
> {
> struct kmem_cache *s = NULL;
> - const char *cache_name;
> int err;
>
> #ifdef CONFIG_SLUB_DEBUG
> @@ -332,18 +329,10 @@ struct kmem_cache *__kmem_cache_create_args(const char *name,
> if (s)
> goto out_unlock;
>
> - cache_name = kstrdup_const(name, GFP_KERNEL);
> - if (!cache_name) {
> - err = -ENOMEM;
> - goto out_unlock;
> - }
> -
> args->align = calculate_alignment(flags, args->align, object_size);
> - s = create_cache(cache_name, object_size, args, flags);
> - if (IS_ERR(s)) {
> + s = create_cache(name, object_size, args, flags);
> + if (IS_ERR(s))
> err = PTR_ERR(s);
> - kfree_const(cache_name);
> - }
>
> out_unlock:
> mutex_unlock(&slab_mutex);
> diff --git a/mm/slub.c b/mm/slub.c
> index c95ac26dad07..1daf4de8dc33 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -330,10 +330,11 @@ struct track {
> enum track_item { TRACK_ALLOC, TRACK_FREE };
>
> #ifdef SLAB_SUPPORTS_SYSFS
> -static int sysfs_slab_add(struct kmem_cache *);
> +static int sysfs_slab_add(struct kmem_cache *, bool);
> static int sysfs_slab_alias(struct kmem_cache *, const char *);
> #else
> -static inline int sysfs_slab_add(struct kmem_cache *s) { return 0; }
> +static inline int sysfs_slab_add(struct kmem_cache *s, bool sysfs_lazy)
> + { return 0; }
> static inline int sysfs_slab_alias(struct kmem_cache *s, const char *p)
> { return 0; }
> #endif
> @@ -6137,7 +6138,12 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name,
> {
> int err = -EINVAL;
>
> - s->name = name;
> + s->name = kstrdup_const(name, GFP_KERNEL);
> + if (!s->name) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> s->size = s->object_size = size;
>
> s->flags = kmem_cache_flags(flags, s->name);
> @@ -6204,16 +6210,20 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name,
> goto out;
> }
>
> - err = sysfs_slab_add(s);
> + err = sysfs_slab_add(s, false);
> if (err)
> - goto out;
> + /* Resources are already freed on sysfs_slab_add() failure */
> + return err;
>
> if (s->flags & SLAB_STORE_USER)
> debugfs_slab_add(s);
>
> out:
> - if (err)
> + if (err) {
> __kmem_cache_release(s);
> + kfree_const(s->name);
> + kmem_cache_free(kmem_cache, s);
> + }
> return err;
> }
>
> @@ -7162,11 +7172,22 @@ static void kmem_cache_release(struct kobject *k)
> slab_kmem_cache_release(to_slab(k));
> }
>
> +static void kmem_cache_sysfs_lazy_release(struct kobject *k) { }
> +
> static const struct sysfs_ops slab_sysfs_ops = {
> .show = slab_attr_show,
> .store = slab_attr_store,
> };
>
> +/*
> + * Early slab caches shouldn't be destroyed just because SLUB failed to create
> + * sysfs files. use a no-op function as .release function.
> + */
> +static const struct kobj_type slab_sysfs_lazy_ktype = {
> + .sysfs_ops = &slab_sysfs_ops,
> + .release = kmem_cache_sysfs_lazy_release,
> +};
> +
> static const struct kobj_type slab_ktype = {
> .sysfs_ops = &slab_sysfs_ops,
> .release = kmem_cache_release,
> @@ -7223,12 +7244,13 @@ static char *create_unique_id(struct kmem_cache *s)
> return name;
> }
>
> -static int sysfs_slab_add(struct kmem_cache *s)
> +static int sysfs_slab_add(struct kmem_cache *s, bool sysfs_lazy)
> {
> int err;
> const char *name;
> struct kset *kset = cache_kset(s);
> int unmergeable = slab_unmergeable(s);
> + const struct kobj_type *ktype;
>
> if (!unmergeable && disable_higher_order_debug &&
> (slub_debug & DEBUG_METADATA_FLAGS))
> @@ -7253,9 +7275,14 @@ static int sysfs_slab_add(struct kmem_cache *s)
> }
>
> s->kobj.kset = kset;
> - err = kobject_init_and_add(&s->kobj, &slab_ktype, NULL, "%s", name);
> + if (sysfs_lazy)
> + ktype = &slab_sysfs_lazy_ktype;
> + else
> + ktype = &slab_ktype;
> + err = kobject_init_and_add(&s->kobj, ktype, NULL, "%s", name);
> +
> if (err)
> - goto out;
> + goto out_put_kobj;
>
> err = sysfs_create_group(&s->kobj, &slab_attr_group);
> if (err)
> @@ -7269,8 +7296,11 @@ static int sysfs_slab_add(struct kmem_cache *s)
> if (!unmergeable)
> kfree(name);
> return err;
> +
> out_del_kobj:
> kobject_del(&s->kobj);
> +out_put_kobj:
> + kobject_put(&s->kobj);
> goto out;
> }
>
> @@ -7337,7 +7367,7 @@ static int __init slab_sysfs_init(void)
> slab_state = FULL;
>
> list_for_each_entry(s, &slab_caches, list) {
> - err = sysfs_slab_add(s);
> + err = sysfs_slab_add(s, true);
> if (err)
> pr_err("SLUB: Unable to add boot slab %s to sysfs\n",
> s->name);
next prev parent reply other threads:[~2024-10-21 10:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-21 9:14 Hyeonggon Yoo
2024-10-21 10:13 ` Vlastimil Babka [this message]
2024-10-21 16:27 ` Christoph Lameter (Ampere)
2024-10-22 15:12 ` Vlastimil Babka
2024-10-22 16:08 ` Christoph Lameter (Ampere)
2024-10-23 9:23 ` Hyeonggon Yoo
2024-10-23 9:28 ` Vlastimil Babka
2024-10-23 17:12 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77d5c04b-9b97-4aee-85a9-c5efb2fa21fe@suse.cz \
--to=vbabka@suse.cz \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox