From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 769F2C83F22 for ; Thu, 17 Jul 2025 02:50:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11CBD6B0092; Wed, 16 Jul 2025 22:50:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A5DE8D0001; Wed, 16 Jul 2025 22:50:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED6BC8D0007; Wed, 16 Jul 2025 22:50:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D6E7C8D0001 for ; Wed, 16 Jul 2025 22:50:52 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8129B1405A2 for ; Thu, 17 Jul 2025 02:50:52 +0000 (UTC) X-FDA: 83672229144.30.4921D72 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf27.hostedemail.com (Postfix) with ESMTP id 8DBCE4000B for ; Thu, 17 Jul 2025 02:50:50 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZkQMhnTc; spf=pass (imf27.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752720650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WZZjqBPfc9UtG7UUAMJwc8EPjG/Uu9Ens89MSZaC6TA=; b=z24sO5qMtjs2LKZJ0jDs4vXWk+GHMy8rmSTBAHauIfB4EMiitxi1u5ySRvphxY0a+0zsxs EqZcbNlqjX1iIzXOcckGcuZmF6321hEQ/nQS5+N24AvIzpxxyEDovvOW9IJfcWNrwztyOW yN9I2n0L5oN8mVnXSD83wIxtxbUu+DA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZkQMhnTc; spf=pass (imf27.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752720650; a=rsa-sha256; cv=none; b=dCWMAGu1z1+BUJAF6HRiW5MI5Epx4QSUABy81NRUKCfb8sbbhVNeMtmsBkiJmmfQo1OnA5 ++WMWczviXDqXA4iUn72JUQKi/Jg/bKKB3WQ2iABpzB0cJVjAnRL9AsxRaJkXXbwEwjzzX NGh/BANXPuramnyQN0/5vBggNxIiHE4= Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-45555e3317aso2441875e9.3 for ; Wed, 16 Jul 2025 19:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1752720649; x=1753325449; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WZZjqBPfc9UtG7UUAMJwc8EPjG/Uu9Ens89MSZaC6TA=; b=ZkQMhnTcUOO6NgBPbrhNwzZOs4PNKODA6K3clgt0VEDRu61lKPWksW46h2ratUjRzX TqgrZ6yc3Gl6xuKwPUS5AqIKRnbDFrg77Yo5HOics2x1+8Mvl8Pgbv+OQWYjXF3/oBQh XMOh+5pIenw6D1GaNylfuAewIqh8hz8a3HnYPM2shnfQf/pUOudRM5XAYHKxCfNrLSNr IaY2kqYOWw8CWgynJSJn41O+TC8RkNAFPVZPNMbcVob2y+p9rRquOm2nN0RBpAMxcNJN 0o6zMlxoIfChIJS6sOeThYyodCd/ESq6G3AiZ+3Ul+HOHwA2WEngWLbYCAqWVPAr2ee+ 7Uqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752720649; x=1753325449; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WZZjqBPfc9UtG7UUAMJwc8EPjG/Uu9Ens89MSZaC6TA=; b=aNgason5isX61BYqM1aLVs33ul5fv66eSSaEgUD66DcKGTG35NVC6XKaoIt3RzjHhO 3Gt0wvuOXQo1xDtdBdtpoSur6yGpid4vowpZ7xHQuPraDymULlHO3edOJFrFVm3uIZn1 MqccZ0Qa09+uzO3N92Qx4FqrbiUZes0sC0jHONUBUJWkGJVPREgVDdg/H48V9JhHl4K8 53s4yyj50y+UJ1NAZ8a80Kgpc8f6cER32W2rfVw7xNJkz49G6dJUOZxqx/RsxR33vsQR gw0kgPoNqlrsreXlQZkrnF8o5vwRU009tLWTYACkjYuq0mwcWfYY+thwkx+Qd9gIHEXZ lEiQ== X-Forwarded-Encrypted: i=1; AJvYcCULoa2UmHkAsC4/2onvL7AiYS7IJi0w/24fGWHQkjBtQmshXWQ/Lj5fKYtdTO4dioghlE9NKQOT3A==@kvack.org X-Gm-Message-State: AOJu0Yyg36SvDKZiLv/MPxKBN6/7obNg37XN+ShdcUyEJcPVOqOJdj1g oGGP2gYedaGAoIjK1iQtyWj4yFVg6KfBt0y1N/qpOq/l0OLW7SqCe8l8xHOcrNZAaXgSYEphRk6 uD3fO+YaVdkEEguvdeVCjzo7JtYHHSz4= X-Gm-Gg: ASbGncvbexG08dEC+xZx2FPAXbKFbJUrjdNL69C7vAhsUWDkRSyOqkXAfADeLvA1UsG ULSwlJiMBeUNVD5EkyfTO8m3UpZzf4r+EgOVpTCgPMWtRHchcj7dIQazJKs868N5OgyunjYojBc DsgY4R+H05JnOACMZQ1k+VPHU1VMdnjpCKQ+3ysXZr/zc6Ysyrls5Hg68QeBp2pnsLWqsA2N7ZD PTIxtQLBH3KH/c9sKyIg11yPgiWMzZS0ee5kgM1g34jRTs= X-Google-Smtp-Source: AGHT+IFwf79lhlrKdmgN/tnNI+U/m0kQE/g0MlTcZs62ALz3xisFAj6QVSDtRUNGn/K/PbhY4HVto/er9P5BK8K/Oi8= X-Received: by 2002:a05:600c:a10e:b0:456:1204:e7e2 with SMTP id 5b1f17b1804b1-4562e815fc8mr38932485e9.12.1752720648400; Wed, 16 Jul 2025 19:50:48 -0700 (PDT) MIME-Version: 1.0 References: <20250716022950.69330-1-alexei.starovoitov@gmail.com> <20250716022950.69330-6-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Wed, 16 Jul 2025 19:50:37 -0700 X-Gm-Features: Ac12FXzgV1SLwvjx9o7CXhDloQ1VQYsZ8SKmaAcjc6AKedVK6c9K7IQZPXqGS2g Message-ID: Subject: Re: [PATCH v3 5/6] slab: Introduce kmalloc_nolock() and kfree_nolock(). To: Vlastimil Babka Cc: bpf , linux-mm , Harry Yoo , Shakeel Butt , Michal Hocko , Sebastian Sewior , Andrii Nakryiko , Kumar Kartikeya Dwivedi , Andrew Morton , Peter Zijlstra , Steven Rostedt , Johannes Weiner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 8DBCE4000B X-Rspamd-Server: rspam06 X-Stat-Signature: 3bt9g71j33srre1ma1ef9819uz99ierf X-HE-Tag: 1752720650-663286 X-HE-Meta: U2FsdGVkX1/dD4w4uexxZz/mRMHmwS2G23LR9ytgyMKI91QSop2tF2NSI92i1pjLztrSlTukr5bxywtje1dGkO11anv1oof/o0J2WZ7G2Kdl1WVbTdk0H3hRHWkiAMTs10FNgZLvSFwhmFmHeOhbM8dcY5bX4UTB83jlDPVB4K/sGfUqHT9zBHHsjePLlEUs1CK1WNgmSp1Hw8mW9W49hEI7k8+o/uXjl/1Ohi0ekDdhYy6yevq6ZHtWzZ9irQ2RUgxAeQdpKePR3q4/JwUxEqBW7ICIEZEWShxmCNo8Zp+lyCYc5v6DRR8Lw8QZZFwbujmPKIxUZpSGKB6gAzSZtGVQBpSKPwadEO8GNKabZoOmZLiFPzE9E6ZfWjBj/aJSRIELcLiVjbGiK1JQ7iZri7E/cD2fPd6q1CS6kFQDAueAR4PchF/5spNkM1TC2vliSvBsquoSRIZ4ItvFxSlSPEMMGeIlx54Ju2gXj5Z6mU7/7/AHOSxsBbWqeHv73bEv0JnL7ePQJsz41gePSHd/ZWic3qD7LX3dX57pJPS5iEyT96eHR6LAz/ow9Qne7uR2wEhFd2XxyZPuGOVlOz7uQm+Gm1ayf5ZKcxaSWjUUZH7tyXC2fHs9HZabRgFZ6YIHy3eZd9nNkCnY7epwUF08N/Dg9xLn9U4l/OvKvnnDBZ6js8FjZXLVoGJ/2yzz9ztzh9aA43mE6lc4/nbRy2RLztmt0I30YdaNxprN7Ul69NzH9qmYnHVyEnbsFvZDxhTr5Lu91IgjcRnxgyEwLDF7G5qtf4Nb43jeneACmi6fmyn0+5NEGIBc/GfSp7ZjiK6eFbYggJLkBG8jc6uwhaHByQ7+cT/m3yx2jvvTI+pDqKlKK/R0tVCyIDUCT5yuy0lmTVEaB6KK7uEW2j+dISv7ZhUdc+O+/am/FepzOw9YJv0LXy3Aj4N18WBpl0XiDFXsQrH0BYtKjTV2K3jL44e xh6r60Ux b4T0o1b08uwyUGp3yCE8XK8dHEYaFpfCz7LOsLBgkSOHpCUl1ubxpRl2WUnCMkAw4dkIPq2mKXSJ8ImICxhAY978OC9Lem1896/jP5+JPWpDCAE1XWA1FgF6v9oDf6u7nUrpBzZasQBqz8KrLdjBps7uvJCj9H8jPiY4yG9ugy/zFrBrM6QgoHSNwyyTVWNAUqnw5eF+D096y4JJxg9johvFaLyqJFA86qM3/PZpajHdZ+xKCXIJVSAz+hnyCW1enRPGSdMCloaq6Y4RfhP1zgJDCvqeLEcryttbrUcndgSKqz5YDE/kAQHg/uUlrVH+Xeq0LAB+ZIGsXDk/f4B94efonDaGyhkLx5IWeJsaLBgi5gIKNzN7thzLLXevfQRL7xsVkHJ/DpJP7wtr/lMIzfbkgMxLEkoWRpSjSp5y3v9FIWvyCokLx4Z65Pn49lQkEAI9CeeewHnNgsNI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 16, 2025 at 3:58=E2=80=AFAM Vlastimil Babka wr= ote: > > On 7/16/25 04:29, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > kmalloc_nolock() relies on ability of local_lock to detect the situatio= n > > ^ local_trylock_t perhaps? > > > when it's locked. > > In !PREEMPT_RT local_lock_is_locked() is true only when NMI happened in > > irq saved region that protects _that specific_ per-cpu kmem_cache_cpu. > > It can be also true when you call it from the bpf hook, no? Correct. Technically one can disasm ___slab_alloc(), find some instruction after pushfq;cli, add kprobe there, attach bpf prog to kprobe, and call kmalloc_nolock (eventually, when bpf infra switches to kmalloc_nolock). I wouldn't call it malicious, but if somebody does that they are looking for trouble. Even syzbot doesn't do such things. > > In that case retry the operation in a different kmalloc bucket. > > The second attempt will likely succeed, since this cpu locked > > different kmem_cache_cpu. > > > > Similarly, in PREEMPT_RT local_lock_is_locked() returns true when > > per-cpu rt_spin_lock is locked by current task. In this case re-entranc= e > > into the same kmalloc bucket is unsafe, and kmalloc_nolock() tries > > a different bucket that is most likely is not locked by the current > > task. Though it may be locked by a different task it's safe to > > rt_spin_lock() on it. > > > > Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL > > immediately if called from hard irq or NMI in PREEMPT_RT. > > > > kfree_nolock() defers freeing to irq_work when local_lock_is_locked() > > and in_nmi() or in PREEMPT_RT. > > > > SLUB_TINY config doesn't use local_lock_is_locked() and relies on > > spin_trylock_irqsave(&n->list_lock) to allocate while kfree_nolock() > > always defers to irq_work. > > > > Signed-off-by: Alexei Starovoitov > > Haven't seen an obvious bug now but will ponder it some more. Meanwhile s= ome > nits and maybe one bit more serious concern. > > > +static inline void local_lock_cpu_slab(struct kmem_cache *s, unsigned = long *flags) > > +{ > > + /* > > + * ___slab_alloc()'s caller is supposed to check if kmem_cache::k= mem_cache_cpu::lock > > + * can be acquired without a deadlock before invoking the functio= n. > > + * > > + * On PREEMPT_RT an invocation is not possible from IRQ-off or pr= eempt > > + * disabled context. The lock will always be acquired and if need= ed it > > + * block and sleep until the lock is available. > > + * > > + * On !PREEMPT_RT allocations from any context but NMI are safe. = The lock > > + * is always acquired with disabled interrupts meaning it is alwa= ys > > + * possible to it. > > + * In NMI context it is needed to check if the lock is acquired. = If it is not, > > This also could mention the bpf instrumentation context? Ok. > > + * it is safe to acquire it. The trylock semantic is used to tell= lockdep > > + * that we don't spin. The BUG_ON() will not trigger if it is saf= e to acquire > > + * the lock. > > + * > > + */ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > > + BUG_ON(!local_trylock_irqsave(&s->cpu_slab->lock, *flags)= ); > > Linus might still spot the BUG_ON() and complain, lockdep_assert() would = be > safer maybe :) > Or just use local_lock_irqsave() with !CONFIG_LOCKDEP as well. Fair enough. Let's save one branch in the critical path. > Nit: maybe could be a #define to avoid the unusual need for "&flags" inst= ead > of "flags" when calling. When "bool allow_spin" was there in Sebastian's version it definitely looked cleaner as a proper function, but now, if (!IS_ENABLED(CONFIG_PREEMPT_RT)) can be #ifdef CONFIG_PREEMPT_RT and the comment will look normal (without ugly backslashes) So yeah. I'll convert it to macro. > > +/** > > + * kmalloc_nolock - Allocate an object of given size from any context. > > + * @size: size to allocate > > + * @gfp_flags: GFP flags. Only __GFP_ACCOUNT, __GFP_ZERO allowed. > > + * @node: node number of the target node. > > + * > > + * Return: pointer to the new object or NULL in case of error. > > + * NULL does not mean EBUSY or EAGAIN. It means ENOMEM. > > + * There is no reason to call it again and expect !NULL. > > + */ > > +void *kmalloc_nolock_noprof(size_t size, gfp_t gfp_flags, int node) > > +{ > > + gfp_t alloc_gfp =3D __GFP_NOWARN | __GFP_NOMEMALLOC | gfp_flags; > > + struct kmem_cache *s; > > + bool can_retry =3D true; > > + void *ret =3D ERR_PTR(-EBUSY); > > + > > + VM_WARN_ON_ONCE(gfp_flags & ~(__GFP_ACCOUNT | __GFP_ZERO)); > > + > > + if (unlikely(!size)) > > + return ZERO_SIZE_PTR; > > + > > + if (!USE_LOCKLESS_FAST_PATH() && (in_nmi() || in_hardirq())) > > Nit: maybe just due explicit PREEMPT_RT checks when the code isn't about > lockless fastpaths, True. I wasn't sure what's better. do_slab_free() does if (USE_LOCKLESS_FAST_PATH()) but it's really meant to be PREEMPT_RT, since 'else' part doesn't make sense otherwise. It's doing local_lock() without _irqsave() which is inconsistent with everything else and looks broken when one doesn't have knowledge of local_lock_internal.h This patch fixes this part: - local_lock(&s->cpu_slab->lock); + local_lock_cpu_slab(s, &flags); Here, in this hunk, if (IS_ENABLED(CONFIG_PREEMPT_RT) might look better indeed. > > + /* kmalloc_nolock() in PREEMPT_RT is not supported from i= rq */ > > + return NULL; > > +retry: > > + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) > > + return NULL; > > + s =3D kmalloc_slab(size, NULL, alloc_gfp, _RET_IP_); > > + > > + if (!(s->flags & __CMPXCHG_DOUBLE) && !kmem_cache_debug(s)) > > + /* > > + * kmalloc_nolock() is not supported on architectures tha= t > > + * don't implement cmpxchg16b, but debug caches don't use > > + * per-cpu slab and per-cpu partial slabs. They rely on > > + * kmem_cache_node->list_lock, so kmalloc_nolock() can > > + * attempt to allocate from debug caches by > > + * spin_trylock_irqsave(&n->list_lock, ...) > > + */ > > + return NULL; > > + > > + /* > > + * Do not call slab_alloc_node(), since trylock mode isn't > > + * compatible with slab_pre_alloc_hook/should_failslab and > > + * kfence_alloc. Hence call __slab_alloc_node() (at most twice) > > + * and slab_post_alloc_hook() directly. > > + * > > + * In !PREEMPT_RT ___slab_alloc() manipulates (freelist,tid) pair > > + * in irq saved region. It assumes that the same cpu will not > > + * __update_cpu_freelist_fast() into the same (freelist,tid) pair= . > > + * Therefore use in_nmi() to check whether particular bucket is i= n > > + * irq protected section. > > + * > > + * If in_nmi() && local_lock_is_locked(s->cpu_slab) then it means= that > > + * this cpu was interrupted somewhere inside ___slab_alloc() afte= r > > + * it did local_lock_irqsave(&s->cpu_slab->lock, flags). > > + * In this case fast path with __update_cpu_freelist_fast() is no= t safe. > > + */ > > +#ifndef CONFIG_SLUB_TINY > > + if (!in_nmi() || !local_lock_is_locked(&s->cpu_slab->lock)) > > +#endif > > + ret =3D __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, s= ize); > > Nit: use IS_DEFINED(CONFIG_SLUB_TINY) to make this look better? ok. > > +static void defer_deactivate_slab(struct slab *slab) > > +{ > > Nit: for more consistency this could thake the freelist argument and assi= gn > it here, and not in the caller. ok. > > + struct defer_free *df =3D this_cpu_ptr(&defer_free_objects); > > + > > + if (llist_add(&slab->llnode, &df->slabs)) > > + irq_work_queue(&df->work); > > +} > > + > > +void defer_free_barrier(void) > > +{ > > + int cpu; > > + > > + for_each_possible_cpu(cpu) > > + irq_work_sync(&per_cpu_ptr(&defer_free_objects, cpu)->wor= k); > > +} > > + > > #ifndef CONFIG_SLUB_TINY > > /* > > * Fastpath with forced inlining to produce a kfree and kmem_cache_fre= e that > > @@ -4575,6 +4857,8 @@ static __always_inline void do_slab_free(struct k= mem_cache *s, > > struct slab *slab, void *head, void *tail= , > > int cnt, unsigned long addr) > > { > > + /* cnt =3D=3D 0 signals that it's called from kfree_nolock() */ > > + bool allow_spin =3D cnt; > > struct kmem_cache_cpu *c; > > unsigned long tid; > > void **freelist; > > @@ -4593,10 +4877,30 @@ static __always_inline void do_slab_free(struct= kmem_cache *s, > > barrier(); > > > > if (unlikely(slab !=3D c->slab)) { > > - __slab_free(s, slab, head, tail, cnt, addr); > > + if (unlikely(!allow_spin)) { > > + /* > > + * __slab_free() can locklessly cmpxchg16 into a = slab, > > + * but then it might need to take spin_lock or lo= cal_lock > > + * in put_cpu_partial() for further processing. > > + * Avoid the complexity and simply add to a defer= red list. > > + */ > > + defer_free(s, head); > > + } else { > > + __slab_free(s, slab, head, tail, cnt, addr); > > + } > > return; > > } > > > > + if (unlikely(!allow_spin)) { > > + if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) && > > Same nit about USE_LOCKLESS_FAST_PATH Here, I have to disagree unless we fix the couple lines below as well. > > + local_lock_is_locked(&s->cpu_slab->lock)) { > > + defer_free(s, head); > > + return; > > + } > > + cnt =3D 1; /* restore cnt. kfree_nolock() frees one objec= t at a time */ > > + kasan_slab_free(s, head, false, false, /* skip quarantine= */true); > > + } > > + > > if (USE_LOCKLESS_FAST_PATH()) { > > freelist =3D READ_ONCE(c->freelist); > > > > @@ -4607,8 +4911,10 @@ static __always_inline void do_slab_free(struct = kmem_cache *s, > > goto redo; > > } > > } else { > > + __maybe_unused long flags =3D 0; > > + > > /* Update the free list under the local lock */ > > - local_lock(&s->cpu_slab->lock); > > + local_lock_cpu_slab(s, &flags); > > c =3D this_cpu_ptr(s->cpu_slab); > > if (unlikely(slab !=3D c->slab)) { > > local_unlock(&s->cpu_slab->lock); > > @@ -4621,7 +4927,7 @@ static __always_inline void do_slab_free(struct k= mem_cache *s, > > c->freelist =3D head; > > c->tid =3D next_tid(tid); > > > > - local_unlock(&s->cpu_slab->lock); > > + local_unlock_cpu_slab(s, &flags); > > } > > stat_add(s, FREE_FASTPATH, cnt); > > } > > @@ -4844,6 +5150,62 @@ void kfree(const void *object) > > } > > EXPORT_SYMBOL(kfree); > > > > +/* > > + * Can be called while holding raw_spinlock_t or from IRQ and NMI, > > + * but only for objects allocated by kmalloc_nolock(), > > + * since some debug checks (like kmemleak and kfence) were > > + * skipped on allocation. large_kmalloc is not supported either. > > + */ > > +void kfree_nolock(const void *object) > > +{ > > + struct folio *folio; > > + struct slab *slab; > > + struct kmem_cache *s; > > + void *x =3D (void *)object; > > + > > + if (unlikely(ZERO_OR_NULL_PTR(object))) > > + return; > > + > > + folio =3D virt_to_folio(object); > > + if (unlikely(!folio_test_slab(folio))) { > > + WARN(1, "Buggy usage of kfree_nolock"); > > + return; > > + } > > + > > + slab =3D folio_slab(folio); > > + s =3D slab->slab_cache; > > + > > + memcg_slab_free_hook(s, slab, &x, 1); > > + alloc_tagging_slab_free_hook(s, slab, &x, 1); > > + /* > > + * Unlike slab_free() do NOT call the following: > > + * kmemleak_free_recursive(x, s->flags); > > + * debug_check_no_locks_freed(x, s->object_size); > > + * debug_check_no_obj_freed(x, s->object_size); > > + * __kcsan_check_access(x, s->object_size, ..); > > + * kfence_free(x); > > + * since they take spinlocks. > > + */ > > So here's the bigger concern. What if someone allocates with regular > kmalloc() so that the debugging stuff is performed as usual, and then tri= es > to use kfree_nolock() whre we skip it? You might not be planning such usa= ge, > but later someone can realize that only their freeing context is limited, > finds out kfree_nolock() exists and tries to use it? > > Can we document this strongly enough? Or even enforce it somehow? Or when > any of these kinds of debugs above are enabled, we play it safe and use > defer_free()? Let's break it one by one. 1. kmemleak_free_recursive() will miss an object that was recorded during normal kmalloc() and that's indeed problematic. 2. debug_check_no_locks_freed() and debug_check_no_obj_freed() are somewhat harmless. We miss checks, but it's not breaking the corresponding features. 3. __kcsan_check_access() doesn't take locks, but its stack is so deep and looks to be recursive that I doubt it's safe from any context. 4. kfence_free() looks like an existing quirk. I'm not sure why it's there in the slab free path :) kfence comment says: * KFENCE objects live in a separate page range and are not to be intermixe= d * with regular heap objects (e.g. KFENCE objects must never be added to th= e * allocator freelists). Failing to do so may and will result in heap * corruptions, therefore is_kfence_address() must be used to check whether * an object requires specific handling. so it should always be a nop for slab. I removed the call for peace of mind. So imo only 1 is dodgy. We can add: if (!(flags & SLAB_NOLEAKTRACE) && kmemleak_free_enabled) defer_free(..); but it's ugly too. My preference is to add a comment saying that only objects allocated by kmalloc_nolock() should be freed by kfree_nolock().