From: Uladzislau Rezki <urezki@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm@kvack.org, Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
rcu@vger.kernel.org
Subject: Re: [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu
Date: Fri, 24 Jan 2025 15:19:32 +0100 [thread overview]
Message-ID: <Z5OhdK0OqIgT7qXU@pc636> (raw)
In-Reply-To: <e8bcc4bd-3e3b-4a96-8183-4af1fbb14211@suse.cz>
On Fri, Jan 24, 2025 at 03:16:05PM +0100, Vlastimil Babka wrote:
> On 1/24/25 13:47, Uladzislau Rezki wrote:
> > On Thu, Jan 23, 2025 at 11:37:20AM +0100, Vlastimil Babka wrote:
> >> RCU has been special-casing callback function pointers that are integers
> >> lower than 4096 as offsets of rcu_head for kvfree() instead. The tree
> >> RCU implementation no longer does that as the batched kvfree_rcu() is
> >> not a simple call_rcu(). The tiny RCU still does, and the plan is also
> >> to make tree RCU use call_rcu() for SLUB_TINY configurations.
> >>
> >> Instead of teaching tree RCU again to special case the offsets, let's
> >> remove the special casing completely. Since there's no SLOB anymore, it
> >> is possible to create a callback function that can take a pointer to a
> >> middle of slab object with unknown offset and determine the object's
> >> pointer before freeing it, so implement that as kvfree_rcu_cb().
> >>
> >> Large kmalloc and vmalloc allocations are handled simply by aligning
> >> down to page size. For that we retain the requirement that the offset is
> >> smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely
> >> and instead just opencode the condition in the BUILD_BUG_ON() check.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >> include/linux/rcupdate.h | 24 +++++++++---------------
> >> kernel/rcu/tiny.c | 13 -------------
> >> mm/slab.h | 2 ++
> >> mm/slab_common.c | 5 +----
> >> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >> 5 files changed, 54 insertions(+), 32 deletions(-)
> >>
> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >> index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644
> >> --- a/include/linux/rcupdate.h
> >> +++ b/include/linux/rcupdate.h
> >> @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> >> #define RCU_POINTER_INITIALIZER(p, v) \
> >> .p = RCU_INITIALIZER(v)
> >>
> >> -/*
> >> - * Does the specified offset indicate that the corresponding rcu_head
> >> - * structure can be handled by kvfree_rcu()?
> >> - */
> >> -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
> >> -
> >> /**
> >> * kfree_rcu() - kfree an object after a grace period.
> >> * @ptr: pointer to kfree for double-argument invocations.
> >> @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> >> * when they are used in a kernel module, that module must invoke the
> >> * high-latency rcu_barrier() function at module-unload time.
> >> *
> >> - * The kfree_rcu() function handles this issue. Rather than encoding a
> >> - * function address in the embedded rcu_head structure, kfree_rcu() instead
> >> - * encodes the offset of the rcu_head structure within the base structure.
> >> - * Because the functions are not allowed in the low-order 4096 bytes of
> >> - * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
> >> + * The kfree_rcu() function handles this issue. In order to have a universal
> >> + * callback function handling different offsets of rcu_head, the callback needs
> >> + * to determine the starting address of the freed object, which can be a large
> >> + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to
> >> + * page boundary for those, only offsets up to 4095 bytes can be accommodated.
> >> * If the offset is larger than 4095 bytes, a compile-time error will
> >> * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
> >> * either fall back to use of call_rcu() or rearrange the structure to
> >> @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> >> do { \
> >> typeof (ptr) ___p = (ptr); \
> >> \
> >> - if (___p) { \
> >> - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
> >> - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> >> - } \
> >> + if (___p) { \
> >> + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \
> >> + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> >> + } \
> >>
> > Why removing the macro? At least __is_kvfree_rcu_offset() makes it
> > clear what and why + it has a nice comment what it is used for. 4096
> > looks like hard-coded value.
>
> Hmm but it's ultimately shorter. We could add a short comment to
> kvfree_rcu_arg_2() referring to the longer explanation in the kfree_rcu()
> comment?
>
Sounds good to place or keep the comment.
> > Or you do not want that someone else started to use that macro in
> > another places?
>
> And also that, since this has to be exposed in a "public" header.
>
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> >> void **p, int objects, struct slabobj_ext *obj_exts);
> >> #endif
> >>
> >> +void kvfree_rcu_cb(struct rcu_head *head);
> >> +
> >> size_t __ksize(const void *objp);
> >>
> >> static inline size_t slab_ksize(const struct kmem_cache *s)
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head)
> >> rcu_lock_acquire(&rcu_callback_map);
> >> trace_rcu_invoke_kvfree_callback("slab", head, offset);
> >>
> >> - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> >> - kvfree(ptr);
> >> -
> > This is not correct unless i miss something. Why do you remove this?
>
> Oops, meant to remove just the warn check, and unconditionally call
> kvfree(), thanks for noticing :)
>
> The warning could really only trigger due to a use-after-free corruption of
> the ptr pointer, because the BUILD_BUG_ON() would catch misuse with a too
> large offset, so we don't need to keep it.
>
> >> +void kvfree_rcu_cb(struct rcu_head *head)
> >> +{
> >> + void *obj = head;
> >> + struct folio *folio;
> >> + struct slab *slab;
> >> + struct kmem_cache *s;
> >> + void *slab_addr;
> >> +
> >> + if (unlikely(is_vmalloc_addr(obj))) {
> >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> >> + vfree(obj);
> >> + return;
> >> + }
> >> +
> >> + folio = virt_to_folio(obj);
> >> + if (unlikely(!folio_test_slab(folio))) {
> >> + /*
> >> + * rcu_head offset can be only less than page size so no need to
> >> + * consider folio order
> >> + */
> >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> >> + free_large_kmalloc(folio, obj);
> >> + return;
> >> + }
> >> +
> >> + slab = folio_slab(folio);
> >> + s = slab->slab_cache;
> >> + slab_addr = folio_address(folio);
> >> +
> >> + if (is_kfence_address(obj)) {
> >> + obj = kfence_object_start(obj);
> >> + } else {
> >> + unsigned int idx = __obj_to_index(s, slab_addr, obj);
> >> +
> >> + obj = slab_addr + s->size * idx;
> >> + obj = fixup_red_left(s, obj);
> >> + }
> >> +
> >> + slab_free(s, slab, obj, _RET_IP_);
> >> +}
> >> +
> > Tiny computer case. Just small nit, maybe remove unlikely() but you decide :)
>
> Force of habbit :)
>
:)
--
Uladzislau Rezki
next prev parent reply other threads:[~2025-01-24 14:19 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
2025-01-24 12:56 ` Uladzislau Rezki
2025-01-23 10:37 ` [PATCH RFC 2/4] rcu: remove trace_rcu_kvfree_callback Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu Vlastimil Babka
2025-01-24 12:47 ` Uladzislau Rezki
2025-01-24 14:16 ` Vlastimil Babka
2025-01-24 14:19 ` Uladzislau Rezki [this message]
2025-01-23 10:37 ` [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY Vlastimil Babka
2025-01-24 12:11 ` Uladzislau Rezki
2025-01-23 19:37 ` [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Joel Fernandes
2025-01-23 20:26 ` Paul E. McKenney
2025-01-24 14:25 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5OhdK0OqIgT7qXU@pc636 \
--to=urezki@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cl@linux.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox