From: Vlastimil Babka <vbabka@suse.cz>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
Christoph Lameter <cl@linux.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
David Rientjes <rientjes@google.com>,
Pekka Enberg <penberg@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
linux-mm@kvack.org,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Thomas Gleixner <tglx@linutronix.de>,
Mike Galbraith <efault@gmx.de>
Subject: Re: [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe
Date: Tue, 23 Aug 2022 18:39:19 +0200 [thread overview]
Message-ID: <9674adec-7973-398b-6367-8f9100782eb0@suse.cz> (raw)
In-Reply-To: <YvkJKwZ8c/h6OcuL@hyeyoo>
On 8/14/22 16:39, Hyeonggon Yoo wrote:
> On Fri, Aug 12, 2022 at 11:14:23AM +0200, Vlastimil Babka wrote:
>> Rongwei Wang reports [1] that cache validation triggered by writing to
>> /sys/kernel/slab/<cache>/validate is racy against normal cache
>> operations (e.g. freeing) in a way that can cause false positive
>> inconsistency reports for caches with debugging enabled. The problem is
>> that debugging actions that mark object free or active and actual
>> freelist operations are not atomic, and the validation can see an
>> inconsistent state.
>>
>> For caches that do or don't have debugging enabled, additional races
>> involving n->nr_slabs are possible that result in false reports of wrong
>> slab counts.
>>
>> This patch attempts to solve these issues while not adding overhead to
>> normal (especially fastpath) operations for caches that do not have
>> debugging enabled. Such overhead would not be justified to make possible
>> userspace-triggered validation safe. Instead, disable the validation for
>> caches that don't have debugging enabled and make their sysfs validate
>> handler return -EINVAL.
>>
>> For caches that do have debugging enabled, we can instead extend the
>> existing approach of not using percpu freelists to force all alloc/free
>> perations to the slow paths where debugging flags is checked and acted
>> upon. There can adjust the debug-specific paths to increase n->list_lock
>> coverage against concurrent validation as necessary.
>
> s/perations/operations
OK
>> @@ -1604,9 +1601,9 @@ static inline
>> void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>>
>> static inline int alloc_debug_processing(struct kmem_cache *s,
>> - struct slab *slab, void *object, unsigned long addr) { return 0; }
>> + struct slab *slab, void *object) { return 0; }
>>
>> -static inline int free_debug_processing(
>> +static inline void free_debug_processing(
>> struct kmem_cache *s, struct slab *slab,
>> void *head, void *tail, int bulk_cnt,
>> unsigned long addr) { return 0; }
>
> IIRC As reported by bot on earlier patch, void function
> should not return 0;
OK
>> +/*
>> + * Called only for kmem_cache_debug() caches to allocate from a freshly
>> + * allocated slab. Allocate a single object instead of whole freelist
>> + * and put the slab to the partial (or full) list.
>> + */
>> +static void *alloc_single_from_new_slab(struct kmem_cache *s,
>> + struct slab *slab)
>> +{
>> + int nid = slab_nid(slab);
>> + struct kmem_cache_node *n = get_node(s, nid);
>> + unsigned long flags;
>> + void *object;
>> +
>> + spin_lock_irqsave(&n->list_lock, flags);
>> +
>> + object = slab->freelist;
>> + slab->freelist = get_freepointer(s, object);
>> + slab->inuse = 1;
>> +
>> + if (!alloc_debug_processing(s, slab, object)) {
>> + /*
>> + * It's not really expected that this would fail on a
>> + * freshly allocated slab, but a concurrent memory
>> + * corruption in theory could cause that.
>> + */
>> + spin_unlock_irqrestore(&n->list_lock, flags);
>> + return NULL;
>> + }
>> +
>
> Nit: spin_lock_irqsave() can be here as freshly allocated
> slab has no other reference.
Right.
>> +out:
>> + if (checks_ok) {
>> + void *prior = slab->freelist;
>> +
>> + /* Perform the actual freeing while we still hold the locks */
>> + slab->inuse -= cnt;
>> + set_freepointer(s, tail, prior);
>> + slab->freelist = head;
>> +
>> + /* Do we need to remove the slab from full or partial list? */
>> + if (!prior) {
>> + remove_full(s, n, slab);
>> + } else if (slab->inuse == 0) {
>> + remove_partial(n, slab);
>> + stat(s, FREE_REMOVE_PARTIAL);
>> + }
>> +
>> + /* Do we need to discard the slab or add to partial list? */
>> + if (slab->inuse == 0) {
>> + slab_free = slab;
>> + } else if (!prior) {
>> + add_partial(n, slab, DEACTIVATE_TO_TAIL);
>> + stat(s, FREE_ADD_PARTIAL);
>> + }
>> + }
>> +
>> + if (slab_free) {
>> + /*
>> + * Update the counters while still holding n->list_lock to
>> + * prevent spurious validation warnings
>> + */
>> + dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
>> + }
>
> This looks good but maybe kmem_cache_shrink() can lead to
> spurious validation warnings?
Good catch, I'll fix that too.
>
> Otherwise looks good to me!
>
Thanks!
next prev parent reply other threads:[~2022-08-23 16:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-12 9:14 [PATCH 0/5] fix validation races and cleanup locking Vlastimil Babka
2022-08-12 9:14 ` [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe Vlastimil Babka
2022-08-14 14:39 ` Hyeonggon Yoo
2022-08-23 16:39 ` Vlastimil Babka [this message]
2022-08-12 9:16 ` [PATCH 0/5] fix validation races and cleanup locking Vlastimil Babka
[not found] ` <20220812091426.18418-4-vbabka@suse.cz>
2022-08-14 14:54 ` [PATCH 3/5] mm/slub: remove slab_lock() usage for debug operations Hyeonggon Yoo
2022-08-15 0:04 ` David Rientjes
[not found] ` <20220812091426.18418-5-vbabka@suse.cz>
2022-08-15 0:03 ` [PATCH 4/5] mm/slub: convert object_map_lock to non-raw spinlock David Rientjes
2022-08-15 12:53 ` Hyeonggon Yoo
[not found] ` <20220812091426.18418-2-vbabka@suse.cz>
2022-08-14 13:42 ` [PATCH 1/5] mm/slub: move free_debug_processing() further Hyeonggon Yoo
2022-08-15 0:03 ` David Rientjes
[not found] ` <20220812091426.18418-6-vbabka@suse.cz>
2022-08-15 0:04 ` [PATCH 5/5] mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock() David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9674adec-7973-398b-6367-8f9100782eb0@suse.cz \
--to=vbabka@suse.cz \
--cc=42.hyeyoo@gmail.com \
--cc=bigeasy@linutronix.de \
--cc=cl@linux.com \
--cc=efault@gmx.de \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rongwei.wang@linux.alibaba.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox