Re: [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	Christoph Lameter <cl@linux.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	linux-mm@kvack.org,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Mike Galbraith <efault@gmx.de>
Subject: Re: [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe
Date: Tue, 23 Aug 2022 18:39:19 +0200	[thread overview]
Message-ID: <9674adec-7973-398b-6367-8f9100782eb0@suse.cz> (raw)
In-Reply-To: <YvkJKwZ8c/h6OcuL@hyeyoo>

On 8/14/22 16:39, Hyeonggon Yoo wrote:
> On Fri, Aug 12, 2022 at 11:14:23AM +0200, Vlastimil Babka wrote:
>> Rongwei Wang reports [1] that cache validation triggered by writing to
>> /sys/kernel/slab/<cache>/validate is racy against normal cache
>> operations (e.g. freeing) in a way that can cause false positive
>> inconsistency reports for caches with debugging enabled. The problem is
>> that debugging actions that mark object free or active and actual
>> freelist operations are not atomic, and the validation can see an
>> inconsistent state.
>> 
>> For caches that do or don't have debugging enabled, additional races
>> involving n->nr_slabs are possible that result in false reports of wrong
>> slab counts.
>> 
>> This patch attempts to solve these issues while not adding overhead to
>> normal (especially fastpath) operations for caches that do not have
>> debugging enabled. Such overhead would not be justified to make possible
>> userspace-triggered validation safe. Instead, disable the validation for
>> caches that don't have debugging enabled and make their sysfs validate
>> handler return -EINVAL.
>> 
>> For caches that do have debugging enabled, we can instead extend the
>> existing approach of not using percpu freelists to force all alloc/free
>> perations to the slow paths where debugging flags is checked and acted
>> upon. There can adjust the debug-specific paths to increase n->list_lock
>> coverage against concurrent validation as necessary.
> 
> s/perations/operations

OK

>> @@ -1604,9 +1601,9 @@ static inline
>>  void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>>  
>>  static inline int alloc_debug_processing(struct kmem_cache *s,
>> -	struct slab *slab, void *object, unsigned long addr) { return 0; }
>> +	struct slab *slab, void *object) { return 0; }
>>  
>> -static inline int free_debug_processing(
>> +static inline void free_debug_processing(
>>  	struct kmem_cache *s, struct slab *slab,
>>  	void *head, void *tail, int bulk_cnt,
>>  	unsigned long addr) { return 0; }
> 
> IIRC As reported by bot on earlier patch, void function
> should not return 0;

OK

>> +/*
>> + * Called only for kmem_cache_debug() caches to allocate from a freshly
>> + * allocated slab. Allocate a single object instead of whole freelist
>> + * and put the slab to the partial (or full) list.
>> + */
>> +static void *alloc_single_from_new_slab(struct kmem_cache *s,
>> +					struct slab *slab)
>> +{
>> +	int nid = slab_nid(slab);
>> +	struct kmem_cache_node *n = get_node(s, nid);
>> +	unsigned long flags;
>> +	void *object;
>> +
>> +	spin_lock_irqsave(&n->list_lock, flags);
>> +
>> +	object = slab->freelist;
>> +	slab->freelist = get_freepointer(s, object);
>> +	slab->inuse = 1;
>> +
>> +	if (!alloc_debug_processing(s, slab, object)) {
>> +		/*
>> +		 * It's not really expected that this would fail on a
>> +		 * freshly allocated slab, but a concurrent memory
>> +		 * corruption in theory could cause that.
>> +		 */
>> +		spin_unlock_irqrestore(&n->list_lock, flags);
>> +		return NULL;
>> +	}
>> +
> 
> Nit: spin_lock_irqsave() can be here as freshly allocated
> slab has no other reference.

Right.

>> +out:
>> +	if (checks_ok) {
>> +		void *prior = slab->freelist;
>> +
>> +		/* Perform the actual freeing while we still hold the locks */
>> +		slab->inuse -= cnt;
>> +		set_freepointer(s, tail, prior);
>> +		slab->freelist = head;
>> +
>> +		/* Do we need to remove the slab from full or partial list? */
>> +		if (!prior) {
>> +			remove_full(s, n, slab);
>> +		} else if (slab->inuse == 0) {
>> +			remove_partial(n, slab);
>> +			stat(s, FREE_REMOVE_PARTIAL);
>> +		}
>> +
>> +		/* Do we need to discard the slab or add to partial list? */
>> +		if (slab->inuse == 0) {
>> +			slab_free = slab;
>> +		} else if (!prior) {
>> +			add_partial(n, slab, DEACTIVATE_TO_TAIL);
>> +			stat(s, FREE_ADD_PARTIAL);
>> +		}
>> +	}
>> +
>> +	if (slab_free) {
>> +		/*
>> +		 * Update the counters while still holding n->list_lock to
>> +		 * prevent spurious validation warnings
>> +		 */
>> +		dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
>> +	}
> 
> This looks good but maybe kmem_cache_shrink() can lead to
> spurious validation warnings?

Good catch, I'll fix that too.

> 
> Otherwise looks good to me!
> 

Thanks!

next prev parent reply	other threads:[~2022-08-23 16:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-12  9:14 [PATCH 0/5] fix validation races and cleanup locking Vlastimil Babka
2022-08-12  9:14 ` [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe Vlastimil Babka
2022-08-14 14:39   ` Hyeonggon Yoo
2022-08-23 16:39     ` Vlastimil Babka [this message]
2022-08-12  9:16 ` [PATCH 0/5] fix validation races and cleanup locking Vlastimil Babka
     [not found] ` <20220812091426.18418-4-vbabka@suse.cz>
2022-08-14 14:54   ` [PATCH 3/5] mm/slub: remove slab_lock() usage for debug operations Hyeonggon Yoo
2022-08-15  0:04   ` David Rientjes
     [not found] ` <20220812091426.18418-5-vbabka@suse.cz>
2022-08-15  0:03   ` [PATCH 4/5] mm/slub: convert object_map_lock to non-raw spinlock David Rientjes
2022-08-15 12:53   ` Hyeonggon Yoo
     [not found] ` <20220812091426.18418-2-vbabka@suse.cz>
2022-08-14 13:42   ` [PATCH 1/5] mm/slub: move free_debug_processing() further Hyeonggon Yoo
2022-08-15  0:03   ` David Rientjes
     [not found] ` <20220812091426.18418-6-vbabka@suse.cz>
2022-08-15  0:04   ` [PATCH 5/5] mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock() David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9674adec-7973-398b-6367-8f9100782eb0@suse.cz \
    --to=vbabka@suse.cz \
    --cc=42.hyeyoo@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox