Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: David Rientjes <rientjes@google.com>,
	songmuchun@bytedance.com, akpm@linux-foundation.org,
	roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com,
	penberg@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free
Date: Fri, 15 Jul 2022 12:33:59 +0200	[thread overview]
Message-ID: <e109dab2-71e5-0c60-016e-f798da23884e@suse.cz> (raw)
In-Reply-To: <aceab1c8-0c10-fa5f-da39-6820294494c4@linux.alibaba.com>

On 7/15/22 10:05, Rongwei Wang wrote:
> 
> 
> On 6/17/22 5:40 PM, Vlastimil Babka wrote:
>> On 6/8/22 14:23, Christoph Lameter wrote:
>>> On Wed, 8 Jun 2022, Rongwei Wang wrote:
>>>
>>>> If available, I think document the issue and warn this incorrect
>>>> behavior is
>>>> OK. But it still prints a large amount of confusing messages, and
>>>> disturbs us?
>>>
>>> Correct it would be great if you could fix this in a way that does not
>>> impact performance.
>>>
>>>>> are current operations on the slab being validated.
>>>> And I am trying to fix it in following way. In a short, these changes only
>>>> works under the slub debug mode, and not affects the normal mode (I'm not
>>>> sure). It looks not elegant enough. And if all approve of this way, I can
>>>> submit the next version.
>>>
>>>
>>>>
>>>> Anyway, thanks for your time:).
>>>> -wrw
>>>>
>>>> @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s,
>>> struct
>>>> slab *slab,
>>>>
>>>>   {
>>>>          void *prior;
>>>> -       int was_frozen;
>>>> +       int was_frozen, to_take_off = 0;
>>>>          struct slab new;
>>>
>>> to_take_off has the role of !n ? Why is that needed?
>>>
>>>> -       do {
>>>> -               if (unlikely(n)) {
>>>> +               spin_lock_irqsave(&n->list_lock, flags);
>>>> +               ret = free_debug_processing(s, slab, head, tail, cnt,
>>>> addr);
>>>
>>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks
>>> ok. But it still adds a number of new branches etc to the free loop.
>>
> Hi, Vlastimil, sorry for missing your message long time.

Hi, no problem.

>> It also further complicates the already tricky code. I wonder if we should
>> make more benefit from the fact that for kmem_cache_debug() caches we don't
>> leave any slabs on percpu or percpu partial lists, and also in
>> free_debug_processing() we aready take both list_lock and slab_lock. If we
>> just did the freeing immediately there under those locks, we would be
>> protected against other freeing cpus by that list_lock and don't need the
>> double cmpxchg tricks.
> enen, I'm not sure get your "don't need the double cmpxchg tricks" means
> completely. What you want to say is that replace cmpxchg_double_slab() here
> with following code when kmem_cache_debug(s)?
> 
> __slab_lock(slab);
> if (slab->freelist == freelist_old && slab->counters == counters_old){
>     slab->freelist = freelist_new;
>     slab->counters = counters_new;
>     __slab_unlock(slab);
>     local_irq_restore(flags);
>     return true;
> }
> __slab_unlock(slab);

Pretty much, but it's more complicated.

> If I make mistakes for your words, please let me know.
> Thanks!
>>
>> What about against allocating cpus? More tricky as those will currently end
>> up privatizing the freelist via get_partial(), only to deactivate it again,
>> so our list_lock+slab_lock in freeing path would not protect in the
>> meanwhile. But the allocation is currently very inefficient for debug
>> caches, as in get_partial() it will take the list_lock to take the slab from
>> partial list and then in most cases again in deactivate_slab() to return it.
> It seems that I need speed some time to eat these words. Anyway, thanks.
>>
>> If instead the allocation path for kmem_cache_debug() cache would take a
>> single object from the partial list (not whole freelist) under list_lock, it
>> would be ultimately more efficient, and protect against freeing using
>> list_lock. Sounds like an idea worth trying to me?
> 
> Hyeonggon had a similar advice that split freeing and allocating slab from
> debugging, likes below:
> 
> 
> __slab_alloc() {
>     if (kmem_cache_debug(s))
>         slab_alloc_debug()
>     else
>         ___slab_alloc()
> }
> 
> I guess that above code aims to solve your mentioned problem (idea)?
> 
> slab_free() {
>     if (kmem_cache_debug(s))
>         slab_free_debug()
>     else
>         __do_slab_free()
> }
> 
> Currently, I only modify the code of freeing slab to fix the confusing
> messages of "slabinfo -v". If you agree, I can try to realize above
> mentioned slab_alloc_debug() code. Maybe it's also a challenge to me.

I already started working on this approach and hope to post a RFC soon.

> Thanks for your time.
> 
>> And of course we would stop creating the 'validate' sysfs files for
>> non-debug caches.

next prev parent reply	other threads:[~2022-07-15 10:34 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-29  8:15 Rongwei Wang
2022-05-29  8:15 ` [PATCH 2/3] mm/slub: improve consistency of nr_slabs count Rongwei Wang
2022-05-29 12:26   ` Hyeonggon Yoo
2022-05-29  8:15 ` [PATCH 3/3] mm/slub: add nr_full count for debugging slub Rongwei Wang
2022-05-29 11:37 ` [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Hyeonggon Yoo
2022-05-30 21:14   ` David Rientjes
2022-06-02 15:14     ` Christoph Lameter
2022-06-03  3:35       ` Rongwei Wang
2022-06-07 12:14         ` Christoph Lameter
2022-06-08  3:04           ` Rongwei Wang
2022-06-08 12:23             ` Christoph Lameter
2022-06-11  4:04               ` Rongwei Wang
2022-06-13 13:50                 ` Christoph Lameter
2022-06-14  2:38                   ` Rongwei Wang
2022-06-17  7:55                   ` Rongwei Wang
2022-06-17 14:19                     ` Christoph Lameter
2022-06-18  2:33                       ` Rongwei Wang
2022-06-20 11:57                         ` Christoph Lameter
2022-06-26 16:48                           ` Rongwei Wang
2022-06-17  9:40               ` Vlastimil Babka
2022-07-15  8:05                 ` Rongwei Wang
2022-07-15 10:33                   ` Vlastimil Babka [this message]
2022-07-15 10:51                     ` Rongwei Wang
2022-05-31  3:47   ` Muchun Song
2022-06-04 11:05     ` Hyeonggon Yoo
2022-05-31  8:50   ` Rongwei Wang
2022-07-18 11:09 ` Vlastimil Babka
2022-07-19 14:15   ` Rongwei Wang
2022-07-19 14:21     ` Vlastimil Babka
2022-07-19 14:43       ` Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e109dab2-71e5-0c60-016e-f798da23884e@suse.cz \
    --to=vbabka@suse.cz \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox