From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
To: Vlastimil Babka <vbabka@suse.cz>, Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: David Rientjes <rientjes@google.com>,
songmuchun@bytedance.com, akpm@linux-foundation.org,
roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com,
penberg@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free
Date: Fri, 15 Jul 2022 18:51:54 +0800 [thread overview]
Message-ID: <23a79337-a5eb-a959-a764-1296a3e8e7c1@linux.alibaba.com> (raw)
In-Reply-To: <e109dab2-71e5-0c60-016e-f798da23884e@suse.cz>
On 7/15/22 6:33 PM, Vlastimil Babka wrote:
> On 7/15/22 10:05, Rongwei Wang wrote:
>>
>>
>> On 6/17/22 5:40 PM, Vlastimil Babka wrote:
>>> On 6/8/22 14:23, Christoph Lameter wrote:
>>>> On Wed, 8 Jun 2022, Rongwei Wang wrote:
>>>>
>>>>> If available, I think document the issue and warn this incorrect
>>>>> behavior is
>>>>> OK. But it still prints a large amount of confusing messages, and
>>>>> disturbs us?
>>>>
>>>> Correct it would be great if you could fix this in a way that does not
>>>> impact performance.
>>>>
>>>>>> are current operations on the slab being validated.
>>>>> And I am trying to fix it in following way. In a short, these changes only
>>>>> works under the slub debug mode, and not affects the normal mode (I'm not
>>>>> sure). It looks not elegant enough. And if all approve of this way, I can
>>>>> submit the next version.
>>>>
>>>>
>>>>>
>>>>> Anyway, thanks for your time:).
>>>>> -wrw
>>>>>
>>>>> @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s,
>>>> struct
>>>>> slab *slab,
>>>>>
>>>>> {
>>>>> void *prior;
>>>>> - int was_frozen;
>>>>> + int was_frozen, to_take_off = 0;
>>>>> struct slab new;
>>>>
>>>> to_take_off has the role of !n ? Why is that needed?
>>>>
>>>>> - do {
>>>>> - if (unlikely(n)) {
>>>>> + spin_lock_irqsave(&n->list_lock, flags);
>>>>> + ret = free_debug_processing(s, slab, head, tail, cnt,
>>>>> addr);
>>>>
>>>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks
>>>> ok. But it still adds a number of new branches etc to the free loop.
>>>
>> Hi, Vlastimil, sorry for missing your message long time.
>
> Hi, no problem.
>
>>> It also further complicates the already tricky code. I wonder if we should
>>> make more benefit from the fact that for kmem_cache_debug() caches we don't
>>> leave any slabs on percpu or percpu partial lists, and also in
>>> free_debug_processing() we aready take both list_lock and slab_lock. If we
>>> just did the freeing immediately there under those locks, we would be
>>> protected against other freeing cpus by that list_lock and don't need the
>>> double cmpxchg tricks.
>> enen, I'm not sure get your "don't need the double cmpxchg tricks" means
>> completely. What you want to say is that replace cmpxchg_double_slab() here
>> with following code when kmem_cache_debug(s)?
>>
>> __slab_lock(slab);
>> if (slab->freelist == freelist_old && slab->counters == counters_old){
>> slab->freelist = freelist_new;
>> slab->counters = counters_new;
>> __slab_unlock(slab);
>> local_irq_restore(flags);
>> return true;
>> }
>> __slab_unlock(slab);
>
> Pretty much, but it's more complicated.
Yes, actually, I think reuse cmpxchg_double_slab() here is more concise
in code. I'm already finish this part of code, but hesitating whether to
replace cmpxchg_double_slab().
>
>> If I make mistakes for your words, please let me know.
>> Thanks!
>>>
>>> What about against allocating cpus? More tricky as those will currently end
>>> up privatizing the freelist via get_partial(), only to deactivate it again,
>>> so our list_lock+slab_lock in freeing path would not protect in the
>>> meanwhile. But the allocation is currently very inefficient for debug
>>> caches, as in get_partial() it will take the list_lock to take the slab from
>>> partial list and then in most cases again in deactivate_slab() to return it.
>> It seems that I need speed some time to eat these words. Anyway, thanks.
>>>
>>> If instead the allocation path for kmem_cache_debug() cache would take a
>>> single object from the partial list (not whole freelist) under list_lock, it
>>> would be ultimately more efficient, and protect against freeing using
>>> list_lock. Sounds like an idea worth trying to me?
>>
>> Hyeonggon had a similar advice that split freeing and allocating slab from
>> debugging, likes below:
>>
>>
>> __slab_alloc() {
>> if (kmem_cache_debug(s))
>> slab_alloc_debug()
>> else
>> ___slab_alloc()
>> }
>>
>> I guess that above code aims to solve your mentioned problem (idea)?
>>
>> slab_free() {
>> if (kmem_cache_debug(s))
>> slab_free_debug()
>> else
>> __do_slab_free()
>> }
>>
>> Currently, I only modify the code of freeing slab to fix the confusing
>> messages of "slabinfo -v". If you agree, I can try to realize above
>> mentioned slab_alloc_debug() code. Maybe it's also a challenge to me.
>
> I already started working on this approach and hope to post a RFC soon.
OK, that's great.
>
>> Thanks for your time.
>>
>>> And of course we would stop creating the 'validate' sysfs files for
>>> non-debug caches.
next prev parent reply other threads:[~2022-07-15 10:52 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-29 8:15 Rongwei Wang
2022-05-29 8:15 ` [PATCH 2/3] mm/slub: improve consistency of nr_slabs count Rongwei Wang
2022-05-29 12:26 ` Hyeonggon Yoo
2022-05-29 8:15 ` [PATCH 3/3] mm/slub: add nr_full count for debugging slub Rongwei Wang
2022-05-29 11:37 ` [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Hyeonggon Yoo
2022-05-30 21:14 ` David Rientjes
2022-06-02 15:14 ` Christoph Lameter
2022-06-03 3:35 ` Rongwei Wang
2022-06-07 12:14 ` Christoph Lameter
2022-06-08 3:04 ` Rongwei Wang
2022-06-08 12:23 ` Christoph Lameter
2022-06-11 4:04 ` Rongwei Wang
2022-06-13 13:50 ` Christoph Lameter
2022-06-14 2:38 ` Rongwei Wang
2022-06-17 7:55 ` Rongwei Wang
2022-06-17 14:19 ` Christoph Lameter
2022-06-18 2:33 ` Rongwei Wang
2022-06-20 11:57 ` Christoph Lameter
2022-06-26 16:48 ` Rongwei Wang
2022-06-17 9:40 ` Vlastimil Babka
2022-07-15 8:05 ` Rongwei Wang
2022-07-15 10:33 ` Vlastimil Babka
2022-07-15 10:51 ` Rongwei Wang [this message]
2022-05-31 3:47 ` Muchun Song
2022-06-04 11:05 ` Hyeonggon Yoo
2022-05-31 8:50 ` Rongwei Wang
2022-07-18 11:09 ` Vlastimil Babka
2022-07-19 14:15 ` Rongwei Wang
2022-07-19 14:21 ` Vlastimil Babka
2022-07-19 14:43 ` Rongwei Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23a79337-a5eb-a959-a764-1296a3e8e7c1@linux.alibaba.com \
--to=rongwei.wang@linux.alibaba.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=songmuchun@bytedance.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox