From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A00AC43334 for ; Fri, 15 Jul 2022 10:34:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0E5A6B01F6; Fri, 15 Jul 2022 06:34:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9BE166B01F7; Fri, 15 Jul 2022 06:34:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 885459401A5; Fri, 15 Jul 2022 06:34:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7645E6B01F6 for ; Fri, 15 Jul 2022 06:34:02 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3B37E215AE for ; Fri, 15 Jul 2022 10:34:02 +0000 (UTC) X-FDA: 79688973924.18.D3E5522 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf18.hostedemail.com (Postfix) with ESMTP id C0B311C00A0 for ; Fri, 15 Jul 2022 10:34:01 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 25CC43420A; Fri, 15 Jul 2022 10:34:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1657881240; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V/aNejyhpJ5R+9PXS3UG8zpOnmCWu14VxGWWtda2Lac=; b=Ir6qC4jfUlDnM0aT+Vwlsl+lR0VIln9fTlvI8wEQJXgg4VPQIP98Kyi7fKdLWaE7GgZRqG kWtIFFYF9WtO7B2WsExgJPMFHvNpWsF0x0Jq2DYAEs/GreOyPl5aK8jvA+nddMW21VV5MN +yYvBhCx6Aqd1hzSScga/4zVS1oLavw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1657881240; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V/aNejyhpJ5R+9PXS3UG8zpOnmCWu14VxGWWtda2Lac=; b=FVGvkNwE34xUkewaldVgHtaEzOL5zDw1XV6MAgEari9XCj1/Z8hfaf+x1ZwOQz3EHLOW6q OVj/eAyUQa0qPgAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F392613AC3; Fri, 15 Jul 2022 10:33:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uIDVOpdC0WJTXQAAMHmgww (envelope-from ); Fri, 15 Jul 2022 10:33:59 +0000 Message-ID: Date: Fri, 15 Jul 2022 12:33:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Content-Language: en-US To: Rongwei Wang , Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: David Rientjes , songmuchun@bytedance.com, akpm@linux-foundation.org, roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com, penberg@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20220529081535.69275-1-rongwei.wang@linux.alibaba.com> <9794df4f-3ffe-4e99-0810-a1346b139ce8@linux.alibaba.com> <29723aaa-5e28-51d3-7f87-9edf0f7b9c33@linux.alibaba.com> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Ir6qC4jf; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=FVGvkNwE; dmarc=none; spf=pass (imf18.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657881241; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V/aNejyhpJ5R+9PXS3UG8zpOnmCWu14VxGWWtda2Lac=; b=cEazt9vJoE0l3+742NWF7nXvgkiFF9i3IcI/kLZbjCFEXrfrVv48UsaLiNW9N9XZ3nE0Tb QcxfvbyvvrvMUfSWywXhj3kytcqJx4XwCcjIQUz3LDWbeuaXuHVonVRDOOb5MzFkTfQGus TLd3oELezHogYBS+3S0JHNSzdu0nhq8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657881241; a=rsa-sha256; cv=none; b=GWEWgajg6ZQbUNG3d+q3mHtD8fCp/BPIahdTz5k9Bbidmpo6C+74JXo/wiwjXlRv9W++FZ C//HqV56WM4eP9LJn1CFngjwDt6A7McQ1+WHSAptqIXRfyZ8y6lWF0PAkeHmkILnNpr8Uh vU0eU0SA5KFGehWZmV8wd8IBMyU6nvk= Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Ir6qC4jf; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=FVGvkNwE; dmarc=none; spf=pass (imf18.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C0B311C00A0 X-Rspam-User: X-Stat-Signature: 41r97o6iucf1qkszt6j1fz7xgoidpd88 X-HE-Tag: 1657881241-462426 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/15/22 10:05, Rongwei Wang wrote: > > > On 6/17/22 5:40 PM, Vlastimil Babka wrote: >> On 6/8/22 14:23, Christoph Lameter wrote: >>> On Wed, 8 Jun 2022, Rongwei Wang wrote: >>> >>>> If available, I think document the issue and warn this incorrect >>>> behavior is >>>> OK. But it still prints a large amount of confusing messages, and >>>> disturbs us? >>> >>> Correct it would be great if you could fix this in a way that does not >>> impact performance. >>> >>>>> are current operations on the slab being validated. >>>> And I am trying to fix it in following way. In a short, these changes only >>>> works under the slub debug mode, and not affects the normal mode (I'm not >>>> sure). It looks not elegant enough. And if all approve of this way, I can >>>> submit the next version. >>> >>> >>>> >>>> Anyway, thanks for your time:). >>>> -wrw >>>> >>>> @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, >>> struct >>>> slab *slab, >>>> >>>>   { >>>>          void *prior; >>>> -       int was_frozen; >>>> +       int was_frozen, to_take_off = 0; >>>>          struct slab new; >>> >>> to_take_off has the role of !n ? Why is that needed? >>> >>>> -       do { >>>> -               if (unlikely(n)) { >>>> +               spin_lock_irqsave(&n->list_lock, flags); >>>> +               ret = free_debug_processing(s, slab, head, tail, cnt, >>>> addr); >>> >>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks >>> ok. But it still adds a number of new branches etc to the free loop. >> > Hi, Vlastimil, sorry for missing your message long time. Hi, no problem. >> It also further complicates the already tricky code. I wonder if we should >> make more benefit from the fact that for kmem_cache_debug() caches we don't >> leave any slabs on percpu or percpu partial lists, and also in >> free_debug_processing() we aready take both list_lock and slab_lock. If we >> just did the freeing immediately there under those locks, we would be >> protected against other freeing cpus by that list_lock and don't need the >> double cmpxchg tricks. > enen, I'm not sure get your "don't need the double cmpxchg tricks" means > completely. What you want to say is that replace cmpxchg_double_slab() here > with following code when kmem_cache_debug(s)? > > __slab_lock(slab); > if (slab->freelist == freelist_old && slab->counters == counters_old){ >     slab->freelist = freelist_new; >     slab->counters = counters_new; >     __slab_unlock(slab); >     local_irq_restore(flags); >     return true; > } > __slab_unlock(slab); Pretty much, but it's more complicated. > If I make mistakes for your words, please let me know. > Thanks! >> >> What about against allocating cpus? More tricky as those will currently end >> up privatizing the freelist via get_partial(), only to deactivate it again, >> so our list_lock+slab_lock in freeing path would not protect in the >> meanwhile. But the allocation is currently very inefficient for debug >> caches, as in get_partial() it will take the list_lock to take the slab from >> partial list and then in most cases again in deactivate_slab() to return it. > It seems that I need speed some time to eat these words. Anyway, thanks. >> >> If instead the allocation path for kmem_cache_debug() cache would take a >> single object from the partial list (not whole freelist) under list_lock, it >> would be ultimately more efficient, and protect against freeing using >> list_lock. Sounds like an idea worth trying to me? > > Hyeonggon had a similar advice that split freeing and allocating slab from > debugging, likes below: > > > __slab_alloc() { >     if (kmem_cache_debug(s)) >         slab_alloc_debug() >     else >         ___slab_alloc() > } > > I guess that above code aims to solve your mentioned problem (idea)? > > slab_free() { >     if (kmem_cache_debug(s)) >         slab_free_debug() >     else >         __do_slab_free() > } > > Currently, I only modify the code of freeing slab to fix the confusing > messages of "slabinfo -v". If you agree, I can try to realize above > mentioned slab_alloc_debug() code. Maybe it's also a challenge to me. I already started working on this approach and hope to post a RFC soon. > Thanks for your time. > >> And of course we would stop creating the 'validate' sysfs files for >> non-debug caches.