From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5189C00144 for ; Mon, 1 Aug 2022 13:51:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25B5C6B0071; Mon, 1 Aug 2022 09:51:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 20B416B0072; Mon, 1 Aug 2022 09:51:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AC408E0001; Mon, 1 Aug 2022 09:51:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F19166B0071 for ; Mon, 1 Aug 2022 09:51:51 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C783C160A57 for ; Mon, 1 Aug 2022 13:51:51 +0000 (UTC) X-FDA: 79751162022.22.22303D7 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf24.hostedemail.com (Postfix) with ESMTP id 2517C180037 for ; Mon, 1 Aug 2022 13:51:51 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 0A47466F87; Mon, 1 Aug 2022 13:51:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1659361910; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MJNHzzkdgVia0S0obt+YukAWvCPZYPp1k67SFB9YvvA=; b=Bcw7as4IBiytILAuSQ047BA+GgcARB+yhi9uCF4RyPrQPM1Ny7/uANCEAhfZY55Zf2ZPrj Y6CoStycOpTzZZWwoOuozZcCnJEIeIuskITgk1ljYiKaOn3gWQDjLbt3DBQiEiZUObBT4o K9WoY6UGjAuIHQya93xcFlqXiFVMDtE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1659361910; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MJNHzzkdgVia0S0obt+YukAWvCPZYPp1k67SFB9YvvA=; b=NbPFrTTRnDrKyR71MSZGHbG2tCsvW0/PfWdwUhEWCWLfprIsiXDqDSpWazDrhnozi30PN+ 8F5TxHpX3crcebBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D4F8A13A72; Mon, 1 Aug 2022 13:51:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id oAdNM3Xa52KwRwAAMHmgww (envelope-from ); Mon, 01 Aug 2022 13:51:49 +0000 Message-ID: <224265c3-f3fe-aecd-039a-0c9ba3817305@suse.cz> Date: Mon, 1 Aug 2022 15:51:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [RFC PATCH] mm, slub: restrict sysfs validation to debug caches and make it safe Content-Language: en-US To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Rongwei Wang , Christoph Lameter , Joonsoo Kim , David Rientjes , Pekka Enberg , linux-mm@kvack.org, Feng Tang References: <20220727182909.11231-1-vbabka@suse.cz> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Bcw7as4I; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=NbPFrTTR; spf=pass (imf24.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659361911; a=rsa-sha256; cv=none; b=1Ml1cdvIIR7QwS6xvF8Guun4q45dfvLgH6SPLMgYSiBbBZv3W1ezPO6lnDVhjwsIbFkiGl /wDkRO1cX0ZUTG803r9YUXCkZoj8uO50yKNZtD84dyKstQwZuZneO36e3tzsX0xtsGSu5i uzTLL5r1ZlNUHO0LbMph7ocjoIC5L+E= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659361911; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MJNHzzkdgVia0S0obt+YukAWvCPZYPp1k67SFB9YvvA=; b=vaEutYUnTZuGETXbD/hW9sk3NWEd7Dfcjkg2LyApwBeoxPYlDhAR/Z/Eoy/dorgc6fTfPa 9gkBZCC1HmUVV+PsuNNa/HT9DEl8rTkR2+PY5sxEeyUwZV6T7QaoHoANKafCOKsl0jvG/Y Girl0jQUFzsldEvbLB87qQmmNXR7x5c= X-Stat-Signature: h765scpf7gbxsomntx8uxrd1o4y9e76f X-Rspamd-Queue-Id: 2517C180037 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=Bcw7as4I; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=NbPFrTTR; spf=pass (imf24.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1659361911-890093 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/31/22 10:44, Hyeonggon Yoo wrote: > On Wed, Jul 27, 2022 at 08:29:09PM +0200, Vlastimil Babka wrote: > Also all issues (as far as I know) related to validate attribute > as gone after this patch. As you (And Rongwei) were able to trigger/reproduce the issues, does your testing also no longer reproduce them? > Silly question: > Do we want to apply on stable trees? I'd prefer not to, it's too intrusive for stable. > I doubt someone would use validate attribute when not debugging. I doubt as well. Also it requires root, and even if somebody hits the issue, it's just spurious warnings, nothing fatal. So that doesn't warrant the intrusive stable backport IMHO. >> [1] https://lore.kernel.org/all/69462916-2d1c-dd50-2e64-b31c2b61690e@suse.cz/ >> >> mm/slub.c | 322 +++++++++++++++++++++++++++++++++++++++--------------- >> 1 file changed, 231 insertions(+), 91 deletions(-) >> >> diff --git a/mm/slub.c b/mm/slub.c >> index b1281b8654bd..01e5228809d7 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1324,17 +1324,14 @@ static inline int alloc_consistency_checks(struct kmem_cache *s, >> } > > [...] > >> +/* >> + * Called only for kmem_cache_debug() caches instead of acquire_slab(), with a >> + * slab from the n->partial list. Removes only a single object from the slab >> + * under slab_lock(), does the alloc_debug_processing() checks and leaves the >> + * slab on the list, or moves it to full list if it was the last object. >> + */ >> +static void *alloc_single_from_partial(struct kmem_cache *s, >> + struct kmem_cache_node *n, struct slab *slab) >> +{ >> + void *object; >> + unsigned long flags; >> + >> + lockdep_assert_held(&n->list_lock); >> + >> + slab_lock(slab, &flags); >> + >> + object = slab->freelist; >> + slab->freelist = get_freepointer(s, object); >> + slab->inuse++; >> + >> + if (!alloc_debug_processing(s, slab, object)) { >> + remove_partial(n, slab); >> + slab_unlock(slab, &flags); >> + return NULL; >> + } >> + >> + if (slab->inuse == slab->objects) { >> + remove_partial(n, slab); >> + add_full(s, n, slab); >> + } >> + >> + slab_unlock(slab, &flags); > > AFAIK add_full/remove_full/add_partial/remove_partial > can be called outside slab_lock but inside list_lock. Right, I will adjust, thanks. >> + return object; >> +} >> + >> +/* >> + * Called only for kmem_cache_debug() caches to allocate from a freshly >> + * allocated slab. Allocates a single object instead of whole freelist >> + * and puts the slab to the partial (or full) list. >> + */ >> +static void *alloc_single_from_new_slab(struct kmem_cache *s, >> + struct slab *slab) >> +{ >> + int nid = slab_nid(slab); >> + struct kmem_cache_node *n = get_node(s, nid); >> + unsigned long flags, flags2; >> + void *object; >> + >> + spin_lock_irqsave(&n->list_lock, flags); >> + slab_lock(slab, &flags2); >> + >> + object = slab->freelist; >> + slab->freelist = get_freepointer(s, object); >> + /* Undo what allocate_slab() did */ >> + slab->frozen = 0; >> + slab->inuse = 1; > > Maybe do it in allocate_slab()? Hmm yeah, I guess we could stop doing that pre-freezing and inuse = objects in allocate_slab(), and do it in __slab_alloc(), which thus won't add any overhead. Then we won't have to unfreeze in early_kmem_cache_node_alloc() as well. >> + if (!alloc_debug_processing(s, slab, object)) { >> + /* >> + * It's not really expected that this would fail on a >> + * freshly allocated slab, but a concurrent memory >> + * corruption in theory could cause that. >> + */ >> + slab_unlock(slab, &flags2); >> + spin_unlock_irqrestore(&n->list_lock, flags); >> + return NULL; >> + } >> + >> + if (slab->inuse == slab->objects) >> + add_full(s, n, slab); >> + else >> + add_partial(n, slab, DEACTIVATE_TO_HEAD); >> + >> + slab_unlock(slab, &flags2); >> + inc_slabs_node(s, nid, slab->objects); >> + spin_unlock_irqrestore(&n->list_lock, flags); >> + >> + return object; >> +} > > [...] > >> #endif /* CONFIG_SLUB_DEBUG */ >> >> #if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SYSFS) >> @@ -3036,6 +3165,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, >> return NULL; >> } >> >> + stat(s, ALLOC_SLAB); >> + >> + if (kmem_cache_debug(s)) { >> + freelist = alloc_single_from_new_slab(s, slab); >> + >> + if (unlikely(!freelist)) >> + goto new_objects; >> + >> + if (s->flags & SLAB_STORE_USER) >> + set_track(s, freelist, TRACK_ALLOC, addr); >> + >> + return freelist; >> + } >> + >> /* >> * No other reference to the slab yet so we can >> * muck around with it freely without cmpxchg >> @@ -3043,29 +3186,29 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, >> freelist = slab->freelist; >> slab->freelist = NULL; >> >> - stat(s, ALLOC_SLAB); >> + inc_slabs_node(s, slab_nid(slab), slab->objects); >> >> check_new_slab: >> >> if (kmem_cache_debug(s)) { >> - if (!alloc_debug_processing(s, slab, freelist, addr)) { >> - /* Slab failed checks. Next slab needed */ >> - goto new_slab; >> - } else { >> - /* >> - * For debug case, we don't load freelist so that all >> - * allocations go through alloc_debug_processing() >> - */ >> - goto return_single; >> - } >> + /* >> + * For debug caches here we had to go through >> + * alloc_single_from_partial() so just store the tracking info >> + * and return the object >> + */ >> + if (s->flags & SLAB_STORE_USER) >> + set_track(s, freelist, TRACK_ALLOC, addr); >> + return freelist; >> } >> >> - if (unlikely(!pfmemalloc_match(slab, gfpflags))) >> + if (unlikely(!pfmemalloc_match(slab, gfpflags))) { >> /* >> * For !pfmemalloc_match() case we don't load freelist so that >> * we don't make further mismatched allocations easier. >> */ >> - goto return_single; >> + deactivate_slab(s, slab, get_freepointer(s, freelist)); >> + return freelist; >> + } > > > >> >> retry_load_slab: >> >> @@ -3089,11 +3232,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, >> c->slab = slab; >> >> goto load_freelist; >> - >> -return_single: >> - >> - deactivate_slab(s, slab, get_freepointer(s, freelist)); >> - return freelist; >> } >> >> /* >> @@ -3341,9 +3479,10 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, >> if (kfence_free(head)) >> return; >> >> - if (kmem_cache_debug(s) && >> - !free_debug_processing(s, slab, head, tail, cnt, addr)) >> + if (kmem_cache_debug(s)) { >> + free_debug_processing(s, slab, head, tail, cnt, addr); >> return; >> + } > > Oh, now debugging caches does not share free path with non-debugging > caches. > > Now free_debug_processing's return type can be void? Right. >> >> do { >> if (unlikely(n)) { >> @@ -3958,6 +4097,7 @@ static void early_kmem_cache_node_alloc(int node) >> slab = new_slab(kmem_cache_node, GFP_NOWAIT, node); >> >> BUG_ON(!slab); >> + inc_slabs_node(kmem_cache_node, slab_nid(slab), slab->objects); >> if (slab_nid(slab) != node) { >> pr_err("SLUB: Unable to allocate memory from node %d\n", node); >> pr_err("SLUB: Allocating a useless per node structure in order to be able to continue\n"); >> @@ -5625,7 +5765,7 @@ static ssize_t validate_store(struct kmem_cache *s, >> { >> int ret = -EINVAL; >> >> - if (buf[0] == '1') { >> + if (buf[0] == '1' && kmem_cache_debug(s)) { >> ret = validate_slab_cache(s); >> if (ret >= 0) >> ret = length; > > Yeah definitely this is what it should be, > instead of serializing inc_slabs_node()/dec_slabs_node() > for non-debugging caches. > >> -- >> 2.37.1 >> >