From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08E8DC00144 for ; Tue, 2 Aug 2022 02:48:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37F108E0001; Mon, 1 Aug 2022 22:48:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 32DDF6B0072; Mon, 1 Aug 2022 22:48:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D0878E0001; Mon, 1 Aug 2022 22:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0CD3E6B0071 for ; Mon, 1 Aug 2022 22:48:11 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D0447120155 for ; Tue, 2 Aug 2022 02:48:10 +0000 (UTC) X-FDA: 79753118340.10.7C92A95 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf21.hostedemail.com (Postfix) with ESMTP id 64E1E1C0114 for ; Tue, 2 Aug 2022 02:48:10 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id g12so12324335pfb.3 for ; Mon, 01 Aug 2022 19:48:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=slQnmCCKVypa4kTkynAHk8jiQbK/0TWXOQzQ2Hj1Nzc=; b=p8Ke5Hgu8dMv4Slhi8i0Q1LhKSyRcrXC5x/FoJfvC1tYtU7ZIpCMyPc+zUg9CDeQzr smbtx4OzBaXCoDv7siSEOFtNYxuCfxO5PTRtkIw5ERRnyIm3ZYtSO5yFjSINIfmn0a62 B2UzfLb4isnKe/98bwhGH4IMw3QLsrtegUFwTbrRl2R+yjzdui9Of0hQzJbgZdWUXS15 YkcvjhIiQFpkXAuudX01/C0Hmc4+MoGsgqhacoPY2wmHOgtRurz7gyK8DRN0CSVbaIRG CAVlB+PrToTAi16lcSQb+Tn7Cy3vpRpxj5cvtr5S4zSY5tmHmqol+4ymV/CMkeMaK8Sm 5jYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=slQnmCCKVypa4kTkynAHk8jiQbK/0TWXOQzQ2Hj1Nzc=; b=aRUrIBP/nkg568eJCnaJSdA12boI94AmwyDqaLCAn8THFa5w9TuRszFtRXRal24+tI 7SJl1MNB+F0XvUSTf2NUwwzzpzzjgm6ksbyjZRf+wCuZdVvXhwqJxaVl7InVp/XS/tFY 3sYQGnOfNsQmbDKjTwC0peSInsQaS4YUeDRx4BSNjSKGBFEwmbq12GC5RSK6Nsl+pHDU RdAx/IPR/EQMyZ7CqaezoVVTww9fKD+AjPUPx4VPULoxDfm/Br5nJDWmXa8sekImzq6K bA/PYJTFp+bSB/rKJWfJ6U2SqMlzo4PsFF93DycVS1g2JqdorKVQJJJvJlzH5zyycROg WCZQ== X-Gm-Message-State: ACgBeo3gxYSQR0U3YVHBa2S+U2pi5ia32V0kbMROsIqKJ7o3+Uzn8Zvh qCq0/riGLq02raYH6sIPTT0= X-Google-Smtp-Source: AA6agR4/6mrhMqTx0Co32cbxiEImspKZxbGg/0slKtxX9dUPO/L3V8XEteKDiQ6vUkUgcDBm4bkX9w== X-Received: by 2002:a63:6b02:0:b0:41c:322d:eb2 with SMTP id g2-20020a636b02000000b0041c322d0eb2mr4593345pgc.169.1659408489104; Mon, 01 Aug 2022 19:48:09 -0700 (PDT) Received: from hyeyoo ([114.29.91.56]) by smtp.gmail.com with ESMTPSA id z15-20020a634c0f000000b0041b2f37c571sm8068589pga.34.2022.08.01.19.48.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Aug 2022 19:48:08 -0700 (PDT) Date: Tue, 2 Aug 2022 11:47:14 +0900 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Vlastimil Babka Cc: Rongwei Wang , Christoph Lameter , Joonsoo Kim , David Rientjes , Pekka Enberg , linux-mm@kvack.org, Feng Tang Subject: Re: [RFC PATCH] mm, slub: restrict sysfs validation to debug caches and make it safe Message-ID: References: <20220727182909.11231-1-vbabka@suse.cz> <224265c3-f3fe-aecd-039a-0c9ba3817305@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <224265c3-f3fe-aecd-039a-0c9ba3817305@suse.cz> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659408490; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=slQnmCCKVypa4kTkynAHk8jiQbK/0TWXOQzQ2Hj1Nzc=; b=fkvHpjhHJWyxGttTIDPy46DFP3nD3X7NiWuEbnj0LzA/+/LDM3tmkswAnG/B+jRA1ju/qc OLgbj5+xoFjYm9Wr+gzxUHh0UMK4+Oou4yXSfYVx37pZdTKiIF7+wrbozQSMLiOwyB4+Pl m8Q+jhwOfv8Rk23DpOW6p1hCrQ2T2wY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=p8Ke5Hgu; spf=pass (imf21.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659408490; a=rsa-sha256; cv=none; b=nJJBvxV8Vq3rxEsbVlyUkc3K3vwgMVkvcfJIMjrFEY5IJ2iJP8ymKzhKEC+RAhjrpCNvUG ylCvKi2I7a3A8u1RwvgyNS7VEUEVEyvDCCe3ozvAQdyl2hV/NNk7/diNiecTr6gpyrHdoi FfQPiA2aFUInSeeF+58zg1iQqmg+QFc= X-Rspam-User: X-Stat-Signature: jidk197xoxqxmaq9q1uugz1udkn7ssm6 X-Rspamd-Queue-Id: 64E1E1C0114 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=p8Ke5Hgu; spf=pass (imf21.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam08 X-HE-Tag: 1659408490-377807 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 01, 2022 at 03:51:49PM +0200, Vlastimil Babka wrote: > On 7/31/22 10:44, Hyeonggon Yoo wrote: > > On Wed, Jul 27, 2022 at 08:29:09PM +0200, Vlastimil Babka wrote: > > Also all issues (as far as I know) related to validate attribute > > as gone after this patch. > > As you (And Rongwei) were able to trigger/reproduce the issues, does your > testing also no longer reproduce them? Yes. I was no longer able to reproduce them. And it would be better if we could get Rongwei's Tested-by: too. > > Silly question: > > Do we want to apply on stable trees? > > I'd prefer not to, it's too intrusive for stable. > > > I doubt someone would use validate attribute when not debugging. > > I doubt as well. Also it requires root, and even if somebody hits the issue, > it's just spurious warnings, nothing fatal. So that doesn't warrant the > intrusive stable backport IMHO. Agreed. > > >> [1] https://lore.kernel.org/all/69462916-2d1c-dd50-2e64-b31c2b61690e@suse.cz/ > >> > >> mm/slub.c | 322 +++++++++++++++++++++++++++++++++++++++--------------- > >> 1 file changed, 231 insertions(+), 91 deletions(-) > >> > >> diff --git a/mm/slub.c b/mm/slub.c > >> index b1281b8654bd..01e5228809d7 100644 > >> --- a/mm/slub.c > >> +++ b/mm/slub.c > >> @@ -1324,17 +1324,14 @@ static inline int alloc_consistency_checks(struct kmem_cache *s, > >> } > > > > [...] > > > >> +/* > >> + * Called only for kmem_cache_debug() caches instead of acquire_slab(), with a > >> + * slab from the n->partial list. Removes only a single object from the slab > >> + * under slab_lock(), does the alloc_debug_processing() checks and leaves the > >> + * slab on the list, or moves it to full list if it was the last object. > >> + */ > >> +static void *alloc_single_from_partial(struct kmem_cache *s, > >> + struct kmem_cache_node *n, struct slab *slab) > >> +{ > >> + void *object; > >> + unsigned long flags; > >> + > >> + lockdep_assert_held(&n->list_lock); > >> + > >> + slab_lock(slab, &flags); > >> + > >> + object = slab->freelist; > >> + slab->freelist = get_freepointer(s, object); > >> + slab->inuse++; > >> + > >> + if (!alloc_debug_processing(s, slab, object)) { > >> + remove_partial(n, slab); > >> + slab_unlock(slab, &flags); > >> + return NULL; > >> + } > >> + > >> + if (slab->inuse == slab->objects) { > >> + remove_partial(n, slab); > >> + add_full(s, n, slab); > >> + } > >> + > >> + slab_unlock(slab, &flags); > > > > AFAIK add_full/remove_full/add_partial/remove_partial > > can be called outside slab_lock but inside list_lock. > > Right, I will adjust, thanks. > > >> + return object; > >> +} > >> + > >> +/* > >> + * Called only for kmem_cache_debug() caches to allocate from a freshly > >> + * allocated slab. Allocates a single object instead of whole freelist > >> + * and puts the slab to the partial (or full) list. > >> + */ > >> +static void *alloc_single_from_new_slab(struct kmem_cache *s, > >> + struct slab *slab) > >> +{ > >> + int nid = slab_nid(slab); > >> + struct kmem_cache_node *n = get_node(s, nid); > >> + unsigned long flags, flags2; > >> + void *object; > >> + > >> + spin_lock_irqsave(&n->list_lock, flags); > >> + slab_lock(slab, &flags2); > >> + > >> + object = slab->freelist; > >> + slab->freelist = get_freepointer(s, object); > >> + /* Undo what allocate_slab() did */ > >> + slab->frozen = 0; > >> + slab->inuse = 1; > > > > Maybe do it in allocate_slab()? > > Hmm yeah, I guess we could stop doing that pre-freezing and inuse = objects > in allocate_slab(), and do it in __slab_alloc(), which thus won't add any > overhead. Then we won't have to unfreeze in early_kmem_cache_node_alloc() as > well. Sounds good. > > >> + if (!alloc_debug_processing(s, slab, object)) { > >> + /* > >> + * It's not really expected that this would fail on a > >> + * freshly allocated slab, but a concurrent memory > >> + * corruption in theory could cause that. > >> + */ > >> + slab_unlock(slab, &flags2); > >> + spin_unlock_irqrestore(&n->list_lock, flags); > >> + return NULL; > >> + } > >> + > >> + if (slab->inuse == slab->objects) > >> + add_full(s, n, slab); > >> + else > >> + add_partial(n, slab, DEACTIVATE_TO_HEAD); > >> + > >> + slab_unlock(slab, &flags2); > >> + inc_slabs_node(s, nid, slab->objects); > >> + spin_unlock_irqrestore(&n->list_lock, flags); > >> + > >> + return object; > >> +} > > > > [...] > > > >> #endif /* CONFIG_SLUB_DEBUG */ > >> > >> #if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SYSFS) > >> @@ -3036,6 +3165,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > >> return NULL; > >> } > >> > >> + stat(s, ALLOC_SLAB); > >> + > >> + if (kmem_cache_debug(s)) { > >> + freelist = alloc_single_from_new_slab(s, slab); > >> + > >> + if (unlikely(!freelist)) > >> + goto new_objects; > >> + > >> + if (s->flags & SLAB_STORE_USER) > >> + set_track(s, freelist, TRACK_ALLOC, addr); > >> + > >> + return freelist; > >> + } > >> + > >> /* > >> * No other reference to the slab yet so we can > >> * muck around with it freely without cmpxchg > >> @@ -3043,29 +3186,29 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > >> freelist = slab->freelist; > >> slab->freelist = NULL; > >> > >> - stat(s, ALLOC_SLAB); > >> + inc_slabs_node(s, slab_nid(slab), slab->objects); > >> > >> check_new_slab: > >> > >> if (kmem_cache_debug(s)) { > >> - if (!alloc_debug_processing(s, slab, freelist, addr)) { > >> - /* Slab failed checks. Next slab needed */ > >> - goto new_slab; > >> - } else { > >> - /* > >> - * For debug case, we don't load freelist so that all > >> - * allocations go through alloc_debug_processing() > >> - */ > >> - goto return_single; > >> - } > >> + /* > >> + * For debug caches here we had to go through > >> + * alloc_single_from_partial() so just store the tracking info > >> + * and return the object > >> + */ > >> + if (s->flags & SLAB_STORE_USER) > >> + set_track(s, freelist, TRACK_ALLOC, addr); > >> + return freelist; > >> } > >> > >> - if (unlikely(!pfmemalloc_match(slab, gfpflags))) > >> + if (unlikely(!pfmemalloc_match(slab, gfpflags))) { > >> /* > >> * For !pfmemalloc_match() case we don't load freelist so that > >> * we don't make further mismatched allocations easier. > >> */ > >> - goto return_single; > >> + deactivate_slab(s, slab, get_freepointer(s, freelist)); > >> + return freelist; > >> + } > > > > > > > >> > >> retry_load_slab: > >> > >> @@ -3089,11 +3232,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > >> c->slab = slab; > >> > >> goto load_freelist; > >> - > >> -return_single: > >> - > >> - deactivate_slab(s, slab, get_freepointer(s, freelist)); > >> - return freelist; > >> } > >> > >> /* > >> @@ -3341,9 +3479,10 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > >> if (kfence_free(head)) > >> return; > >> > >> - if (kmem_cache_debug(s) && > >> - !free_debug_processing(s, slab, head, tail, cnt, addr)) > >> + if (kmem_cache_debug(s)) { > >> + free_debug_processing(s, slab, head, tail, cnt, addr); > >> return; > >> + } > > > > Oh, now debugging caches does not share free path with non-debugging > > caches. > > > > Now free_debug_processing's return type can be void? > > Right. Thanks! > > >> > >> do { > >> if (unlikely(n)) { > >> @@ -3958,6 +4097,7 @@ static void early_kmem_cache_node_alloc(int node) > >> slab = new_slab(kmem_cache_node, GFP_NOWAIT, node); > >> > >> BUG_ON(!slab); > >> + inc_slabs_node(kmem_cache_node, slab_nid(slab), slab->objects); > >> if (slab_nid(slab) != node) { > >> pr_err("SLUB: Unable to allocate memory from node %d\n", node); > >> pr_err("SLUB: Allocating a useless per node structure in order to be able to continue\n"); > >> @@ -5625,7 +5765,7 @@ static ssize_t validate_store(struct kmem_cache *s, > >> { > >> int ret = -EINVAL; > >> > >> - if (buf[0] == '1') { > >> + if (buf[0] == '1' && kmem_cache_debug(s)) { > >> ret = validate_slab_cache(s); > >> if (ret >= 0) > >> ret = length; > > > > Yeah definitely this is what it should be, > > instead of serializing inc_slabs_node()/dec_slabs_node() > > for non-debugging caches. > > > >> -- > >> 2.37.1 > >> > > > -- Thanks, Hyeonggon