From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4155C433EF for ; Sat, 4 Jun 2022 11:05:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48FAA6B0071; Sat, 4 Jun 2022 07:05:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43E1B6B0073; Sat, 4 Jun 2022 07:05:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 302526B0074; Sat, 4 Jun 2022 07:05:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1E68D6B0071 for ; Sat, 4 Jun 2022 07:05:06 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DF97234E11 for ; Sat, 4 Jun 2022 11:05:05 +0000 (UTC) X-FDA: 79540271370.15.02AE4AB Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by imf24.hostedemail.com (Postfix) with ESMTP id 47509180036 for ; Sat, 4 Jun 2022 11:04:49 +0000 (UTC) Received: by mail-io1-f50.google.com with SMTP id q76so2731827iod.8 for ; Sat, 04 Jun 2022 04:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LXSima+rBrCZEoEdN9Res2CXy6Ed3jHKZBixDdAZll8=; b=PHV3GKBOOPM8zFLSgovBNQM6CEUPk3EF1YQWC5HUsvdBkhDIFswCP3TuCMSDWcFu1Y FIM4KoackkR4uQVpQlXCCU1ufCzd8fMioUYyRMfL+NWbEoGLjzXSfUOIcYm4J3kC19cz aE5aWY7B11ZW83m254oZwDglF10Vb9E5d9sqmgYeLbS2DLMnLG6exu5sheVJpj0yWmmK pu8EAtbA1MWZtgtMEhBNA/Z3D7d1f0OClXtiM2syAVtrQG6c0pQg/uTiGC+eeVMRjvjJ PqS+dNkqjIOcTzZ9dUmeEiPlfk3kE36G6GeL6y6Q4qxjSggyAHikgF8ZiwfEz6B/GphQ s6Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LXSima+rBrCZEoEdN9Res2CXy6Ed3jHKZBixDdAZll8=; b=0wCDomC9PJjoJ5YxQqpFZkhK1KRkg5JWjTcVoLfYGXDlXUMvMYhuBB55I02tDEkDbq 85BMPgIogdrbhfk8eBuyilfJTqRu700tui2b2fRrcmb/JyD85o2I/JdribA7ejGDAA2j 0SpHMmFTjH8sVWpmNXvr28t47b3gYLo/80dx6LuMwV9tZqeYbzHzrTFeNAR3W6FBG2t4 XsfZqfb6UhjPqigoM6+MyGSKO1oI140gYdQAblBUmWefZ6P60IfDSSMNPxjlcdTsBTe0 3V9LNABhjg/zW6o3U2CxLXRMvg0PmLoYODNCL9FAMW+Z1oxdDwvpCEDL99BoMHyKwN0/ 7eOw== X-Gm-Message-State: AOAM532Gy7xb7KJdo0v8tVcvCVeZZWRb1D+aqa1iWXVWBMq4WovUkSaY oMgZDXhi76wz8ELnRKVfs4M= X-Google-Smtp-Source: ABdhPJw8dV2gItyNIZtjEdzZe9cHbFcqdcv5hFTxrgCHD7joYFdZ4SGHNtdsoRG7tLNP4KFnz831Ig== X-Received: by 2002:a05:6602:81b:b0:665:746a:f079 with SMTP id z27-20020a056602081b00b00665746af079mr6863759iow.125.1654340704769; Sat, 04 Jun 2022 04:05:04 -0700 (PDT) Received: from n2.us-central1-a.c.spheric-algebra-350919.internal (151.16.70.34.bc.googleusercontent.com. [34.70.16.151]) by smtp.gmail.com with ESMTPSA id o1-20020a056e02102100b002d3ca0d55d0sm3954868ilj.48.2022.06.04.04.05.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Jun 2022 04:05:03 -0700 (PDT) Date: Sat, 4 Jun 2022 11:05:02 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Muchun Song Cc: Rongwei Wang , akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com, rientjes@google.com, penberg@kernel.org, cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Message-ID: References: <20220529081535.69275-1-rongwei.wang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 47509180036 X-Stat-Signature: dec9nt8sbt4tpxgxj5eimnjf4dsjy6ju X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PHV3GKBO; spf=pass (imf24.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.166.50 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1654340689-299781 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 31, 2022 at 11:47:22AM +0800, Muchun Song wrote: > On Sun, May 29, 2022 at 11:37:06AM +0000, Hyeonggon Yoo wrote: > > On Sun, May 29, 2022 at 04:15:33PM +0800, Rongwei Wang wrote: > > > In use cases where allocating and freeing slab frequently, some > > > error messages, such as "Left Redzone overwritten", "First byte > > > 0xbb instead of 0xcc" would be printed when validating slabs. > > > That's because an object has been filled with SLAB_RED_INACTIVE, > > > but has not been added to slab's freelist. And between these > > > two states, the behaviour of validating slab is likely to occur. > > > > > > Actually, it doesn't mean the slab can not work stably. But, these > > > confusing messages will disturb slab debugging more or less. > > > > > > Signed-off-by: Rongwei Wang > > > > Have you observed it or it's from code inspection? > > > > > --- > > > mm/slub.c | 40 +++++++++++++++++----------------------- > > > 1 file changed, 17 insertions(+), 23 deletions(-) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index ed5c2c03a47a..310e56d99116 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -1374,15 +1374,12 @@ static noinline int free_debug_processing( > > > void *head, void *tail, int bulk_cnt, > > > unsigned long addr) > > > { > > > - struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > > > void *object = head; > > > int cnt = 0; > > > - unsigned long flags, flags2; > > > + unsigned long flags; > > > int ret = 0; > > > > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - slab_lock(slab, &flags2); > > > - > > > + slab_lock(slab, &flags); > > > if (s->flags & SLAB_CONSISTENCY_CHECKS) { > > > if (!check_slab(s, slab)) > > > goto out; > > > @@ -1414,8 +1411,7 @@ static noinline int free_debug_processing( > > > slab_err(s, slab, "Bulk freelist count(%d) invalid(%d)\n", > > > bulk_cnt, cnt); > > > > > > - slab_unlock(slab, &flags2); > > > - spin_unlock_irqrestore(&n->list_lock, flags); > > > + slab_unlock(slab, &flags); > > > if (!ret) > > > slab_fix(s, "Object at 0x%p not freed", object); > > > return ret; > > > @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > > > > > > { > > > void *prior; > > > - int was_frozen; > > > + int was_frozen, to_take_off = 0; > > > struct slab new; > > > unsigned long counters; > > > struct kmem_cache_node *n = NULL; > > > @@ -3315,15 +3311,19 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > > > if (kfence_free(head)) > > > return; > > > > > > + n = get_node(s, slab_nid(slab)); > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + > > > > Oh please don't do this. > > > > SLUB free slowpath can be hit a lot depending on workload. > > > > __slab_free() try its best not to take n->list_lock. currently takes n->list_lock > > only when the slab need to be taken from list. > > > > Unconditionally taking n->list_lock will degrade performance. > > > > I can confirm you are right. We have encountered this issue in practise. > We have deployed somen HDFS instance on a 256-CPU machine. When there > are lots of IO requests from users, we can see lots of threads are contended > on n->list_lock. Lots of call traces are like following: > > ffffffff810dfbb4 native_queued_spin_lock_slowpath+0x1a4 > ffffffff81780ffb _raw_spin_lock+0x1b > ffffffff8127327e get_partial_node.isra.81+0x5e > ffffffff812752d3 ___slab_alloc+0x2f3 > ffffffff8127559c __slab_alloc+0x1c > ffffffff81275828 kmem_cache_alloc+0x278 > ffffffff812e9e3d alloc_buffer_head+0x1d > ffffffff812e9f74 alloc_page_buffers+0xa4 > ffffffff812eb0e9 create_empty_buffers+0x19 > ffffffff812eb37d create_page_buffers+0x7d > ffffffff812ed78d __block_write_begin_int+0x9d > > I thought it was because there are lots of threads which consume local > CPU slab cache quickly and then both of them try to get a new slab > from node partial list. Since there are 256 CPUs, the contention > is more competitive and easy to be visible. > > Any thoughts on this issue (e.e. how to ease contention)? Comments > are welcome. How does increasing number of partial slabs affect your situation? (increasing /sys/slab//cpu_partial) > Thanks. > >