From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7CCAC00528 for ; Wed, 26 Jul 2023 23:09:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F3EA6B0072; Wed, 26 Jul 2023 19:09:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A4216B0075; Wed, 26 Jul 2023 19:09:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3476A8D0001; Wed, 26 Jul 2023 19:09:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 24A216B0072 for ; Wed, 26 Jul 2023 19:09:50 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E739416042E for ; Wed, 26 Jul 2023 23:09:49 +0000 (UTC) X-FDA: 81055307298.03.75D80FD Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf14.hostedemail.com (Postfix) with ESMTP id D5AA7100022 for ; Wed, 26 Jul 2023 23:09:47 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b="Lnuap/X4"; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf14.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690412988; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bNBWi9BFA7oJZNyI/XnA5H8tHZCJl01gNE0DBxlvZ40=; b=qn7SJh7ZUnNYAcruUBPkEirA2ug6PyO0wdJFAPDz75Ha8uVCPcxUmdju2k1cTWnw4ZDlE+ vuPWJIjmhiMuonwb951OdIis4zjGPMqlney/kWumgo+1uePUBXjAi06twwAmR9TZxEgDVh 7Ado/mvmYBCQk3LGk9lOaSThhUhHVMg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b="Lnuap/X4"; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf14.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690412988; a=rsa-sha256; cv=none; b=4ozKQ+BWCT9Jl9yy1XgPtBzeh1jewRaUoJfzDukGXnM6FMN3MEeU/H/qEdeAjBDvkuIITi HjblbRR/Z/z5jtyC5Iwz7q6CRa9oSOqEeOqbDXYhHsDPFwm5zFZrgnvvqa7Aircx2Pipex GRT/Iz427i9LMjFNjXIswmlupFARgwo= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1bbdc05a93bso831265ad.0 for ; Wed, 26 Jul 2023 16:09:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1690412986; x=1691017786; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=bNBWi9BFA7oJZNyI/XnA5H8tHZCJl01gNE0DBxlvZ40=; b=Lnuap/X4mLRvumv8rxik6XXAmLQssX2LaTRnfxOwbhhRA6q+4W8DfrIsG7kVikG4Iw ozUx9bZZDKNb4aqqrWuoDs7rpphs2fFB4IjiULQPXjflO/XhZA2vc5TzngrkqmMIM5UL tCjpn3hCX/GmJvs5MwDi73VFwiTs9hADB2Cru8fy0ICxuL68/fE/7pFEUcb8ihUp2Ey5 GxMGCc+rwQuZGeGguv2m2EFaBOeJPE6kWVGxSMJPh4gYWl2FbXQzGnA0u5AYkb1ckWFe PYG7wUThiNHKJJU/Kftp/PqqvyLMK20NIq4IuXuyA0M1JDxZWj46ZkNyp0CVDzf4FlQ+ b/TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690412986; x=1691017786; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bNBWi9BFA7oJZNyI/XnA5H8tHZCJl01gNE0DBxlvZ40=; b=ZWzvdCO3xRlzs3T73ZSgT7urL2XGyn8tNdMlbvoD0DxVm++Gdq4IMhuX6XX2abZuor +ROy67alLA/tw65E+H6YBJSBAHtRmICGGyEOe4HuGYErl4aietzOwbQcOXmk/m7pszCN qM2D5jA6z69MQcIqwR6z/i3bLgeOgiSuwJXRBFFxOOPiq9ge0EXXczwE9vflCG3eY2og O4DY8qfS/hNmsMmGodvdyySyiYO6BFynUG/khgH31Wu98Yuzt6/6fyJutn8vg7YDV0FW Lnd2+Cu5cF5m7ladtO8sK7Gk9v5ZhE1hwLlyR5cKpQbbMNPLzxljLztb1H8Vbk6yJRYo jBXw== X-Gm-Message-State: ABy/qLbm8L5OSL4n7pyOItc24PynWhpT4JNYwpbVZwbMQKqDNMdeeLyp RRUXXugME+oDVEg2QN9/PwI+4Q== X-Google-Smtp-Source: APBJJlHUt9d2hTqe2ud94c6lNUCrPn/cPviOHGLjkBp5h1xJnkt7lTpPKllT1sxo0CM9mi2vK7HwDA== X-Received: by 2002:a17:902:c10c:b0:1b8:b382:f6c3 with SMTP id 12-20020a170902c10c00b001b8b382f6c3mr2914213pli.13.1690412986476; Wed, 26 Jul 2023 16:09:46 -0700 (PDT) Received: from dread.disaster.area (pa49-186-119-116.pa.vic.optusnet.com.au. [49.186.119.116]) by smtp.gmail.com with ESMTPSA id u9-20020a17090341c900b001ac95be5081sm58846ple.307.2023.07.26.16.09.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jul 2023 16:09:45 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qOndW-00AvaD-0j; Thu, 27 Jul 2023 09:09:42 +1000 Date: Thu, 27 Jul 2023 09:09:42 +1000 From: Dave Chinner To: Qi Zheng Cc: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [PATCH v2 44/47] mm: shrinker: make global slab shrink lockless Message-ID: References: <20230724094354.90817-1-zhengqi.arch@bytedance.com> <20230724094354.90817-45-zhengqi.arch@bytedance.com> <19ad6d06-8a14-6102-5eae-2134dc2c5061@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <19ad6d06-8a14-6102-5eae-2134dc2c5061@bytedance.com> X-Rspamd-Queue-Id: D5AA7100022 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: kt9qakxxsayidii9qef6hzpuzqw6w757 X-HE-Tag: 1690412987-655911 X-HE-Meta: U2FsdGVkX1+l207+MbHj+0lmlp4U7CqUjRR/65LrkaTyfwBX1sQoUqjfLn7t+RxlDoOEV+CU1d8XnqNUak4i3svno0NgvMXZmiY8i1/Y0z+4rOGDXu935cx9UB7B3HJWQ14WzNivf7+jHvuTl3K40iCgNfwc4nmqMzB0vscCJPudLxuMIx+MB1X2lfNRryg10GcSvhrP/CHvFTkqGBLZROoqKCSHgVAg1IQfld1fm9fG0zBn6Fc4Dn486GrWU3704NoK+csanJAvE7EkZgoPs5cr4yhQeU2kxAUPdT8CtuR/VEBjE4Y6Mh3hkSch8h5afn4DYDDqX8fQiF9xw2Cxj3MOy+HSCfrdEkPio2njgEhEuvEXmpuEJYnmbBfTOBYRIbxFvns88G9e1jAPKBCaqNYf9SGhS992d/8pdA3rlIISO1TakZNns6bDt8+UDTFFsmUKFo9OXSsMe2t/G0cf44gxlw6hsj+dG2pSTLeVAfZqqxVhRc2uQ/uQ3r8m/Nh4OFdOjWf7YxTq19HfgYEXcFBYp42/mvIxTek5/DY9nn4e96L+1Tu4BWZcE9RzUQj8Hrz6/Xq8MPs2XBGg/Wtc8COL1WQI/h5MBxx5TYwaw1VKvWviNLVdso/PchD79J/cLEscRiwDvOEB2naIWZWKxVrV1XndrPlwzqcMmbqbkcHv33NTebAHf1LT5/Aw63H+tOlhI8J0Ps1s5gPiJ0drlJryI8WdtPsgGg2ciJqdv1yl5EAZCMlbN/+YeIp89B+6TSagWmiZu7UNmdKhcvlekkWAinb0XHvlEhZ4q6xlox74NOBGddwd/WpOs6zN2EjYcnAzvmdptZiCnxuq9qIck2GoR1s2sDc1/HjoSLnSUftkxOBZ3Td9WuP1/xCMWSdYl26ox0kSzeWeJZEdINaePeEskMH9TxUp/tALx/CZMFLc0IT2RrOukSsPkHLRpwRQsW5qCnseVzFUIovnqiw RBeRZySF WSzkeNCI6kO7+3EfNmrPtZI4q6max5ELBbtO/OvlMGFoHyXsq5DpzQuoAyNO3lqNjZYPfR/Je7+KCVGB8yMb3gNrnZtNy/43BRRkWfdn4yX69t/PfVo0B2MNxk+P9AfFw7db6siwdvBuBuGcQ7Dj1aq1camxvcB/u07dgtZbgEwtbw+xdTFmfXp9PouUd5ZvtYjlxtnDwL9GXdiitIrYuj8p+Xfm6rkkDSGUUYAsa4FJRaLs+GusocBbgMc7Ks3MTQhbeqXiMeOzzU1d0uApOCfqCOiJBzhpgUbrNH2tDSVARoLInT23/mnToq89iR27FxNO1FsGwJmgqseqs7ZGnxS+AqQ1RftKWr4nUliKS5RK1yo9huCeZ19EDX+VBzeQHI2nMtgTKr7xfQW5GkKn879h7Tro+2I3yoPqBfNtqSWsvn9RoSFmvaIVmcCPORLg8XP2q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 26, 2023 at 05:14:09PM +0800, Qi Zheng wrote: > On 2023/7/26 16:08, Dave Chinner wrote: > > On Mon, Jul 24, 2023 at 05:43:51PM +0800, Qi Zheng wrote: > > > @@ -122,6 +126,13 @@ void shrinker_free_non_registered(struct shrinker *shrinker); > > > void shrinker_register(struct shrinker *shrinker); > > > void shrinker_unregister(struct shrinker *shrinker); > > > +static inline bool shrinker_try_get(struct shrinker *shrinker) > > > +{ > > > + return READ_ONCE(shrinker->registered) && > > > + refcount_inc_not_zero(&shrinker->refcount); > > > +} > > > > Why do we care about shrinker->registered here? If we don't set > > the refcount to 1 until we have fully initialised everything, then > > the shrinker code can key entirely off the reference count and > > none of the lookup code needs to care about whether the shrinker is > > registered or not. > > The purpose of checking shrinker->registered here is to stop running > shrinker after calling shrinker_free(), which can prevent the following > situations from happening: > > CPU 0 CPU 1 > > shrinker_try_get() > > shrinker_try_get() > > shrinker_put() > shrinker_try_get() > shrinker_put() I don't see any race here? What is wrong with having multiple active users at once? > > > > This should use a completion, then it is always safe under > > rcu_read_lock(). This also gets rid of the shrinker_lock spin lock, > > which only exists because we can't take a blocking lock under > > rcu_read_lock(). i.e: > > > > > > void shrinker_put(struct shrinker *shrinker) > > { > > if (refcount_dec_and_test(&shrinker->refcount)) > > complete(&shrinker->done); > > } > > > > void shrinker_free() > > { > > ..... > > refcount_dec(&shrinker->refcount); > > I guess what you mean is shrinker_put(), because here may be the last > refcount. Yes, I did. > > wait_for_completion(&shrinker->done); > > /* > > * lookups on the shrinker will now all fail as refcount has > > * fallen to zero. We can now remove it from the lists and > > * free it. > > */ > > down_write(shrinker_rwsem); > > list_del_rcu(&shrinker->list); > > up_write(&shrinker_rwsem); > > call_rcu(shrinker->rcu_head, shrinker_free_rcu_cb); > > } > > > > .... > > > > > @@ -686,11 +711,14 @@ EXPORT_SYMBOL(shrinker_free_non_registered); > > > void shrinker_register(struct shrinker *shrinker) > > > { > > > - down_write(&shrinker_rwsem); > > > - list_add_tail(&shrinker->list, &shrinker_list); > > > - shrinker->flags |= SHRINKER_REGISTERED; > > > + refcount_set(&shrinker->refcount, 1); > > > + > > > + spin_lock(&shrinker_lock); > > > + list_add_tail_rcu(&shrinker->list, &shrinker_list); > > > + spin_unlock(&shrinker_lock); > > > + > > > shrinker_debugfs_add(shrinker); > > > - up_write(&shrinker_rwsem); > > > + WRITE_ONCE(shrinker->registered, true); > > > } > > > EXPORT_SYMBOL(shrinker_register); > > > > This just looks wrong - you are trying to use WRITE_ONCE() as a > > release barrier to indicate that the shrinker is now set up fully. > > That's not necessary - the refcount is an atomic and along with the > > rcu locks they should provides all the barriers we need. i.e. > > The reason I used WRITE_ONCE() here is because the shrinker->registered > will be read and written concurrently (read in shrinker_try_get() and > written in shrinker_free()), which is why I added shrinker::registered > field instead of using SHRINKER_REGISTERED flag (this can reduce the > addition of WRITE_ONCE()/READ_ONCE()). Using WRITE_ONCE/READ_ONCE doesn't provide memory barriers needed to use the field like this. You need release/acquire memory ordering here. i.e. smp_store_release()/smp_load_acquire(). As it is, the refcount_inc_not_zero() provides a control dependency, as documented in include/linux/refcount.h, refcount_dec_and_test() provides release memory ordering. The only thing I think we may need is a write barrier before refcount_set(), such that if refcount_inc_not_zero() sees a non-zero value, it is guaranteed to see an initialised structure... i.e. refcounts provide all the existence and initialisation guarantees. Hence I don't see the need to use shrinker->registered like this and it can remain a bit flag protected by the shrinker_rwsem(). > > void shrinker_register(struct shrinker *shrinker) > > { > > down_write(&shrinker_rwsem); > > list_add_tail_rcu(&shrinker->list, &shrinker_list); > > shrinker->flags |= SHRINKER_REGISTERED; > > shrinker_debugfs_add(shrinker); > > up_write(&shrinker_rwsem); > > > > /* > > * now the shrinker is fully set up, take the first > > * reference to it to indicate that lookup operations are > > * now allowed to use it via shrinker_try_get(). > > */ > > refcount_set(&shrinker->refcount, 1); > > } > > > > > diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c > > > index f1becfd45853..c5573066adbf 100644 > > > --- a/mm/shrinker_debug.c > > > +++ b/mm/shrinker_debug.c > > > @@ -5,6 +5,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > /* defined in vmscan.c */ > > > extern struct rw_semaphore shrinker_rwsem; > > > @@ -161,17 +162,21 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > > > { > > > struct dentry *entry; > > > char buf[128]; > > > - int id; > > > - > > > - lockdep_assert_held(&shrinker_rwsem); > > > + int id, ret = 0; > > > /* debugfs isn't initialized yet, add debugfs entries later. */ > > > if (!shrinker_debugfs_root) > > > return 0; > > > + down_write(&shrinker_rwsem); > > > + if (shrinker->debugfs_entry) > > > + goto fail; > > > + > > > id = ida_alloc(&shrinker_debugfs_ida, GFP_KERNEL); > > > - if (id < 0) > > > - return id; > > > + if (id < 0) { > > > + ret = id; > > > + goto fail; > > > + } > > > shrinker->debugfs_id = id; > > > snprintf(buf, sizeof(buf), "%s-%d", shrinker->name, id); > > > @@ -180,7 +185,8 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > > > entry = debugfs_create_dir(buf, shrinker_debugfs_root); > > > if (IS_ERR(entry)) { > > > ida_free(&shrinker_debugfs_ida, id); > > > - return PTR_ERR(entry); > > > + ret = PTR_ERR(entry); > > > + goto fail; > > > } > > > shrinker->debugfs_entry = entry; > > > @@ -188,7 +194,10 @@ int shrinker_debugfs_add(struct shrinker *shrinker) > > > &shrinker_debugfs_count_fops); > > > debugfs_create_file("scan", 0220, entry, shrinker, > > > &shrinker_debugfs_scan_fops); > > > - return 0; > > > + > > > +fail: > > > + up_write(&shrinker_rwsem); > > > + return ret; > > > } > > > int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) > > > @@ -243,6 +252,11 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker, > > > shrinker->name = NULL; > > > *debugfs_id = entry ? shrinker->debugfs_id : -1; > > > + /* > > > + * Ensure that shrinker->registered has been set to false before > > > + * shrinker->debugfs_entry is set to NULL. > > > + */ > > > + smp_wmb(); > > > shrinker->debugfs_entry = NULL; > > > return entry; > > > @@ -266,14 +280,26 @@ static int __init shrinker_debugfs_init(void) > > > shrinker_debugfs_root = dentry; > > > /* Create debugfs entries for shrinkers registered at boot */ > > > - down_write(&shrinker_rwsem); > > > - list_for_each_entry(shrinker, &shrinker_list, list) > > > + rcu_read_lock(); > > > + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { > > > + if (!shrinker_try_get(shrinker)) > > > + continue; > > > + rcu_read_unlock(); > > > + > > > if (!shrinker->debugfs_entry) { > > > - ret = shrinker_debugfs_add(shrinker); > > > - if (ret) > > > - break; > > > + /* Paired with smp_wmb() in shrinker_debugfs_detach() */ > > > + smp_rmb(); > > > + if (READ_ONCE(shrinker->registered)) > > > + ret = shrinker_debugfs_add(shrinker); > > > } > > > - up_write(&shrinker_rwsem); > > > + > > > + rcu_read_lock(); > > > + shrinker_put(shrinker); > > > + > > > + if (ret) > > > + break; > > > + } > > > + rcu_read_unlock(); > > > return ret; > > > } > > > > And all this churn and complexity can go away because the > > shrinker_rwsem is still used to protect shrinker_register() > > entirely.... > > My consideration is that during this process, there may be a > driver probe failure and then shrinker_free() is called (the > shrinker_debugfs_init() is called in late_initcall stage). In > this case, we need to use RCU+refcount to ensure that the shrinker > is not freed. Yeah, you're trying to work around the lack of a wait_for_completion() call in shrinker_free(). With that, this doesn't need RCU at all, and the iteration can be done fully under the shrinker_rwsem() safely and so none of this code needs to change. Cheers, Dave. -- Dave Chinner david@fromorbit.com