From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6DBAC00528 for ; Mon, 7 Aug 2023 23:28:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F4118E0001; Mon, 7 Aug 2023 19:28:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A42C8D0001; Mon, 7 Aug 2023 19:28:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 345008E0001; Mon, 7 Aug 2023 19:28:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 255F48D0001 for ; Mon, 7 Aug 2023 19:28:52 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E268114039E for ; Mon, 7 Aug 2023 23:28:51 +0000 (UTC) X-FDA: 81098900862.29.212ECC7 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf15.hostedemail.com (Postfix) with ESMTP id DD057A0020 for ; Mon, 7 Aug 2023 23:28:49 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=adlg0mZ+; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf15.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691450930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0xqamAux0UTVG6klbJHwJNM84FqVsjBhqmY+h06Y7S4=; b=5DBcRf/AJ01odZzxSvyLI87rVSe/20LmyzkTd6jXh4GYZ3gjlKpygsxgTDu4XIbRUj94QT 6wHhqN/L25o9TEIFif0WHYJ70HRF6K3hxgMaQbTFUevjD+EfnbMkepaXo2EXaJahQtXhge fvZDUSpcZ7f711C/HdQyJG27CrGArp8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=adlg0mZ+; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf15.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691450930; a=rsa-sha256; cv=none; b=E9FmORyhO1dqnivT2o+0AnnqLT7fqD88IUnW/aaM+D1tQSj3uoUudCtMsI9xnEKJgOY0JV 1hKvcINGTaGKovaxAh2xGgZ+XFuCgQegMsSlXKbsAeUyphs9iZHytZKRfEO3HFEm/VyOPd rt2ZZILZN1/7D2U04ZPHTDkUSacjx50= Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-686b9964ae2so3587544b3a.3 for ; Mon, 07 Aug 2023 16:28:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1691450928; x=1692055728; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0xqamAux0UTVG6klbJHwJNM84FqVsjBhqmY+h06Y7S4=; b=adlg0mZ+HbNffKaKec2p7cVEI4EUlpFfVSVpmG5fvHzHnQEOdBYZjEy4QnkJtTu7lu YmN6EGaH1vJkJ2ZOmqiBcV/yaN6Y4JVna47CjKD7nPXpOiMzYNw9UySGma80pWoaj7Dk jUxXtUXqouB4bUXn3WeUBdMeMQNzv8NtKVfvER+m7KEKe99ZrQWUTG4Y9TUCDw4Kbcew TFF8/aAopaaIk2YO3rVVNpTSZmvFLss5lfJ8bn13k+LTkCPg51SMuEfV97r2Zvr+7s3C AcdJXj3qJcnQFWBnyvELMIWQLaBKzEk7yCwrvdFLv4Z0y5aFUIdjan9j2uUGZT7xZ3GX Ib8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691450928; x=1692055728; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0xqamAux0UTVG6klbJHwJNM84FqVsjBhqmY+h06Y7S4=; b=fEOqQv4W6vhfZXkuaIcSZse0tBAanerQpB6/zefPqrwmF4QB52Anf0a7LEtKlcCO/G r7bx51pO0t+vqiA4EwcALYFSSlUsytXtf+elXN7Il2JUZnID3wNThG7gisqDM4x91QjC FoFNc7m6T+m9pEmqy6IW1zeYhhCxpe7/DzsqG6Ii4FeQvx83sSplMTLTJ/+c8buBRuIA PF/ZBkv5jeBtnIg/y+OTORYsW/2f9Lav9vvcm6Vwbk+gPphfaMTzZC4b0z/J+cv+F+gG hGmNvC6v44Jm02senVOpUusiBInAp0ora4QE/40Ec8tJfJinhlzV9OrsHQm4+dMZbm6x b8jQ== X-Gm-Message-State: AOJu0YwTi/NviPyhG5VOgejD6KywdY/tk06DjR/9xY+yVclzk59Lobco oiGJmTTAxeGPRNayG09ZoTo0Rw== X-Google-Smtp-Source: AGHT+IHWcHOenJQfOaWa9c7YDyMdy3l3j1H0rEzsEcwqog5HJ9HhPve3KCBNDXb4QQT+YEAg/cljWw== X-Received: by 2002:a05:6a20:8e04:b0:13c:8e50:34b8 with SMTP id y4-20020a056a208e0400b0013c8e5034b8mr12892217pzj.35.1691450928413; Mon, 07 Aug 2023 16:28:48 -0700 (PDT) Received: from dread.disaster.area (pa49-180-166-213.pa.nsw.optusnet.com.au. [49.180.166.213]) by smtp.gmail.com with ESMTPSA id e18-20020aa78c52000000b0068620bee456sm6663729pfd.209.2023.08.07.16.28.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 16:28:47 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qT9eW-002TeM-1d; Tue, 08 Aug 2023 09:28:44 +1000 Date: Tue, 8 Aug 2023 09:28:44 +1000 From: Dave Chinner To: Qi Zheng Cc: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, simon.horman@corigine.com, dlemoal@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless Message-ID: References: <20230807110936.21819-1-zhengqi.arch@bytedance.com> <20230807110936.21819-46-zhengqi.arch@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230807110936.21819-46-zhengqi.arch@bytedance.com> X-Rspam-User: X-Stat-Signature: d4hcdpe3wz7qko6outboy3y9kje46spm X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DD057A0020 X-HE-Tag: 1691450929-649491 X-HE-Meta: U2FsdGVkX18KP/lEfrNGb//aRbRiZvhOISjor09vuEw54qxxbp/PsLwPgmF41enjgI71E/pZLK2ks6s+ZveTZ9Ymn6GNr5O8LcMhOKoDIQ/99VFgpILbOOQtMBK2keMcGjfV2Fbgwgw1l+lTWdKCxHaAh+r+GtGSAyXLWdI9R0doer78P3YnWC7VNKRHG/bKaiaRhUPFhyQ0MSukjVHKG4/R9pdSYHy7T65rKUI7jArK/OQ1xVYBdWzlOmXcvr/GPCJedcTdMku/De9igGq46a9WOfslShu7T8WUNEP4bNRJj4u47zuHUYkNG1Rq6BkT/GHBXC1wZcLURLlMCycNNuL3lvBGeEXoalLT5KDKpwbQldN8FH6Poiw4eaXQ2ee59hUYo1up0tiLCzfpQ4wikQuwEYu3uI3F8C9uzAPwwuItyZ7AR0DB52OCBcUcahR3iFeqqgqVlv5bKrbp5roBOxVD9DMHslaQSHe3ze3sfqpHl0bbheZgrcz6iETEDmanfDQ93oNivSkonDG/AX5wnI/TfQR3ASb872Wfu5uAMxz6ZlB6ynNOYswGHdhe3ZxmHong+FC2INtNUJrOmdrdcnbu8gl97yJgFRPiu0fyNY8S62u+2Zm8ZS5BGuffnI2UCplPhHh4qSvSYkp37+pcazVG5G9EXf+7PYX5oVHviJJYx0BcDga8VDh883NLsE2UVA/dX2yoVXU1OhS3rI/xpt63uDx08irPIZDFqybhml9nd6EzGhHcnu6FtRUuFoAjUpEgUHdvRKM9Gv465csF08FwcYsrNJUImGCQ53zX478jT8vlC/RHmRnrpfgaJV2ectKIDNr2KBqg53QhGdvMYkkeTs+GSVYHdEqhNytMFfXtIhxEgUv8+wwx6ZiXS98ajbw5qqccH5LwnCmt2J1DHMLTyH1cZ2dT7M3iHJOw2IXiu1wGkEO3DMOQjLl6PTUIvB38WlQTZniCVpyh1a8 EPdSnXwE L5ZNGdATmr/qKoMLGNBR85zsffv85+HGxvFK3Hcvbx8sG1fFHtyQIFuU4Sbk39VAxLkRexy3uHVkj0JbZhQ4t94rFV1+yD8HlaKplg+tXfaGKiY5kbrk6s1cEI73to0azctl6y/zD6+q3RuLV3A1T95gwjMjJA6YjEghd7yc4TDwevej5atFuJjSTBc1wy3XwPHrHkXtwOHqCZjRU3on4WHwGZvdHyCjc4d9wfYPDfZRjYnbH2JeI06WKCVEi2uAObg1dYT4IklJmKruOU+tJQtZ98+UXEo0184/pcPwFP8nTne5nP6kG5otgbVAjhXSWPXMhIoWuJ/W8575OsGPDturLx7AFoKcpDRRMe44uA8V1FihE16vY62TKN4aIWx2A7zoL9XXjkis/9WrJDEm7dnWdiaP/9um/upvIyRXZiS/od/jnsK62yxDP7/1ycSE/RykKWVsdfABsW81OYDRRI2fLfkO9R98MLGWX7C//cNrXL/w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote: > The shrinker_rwsem is a global read-write lock in shrinkers subsystem, > which protects most operations such as slab shrink, registration and > unregistration of shrinkers, etc. This can easily cause problems in the > following cases. .... > This commit uses the refcount+RCU method [5] proposed by Dave Chinner > to re-implement the lockless global slab shrink. The memcg slab shrink is > handled in the subsequent patch. .... > --- > include/linux/shrinker.h | 17 ++++++++++ > mm/shrinker.c | 70 +++++++++++++++++++++++++++++----------- > 2 files changed, 68 insertions(+), 19 deletions(-) There's no documentation in the code explaining how the lockless shrinker algorithm works. It's left to the reader to work out how this all goes together.... > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h > index eb342994675a..f06225f18531 100644 > --- a/include/linux/shrinker.h > +++ b/include/linux/shrinker.h > @@ -4,6 +4,8 @@ > > #include > #include > +#include > +#include > > #define SHRINKER_UNIT_BITS BITS_PER_LONG > > @@ -87,6 +89,10 @@ struct shrinker { > int seeks; /* seeks to recreate an obj */ > unsigned flags; > > + refcount_t refcount; > + struct completion done; > + struct rcu_head rcu; What does the refcount protect, why do we need the completion, etc? > + > void *private_data; > > /* These are for internal use */ > @@ -120,6 +126,17 @@ struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); > void shrinker_register(struct shrinker *shrinker); > void shrinker_free(struct shrinker *shrinker); > > +static inline bool shrinker_try_get(struct shrinker *shrinker) > +{ > + return refcount_inc_not_zero(&shrinker->refcount); > +} > + > +static inline void shrinker_put(struct shrinker *shrinker) > +{ > + if (refcount_dec_and_test(&shrinker->refcount)) > + complete(&shrinker->done); > +} > + > #ifdef CONFIG_SHRINKER_DEBUG > extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker, > const char *fmt, ...); > diff --git a/mm/shrinker.c b/mm/shrinker.c > index 1911c06b8af5..d318f5621862 100644 > --- a/mm/shrinker.c > +++ b/mm/shrinker.c > @@ -2,6 +2,7 @@ > #include > #include > #include > +#include > #include > > #include "internal.h" > @@ -577,33 +578,42 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, > if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > > - if (!down_read_trylock(&shrinker_rwsem)) > - goto out; > - > - list_for_each_entry(shrinker, &shrinker_list, list) { > + rcu_read_lock(); > + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { > struct shrink_control sc = { > .gfp_mask = gfp_mask, > .nid = nid, > .memcg = memcg, > }; > > + if (!shrinker_try_get(shrinker)) > + continue; > + > + /* > + * We can safely unlock the RCU lock here since we already > + * hold the refcount of the shrinker. > + */ > + rcu_read_unlock(); > + > ret = do_shrink_slab(&sc, shrinker, priority); > if (ret == SHRINK_EMPTY) > ret = 0; > freed += ret; > + > /* > - * Bail out if someone want to register a new shrinker to > - * prevent the registration from being stalled for long periods > - * by parallel ongoing shrinking. > + * This shrinker may be deleted from shrinker_list and freed > + * after the shrinker_put() below, but this shrinker is still > + * used for the next traversal. So it is necessary to hold the > + * RCU lock first to prevent this shrinker from being freed, > + * which also ensures that the next shrinker that is traversed > + * will not be freed (even if it is deleted from shrinker_list > + * at the same time). > */ This comment really should be at the head of the function, describing the algorithm used within the function itself. i.e. how reference counts are used w.r.t. the rcu_read_lock() usage to guarantee existence of the shrinker and the validity of the list walk. I'm not going to remember all these little details when I look at this code in another 6 months time, and having to work it out from first principles every time I look at the code will waste of a lot of time... -Dave. -- Dave Chinner david@fromorbit.com