From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE4BFC61DA3 for ; Tue, 21 Feb 2023 21:29:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 403F36B0074; Tue, 21 Feb 2023 16:29:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B37D6B0075; Tue, 21 Feb 2023 16:29:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27C8A6B007B; Tue, 21 Feb 2023 16:29:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 154C86B0074 for ; Tue, 21 Feb 2023 16:29:51 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A330E1403AF for ; Tue, 21 Feb 2023 21:29:50 +0000 (UTC) X-FDA: 80492591340.19.9935E5C Received: from forward502b.mail.yandex.net (forward502b.mail.yandex.net [178.154.239.146]) by imf23.hostedemail.com (Postfix) with ESMTP id 4101C140009 for ; Tue, 21 Feb 2023 21:29:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=ya.ru header.s=mail header.b=POeu30lq; dmarc=pass (policy=none) header.from=ya.ru; spf=pass (imf23.hostedemail.com: domain of tkhai@ya.ru designates 178.154.239.146 as permitted sender) smtp.mailfrom=tkhai@ya.ru ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677014989; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jhdQEoeR7T+njwKzfUcMQWP7jPNishtRbKu6wgeeUC8=; b=GBRteDfPiEiZ4SNRS3NbMsYItmtd0nypc4/NlxBMsNO8jnWfnW5YVCn9Z1a+tTYRcXqw3k v0I12N72hzJUHwd6GcQJGzpsqV0dyAFW1zCECuWDfwnyK/HmNLnkpqlvOAYYKbC+ZGL2LY IQnamWjr3rJvFELzYmGzRQ7+tbIxeiM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=ya.ru header.s=mail header.b=POeu30lq; dmarc=pass (policy=none) header.from=ya.ru; spf=pass (imf23.hostedemail.com: domain of tkhai@ya.ru designates 178.154.239.146 as permitted sender) smtp.mailfrom=tkhai@ya.ru ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677014989; a=rsa-sha256; cv=none; b=CERyyhq+70t89J1f5LEN5BsBrVieAAKtXJo/2ItHfxdfbGGdCHuTltlSHEm2SsSCCXlxS1 vPJbR7bWlDcjHMAtZk2g1wKFbXSuy6XyFXnZAcrwBPjEL76dcybJQFZJPHh5JH1ymuf6y9 eI8BR34A+LYSXO97w+85ftDbNtMDN10= Received: from sas8-d2832c3b6ed7.qloud-c.yandex.net (sas8-d2832c3b6ed7.qloud-c.yandex.net [IPv6:2a02:6b8:c1b:2a09:0:640:d283:2c3b]) by forward502b.mail.yandex.net (Yandex) with ESMTP id 2C77C5F18D; Wed, 22 Feb 2023 00:29:45 +0300 (MSK) Received: by sas8-d2832c3b6ed7.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id nS0vS3maT8c1-LA2Hqo4l; Wed, 22 Feb 2023 00:29:44 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ya.ru; s=mail; t=1677014984; bh=jhdQEoeR7T+njwKzfUcMQWP7jPNishtRbKu6wgeeUC8=; h=From:In-Reply-To:Cc:Date:References:To:Subject:Message-ID; b=POeu30lqyqW0/IdThHWzC+wbzHHj3e8CVa1AAS+rcugsx4q3Mtivy3P2ro649qxJU WULUvu/b9lBm0uLqETLWtSPNkaDsCXULB5E341VTezw1qAUbOi7fb7veZ6hRGfmOAU 5TS6JsCk7QJxU5zPpAhzIV+jC7Qphzmxr82cDB7E= Message-ID: <8733cb3c-7a6a-33c4-b84e-4fb981dae765@ya.ru> Date: Wed, 22 Feb 2023 00:28:49 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH 2/5] mm: vmscan: make memcg slab shrink lockless To: Qi Zheng , akpm@linux-foundation.org, hannes@cmpxchg.org, shakeelb@google.com, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, david@redhat.com, shy828301@gmail.com Cc: sultan@kerneltoast.com, dave@stgolabs.net, penguin-kernel@I-love.SAKURA.ne.jp, paulmck@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230220091637.64865-1-zhengqi.arch@bytedance.com> <20230220091637.64865-3-zhengqi.arch@bytedance.com> Content-Language: en-US From: Kirill Tkhai In-Reply-To: <20230220091637.64865-3-zhengqi.arch@bytedance.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4101C140009 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: h6hcruieb354x8g89hped1pm3su9cyh3 X-HE-Tag: 1677014987-63369 X-HE-Meta: U2FsdGVkX18RBZdP0TXr4b1TCG6yk0fQEa7/3RDL+KOKW87ThF0tYJeeI3SXlgblgMECVMeQLznL0MZ02C4QAN6YnwBaRt4aZ1688Lwn+DKWpxg9TSgHnoWRGRZNN730HQVg/1xCi4WagmEGijG8XEKoVicNN1kGkuhXse/3UWMLncF5LirCuSsxDe4hI7K2pGn54sb8s2q8++/oOd5tgHrWTy/9vl3VkvaCe15OWeV/bvA8yHFtZB7Vo8HS/Z90WvMfhx9RSCFSqEwXAcT16L31HA5A5UenTUMWOx04TLvpr3GF43Gpjt3B1r/s182BI7MLfA7cv/amiDm9crs0kj6oxmqOqpekBPAUjgEOJ8Ar07ifQAIkklDkBl1CIWeMHJFmB01tdtbeXK5P0JLqUFYML3rO5U+hdvgCiFbIuPZtHQeIe66FufcfGjG7s+8e+iC+3HT1ME5ovJwKLv7nNS+vVPq7AeV4j5rG4bOqANbFoltRXwAyGYMnoxRMZE0NSr+GnudqXDxqnVFX4UjqhQekxOC8tCQWblFDTIr2QIZU/iPO4AycUsNtH/f8F8Xbj4ceUskowpLlKSAjhlf/V7zQVvl2MyLHm6bEEaWvHRmaHzaJjdUODtCM4KJVMHZzJB51klcVhVRJ9WJ0r1Cn6wh9bUAj9guOAhEZH1lzeajTBTwZOoWennEtHxGpFayInxHAVIebJBv9z6nIKq3HFsYaMqQLoB+TPOFQ9zIm8FBFvDZDXTrhwhCxm15u1A3nqyJgSZ1fBpV4JsuUMIJy2JHK77ZtA1cFP7MN8O5R7CFF9hlAhZT73+aTuUHyyx7zYhPlYjT/d/zzqTz6gV6uoJIIu042kLTB0veQfWyWB2c1/XnXJohgqxwurvvzLubKna43UiCPvo4NDXhd6MtAYcQYstBrDZ0HeN4NWOlEwWW6LxKpvyeukXJHR9sTNeai0uXOGAJgTMHUgfuCDnd orbDOZvB 5WKJz77o+bVpN3jTfpRzenPq8vTJeh9oa84VnZNaGo+tunXXsBQbympzPGFb1B1zfohixkO66ynM/gkkY7qsMbfFnBOjFZafCqYrHeR8BR0NaUl+wAVmFdiebODYj2XDQtxtlhyZ+MoxrbYbTh2A/lryl1vsk+7ca6z3ynJ8QNq/hwlP52eTruIgQGh5MiBhHImLGpT6Z5ubLwBAYXRdO0FCwWSJoE/rAgkgXIMC+1u5NS2lr0naf7vktaohF2ccmoGHNWvHv+l4MI740so8SCMHvHWxG99EiVjnHGJXIc5Gn/SK/U/47d7S70UMwDK5YR96KE7Mzrp9yQMS6njlqitUMNY/y0qe2I7/m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.02.2023 12:16, Qi Zheng wrote: > Like global slab shrink, since commit 1cd0bd06093c > ("rcu: Remove CONFIG_SRCU"), it's time to use SRCU > to protect readers who previously held shrinker_rwsem. > > We can test with the following script: > > ``` > DIR="/root/shrinker/memcg/mnt" > > do_create() > { > mkdir /sys/fs/cgroup/memory/test > echo 200M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes > for i in `seq 0 $1`; > do > mkdir /sys/fs/cgroup/memory/test/$i; > echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; > mkdir -p $DIR/$i; > done > } > > do_mount() > { > for i in `seq $1 $2`; > do > mount -t tmpfs $i $DIR/$i; > done > } > > do_touch() > { > for i in `seq $1 $2`; > do > echo $$ > /sys/fs/cgroup/memory/test/$i/cgroup.procs; > dd if=/dev/zero of=$DIR/$i/file$i bs=1M count=1 & > done > } > > do_create 2000 > do_mount 0 2000 > do_touch 0 1000 > ``` > > Before applying: > > 46.60% [kernel] [k] down_read_trylock > 18.70% [kernel] [k] up_read > 15.44% [kernel] [k] shrink_slab > 4.37% [kernel] [k] _find_next_bit > 2.75% [kernel] [k] xa_load > 2.07% [kernel] [k] idr_find > 1.73% [kernel] [k] do_shrink_slab > 1.42% [kernel] [k] shrink_lruvec > 0.74% [kernel] [k] shrink_node > 0.60% [kernel] [k] list_lru_count_one > > After applying: > > 19.53% [kernel] [k] _find_next_bit > 14.63% [kernel] [k] do_shrink_slab > 14.58% [kernel] [k] shrink_slab > 11.83% [kernel] [k] shrink_lruvec > 9.33% [kernel] [k] __blk_flush_plug > 6.67% [kernel] [k] mem_cgroup_iter > 3.73% [kernel] [k] list_lru_count_one > 2.43% [kernel] [k] shrink_node > 1.96% [kernel] [k] super_cache_count > 1.78% [kernel] [k] __rcu_read_unlock > 1.38% [kernel] [k] __srcu_read_lock > 1.30% [kernel] [k] xas_descend > > We can see that the readers is no longer blocked. > > Signed-off-by: Qi Zheng > --- > mm/vmscan.c | 56 ++++++++++++++++++++++++++++++----------------------- > 1 file changed, 32 insertions(+), 24 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 95a3d6ddc6c1..dc47396ecd0e 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -57,6 +57,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -221,8 +222,21 @@ static inline int shrinker_defer_size(int nr_items) > static struct shrinker_info *shrinker_info_protected(struct mem_cgroup *memcg, > int nid) > { > - return rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_info, > - lockdep_is_held(&shrinker_rwsem)); > + return srcu_dereference_check(memcg->nodeinfo[nid]->shrinker_info, > + &shrinker_srcu, > + lockdep_is_held(&shrinker_rwsem)); > +} > + > +static struct shrinker_info *shrinker_info_srcu(struct mem_cgroup *memcg, > + int nid) > +{ > + return srcu_dereference(memcg->nodeinfo[nid]->shrinker_info, > + &shrinker_srcu); > +} > + > +static void free_shrinker_info_rcu(struct rcu_head *head) > +{ > + kvfree(container_of(head, struct shrinker_info, rcu)); > } > > static int expand_one_shrinker_info(struct mem_cgroup *memcg, > @@ -257,7 +271,7 @@ static int expand_one_shrinker_info(struct mem_cgroup *memcg, > defer_size - old_defer_size); > > rcu_assign_pointer(pn->shrinker_info, new); > - kvfree_rcu(old, rcu); > + call_srcu(&shrinker_srcu, &old->rcu, free_shrinker_info_rcu); > } > > return 0; > @@ -350,13 +364,14 @@ void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id) > { > if (shrinker_id >= 0 && memcg && !mem_cgroup_is_root(memcg)) { > struct shrinker_info *info; > + int srcu_idx; > > - rcu_read_lock(); > - info = rcu_dereference(memcg->nodeinfo[nid]->shrinker_info); > + srcu_idx = srcu_read_lock(&shrinker_srcu); > + info = shrinker_info_srcu(memcg, nid); > /* Pairs with smp mb in shrink_slab() */ > smp_mb__before_atomic(); > set_bit(shrinker_id, info->map); > - rcu_read_unlock(); > + srcu_read_unlock(&shrinker_srcu, srcu_idx); > } > } > > @@ -370,7 +385,6 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker) > return -ENOSYS; > > down_write(&shrinker_rwsem); > - /* This may call shrinker, so it must use down_read_trylock() */ > id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL); > if (id < 0) > goto unlock; > @@ -404,7 +418,7 @@ static long xchg_nr_deferred_memcg(int nid, struct shrinker *shrinker, > { > struct shrinker_info *info; > > - info = shrinker_info_protected(memcg, nid); > + info = shrinker_info_srcu(memcg, nid); > return atomic_long_xchg(&info->nr_deferred[shrinker->id], 0); > } > > @@ -413,13 +427,13 @@ static long add_nr_deferred_memcg(long nr, int nid, struct shrinker *shrinker, > { > struct shrinker_info *info; > > - info = shrinker_info_protected(memcg, nid); > + info = shrinker_info_srcu(memcg, nid); > return atomic_long_add_return(nr, &info->nr_deferred[shrinker->id]); > } > > void reparent_shrinker_deferred(struct mem_cgroup *memcg) > { > - int i, nid; > + int i, nid, srcu_idx; > long nr; > struct mem_cgroup *parent; > struct shrinker_info *child_info, *parent_info; > @@ -429,16 +443,16 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg) > parent = root_mem_cgroup; > > /* Prevent from concurrent shrinker_info expand */ > - down_read(&shrinker_rwsem); > + srcu_idx = srcu_read_lock(&shrinker_srcu); Don't we still have to be protected against parallel expand_one_shrinker_info()? It looks like parent->nodeinfo[nid]->shrinker_info pointer may be changed in expand* right after we've dereferenced it here. > for_each_node(nid) { > - child_info = shrinker_info_protected(memcg, nid); > - parent_info = shrinker_info_protected(parent, nid); > + child_info = shrinker_info_srcu(memcg, nid); > + parent_info = shrinker_info_srcu(parent, nid); > for (i = 0; i < shrinker_nr_max; i++) { > nr = atomic_long_read(&child_info->nr_deferred[i]); > atomic_long_add(nr, &parent_info->nr_deferred[i]); > } > } > - up_read(&shrinker_rwsem); > + srcu_read_unlock(&shrinker_srcu, srcu_idx); > } > > static bool cgroup_reclaim(struct scan_control *sc) > @@ -891,15 +905,14 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, > { > struct shrinker_info *info; > unsigned long ret, freed = 0; > + int srcu_idx; > int i; > > if (!mem_cgroup_online(memcg)) > return 0; > > - if (!down_read_trylock(&shrinker_rwsem)) > - return 0; > - > - info = shrinker_info_protected(memcg, nid); > + srcu_idx = srcu_read_lock(&shrinker_srcu); > + info = shrinker_info_srcu(memcg, nid); > if (unlikely(!info)) > goto unlock; > > @@ -949,14 +962,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, > set_shrinker_bit(memcg, nid, i); > } > freed += ret; > - > - if (rwsem_is_contended(&shrinker_rwsem)) { > - freed = freed ? : 1; > - break; > - } > } > unlock: > - up_read(&shrinker_rwsem); > + srcu_read_unlock(&shrinker_srcu, srcu_idx); > return freed; > } > #else /* CONFIG_MEMCG */