From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1724C001DB for ; Tue, 8 Aug 2023 07:23:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1ECFB6B0071; Tue, 8 Aug 2023 03:23:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 19CC66B0074; Tue, 8 Aug 2023 03:23:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 064328D0001; Tue, 8 Aug 2023 03:23:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EB3D66B0071 for ; Tue, 8 Aug 2023 03:23:09 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B189712039F for ; Tue, 8 Aug 2023 07:23:09 +0000 (UTC) X-FDA: 81100096098.25.4F9849A Received: from mail-il1-f176.google.com (mail-il1-f176.google.com [209.85.166.176]) by imf09.hostedemail.com (Postfix) with ESMTP id 083F514001E for ; Tue, 8 Aug 2023 07:23:06 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Mnl0X9gA; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf09.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.166.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691479387; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=if5VWB18nVWwRp6UCb6aPEiH47g9I/o9CGJnGCggs6I=; b=knQCLTXcZO0LkEqQzPayjPHQokIpe+fOLpJWfpnEc1TlCmJ04AjoiVGoOB6/O3OGJ1S/1g LXbEp3UWGsb3TEkvVqQbNiw8QpWX+V8Rn/8cDHTNhoznY2+IhgZYUbHuUB+a8yfg1bSsUd 1ik6cnsAYK07/b56lmYpYCTsMzzzn34= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Mnl0X9gA; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf09.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.166.176 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691479387; a=rsa-sha256; cv=none; b=SQRsOI7dEAOWRAPUmi3QkwCp52r0H94YVN+/jzEjJH0Yo+x6//AyPQUDOK8rf0WMt0QWix S7J3QAvoKuKWq4xXRGRSLXbrLmVqQnhag+TSi8vbURbt8B1cDoOCCoPqWn071knkBwqJ6o dmjKNUZYenw4oqoxqof6lq/+/J2Vnuc= Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-34914064ea9so5305645ab.1 for ; Tue, 08 Aug 2023 00:23:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691479386; x=1692084186; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=if5VWB18nVWwRp6UCb6aPEiH47g9I/o9CGJnGCggs6I=; b=Mnl0X9gAUC+TX6mJR8mVBxy4/YY9ZnsIqqbI3Lfyi8wyHvx1vCgiMQzro/5+MvPtg7 rXoES6uNYyqB6hAadoK+JhiLrRJcV2a1SauK74Fo+QgskggrpKST7Q5+m6teYvR8e4tA dcZeOtaDf0Rd9+kc+LJz513/LAkTal/IYIX9DnJ4z84K6HpsLIJ90Jamw7qS1U2+fb2T E2X5y4E7mVpGf6lSS7nOAzl3gvR6+Jw0hw4oEJ+kBhol+PlGk5o70522UKHLTkm0KAg0 ZhNaOymrQBqWDvmGRTs6Q0O1Bt27v7iPkeiHC47LN5BEMRNJZINNo5Ho/St2M7DsfprN 6jkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691479386; x=1692084186; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=if5VWB18nVWwRp6UCb6aPEiH47g9I/o9CGJnGCggs6I=; b=EtgrJ2OqfLqGzzhgJNcLtNNGhsDZiBQBBchAjd7ZymSBkAf0w9tRPQZRBE9iaRPD5U CxNmjohXuwS9Bu+ByJ33QTeWbz37IrynvrXdEfWszGLPaXJWeCGG+POq6sCS1rJAB3Wb TNnkNKaHeYc/qYuJ3/gLXzn/g9o5bg9by0gc4gkcdPq4FarG579vhyhr1aF3msCP2lJ8 lb9joU1riRop0TwezUrKls5hO5C/Xajb1LnRk6PnRkgRDmvlSlXZG3HfLe5aaHP+a5Ip zo9duKKhokX2evil4OZ/+0N0pbMARMEOrr++6NVMKf85kjZKLKU8suaNHMItauJIt0d2 uoQQ== X-Gm-Message-State: AOJu0Yz+3rOGYmUu5AAjw023KdWc4NP4sb4k4DD59vLi6VIVrhE0xsJA 1sqDZoKy9A0Gmhy6C3yE0RrbeQ== X-Google-Smtp-Source: AGHT+IGrIAD/R1XG8iT7hZdid4m31qfruz9y309af9L0+sAELwy0z1r+YzLKvbiH8nyqlY1j6mrYWw== X-Received: by 2002:a92:2802:0:b0:349:7518:4877 with SMTP id l2-20020a922802000000b0034975184877mr3215795ilf.0.1691479385787; Tue, 08 Aug 2023 00:23:05 -0700 (PDT) Received: from [10.70.252.135] ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id s15-20020a63af4f000000b00564ca424f79sm4948391pgo.48.2023.08.08.00.22.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 08 Aug 2023 00:23:05 -0700 (PDT) Message-ID: <0fdb926c-0d61-d81f-1a52-4ef634b51804@bytedance.com> Date: Tue, 8 Aug 2023 15:22:51 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless Content-Language: en-US To: Dave Chinner Cc: akpm@linux-foundation.org, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, simon.horman@corigine.com, dlemoal@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org References: <20230807110936.21819-1-zhengqi.arch@bytedance.com> <20230807110936.21819-46-zhengqi.arch@bytedance.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: rubtatga6cy81uceaxxjq14h13n465op X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 083F514001E X-HE-Tag: 1691479386-936430 X-HE-Meta: U2FsdGVkX18b0lLlsZGQ3nrLBWohU65GV2qfxvrqaU+eodSlHMXCYabVqEJVZLKSVlfFVIAQTQ3bFDNGIDUXCwmzSB7JZoBHsQhhvTwatrc207guDaiUWebZMASCy9K8NmB/zc/8EQlTUQ2fqHItNphOO17tH87p4c6WLDLHGwCTwrIvF6rTg+ZdaXmxYF4795GlqzR7k8GS80A/ccwb+CHCJGo4MEtXTnFv0vwrwLFyE/vXc9goJyMTmpgyKdN5IKc+ct9rk21OzJWjfi1m1kVAgTvTBGh4gDIa850HL2DRjmIdyPE8m1Wvl+LvZzvNeUQuCB6UQeihf/Jy1HF2phTHPyzzOhZjZ2uoebbM7wwhfWnTcZC6Ol5VjjY7mTZYSyBS4WyuhX+bMp3mNhVgeCliq69mCP6kFv1hquJkTCxye46/n7WOaka/fH/Tg7otGNGRh/5ad96gYfxi+n00txAbiiyoOAIm14YGY/6A7B8Cev6Qi0EtONTS1w4asDJVNCyq+BvGFfmW/uPJLI2Rvk4nkU6e+x82oK5rPLTIvEeRcyQfjZMYgX+ijrbop+QcNKtnrt33eLpfzqmofwD5m33dkjhkGkcmd/nPgOsyTKpbSREpYJQd+22XWj+CcACnAdoMHILk9Sz24QdZCWEnrCRAirEvCsPWNcqaZFj4JZaUSOrOu41Yrwwihkj7geFOJNKuD2bNzLe9Q/xKqd9irMfYT5cv+B4qrs8xdtJiDOLbbwEJUqEnxMHTKD5g3wbkKgdHKSXXJofq/OFeRfWU00am43oYBScvrdvCMWmy3Ca5TMajbNyKizslvcx6DaAzM5x9mocT4cJd2OdcmsINO0haMnvLkUbcJmBD0jj+yeG4IdvVCChV5BTrKB5ulYHYoq6JLQB4c3gMG+G6a8i5BHxooarYN5r51pqX3MjItXCWUUOdCBQVhiRFBf2CkAz/YaTMPvS3dDj6OxmQbD6 nqI5APga ep9agK1pRiCKxbor93pEUvaPmtvDNpjYh5ci/sf3IPq/3VgxVlXTeGkG0W0RglcGOY4RqD1941e0Zdb1Olh/cNqVYsJFbX1zPc1F8igWGsPR71rEFW4jPatNwX40KP8C5m/awV4U9tSQduLtBMOvVuwoXwL9KXiOvYS7UA7JAHVuRkj+smIBhaAk7X2FQiI6Y0MBYJFmffTxz+sguaPJuemRfKmk8Z6XLuC80sNKT9DC5EuKPyHFRvLwsBk0R2cpy/cdZbu44CpyrYx3G6f9rYEc3VaD+BOK1neDiRlbvpYA3ojMr5TIA4yq1g398pII9g86S2zKQclLaNfSeR63wqI8zOVskrT7e8nqhW1mj91j8vCV5jolc88pB4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Dave, On 2023/8/8 10:24, Dave Chinner wrote: > On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote: >> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h >> index eb342994675a..f06225f18531 100644 >> --- a/include/linux/shrinker.h >> +++ b/include/linux/shrinker.h >> @@ -4,6 +4,8 @@ >> >> #include >> #include >> +#include >> +#include >> >> #define SHRINKER_UNIT_BITS BITS_PER_LONG >> >> @@ -87,6 +89,10 @@ struct shrinker { >> int seeks; /* seeks to recreate an obj */ >> unsigned flags; >> >> + refcount_t refcount; >> + struct completion done; >> + struct rcu_head rcu; > > Documentation, please. What does the refcount protect, what does the > completion provide, etc. How about the following: /* * reference count of this shrinker, holding this can guarantee * that the shrinker will not be released. */ refcount_t refcount; /* * Wait for shrinker::refcount to reach 0, that is, no shrinker * is running or will run again. */ struct completion done; > >> + >> void *private_data; >> >> /* These are for internal use */ >> @@ -120,6 +126,17 @@ struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); >> void shrinker_register(struct shrinker *shrinker); >> void shrinker_free(struct shrinker *shrinker); >> >> +static inline bool shrinker_try_get(struct shrinker *shrinker) >> +{ >> + return refcount_inc_not_zero(&shrinker->refcount); >> +} >> + >> +static inline void shrinker_put(struct shrinker *shrinker) >> +{ >> + if (refcount_dec_and_test(&shrinker->refcount)) >> + complete(&shrinker->done); >> +} >> + >> #ifdef CONFIG_SHRINKER_DEBUG >> extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker, >> const char *fmt, ...); >> diff --git a/mm/shrinker.c b/mm/shrinker.c >> index 1911c06b8af5..d318f5621862 100644 >> --- a/mm/shrinker.c >> +++ b/mm/shrinker.c >> @@ -2,6 +2,7 @@ >> #include >> #include >> #include >> +#include >> #include >> >> #include "internal.h" >> @@ -577,33 +578,42 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, >> if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) >> return shrink_slab_memcg(gfp_mask, nid, memcg, priority); >> >> - if (!down_read_trylock(&shrinker_rwsem)) >> - goto out; >> - >> - list_for_each_entry(shrinker, &shrinker_list, list) { >> + rcu_read_lock(); >> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { >> struct shrink_control sc = { >> .gfp_mask = gfp_mask, >> .nid = nid, >> .memcg = memcg, >> }; >> >> + if (!shrinker_try_get(shrinker)) >> + continue; >> + >> + /* >> + * We can safely unlock the RCU lock here since we already >> + * hold the refcount of the shrinker. >> + */ >> + rcu_read_unlock(); >> + >> ret = do_shrink_slab(&sc, shrinker, priority); >> if (ret == SHRINK_EMPTY) >> ret = 0; >> freed += ret; >> + >> /* >> - * Bail out if someone want to register a new shrinker to >> - * prevent the registration from being stalled for long periods >> - * by parallel ongoing shrinking. >> + * This shrinker may be deleted from shrinker_list and freed >> + * after the shrinker_put() below, but this shrinker is still >> + * used for the next traversal. So it is necessary to hold the >> + * RCU lock first to prevent this shrinker from being freed, >> + * which also ensures that the next shrinker that is traversed >> + * will not be freed (even if it is deleted from shrinker_list >> + * at the same time). >> */ > > This needs to be moved to the head of the function, and document > the whole list walk, get, put and completion parts of the algorithm > that make it safe. There's more to this than "we hold a reference > count", especially the tricky "we might see the shrinker before it > is fully initialised" case.... How about moving these documents to before list_for_each_entry_rcu(), and then go to the head of shrink_slab_memcg() to explain the memcg slab shrink case. > > > ..... >> void shrinker_free(struct shrinker *shrinker) >> { >> struct dentry *debugfs_entry = NULL; >> @@ -686,9 +712,18 @@ void shrinker_free(struct shrinker *shrinker) >> if (!shrinker) >> return; >> >> + if (shrinker->flags & SHRINKER_REGISTERED) { >> + shrinker_put(shrinker); >> + wait_for_completion(&shrinker->done); >> + } > > Needs a comment explaining why we need to wait here... /* * Wait for all lookups of the shrinker to complete, after that, no * shrinker is running or will run again, then we can safely free * the structure where the shrinker is located, such as super_block * etc. */ >> + >> down_write(&shrinker_rwsem); >> if (shrinker->flags & SHRINKER_REGISTERED) { >> - list_del(&shrinker->list); >> + /* >> + * Lookups on the shrinker are over and will fail in the future, >> + * so we can now remove it from the lists and free it. >> + */ > > .... rather than here after the wait has been done and provided the > guarantee that no shrinker is running or will run again... With the above comment, how about simplifying the comment here to the following: /* * Now we can safely remove it from the shrinker_list and free it. */ Thanks, Qi > > -Dave.