From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91D63C4167B for ; Wed, 6 Dec 2023 08:37:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17F1D6B0098; Wed, 6 Dec 2023 03:37:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 12E526B0099; Wed, 6 Dec 2023 03:37:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F10B46B009A; Wed, 6 Dec 2023 03:37:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DD2616B0098 for ; Wed, 6 Dec 2023 03:37:51 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B74271A00EC for ; Wed, 6 Dec 2023 08:37:51 +0000 (UTC) X-FDA: 81535740342.09.E61A0FC Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) by imf19.hostedemail.com (Postfix) with ESMTP id E9E611A000E for ; Wed, 6 Dec 2023 08:37:48 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DW7t26nM; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf19.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.167.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701851869; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qaJoVl1xcnDv8bDUAtsLOITCy9bE8N9edGoKxnrkX8Y=; b=Gg9EPxOgr9Mo5SwdSm8SO2Np0a9Luk4q6xV8zsPrGiGCoqLTdHxNdtf6zkzcP1A8VDmYC9 ibGnxocNpGg0UUmaYR5rLi8Cnlq4wrwL1o9nyMMp6wcn3B20rEUZrbsNpLTgtGm9xQ50qQ qVgOJKeFsI8jMNGqunjiSVzQfuY9RsU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=DW7t26nM; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf19.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.167.180 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701851869; a=rsa-sha256; cv=none; b=uRHumDHY8Lghv6L5NR8KjERpZzsjx6YCORij8Y9n45m40tcSg73L6OthH5wb6l6l/xi+so 5Ki1gabp9i3snZP7DBakTitUyYjg/U+lV6t6C9MeDoIMKu4r8g/Xfjg7O8qcz8c51I3Ew3 FpARVQ5MPK52fFiniyETiNmVAzpvxy8= Received: by mail-oi1-f180.google.com with SMTP id 5614622812f47-3b8bbc9904cso665863b6e.0 for ; Wed, 06 Dec 2023 00:37:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1701851866; x=1702456666; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=qaJoVl1xcnDv8bDUAtsLOITCy9bE8N9edGoKxnrkX8Y=; b=DW7t26nM6u6X/tOHh/abr71TFskBuTZOM0/EfBpMOC358N1b4A5S0goL7Y929xim4o 7l0SSR9vM/AK2gtIvs62y0n+bzouiXYqkhsgETqvT+htF1JtPTHHO5aHZgeWmIRzm6Yg SR+45vwRI4sMdktPyKibcHlgJwfkdKLA4+S4NoiBPlA+fWIX+eVcPtUvBYyD7tjNKCer G+u3jDG1G1pYqkNb+hMpUEyZAlr2xYvNZ5vzr1YLLlQff4TxzA/O+HDGSOY/o85VbWsn N+YCdsF42MrnUK9mYOArOnaz7H1TxTo5uWQIrPth1RGRQl8nqvBoIYTNULooynvjgbJZ cIow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701851866; x=1702456666; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qaJoVl1xcnDv8bDUAtsLOITCy9bE8N9edGoKxnrkX8Y=; b=OH2xDz4kzDFLl7LO+HLJXa0vHWgnYdAk5iFN3h0BCY4DAnHeGFj+zMw2S2sJPm/Jft bpk23w54m3rxOPmjXh5HTSFzyUIqGixEaDqmp4gwpk5qk/q0PZYa7EVrEBLqO6K6bjI2 R9pbVLqYfJVhEWgDp/7vM8CNNg3Ed+AjJHi3I0/ZYWF4PXnNwa9V0WVhCOinw5C08rlR IMpeNH6+cJJ3yx7zJhxykb1MKVvFXW4VHuWNoEZM+pywDdsASqYU2P5sboAYihfa09xl UcSBYXhr7TZCkyyHKyJs4RxYu0lCwSB+i9b0EewNVrJpf1OYUL5cGTzChaaHdLYeQEt/ 2Dbw== X-Gm-Message-State: AOJu0YyPuII+aHysq0WeG0vlWEjnJY9zgXdgfVtsB5KHdYLSqBEZv/Pn aEUc+7lIMIX+gLdZV/Q0L2/1ew== X-Google-Smtp-Source: AGHT+IFiQNAALnSXiskPUOUstWqNThnLWbU+mRiJzu4yAKenbwX/YTOEz22k0SFEWiI9vkQtni/C4Q== X-Received: by 2002:a05:6808:1897:b0:3b8:98f0:3c2 with SMTP id bi23-20020a056808189700b003b898f003c2mr1066637oib.4.1701851866156; Wed, 06 Dec 2023 00:37:46 -0800 (PST) Received: from [10.84.152.29] ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id e14-20020a62aa0e000000b006ce95e37a40sm602816pff.111.2023.12.06.00.37.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 06 Dec 2023 00:37:45 -0800 (PST) Message-ID: <7c170029-e847-49cf-8332-dc1388fc5c86@bytedance.com> Date: Wed, 6 Dec 2023 16:37:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless Content-Language: en-US To: Lai Jiangshan Cc: akpm@linux-foundation.org, paulmck@kernel.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org References: <20230911094444.68966-1-zhengqi.arch@bytedance.com> <20230911094444.68966-43-zhengqi.arch@bytedance.com> <93c36097-5266-4fc5-84a8-d770ab344361@bytedance.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: o3qrrusiykn47r3xj9rhz7gyjm1ew6ye X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E9E611A000E X-HE-Tag: 1701851868-850429 X-HE-Meta: U2FsdGVkX1/69Bfc7JCsworIpWjt7oPqes/nRU2Sx/LiGfBnE7tpy0/9J2IVfEtK+qHJruGNnOWT7D01xFCusbHemvN4iwHJPl+g9MuzMhLQ9pcQdNJXjJFDNGDJsz6IdCmS51CMDAJ18xT4E5GhTp3M55jGG7Gc118RK62TZaAnAp1az0dItJSqW0NPT4mMX79aPkU+ywUdmZyTTsuL8O2sfO+E+dP+yIyz/ZkepPCfpGXeOc/m6RR15Ef/7rdIwAzye+WIEu7niJgIbPTMV0cn1JmUFiZR3bDmjOKupTr7tbIpDU6eec6P7E/RCwmfaVjCkmhQcN8eYU70RDInCK7SJTD/diNBJ6wNa9oJD+ZyvbRSmHOHMLfLQoUko+Nrt26txTIz97d0+GJ4jJ7/LQomrQlrHZq0HkGyf1opL9AZUfrIBc6iHoWJrfibwwUCF4U3/jVdSJlVtZ7g6OVDbuJ7gnwhyRSlqEeSg+Ryts7h/vjYglhvYM/UNAi0znzrJjhC+yLo8yLLrd8Bwvn4o7pGivzN0tiiwtg/WUmiYkk7qLdpglMQ97Nd6jg69p4AJ/x9psEagO55sgYe4cePWaP4dhO2s7a7rk7twyrPu5hWgYCA+MWbLhDVYhRsFQoKe5KyLp4qNc/myo4Z1wV4Q1HNcY0+HtwMzZxThYcHKGDmPugp9hxUf1hV7d+vAeLBbvHpVMn4GyBYcIXS99esNfzPtAF3YcBRlaAemJkqT8GaBBnNrh9O76lpeXIcqPQzt8l1Y0ZzR1rwpa3DsnaoRvluif5ZaTR7OEnF/+fQkYdatBw8mcHfC+GzbO7JRhJynwmW8ydz3hRSZEJT9eo1vyZN2a8XdRoAyXT/GXNFeANluSlBJKX4UMGDNbjhujTukkSCoB83kRbgNTlKwFyn6KAT6zH3lkjYMqeB/i0sa699IclqPcjS8CnQFh6xIbFXnLNnZBkUo9bREarYnx1 LaPMmMWS QWy5/DsLWIH+xVlLymxQR8fxiDI3izmvzdfcmbzeji7eYHE45F8e8Q047Kn7k568BI70tYsw8NhV0tB3RgU87sz4M2v+8scr1Q9ZD/elXu++k3O7aQdYhM6PkMHU8ykntBEKK37mMZDONNPJJrYnBzDoYo6foDYRJb/cHD7/wFOauf7XzHYNepDxLUs2Irf7i1nHrXmv3/eTAQck5oiD7YPSqTKcdrblOCgDRGsQHwK1HWZQc/OgmTo+BeoSltKFpH/AInhtylCTDyj2OsiGeFHswAcnojsQtN5i1YPG3CDgQz9IQnKnZTW4IAsqS//5+o5vVTw5tNiw0Y3JEEIGt67OLRRZDPvHOiE09uyf+FyY5dMWHNf0FX0hTpUT1tCvphVZQVV6YfJid/S/cRjJbAX/AfYz+RVIxVlshA1cKNm3ZsaGw+2f9vAHRBLtBx20J6FGZxqBZtF6Sv8DCDMpgu2gKnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On 2023/12/6 16:23, Lai Jiangshan wrote: > On Wed, Dec 6, 2023 at 3:55 PM Qi Zheng wrote: >> >> Hi, >> >> On 2023/12/6 15:47, Lai Jiangshan wrote: >>> On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng wrote: >>> >>>> - if (!down_read_trylock(&shrinker_rwsem)) >>>> - goto out; >>>> - >>>> - list_for_each_entry(shrinker, &shrinker_list, list) { >>>> + /* >>>> + * lockless algorithm of global shrink. >>>> + * >>>> + * In the unregistration setp, the shrinker will be freed asynchronously >>>> + * via RCU after its refcount reaches 0. So both rcu_read_lock() and >>>> + * shrinker_try_get() can be used to ensure the existence of the shrinker. >>>> + * >>>> + * So in the global shrink: >>>> + * step 1: use rcu_read_lock() to guarantee existence of the shrinker >>>> + * and the validity of the shrinker_list walk. >>>> + * step 2: use shrinker_try_get() to try get the refcount, if successful, >>>> + * then the existence of the shrinker can also be guaranteed, >>>> + * so we can release the RCU lock to do do_shrink_slab() that >>>> + * may sleep. >>>> + * step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(), >>>> + * which ensures that neither this shrinker nor the next shrinker >>>> + * will be freed in the next traversal operation. >>> >>> Hello, Qi, Andrew, Paul, >>> >>> I wonder know how RCU can ensure the lifespan of the next shrinker. >>> it seems it is diverged from the common pattern usage of RCU+reference. >>> >>> cpu1: >>> rcu_read_lock(); >>> shrinker_try_get(this_shrinker); >>> rcu_read_unlock(); >>> cpu2: shrinker_free(this_shrinker); >>> cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker >>> cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next >>> cpu2: since this_shrinker has been removed first. >> >> No, this_shrinker will not be removed from the shrinker_list until the >> last refcount is released. See below: >> >>> rcu_read_lock(); >>> shrinker_put(this_shrinker); >> >> CPU 1 CPU 2 >> >> --> if (refcount_dec_and_test(&shrinker->refcount)) >> complete(&shrinker->done); >> >> wait_for_completion(&shrinker->done); >> list_del_rcu(&shrinker->list); > > since shrinker will not be removed from the shrinker_list until the > last refcount is released. > > Is it possible that shrinker_free() can be starved by continuous > scanners getting and putting the refcount? I actually considered this case, but the probability of this happening was low, so I discarded the relevant code (v2 --> v3). If this problem really occurs in a production environment, we can fix it, like below: diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 1a00be90d93a..e5ebbbf1414f 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -88,6 +88,7 @@ struct shrinker { long batch; /* reclaim batch size, 0 = default */ int seeks; /* seeks to recreate an obj */ unsigned flags; + bool registered; /* * The reference count of this shrinker. Registered shrinker have an @@ -138,7 +139,8 @@ void shrinker_free(struct shrinker *shrinker); static inline bool shrinker_try_get(struct shrinker *shrinker) { - return refcount_inc_not_zero(&shrinker->refcount); + return READ_ONCE(shrinker->registered) && + refcount_inc_not_zero(&shrinker->refcount); } static inline void shrinker_put(struct shrinker *shrinker) diff --git a/mm/shrinker.c b/mm/shrinker.c index dd91eab43ed3..9b8881d178c6 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -753,6 +753,7 @@ void shrinker_register(struct shrinker *shrinker) * shrinker_try_get(). */ refcount_set(&shrinker->refcount, 1); + WRITE_ONCE(shrinker->registered, true); } EXPORT_SYMBOL_GPL(shrinker_register); @@ -773,6 +774,7 @@ void shrinker_free(struct shrinker *shrinker) return; if (shrinker->flags & SHRINKER_REGISTERED) { + WRITE_ONCE(shrinker->registered, false); /* drop the initial refcount */ shrinker_put(shrinker); /* Thanks, Qi > > Thanks > Lai > > >> >>> travel to the freed next_shrinker. >>> >>> a quick simple fix: >>> >>> // called with other references other than RCU (i.e. refcount) >>> static inline rcu_list_deleted(struct list_head *entry) >>> { >>> // something like this: >>> return entry->prev == LIST_POISON2; >>> } >>> >>> // in the loop >>> if (rcu_list_deleted(&shrinker->list)) { >>> shrinker_put(shrinker); >>> goto restart; >>> } >>> rcu_read_lock(); >>> shrinker_put(shrinker); >>> >>> Thanks >>> Lai >>> >>>> + * step 4: do shrinker_put() paired with step 2 to put the refcount, >>>> + * if the refcount reaches 0, then wake up the waiter in >>>> + * shrinker_free() by calling complete(). >>>> + */ >>>> + rcu_read_lock(); >>>> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { >>>> struct shrink_control sc = { >>>> .gfp_mask = gfp_mask, >>>> .nid = nid, >>>> .memcg = memcg, >>>> }; >>>> >>>> + if (!shrinker_try_get(shrinker)) >>>> + continue; >>>> + >>>> + rcu_read_unlock(); >>>> + >>>> ret = do_shrink_slab(&sc, shrinker, priority); >>>> if (ret == SHRINK_EMPTY) >>>> ret = 0; >>>> freed += ret; >>>> - /* >>>> - * Bail out if someone want to register a new shrinker to >>>> - * prevent the registration from being stalled for long periods >>>> - * by parallel ongoing shrinking. >>>> - */ >>>> - if (rwsem_is_contended(&shrinker_rwsem)) { >>>> - freed = freed ? : 1; >>>> - break; >>>> - } >>>> + >>>> + rcu_read_lock(); >>>> + shrinker_put(shrinker); >>>> } >>>>