From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FA09C4167B for ; Wed, 6 Dec 2023 08:23:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE03B6B008A; Wed, 6 Dec 2023 03:23:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E917B6B008C; Wed, 6 Dec 2023 03:23:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D81196B0092; Wed, 6 Dec 2023 03:23:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C9C2D6B008A for ; Wed, 6 Dec 2023 03:23:38 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A87FEA013C for ; Wed, 6 Dec 2023 08:23:38 +0000 (UTC) X-FDA: 81535704516.23.1441E17 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf16.hostedemail.com (Postfix) with ESMTP id F1166180013 for ; Wed, 6 Dec 2023 08:23:36 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=V2ZM6kWt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of jiangshanlai@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=jiangshanlai@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701851017; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S0LwbusvQAa95t36P39uX2HIsyGAgVBDxOy6DS+pmFA=; b=G5wG8EuqL9wWRNXCUoEmiPkIPtsGfShMHR4QGYH24c3OfIHMth7DY6sKo5wYUI9XoWsSsc 5Tk58fjTIYlBcxAmforkwgXbMWMXV0bFsdD++0NoAi+dns5v0CAH8f3v7222y6NBA0WTc1 BwQ6gk24QS84Oj3Yv9xbyE7GaLSyVEY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=V2ZM6kWt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of jiangshanlai@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=jiangshanlai@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701851017; a=rsa-sha256; cv=none; b=QtPXU37J5uuZnx01LdeWXIMAhFQ6h6144RzKU8Vr344ruqJxiKMISng9hy7AKqZ7x+/Wa5 8CKzIO2CZYb1X/hIBZ5wjddtXBbkbNhxRHZwqQe2s3uo/UDdQ7Cf4VVqqk01esFs8Oa+Ec Q7rUsnwqo3o8drExEB1LHlQFbq6KhfM= Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6ce4d4c5ea2so2487089b3a.0 for ; Wed, 06 Dec 2023 00:23:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701851016; x=1702455816; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=S0LwbusvQAa95t36P39uX2HIsyGAgVBDxOy6DS+pmFA=; b=V2ZM6kWtYER6nKS3d/FFI2DBs7DV0QUBN04LiW2fOYqpYsuy7+hg4tcAI9UcdE3TD8 N2ncPsLHLw8pOOFYprCWGNNKJ7mxnrW9iElnFuChPK9C5MLgKLoe8dC+Vaor78PGaOdW egL/SGAATGjXP1ZI0TWg6OAvdtmp9ESOSjl8v368aXifGautQ0p3ZCiYclxvMD1p7ZAl OSL0gKfvFRJDc0XpCk3XITC6mt9pkfaFBU7gJrrKIVExmrObCfUuC7172iD3F19YTH9v 9vo65y47iOUipWFPPoomoZh057T9FNvga9BCQ57wJULGy4AMrjdg0gy8F92ce2Hbjqa7 C5DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701851016; x=1702455816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S0LwbusvQAa95t36P39uX2HIsyGAgVBDxOy6DS+pmFA=; b=o0toBgrO9/iZTc2+Ym94woVzfE8evqGZk7yiE+zyEegBC45eRkCcpNx40ZWaKinLC4 ET4n50iLic6DG1j0v83k0zkmCZquPgU87mcWE9WPaobQXLfjhXTxugk/q2Cc79gPuIm7 sZf6GK2sTr5NdwRkKG0KLO3ugyOfnkINYWk/7qxlWFMa3WNhbAakBdPaw8qus/YnEx/r P0Ye62JGPi2gdIuhudu8A0DcgqVgHaiVnKb0FQWkvZa25bJAFBrrGvGYld1FkxCncncu Gu+p7l7hU5E3PRKgA5L5zX0mSDp9iEpKAq0fkAtkCRWTOzdMFTdMya0a3SFaf2vNuj7f 37AQ== X-Gm-Message-State: AOJu0Yz8ONe/peE2Wi5Xu10wiRdyGnmdhIXA1UHpasC1iPbAkBXMtGC2 Dr2VKfMQs4XQIFKncD0FOpFRggql7aO28qMoJGQ= X-Google-Smtp-Source: AGHT+IE4z0Aa2OY/6DqAi1bRkpmctNsQdq0WVGZJdvn+5aQuJ96nNPiPbpC9KIh/XV+LCvwJ5NQP72QYvowHgnzPLzA= X-Received: by 2002:a05:6a21:6d9c:b0:18f:df4f:893 with SMTP id wl28-20020a056a216d9c00b0018fdf4f0893mr130520pzb.49.1701851015724; Wed, 06 Dec 2023 00:23:35 -0800 (PST) MIME-Version: 1.0 References: <20230911094444.68966-1-zhengqi.arch@bytedance.com> <20230911094444.68966-43-zhengqi.arch@bytedance.com> <93c36097-5266-4fc5-84a8-d770ab344361@bytedance.com> In-Reply-To: <93c36097-5266-4fc5-84a8-d770ab344361@bytedance.com> From: Lai Jiangshan Date: Wed, 6 Dec 2023 16:23:24 +0800 Message-ID: Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless To: Qi Zheng Cc: akpm@linux-foundation.org, paulmck@kernel.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: F1166180013 X-Stat-Signature: e4y96knwu4syz1y7a4qa9tihsipe8p97 X-Rspam-User: X-HE-Tag: 1701851016-798748 X-HE-Meta: U2FsdGVkX1/xInA51aXLxoHeKXorjqCP1M3HcBUG8MRDQ7dKssfJU5FUBjZ58OTnlxTymEAkG+a8fQXFHeUfGs2xCdEyD5q8Wplhh2UvTulZSWtIGRgtdY5Nf/zNvsBtczb/sBAp/2oeWpqHi4Ci/DXpTVJSmV+AsRTIa0xBZD0+2UMHnRx9ZQ+SAodlz6Og3EbZ3WOqfEnSiEXda7cURGrinsXCzQx6uQwgNoadyzrPMQqNSXjvuBHEeu0z2AtK5vElqS21Z3/oJWcdQFtMcKWq+yPFO2lHp5UqG/VeSG8j6fzxcEn3W5rUYkEOOuyIZmaVfSubLmsdMYgB4KK/ItudMtx/kOgZJmfnxBme3lw0T503iPaQfnImYA52pyLh48PbHZLJ3qMxz1gn7BTtpTiDDecprpeyQQjeCjah6Be00ze6Z5lBP6j/0wQ/oKmnfxXAiDJweUQpNuNJWDSE1ZwmwG9I2ClZyCEp2NPHI5akNkHZupYgNkok51qY1hmBO/4roMXT28eEq1XBt0D2U9zyJSu/sxv89eoU7LXv1EqNtGZIGsY1VUkjcTVOkjfXDaxeVHTxvmfP+YRLz6Lc3q48+octj9rN5iypUdtf9XinrHGIdYzd86s/u7ZIDZnCg9xHOIVKd9uOITNQnOb5fduQtTj2b1qwcb8nUziFeMBLykJ4wI/tGGIM2Ygj3hJOiJw+lASFZpOCskKKZtDiAu+l1tDKIiEoLPtkfZG4XBMVNFhI8MXkRYevetmibIL7k7ocb4hK2KqxRj28XRPoDx/Kowv50mZ1i3jXAlKEU/TpYF2qWDgrerl0IrVwsUXhs3c6gmid/KykmtQ7cxxfDgiqr8t+2qyhAAepV4wIt9jXiUIZ0x38xNeFUZjqOub2SOp3mOevVT5KIHgnFBxFpogzKgKmULNLeT1st1dLkqroejmEGmbWIRrGp7t8zdjuffAbfOApCsc0CTi7OIZ YrrvMJEQ R8iofkS54t8XLjVVxFNptsqPudRWkY2wpzZMEgO9ecBwyw1sPVTVp856E1spYCUHSJhl/BhJdMZSnoU4VtpIxTQKZLfA1Cbe3gz8IHAvT03Qwx0K7ecC1HcZ9fqTex+pOy9ehORDWfWW+/BvOKJNUqQOyEjYZeBtb3ErEoQGEsDIWCScM0CpPyq3uGBS6tSoPR0k6v+qo4dE5ydkGtaYTref45Akuml60kjf3z4k1kVcIkMyCCKmenVzg24RDSDX/Wxuupf9W6lIXMMyFAt9chq2zmDbsvwTMzPjlGj8m1hERYP0h6x1Oj6DKa/i1699Vh2ccu8H1/BYHZPu7dIno8HadQRYUxCt+5jXB10/t7wDjuqDeizb4XCz1pTBlSZjGHOu+lQirYbtC5JCJMfzBBwUoymPefyNjLJ+z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000055, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 6, 2023 at 3:55=E2=80=AFPM Qi Zheng wrote: > > Hi, > > On 2023/12/6 15:47, Lai Jiangshan wrote: > > On Tue, Sep 12, 2023 at 9:57=E2=80=AFPM Qi Zheng wrote: > > > >> - if (!down_read_trylock(&shrinker_rwsem)) > >> - goto out; > >> - > >> - list_for_each_entry(shrinker, &shrinker_list, list) { > >> + /* > >> + * lockless algorithm of global shrink. > >> + * > >> + * In the unregistration setp, the shrinker will be freed asyn= chronously > >> + * via RCU after its refcount reaches 0. So both rcu_read_lock= () and > >> + * shrinker_try_get() can be used to ensure the existence of t= he shrinker. > >> + * > >> + * So in the global shrink: > >> + * step 1: use rcu_read_lock() to guarantee existence of the = shrinker > >> + * and the validity of the shrinker_list walk. > >> + * step 2: use shrinker_try_get() to try get the refcount, if= successful, > >> + * then the existence of the shrinker can also be gua= ranteed, > >> + * so we can release the RCU lock to do do_shrink_sla= b() that > >> + * may sleep. > >> + * step 3: *MUST* to reacquire the RCU lock before calling sh= rinker_put(), > >> + * which ensures that neither this shrinker nor the n= ext shrinker > >> + * will be freed in the next traversal operation. > > > > Hello, Qi, Andrew, Paul, > > > > I wonder know how RCU can ensure the lifespan of the next shrinker. > > it seems it is diverged from the common pattern usage of RCU+reference. > > > > cpu1: > > rcu_read_lock(); > > shrinker_try_get(this_shrinker); > > rcu_read_unlock(); > > cpu2: shrinker_free(this_shrinker); > > cpu2: shrinker_free(next_shrinker); and free the memory of next_sh= rinker > > cpu2: when shrinker_free(next_shrinker), no one updates this_shrin= ker's next > > cpu2: since this_shrinker has been removed first. > > No, this_shrinker will not be removed from the shrinker_list until the > last refcount is released. See below: > > > rcu_read_lock(); > > shrinker_put(this_shrinker); > > CPU 1 CPU 2 > > --> if (refcount_dec_and_test(&shrinker->refcount)) > complete(&shrinker->done); > > wait_for_completion(&shrinker->done); > list_del_rcu(&shrinker->list); since shrinker will not be removed from the shrinker_list until the last refcount is released. Is it possible that shrinker_free() can be starved by continuous scanners getting and putting the refcount? Thanks Lai > > > travel to the freed next_shrinker. > > > > a quick simple fix: > > > > // called with other references other than RCU (i.e. refcount) > > static inline rcu_list_deleted(struct list_head *entry) > > { > > // something like this: > > return entry->prev =3D=3D LIST_POISON2; > > } > > > > // in the loop > > if (rcu_list_deleted(&shrinker->list)) { > > shrinker_put(shrinker); > > goto restart; > > } > > rcu_read_lock(); > > shrinker_put(shrinker); > > > > Thanks > > Lai > > > >> + * step 4: do shrinker_put() paired with step 2 to put the re= fcount, > >> + * if the refcount reaches 0, then wake up the waiter= in > >> + * shrinker_free() by calling complete(). > >> + */ > >> + rcu_read_lock(); > >> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { > >> struct shrink_control sc =3D { > >> .gfp_mask =3D gfp_mask, > >> .nid =3D nid, > >> .memcg =3D memcg, > >> }; > >> > >> + if (!shrinker_try_get(shrinker)) > >> + continue; > >> + > >> + rcu_read_unlock(); > >> + > >> ret =3D do_shrink_slab(&sc, shrinker, priority); > >> if (ret =3D=3D SHRINK_EMPTY) > >> ret =3D 0; > >> freed +=3D ret; > >> - /* > >> - * Bail out if someone want to register a new shrinker= to > >> - * prevent the registration from being stalled for lon= g periods > >> - * by parallel ongoing shrinking. > >> - */ > >> - if (rwsem_is_contended(&shrinker_rwsem)) { > >> - freed =3D freed ? : 1; > >> - break; > >> - } > >> + > >> + rcu_read_lock(); > >> + shrinker_put(shrinker); > >> } > >>