From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6276FC4167B for ; Wed, 6 Dec 2023 08:10:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DECC46B007E; Wed, 6 Dec 2023 03:10:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D9F846B0083; Wed, 6 Dec 2023 03:10:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C655C6B0085; Wed, 6 Dec 2023 03:10:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B670D6B007E for ; Wed, 6 Dec 2023 03:10:21 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 85A268013D for ; Wed, 6 Dec 2023 08:10:21 +0000 (UTC) X-FDA: 81535671042.17.CC220A3 Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) by imf15.hostedemail.com (Postfix) with ESMTP id BA788A001B for ; Wed, 6 Dec 2023 08:10:19 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WfBbl8io; spf=pass (imf15.hostedemail.com: domain of jiangshanlai@gmail.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=jiangshanlai@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701850219; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w4zRPqTOkB7jDP1E8GI9v9TGvn/esl06SjlrpyyjjKU=; b=3Mjo+Je9uwlDzhecEafX8VCmNhkrzLi+FuiJW5XOL9v9EG0T3fgV1fbjFVynMTXfKukOFF tsBQ/kTt9H+iiOmg2hF4HbfDfWDxlDk1OFG8+Yt2+3EcK+SvBmbO8hAUuucXcgAV7eSTH/ CDptcZ718Fck9B0C2lZcvK2B1K8ZHBs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701850219; a=rsa-sha256; cv=none; b=pLhPj7SMu4MfnVMe0hgO9+QKirJrB5SfL1zVX4ACz/IHZvbylJiTvC7owToi3hBh1NPTZ9 SkYPal5L1j0e3qooJcnrqAxC2y3OWMahBJtouXf/wu7/GIWxmP3RujTEA7jkvUHegOrDf7 aUczmDGL2hXRwvtnu09asgolcjlMeNI= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WfBbl8io; spf=pass (imf15.hostedemail.com: domain of jiangshanlai@gmail.com designates 209.85.210.52 as permitted sender) smtp.mailfrom=jiangshanlai@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ot1-f52.google.com with SMTP id 46e09a7af769-6d9a6f756c3so1858166a34.2 for ; Wed, 06 Dec 2023 00:10:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701850219; x=1702455019; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w4zRPqTOkB7jDP1E8GI9v9TGvn/esl06SjlrpyyjjKU=; b=WfBbl8ioZM0PFNK2IFeHuiR70m3JNlLQvpkNUpxsgXaY1GJqk2m+p2y62NronhagBO 6oRjOZLdXf4cTLSYl2qCL8lX0TJ4ZM4gI6a06JJzeIALOOl2qpAVyYI8TO+hhlaN9wmf FZm5ckOL3JGR+vd3RfrFZI0pWn0SzSKf+ImsDOZXe20u2+MCfh1SI6/o6RnOfiBZEmiL lWqf90zU1QAx3c4B4mt7Eo2nz8Hjv0pW70I6b4IV7VumQ/T66vRD0Lma81KJ8l63Vn9U rADt91Jsgy5KeI2wVKkoI7zPsN1bW4UQQFSVjlnL67oOnjzx6BhF76CZ5ZUK6BrqTP3S NCCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701850219; x=1702455019; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w4zRPqTOkB7jDP1E8GI9v9TGvn/esl06SjlrpyyjjKU=; b=PZMtuWGG4RMMU5wYhQiscd4ytFdtEsn4mnYt2gMwBwGV2o0v6O3DnqEXOu0FTxDOSQ D0iV1VU/4NmeYvLjm9BuG62J0sCA43Tt9I41CIU/GVViB11VZSrwIqtI6+JO1bAQdFmg IW0ZMGNdtgQhx0qcSQdzmyfOrMov7wTxbpeDVcWwP7/CoBdqKp4XfoqErGo4fMOp1X6W 0OzjYotcQq8pSZWakn76qqDXQSdtxbuGrowKmlqgsQUe9GQNdA9XsCEcWqTg1oNePCCB JSuqQedan4DnaVUHDFLcZHH2gcNjOCU6+JimMJoDBOESWPhsu923+zVhbwdiOXeKONBH 5GYg== X-Gm-Message-State: AOJu0YzM0XW+73eXihs8FfJj5M9AM3Jj/pEOuf4VvbdrRqoW4+CSadRj 8FYr8aXiXIfpp8v0IG5Z0Ih4zL9EGaKe/FljPNo= X-Google-Smtp-Source: AGHT+IHYzRi5P2jhRZ/cpQOAt6iuZH+UBx/TP+7cObEbZYVtiS4uyqmMjQWcpOlXLBmov5H62Xo8uvzbfv54aVTm00o= X-Received: by 2002:a05:6358:280e:b0:170:17eb:14c2 with SMTP id k14-20020a056358280e00b0017017eb14c2mr661954rwb.50.1701850218695; Wed, 06 Dec 2023 00:10:18 -0800 (PST) MIME-Version: 1.0 References: <20230911094444.68966-1-zhengqi.arch@bytedance.com> <20230911094444.68966-43-zhengqi.arch@bytedance.com> <93c36097-5266-4fc5-84a8-d770ab344361@bytedance.com> In-Reply-To: <93c36097-5266-4fc5-84a8-d770ab344361@bytedance.com> From: Lai Jiangshan Date: Wed, 6 Dec 2023 16:10:06 +0800 Message-ID: Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless To: Qi Zheng Cc: akpm@linux-foundation.org, paulmck@kernel.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BA788A001B X-Rspam-User: X-Stat-Signature: 57oqgxdg3df8w7rf11qoxyf89gkw5hew X-Rspamd-Server: rspam03 X-HE-Tag: 1701850219-540013 X-HE-Meta: U2FsdGVkX1/Y9kkdWa113GB5fqSfQusJrx/Zkq+MhHG5NLAHp2Gl4JusDj+ORuhP+M45LNVNDMQtH/INFbNQOrF1XAwXgqJqQ8zUIEx7Xu08OBaCIktWcxnGMNavIIIyH7guzTn89LGrXBwRVp1FyUmb4NoH8hgEUzyXJn5pPiN8iw/AS0kGN1fuXd2FxZ405Kxbp6i3LgE03GuJhJffN0y8swslortCyFj0VaBe5kaDvbECvngPOTDWCmUdDCd+90SUGzzuI8OiXES3q5wMysBqvpcsGOm37bpY5jZoZ/38tQGLHwcgeCGcE0hDFiPoL/DilcFMKgCoUimEyU83Ew7ouL6KfZkeP48aMzsoXyXfSkSXviQxIRElyFZiadvnjtdAICSyuCExgoeKOUcldWo6mJAs80frtfSVBE0DJly6sRMKlrUFsQlRH5Pz9mCeur2Rvb5WkxU7sHz/QGGYvhdtLYaHesKFbRjtaNLys9OJohUHuCiQx36NSN34wyuIbziJP6jzGrnbIqbe8TRYYCiMKnv7ngQlzwqGEjwvG9y4MInzWiH01pLYg3RmI6FSlRSkik/6SXmjJkg0q8eRVLV5K5mtuITVU4kBsua4yFcSaXpB298uaOLmgyoDYzOApCKwUULC4792S4r0vycoXcEQpqQMJOeBGStUA4WpEMez2HHn1KXtF7EATVkrcVFl8latyXF6MSSEwsQULdeu9AZwc2qeGXsY1NUXMU4sGF8bYsglBeHmprmCiEZtYJ8hqmgVC3kzC5b5CaXK3uczv7mEDhl8y/EEtrn/TNGKvhuyiy9s7Ihg/z+bor3TRu9u8yCNiPQZ96dzxovB5WrpgX4m8T0vbWmbOAOYsqVdKebPccCJ47PjMpDBlS+BwM3aUbuaMfhyjGljmuSolY5/CaEoqmQBcmBrOcpfyEGtrrn+/eiEtRp/ijQj0YsFcecjeO9yaIsYEm2QrW6xaJW 7GcJzFai 0lC4v8/4l++EtLZDhPzxwfS49IRE3SXhWszeyT7JaqN5TMTInmN+ZjQ6hMP4tUD2NWtLoE2T4I4zmTY0QSyXcCmzrMLwi9S0bhvYibE8g3xmSyKxQrl5f5mujDmZqMkcDgcJGd+Qeum2oZU5+G2t2mYSQ7f6k6y/5ZljKqqj8AoYsUXiASusBBDCZzBusAfhNBzQrG3cxfjEVdYWR/X/tHbHxWymltDdW5GXTQ+U2umEpyTmP4MY3XQtCI97CG0qV3mRbFVkU6gKYwxeXnkiWHYgy4QN8D/2z9cYpCPNstjPxHWkg/V5X1Wdjij9dfG4NcrEV+UiTpoIsxtDE/JPdIcTP36fWF3pWyZIrshQrWTBJDc/rZ1+QnaYzSAEIPngs3NbuZrym7Z+7QBPmCr6ifxhMYaokxu+QPLXa X-Bogosity: Ham, tests=bogofilter, spamicity=0.000033, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 6, 2023 at 3:55=E2=80=AFPM Qi Zheng wrote: > > Hi, > > On 2023/12/6 15:47, Lai Jiangshan wrote: > > On Tue, Sep 12, 2023 at 9:57=E2=80=AFPM Qi Zheng wrote: > > > >> - if (!down_read_trylock(&shrinker_rwsem)) > >> - goto out; > >> - > >> - list_for_each_entry(shrinker, &shrinker_list, list) { > >> + /* > >> + * lockless algorithm of global shrink. > >> + * > >> + * In the unregistration setp, the shrinker will be freed asyn= chronously > >> + * via RCU after its refcount reaches 0. So both rcu_read_lock= () and > >> + * shrinker_try_get() can be used to ensure the existence of t= he shrinker. > >> + * > >> + * So in the global shrink: > >> + * step 1: use rcu_read_lock() to guarantee existence of the = shrinker > >> + * and the validity of the shrinker_list walk. > >> + * step 2: use shrinker_try_get() to try get the refcount, if= successful, > >> + * then the existence of the shrinker can also be gua= ranteed, > >> + * so we can release the RCU lock to do do_shrink_sla= b() that > >> + * may sleep. > >> + * step 3: *MUST* to reacquire the RCU lock before calling sh= rinker_put(), > >> + * which ensures that neither this shrinker nor the n= ext shrinker > >> + * will be freed in the next traversal operation. > > > > Hello, Qi, Andrew, Paul, > > > > I wonder know how RCU can ensure the lifespan of the next shrinker. > > it seems it is diverged from the common pattern usage of RCU+reference. > > > > cpu1: > > rcu_read_lock(); > > shrinker_try_get(this_shrinker); > > rcu_read_unlock(); > > cpu2: shrinker_free(this_shrinker); > > cpu2: shrinker_free(next_shrinker); and free the memory of next_sh= rinker > > cpu2: when shrinker_free(next_shrinker), no one updates this_shrin= ker's next > > cpu2: since this_shrinker has been removed first. > > No, this_shrinker will not be removed from the shrinker_list until the > last refcount is released. See below: Oh, I miss the code here. Thanks Lai > > > rcu_read_lock(); > > shrinker_put(this_shrinker); > > CPU 1 CPU 2 > > --> if (refcount_dec_and_test(&shrinker->refcount)) > complete(&shrinker->done); > > wait_for_completion(&shrinker->done); > list_del_rcu(&shrinker->list); > > > travel to the freed next_shrinker. > > > > a quick simple fix: > > > > // called with other references other than RCU (i.e. refcount) > > static inline rcu_list_deleted(struct list_head *entry) > > { > > // something like this: > > return entry->prev =3D=3D LIST_POISON2; > > } > > > > // in the loop > > if (rcu_list_deleted(&shrinker->list)) { > > shrinker_put(shrinker); > > goto restart; > > } > > rcu_read_lock(); > > shrinker_put(shrinker); > > > > Thanks > > Lai > > > >> + * step 4: do shrinker_put() paired with step 2 to put the re= fcount, > >> + * if the refcount reaches 0, then wake up the waiter= in > >> + * shrinker_free() by calling complete(). > >> + */ > >> + rcu_read_lock(); > >> + list_for_each_entry_rcu(shrinker, &shrinker_list, list) { > >> struct shrink_control sc =3D { > >> .gfp_mask =3D gfp_mask, > >> .nid =3D nid, > >> .memcg =3D memcg, > >> }; > >> > >> + if (!shrinker_try_get(shrinker)) > >> + continue; > >> + > >> + rcu_read_unlock(); > >> + > >> ret =3D do_shrink_slab(&sc, shrinker, priority); > >> if (ret =3D=3D SHRINK_EMPTY) > >> ret =3D 0; > >> freed +=3D ret; > >> - /* > >> - * Bail out if someone want to register a new shrinker= to > >> - * prevent the registration from being stalled for lon= g periods > >> - * by parallel ongoing shrinking. > >> - */ > >> - if (rwsem_is_contended(&shrinker_rwsem)) { > >> - freed =3D freed ? : 1; > >> - break; > >> - } > >> + > >> + rcu_read_lock(); > >> + shrinker_put(shrinker); > >> } > >>