From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD73EC3DA7F for ; Wed, 31 Jul 2024 07:03:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 61FC36B0088; Wed, 31 Jul 2024 03:03:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CE2D6B008A; Wed, 31 Jul 2024 03:03:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 496286B0092; Wed, 31 Jul 2024 03:03:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2BD466B0088 for ; Wed, 31 Jul 2024 03:03:26 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CFF4C1C0702 for ; Wed, 31 Jul 2024 07:03:25 +0000 (UTC) X-FDA: 82399156770.10.F634700 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) by imf09.hostedemail.com (Postfix) with ESMTP id E08D0140020 for ; Wed, 31 Jul 2024 07:03:23 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sBB2QVMX; spf=pass (imf09.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722409376; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dd0fgaKpb14AqzIiZc9ZPFNZJ6RSfTKvGzEefKObQXM=; b=gkrIg2OOlehmhdZ4HqXXUthXJWAJUR1SbSC0GSoP4Gu/cEc4IU8Bj+ofs997am5J5aPNEt jgZWdyiQlmx1NZFAJNnfup+m8ZsaPETDVJcpAm0PVAGXYaT4zTh1ilwwuMJxDQixXSpA5l zU/L8JbyEGptWXvA8KDU0KaDo8CBSU8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sBB2QVMX; spf=pass (imf09.hostedemail.com: domain of muchun.song@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722409376; a=rsa-sha256; cv=none; b=TqVWoPMN0c7SbA3NjT+V+BXBC0o4W0BFlIQQRtXSLUu1sWBf4jQpDmOyGYZcJmLpdKNxuU NoNxpb1rzRt8QELIdQTxP4nl8Q8Y3XxxTyfc09t37mD6YyjdrrKm9CjYmgvmty1YxNJTbs Kakod77PhdRJe7FgXLlF8F/SomG1eVg= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1722409401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dd0fgaKpb14AqzIiZc9ZPFNZJ6RSfTKvGzEefKObQXM=; b=sBB2QVMXNwCBAKwnrBNAhslLWpKRRkaLPNM4xyXN6I6Wwel77CdDRdiBObtT2R7ZkBCv9K bJklRurI3Lxc2aXgbqjxksBEEj1bZutaQsFPzDeDla1qp8/fAfhZY7fz3T0ymlNwXCsOdr u0KDYVLKaa52P1QUviqHOQasPKKEEGA= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: [PATCH v3] mm: kmem: add lockdep assertion to obj_cgroup_memcg X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <3c4b978b-b1fe-42d2-b1a7-a58609433f3c@samsung.com> Date: Wed, 31 Jul 2024 15:02:40 +0800 Cc: Muchun Song , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Andrew Morton , vbabka@kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20240725094330.72537-1-songmuchun@bytedance.com> <3c4b978b-b1fe-42d2-b1a7-a58609433f3c@samsung.com> To: Marek Szyprowski X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: E08D0140020 X-Stat-Signature: g8az5939stikxrt8o11fzudkfnyfbtfp X-HE-Tag: 1722409403-629368 X-HE-Meta: U2FsdGVkX1/yiXNeHMMRA7Rbxw0ZRLIy5GguSq9aIAOumdMB5KbaGz/A+0bouBMwcAkoMt5K/QC/dvhk3OPMZGl2UOOs7cAwwgP22sXNzT0XjU8oqFP355s2OuNMM5cdoht93jG9NWWBtQLOrQHXfJS7OSJx1+RDkVu80t4AoDqt56aMciLXZbsvK/QTf0o9cO9eFjjAKx5hsZasDvmdypEAPHg/JiGpZpBY7nDv3UyOOzgf7SHnb7cIBoOWI2HBoTnx84CtjlPwRpAD6t8TgQkN1LVhGOY42/4byeJNdZyNgc0XfFSH4qEDTp+g7mGsX7BQWslnpyhI9w8ycv5rkeqWx7c6za8AnvvX/xJIHtztFKP/Sx3Hr2aEWuUMZYHE7b83a4goaFZl1BcGfxtOsq5JWUsRZltLmpVHEVsdwGrFZkxU8JjMTaVf1mxGoya3TgLwbz5Tmgbzx6wWcWFNiduTZwApeir/8omajJMvhzFwe/JBRqaCGpG+014zivmZgVwl28hKHnNw5v693yggwpI4yEUNRi3FCAchVTztbCjuYz/BF0GFiaiSQL/ULEkvd4diNwRBQp1fPZvCaQRKG7JQ6m+c5fW7s0yMvVDYRhiUbOLu2A9AFy4LnZR9WIlkUEM4MJK/8B4yBKn3pjjSmdCZYGuum8nqNgHKddrteHYzC7NdCGSCWOk0DF/ReKPC0fStMCPDtgOViOpA52vyNNej6XGsnCSRGgeSYWFoPdsmc65AB5w6jqs2cQEtODWVupw9qrE/arTHmMvI1+GVbSPb+c/q6Q0ObSrY9kENu+2A3js3zmpYY2gYqweS4QelTiw8udS03L46JGBizJx9Hvlc8RzCSdktuioA7A80ju8k50HAzrYczjy5Kcu2pimd9c5BJa9EkzycwNHkCbyHQt1JOS1Nue52QG5q9ObQ1gPpJsn9/Lu75tJo3UjskMAktEvlVAcmdotxvlnXH6j hljpu1C2 cPPTv/NZvhGeNG2YDsuVL8cL8nuuSoOpsnL7SiW4HLKrO2bO007dUCuTGCll+otGXoVPbN8WbI8qLMVZ2FucDelJNqJpDlqKm4oWOWlUXwhFXLVL96cBamEKv/t5tjl02sfReWD4Tk0hf7HBpTCfanAiyojUVaBHwVh1QixToVwrDT64N4VOXZFVNOiBTrNA8B1C8dTalWeAPEyptkMY/StNu2FeJvm1I7VvAdBIB4oQvJi0sNrdyEa6i5DR0nZRpgXnIyL3zR0eehe9Mh+znWBw+b+qrzGadi5IzdVd30z//+nWd8vwUz7Uu2MIWn1J+SVYcW3g0+YUx7o5LeAmLim0mgQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On Jul 31, 2024, at 02:52, Marek Szyprowski = wrote: >=20 > On 25.07.2024 11:43, Muchun Song wrote: >> The obj_cgroup_memcg() is supposed to safe to prevent the returned >> memory cgroup from being freed only when the caller is holding the >> rcu read lock or objcg_lock or cgroup_mutex. It is very easy to >> ignore thoes conditions when users call some upper APIs which call >> obj_cgroup_memcg() internally like mem_cgroup_from_slab_obj() (See >> the link below). So it is better to add lockdep assertion to >> obj_cgroup_memcg() to find those issues ASAP. >>=20 >> Because there is no user of obj_cgroup_memcg() holding objcg_lock >> to make the returned memory cgroup safe, do not add objcg_lock >> assertion (We should export objcg_lock if we really want to do). >> Additionally, this is some internal implementation detail of memcg >> and should not be accessible outside memcg code. >>=20 >> Some users like __mem_cgroup_uncharge() do not care the lifetime >> of the returned memory cgroup, which just want to know if the >> folio is charged to a memory cgroup, therefore, they do not need >> to hold the needed locks. In which case, introduce a new helper >> folio_memcg_charged() to do this. Compare it to folio_memcg(), it >> could eliminate a memory access of objcg->memcg for kmem, actually, >> a really small gain. >>=20 >> Link: = https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.co= m/ >> Signed-off-by: Muchun Song >=20 > This patch landed in today's linux-next as commit 230b2f1f31b9 ("mm:=20= > kmem: add lockdep assertion to obj_cgroup_memcg"). I my tests I found=20= > that it triggers the following warning on Debian bookworm/sid system=20= > image running under QEMU RISCV64: Thanks for your report. I'd like to say excellent since it indeed indicates this patch works well. Your report is actually a bug that I fixed it in [1] but not related to this patch. [1] = https://lore.kernel.org/all/20240718083607.42068-1-songmuchun@bytedance.co= m/ >=20 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 1 at include/linux/memcontrol.h:373=20 > mem_cgroup_from_slab_obj+0x13e/0x1ea > Modules linked in: > CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.10.0+ #15154 > Hardware name: riscv-virtio,qemu (DT) > epc : mem_cgroup_from_slab_obj+0x13e/0x1ea > ra : mem_cgroup_from_slab_obj+0x13c/0x1ea > ... > [] mem_cgroup_from_slab_obj+0x13e/0x1ea > [] list_lru_del_obj+0xa6/0xc2 > [] d_lru_del+0x8c/0xa4 > [] __dentry_kill+0x15e/0x17a > [] dput.part.0+0x242/0x3e6 > [] dput+0xe/0x18 > [] lookup_fast+0x80/0xce > [] walk_component+0x20/0x13c > [] path_lookupat+0x64/0x16c > [] filename_lookup+0x76/0x122 > [] user_path_at+0x30/0x4a > [] __riscv_sys_name_to_handle_at+0x52/0x1d8 > [] do_trap_ecall_u+0x14e/0x1da > [] handle_exception+0xca/0xd6 > irq event stamp: 198187 > hardirqs last enabled at (198187): []=20 > lookup_mnt+0x186/0x308 > hardirqs last disabled at (198186): []=20 > lookup_mnt+0x15c/0x308 > softirqs last enabled at (198172): []=20 > cgroup_apply_control_enable+0x1f6/0x2fc > softirqs last disabled at (198170): []=20 > cgroup_apply_control_enable+0x1d8/0x2fc > ---[ end trace 0000000000000000 ]--- >=20 > Similar warning appears on ARM64 Debian bookworm system. Reverting it = on=20 > top of linux-next hides the issue, but I assume this is not the best = way=20 > to fix it. >=20 > I'm testing kernel built from riscv/defconfig with PROVE_LOCKING,=20 > DEBUG_ATOMIC_SLEEP, DEBUG_DRIVER and DEBUG_DEVRES options enabled. >=20 >> --- >> v3: >> - Use lockdep_assert_once(Vlastimil). >>=20 >> v2: >> - Remove mention of objcg_lock in obj_cgroup_memcg()(Shakeel Butt). >>=20 >> include/linux/memcontrol.h | 20 +++++++++++++++++--- >> mm/memcontrol.c | 6 +++--- >> 2 files changed, 20 insertions(+), 6 deletions(-) >>=20 >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index fc94879db4dff..95f823deafeca 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -360,11 +360,11 @@ static inline bool folio_memcg_kmem(struct = folio *folio); >> * After the initialization objcg->memcg is always pointing at >> * a valid memcg, but can be atomically swapped to the parent memcg. >> * >> - * The caller must ensure that the returned memcg won't be released: >> - * e.g. acquire the rcu_read_lock or css_set_lock. >> + * The caller must ensure that the returned memcg won't be released. >> */ >> static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup = *objcg) >> { >> + lockdep_assert_once(rcu_read_lock_held() || = lockdep_is_held(&cgroup_mutex)); >> return READ_ONCE(objcg->memcg); >> } >>=20 >> @@ -438,6 +438,19 @@ static inline struct mem_cgroup = *folio_memcg(struct folio *folio) >> return __folio_memcg(folio); >> } >>=20 >> +/* >> + * folio_memcg_charged - If a folio is charged to a memory cgroup. >> + * @folio: Pointer to the folio. >> + * >> + * Returns true if folio is charged to a memory cgroup, otherwise = returns false. >> + */ >> +static inline bool folio_memcg_charged(struct folio *folio) >> +{ >> + if (folio_memcg_kmem(folio)) >> + return __folio_objcg(folio) !=3D NULL; >> + return __folio_memcg(folio) !=3D NULL; >> +} >> + >> /** >> * folio_memcg_rcu - Locklessly get the memory cgroup associated = with a folio. >> * @folio: Pointer to the folio. >> @@ -454,7 +467,6 @@ static inline struct mem_cgroup = *folio_memcg_rcu(struct folio *folio) >> unsigned long memcg_data =3D READ_ONCE(folio->memcg_data); >>=20 >> VM_BUG_ON_FOLIO(folio_test_slab(folio), folio); >> - WARN_ON_ONCE(!rcu_read_lock_held()); >>=20 >> if (memcg_data & MEMCG_DATA_KMEM) { >> struct obj_cgroup *objcg; >> @@ -463,6 +475,8 @@ static inline struct mem_cgroup = *folio_memcg_rcu(struct folio *folio) >> return obj_cgroup_memcg(objcg); >> } >>=20 >> + WARN_ON_ONCE(!rcu_read_lock_held()); >> + >> return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK); >> } >>=20 >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 622d4544edd24..3da0284573857 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -2366,7 +2366,7 @@ void mem_cgroup_cancel_charge(struct mem_cgroup = *memcg, unsigned int nr_pages) >>=20 >> static void commit_charge(struct folio *folio, struct mem_cgroup = *memcg) >> { >> - VM_BUG_ON_FOLIO(folio_memcg(folio), folio); >> + VM_BUG_ON_FOLIO(folio_memcg_charged(folio), folio); >> /* >> * Any of the following ensures page's memcg stability: >> * >> @@ -4617,7 +4617,7 @@ void __mem_cgroup_uncharge(struct folio *folio) >> struct uncharge_gather ug; >>=20 >> /* Don't touch folio->lru of any random page, pre-check: */ >> - if (!folio_memcg(folio)) >> + if (!folio_memcg_charged(folio)) >> return; >>=20 >> uncharge_gather_clear(&ug); >> @@ -4662,7 +4662,7 @@ void mem_cgroup_replace_folio(struct folio = *old, struct folio *new) >> return; >>=20 >> /* Page cache replacement: new folio already charged? */ >> - if (folio_memcg(new)) >> + if (folio_memcg_charged(new)) >> return; >>=20 >> memcg =3D folio_memcg(old); >=20 > Best regards > --=20 > Marek Szyprowski, PhD > Samsung R&D Institute Poland