From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DA64C54ED0 for ; Fri, 23 May 2025 02:40:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 97E216B0085; Thu, 22 May 2025 22:40:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9554B6B0088; Thu, 22 May 2025 22:40:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8930D6B0089; Thu, 22 May 2025 22:40:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6A81C6B0085 for ; Thu, 22 May 2025 22:40:44 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D198E121228 for ; Fri, 23 May 2025 02:40:43 +0000 (UTC) X-FDA: 83472619566.11.BF55F4E Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) by imf08.hostedemail.com (Postfix) with ESMTP id AD019160004 for ; Fri, 23 May 2025 02:40:41 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=AHPdndVK; spf=pass (imf08.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747968042; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ExKQDu/QVAaNGZZ21ymKeIRIUkQjlnYFZrthUN5LOOg=; b=tOJOnIk7q9QU40IUFqcvcYSVlD1mTLjceUU6VhHkR3nL0qnaBOH5Trr0f0F5g5XZPrAGBk Rla23g3f1YmUiIP45VN8PViDAsWRWNZEss6YY7MNyK4/bHu7T0PAtmNa5IDGlpLS/xhEFW U/Aif1CpWFcPTmmsE9arXqrEfYC1lEU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=AHPdndVK; spf=pass (imf08.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747968042; a=rsa-sha256; cv=none; b=FXGfHzeEKdAOQxyfO2bboxO2gfn3yxMZrBtvAhaBluSVu0ElFNPANCmFONupQbMcfMPrL9 /6IeqJHGiUl1xgmCFRw/i5YoJ+F3HUk4PnXX69ogdrGJ1Ob5L2hLJplnvsi0Ieo3aiMY2m YK3GEnPenNQO11WHgPJkPHKvLSitE9Y= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747968039; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ExKQDu/QVAaNGZZ21ymKeIRIUkQjlnYFZrthUN5LOOg=; b=AHPdndVKjx22GhUixbVWcTUUBkt5txacCmcxOn9eShMc5ije1MuX0G5n/frWJNEAHPUtMO mypLklg11EbPqp+r2tzrQbbspQVz1WRUe1jLE7s/cFsDC5YNSSbvFC4x+BXeV5UdOfYPfs 0OV/QFgG89cdblHTqXtTYf2od+RpktY= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.500.181.1.5\)) Subject: Re: [PATCH RFC 00/28] Eliminate Dying Memory Cgroup X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Fri, 23 May 2025 10:39:58 +0800 Cc: Muchun Song , hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com Content-Transfer-Encoding: quoted-printable Message-Id: References: <20250415024532.26632-1-songmuchun@bytedance.com> To: Harry Yoo X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: AD019160004 X-Stat-Signature: bi57ip7zqfi6czisj17wsgq4ehoskpo7 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1747968041-748402 X-HE-Meta: U2FsdGVkX19rCp2nWVNCscMgoEt2RmDbOJKG9c17d7WbUnV2bLRIbjxD5UEGHKKixIA+pelGqUervK+WHN50cfwfFQpnsgSvuMTJEqna/uQ8UKEauQBXqE/KGzxqj20bhqLi8cV+N4gxYgPJ/hPITUAtzpn1Pb/R7wzMFkKZonJz0G0YW8bNpR0cK/aE2JQpfbJ0Lo+3YRcsiBcGZdsC75nX/03lYRpqqhvL9W5JcHctfnqU4eRtY2yMRK/iKVusn2xRRnkEftOlnAOU733fyWgErDo8zDzkO2WpuoYwtWPdQhpiU2YNfAfaYfsftPxWo50fWAcsuk6S6OPOse/CoQ+RQbIamPT5YB4E8yQi5sM3JKpby1ehzEt+eu5bpot/NW3+I1l5n/QzFcrdpMbZk9B6DZs8MbpUuU9pTcysIRJFtMzCk2vpfJhn0MQtqu40NokrVWpQJALP2YZM16pMzRZLUe6zKFYS2R2rK64iMDZ5dEN8vr2k8ztX69zza17cxZtbz3wKh3YmgxlfuEMFDi90H5wfPIGOiw/wUbwXnY3IT3zYbuhVbTj8KdD9EZ6GRTYpjcP7SPK+l/rpAe4qRsOCnqQVxsKOx9f4a9iGPzLXdHat93Vgv/fJ5bXkbzvhg5/16njtrX7DPADIDEwNPZV6kMGD+wOgcgF8fvcYu4If956MFyEIzjaIsW30Yc/2zpNLwC20d1K8nkKuVXamFBMlVP/i3b2JrmxgC6vIk6HCkmDJSgxzPJJG5EHhFA0JJYAeUiR5RxTe40JAjP33dQu3Rewf1WjSZi/8WhOaYzC21Y9cBQejtf5Jgvyk8v4hBN9EUoH6+2ha3e72TaYluHHHLMSQvYnCpcgjhysIxdxiGh0GjTYNyO51oDjLZrsysrw/pRk7QCR5JCaLmFaxTKPOVw4dcHsPUHWt/jh24KLiHrop99X0aq0iudXaPijJkhTcuhS3Yof74iWqmlO MwRMjoe3 z03cjQnd37iCqWBP6qJYq0qJ5mJE/iKJ5OPJsDgNY4afByCyG3x947tt+UifdXkHQPOQelHYE1Ff9p4P0op8HS8gOvkEaUM6nx50iZtWM5Q4F5uxC42RI0rLqjxg/+LyLsmhEDBEhM2y/an24vnMfy2pLjh8v+nzOTXzvkZHWUv/ilAvlX0P3Vk5YvNtwt3KmmMo6hyzt51W2cdluLSkFWfcsGmtAnvOFE5DdG0pF0n+dzt6SvXCqZTuqWyE92rd/pcPmGBb+FfwuVVw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On May 23, 2025, at 09:23, Harry Yoo wrote: >=20 > On Tue, Apr 15, 2025 at 10:45:04AM +0800, Muchun Song wrote: >> This patchset is based on v6.15-rc2. It functions correctly only when >> CONFIG_LRU_GEN (Multi-Gen LRU) is disabled. Several issues were = encountered >> during rebasing onto the latest code. For more details and = assistance, refer >> to the "Challenges" section. This is the reason for adding the RFC = tag. >>=20 >=20 > [...snip...] >=20 >> ## Fundamentals >>=20 >> A folio will no longer pin its corresponding memory cgroup. It is = necessary >> to ensure that the memory cgroup or the lruvec associated with the = memory >> cgroup is not released when a user obtains a pointer to the memory = cgroup >> or lruvec returned by folio_memcg() or folio_lruvec(). Users are = required >> to hold the RCU read lock or acquire a reference to the memory cgroup >> associated with the folio to prevent its release if they are not = concerned >> about the binding stability between the folio and its corresponding = memory >> cgroup. However, some users of folio_lruvec() (i.e., the lruvec lock) >> desire a stable binding between the folio and its corresponding = memory >> cgroup. An approach is needed to ensure the stability of the binding = while >> the lruvec lock is held, and to detect the situation of holding the >> incorrect lruvec lock when there is a race condition during memory = cgroup >> reparenting. The following four steps are taken to achieve these = goals. >>=20 >> 1. The first step to be taken is to identify all users of both = functions >> (folio_memcg() and folio_lruvec()) who are not concerned about = binding >> stability and implement appropriate measures (such as holding a RCU = read >> lock or temporarily obtaining a reference to the memory cgroup for = a >> brief period) to prevent the release of the memory cgroup. >>=20 >> 2. Secondly, the following refactoring of folio_lruvec_lock() = demonstrates >> how to ensure the binding stability from the user's perspective of >> folio_lruvec(). >>=20 >> struct lruvec *folio_lruvec_lock(struct folio *folio) >> { >> struct lruvec *lruvec; >>=20 >> rcu_read_lock(); >> retry: >> lruvec =3D folio_lruvec(folio); >> spin_lock(&lruvec->lru_lock); >> if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) = { >> spin_unlock(&lruvec->lru_lock); >> goto retry; >> } >>=20 >> return lruvec; >> } >=20 > Is it still required to hold RCU read lock after binding stability > between folio and memcg? No. The spin lock is enough. The reason is because the introducing of lock assertion in commit: 02f4bbefcada ("mm: kmem: add lockdep assertion to obj_cgroup_memcg") The user may unintentionally call obj_cgroup_memcg() with holding lruvec lock, if we do not hold rcu read lock, then obj_cgroup_memcg() will complain about this. >=20 > In the previous version of this series, folio_lruvec_lock() is = implemented: >=20 > struct lruvec *folio_lruvec_lock(struct folio *folio) > { > struct lruvec *lruvec; >=20 > rcu_read_lock(); > retry: > lruvec =3D folio_lruvec(folio); > spin_lock(&lruvec->lru_lock); >=20 > if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { > spin_unlock(&lruvec->lru_lock); > goto retry; > } > rcu_read_unlock(); >=20 > return lruvec; > } >=20 > And then this version calls rcu_read_unlock() in lruvec_unlock(), > instead of folio_lruvec_lock(). >=20 > I wonder if this is because the memcg or objcg can be released without > rcu_read_lock(), or just to silence the warning in > = folio_memcg()->obj_cgroup_memcg()->lockdep_assert_once(rcu_read_lock_is_he= ld())? The latter is right. Muchun, Thanks. >=20 >> =46rom the perspective of memory cgroup removal, the entire = reparenting >> process (altering the binding relationship between folio and its = memory >> cgroup and moving the LRU lists to its parental memory cgroup) = should be >> carried out under both the lruvec lock of the memory cgroup being = removed >> and the lruvec lock of its parent. >>=20 >> 3. Thirdly, another lock that requires the same approach is the = split-queue >> lock of THP. >>=20 >> 4. Finally, transfer the LRU pages to the object cgroup without = holding a >> reference to the original memory cgroup. >=20 > --=20 > Cheers, > Harry / Hyeonggon