From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 641BBCCD1BF for ; Tue, 28 Oct 2025 14:04:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28D9180161; Tue, 28 Oct 2025 10:04:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23E198013F; Tue, 28 Oct 2025 10:04:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12D1380161; Tue, 28 Oct 2025 10:04:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F1B5C8013F for ; Tue, 28 Oct 2025 10:04:37 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CDDE612A883 for ; Tue, 28 Oct 2025 14:04:37 +0000 (UTC) X-FDA: 84047693394.16.D9176A7 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) by imf27.hostedemail.com (Postfix) with ESMTP id 380774001F for ; Tue, 28 Oct 2025 14:04:35 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=FPqX5Jsy; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of qi.zheng@linux.dev designates 91.218.175.178 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1761660276; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SIvMMZtiHeleJuOL4dKZoaUL1E0ZJO7RAB2Ee9n3WYU=; b=fyEKdJ85KW+m/OfGTqHOF/Wouuf2yO5mQJlm9hu321U4s8BmdHlAbmlg5HryFwTXdPB958 n3BGFdAEyUs4KEyVqiy/H95yWlnj52RQCU/+TJVv4AbqFuMZnD0EC8FvuxKBJ7/NUzhwWH dfNvj/V8iAYhqH8OomNqSp/cOw/AjmU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=FPqX5Jsy; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of qi.zheng@linux.dev designates 91.218.175.178 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1761660276; a=rsa-sha256; cv=none; b=eEInmNqihLPX0UYMQOIeR9XsQRPepjpTgsDpHkcuQyOntCN0UfjC5Eg54H0P3lhcNCCMtj tn+/kkcNenTsN5pBwCEr3JOF97ek7gq+uBTWSa1+G1mzNIaj0TvgTK5h6QAWXF7tCUoJWD q+ACfUiwqdhjQEPffCj8ptcSgYtFkU4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1761660274; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SIvMMZtiHeleJuOL4dKZoaUL1E0ZJO7RAB2Ee9n3WYU=; b=FPqX5JsybzsHXjrfTFcOA99OddeCuoln/4P04FFv3wrwoVbySXrfzU6CQQOXPMx37blhfD 77oJfK9rQYFNirve77cNm4PdoucPClzqnu50HZhpoCtQMIMUxecqzmfJKxkP/F+cMUgOI4 AyEhJaySa9fVyrJk41bgUddT+I3p7GQ= From: Qi Zheng To: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com, imran.f.khan@oracle.com, kamalesh.babulal@oracle.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Muchun Song , Qi Zheng Subject: [PATCH v1 21/26] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Date: Tue, 28 Oct 2025 21:58:34 +0800 Message-ID: In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 380774001F X-Rspamd-Server: rspam03 X-Stat-Signature: ftax76gxwpp1hazfnpft7aoiau7sxaw8 X-HE-Tag: 1761660275-347791 X-HE-Meta: U2FsdGVkX191QTvyCu8eicvNpngbecakOTGX4bz4GvtMre1GfKA0RVxBISL8iTeQFGb0uquDKXmQg9MhEYAX54zvJjsaSUbONt3396B4X2q/ZF0Av7+bZv++O8IGtw537gFJyP8czQ0Xl/i7WO6D2eUTwtE6Sa06Y7TmHe8sqeguqrnGEgvg7uIAk4qkoVWIYXl8wpxqF3LCImml7QqF6WWy3HNY8n2rEVQ6ACjlt77nWbe3+nZXuYCXxoFlbRM1nO5LvmoP9auiOW3XbC18n7T3hJCpYu8IhMngtB2a/8fQtT2CimF2P1GuPkrXuEqQvhV/TLYXAwHCd13sfyTJTuLQ0pzSnkFi3E+gB7lT+FNccrPSU2LCTVHnjsKO3KDOxpa2Q9Ysgaycehz7hoGPSG5WNBDClJN8H7QJh2cJGs/3mV+k1EY/9AWsUn5NtDyCoiXHyMPlMl6KAk2GsuO+ak8RLk/NygBwflmzeY35Z8HZOu1sPGb7aphYn68KvIcxI8kqHCWL0jWgAeUvYZ3RhSCUKrwsAY3x40CQnMqW8IGOQwcuANAAMzI/PjzdFNVUHyfCLcIV9xhGqJez9JG6huZ7Fx4TtLPCgsUP9ITiFq4Gt3z6dgRc3Ds7gRaaPUJWYy4qg//9LR+KTUT1qwhn/kD677KB9lq8HxDv+RhZ04LZKIbufYIEtLuOmg1+QLGe1lTvLU+v3m/2n9F6jvr6Bf+ZGWkC5vSNTfQ1LIAp6RTOQOcACr6DDWCMyGS3LzNbM8SYmLFdJ0pupHB6NjYo31FCz5OVRkQJuJl4WxQKPuDIcg0B5dKBsAo499t2DauSp8IQtKGDlqeAbxTN0iTbLt7fLPoeJTRGStsqNdgltcqoj3DKZITQ9MSBFkpW6c9AKUMuYZTAvBsv4Cq2zxP06Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Muchun Song The following diagram illustrates how to ensure the safety of the folio lruvec lock when LRU folios undergo reparenting. In the folio_lruvec_lock(folio) function: ``` rcu_read_lock(); retry: lruvec = folio_lruvec(folio); /* There is a possibility of folio reparenting at this point. */ spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { /* * The wrong lruvec lock was acquired, and a retry is required. * This is because the folio resides on the parent memcg lruvec * list. */ spin_unlock(&lruvec->lru_lock); goto retry; } /* Reaching here indicates that folio_memcg() is stable. */ ``` In the memcg_reparent_objcgs(memcg) function: ``` spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); /* Transfer folios from the lruvec list to the parent's. */ spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); ``` After acquiring the lruvec lock, it is necessary to verify whether the folio has been reparented. If reparenting has occurred, the new lruvec lock must be reacquired. During the LRU folio reparenting process, the lruvec lock will also be acquired (this will be implemented in a subsequent patch). Therefore, folio_memcg() remains unchanged while the lruvec lock is held. Given that lruvec_memcg(lruvec) is always equal to folio_memcg(folio) after the lruvec lock is acquired, the lruvec_memcg_debug() check is redundant. Hence, it is removed. This patch serves as a preparation for the reparenting of LRU folios. Signed-off-by: Muchun Song Signed-off-by: Qi Zheng --- include/linux/memcontrol.h | 23 ++++++----------- mm/compaction.c | 29 ++++++++++++++++----- mm/memcontrol.c | 53 +++++++++++++++++++------------------- 3 files changed, 58 insertions(+), 47 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ca8d4e09cbe7d..6f6b28f8f0f63 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -740,7 +740,11 @@ static inline struct lruvec *mem_cgroup_lruvec(struct mem_cgroup *memcg, * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * - * This function relies on folio->mem_cgroup being stable. + * The user should hold an rcu read lock to protect lruvec associated with + * the folio from being released. But it does not prevent binding stability + * between the folio and the returned lruvec from being changed to its parent + * or ancestor (e.g. like folio_lruvec_lock() does that holds LRU lock to + * prevent the change). */ static inline struct lruvec *folio_lruvec(struct folio *folio) { @@ -763,15 +767,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); -#else -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1204,11 +1199,6 @@ static inline struct lruvec *folio_lruvec(struct folio *folio) return &pgdat->__lruvec; } -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { return NULL; @@ -1515,17 +1505,20 @@ static inline struct lruvec *parent_lruvec(struct lruvec *lruvec) static inline void lruvec_unlock(struct lruvec *lruvec) { spin_unlock(&lruvec->lru_lock); + rcu_read_unlock(); } static inline void lruvec_unlock_irq(struct lruvec *lruvec) { spin_unlock_irq(&lruvec->lru_lock); + rcu_read_unlock(); } static inline void lruvec_unlock_irqrestore(struct lruvec *lruvec, unsigned long flags) { spin_unlock_irqrestore(&lruvec->lru_lock, flags); + rcu_read_unlock(); } /* Test requires a stable folio->memcg binding, see folio_memcg() */ diff --git a/mm/compaction.c b/mm/compaction.c index 4dce180f699b4..0d2a0e6239eb4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -518,6 +518,24 @@ static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } +static struct lruvec * +compact_folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags, + struct compact_control *cc) +{ + struct lruvec *lruvec; + + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); + compact_lock_irqsave(&lruvec->lru_lock, flags, cc); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + return lruvec; +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. The lock should be periodically unlocked to avoid @@ -839,7 +857,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, { pg_data_t *pgdat = cc->zone->zone_pgdat; unsigned long nr_scanned = 0, nr_isolated = 0; - struct lruvec *lruvec; + struct lruvec *lruvec = NULL; unsigned long flags = 0; struct lruvec *locked = NULL; struct folio *folio = NULL; @@ -1153,18 +1171,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, if (!folio_test_clear_lru(folio)) goto isolate_fail_put; - lruvec = folio_lruvec(folio); + if (locked) + lruvec = folio_lruvec(folio); /* If we already hold the lock, we can skip some rechecking */ - if (lruvec != locked) { + if (lruvec != locked || !locked) { if (locked) lruvec_unlock_irqrestore(locked, flags); - compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); + lruvec = compact_folio_lruvec_lock_irqsave(folio, &flags, cc); locked = lruvec; - lruvec_memcg_debug(lruvec, folio); - /* * Try get exclusive access under lock. If marked for * skip, the scan is aborted unless the current context diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4b3c7d4f346b5..7969dd93d858a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1184,23 +1184,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, } } -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg = folio_memcg(folio); - - if (!memcg) - VM_BUG_ON_FOLIO(!mem_cgroup_is_root(lruvec_memcg(lruvec)), folio); - else - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) != memcg, folio); -} -#endif - /** * folio_lruvec_lock - Lock the lruvec for a folio. * @folio: Pointer to the folio. @@ -1210,14 +1193,20 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) * - folio_test_lru false * - folio frozen (refcount of 0) * - * Return: The lruvec this folio is on with its lock held. + * Return: The lruvec this folio is on with its lock held and rcu read lock held. */ struct lruvec *folio_lruvec_lock(struct folio *folio) { - struct lruvec *lruvec = folio_lruvec(folio); + struct lruvec *lruvec; + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } return lruvec; } @@ -1232,14 +1221,20 @@ struct lruvec *folio_lruvec_lock(struct folio *folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { - struct lruvec *lruvec = folio_lruvec(folio); + struct lruvec *lruvec; + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } return lruvec; } @@ -1255,15 +1250,21 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) * - folio frozen (refcount of 0) * * Return: The lruvec this folio is on with its lock held and interrupts - * disabled. + * disabled and rcu read lock held. */ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags) { - struct lruvec *lruvec = folio_lruvec(folio); + struct lruvec *lruvec; + rcu_read_lock(); +retry: + lruvec = folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) != folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } return lruvec; } -- 2.20.1