From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C26ABCAC5A5 for ; Thu, 25 Sep 2025 06:29:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01B3B8E0007; Thu, 25 Sep 2025 02:29:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F345B8E0001; Thu, 25 Sep 2025 02:29:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E71DC8E0007; Thu, 25 Sep 2025 02:29:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D454C8E0001 for ; Thu, 25 Sep 2025 02:29:49 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 497251A021B for ; Thu, 25 Sep 2025 06:29:49 +0000 (UTC) X-FDA: 83926796898.22.C796943 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf21.hostedemail.com (Postfix) with ESMTP id 6CE501C0006 for ; Thu, 25 Sep 2025 06:29:46 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=JuorYGwQ; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1758781787; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8V0lwTpuXsxcfHaxZsvIL+kGsolkwkY6GPwdx3nz5yU=; b=IINQLBvoM4EtDCEOyYg0ewqoTe4Z5BFDPTNkgNCLg/KkTeLYwwnDimCKZI933tAf2FrAOd O614D0VEhmE5k8TQoQxmuoIzq0h+fuUZ13ENsoa3jGi10KphUTkmdgdS1qRhIkJqMBQZgh Ci+cCg2N4RV9AVVPj1nGkjUqnm05RR0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=JuorYGwQ; spf=pass (imf21.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1758781787; a=rsa-sha256; cv=none; b=i+TfH9wr6Ygvol+mwzWr9e8PVewxdZ+JpST3MS45uMaldohB86TJW40vLrLCwFg65+vp5Z yJ/AEqSN2zfsA7WVdCq7gfOnZKYmQUmZDdmHtEly/uZCOseDsztFfAzpunedUtnAK6w82U c+KpKAP81y/rRAZS8Kee1n9iR0lDZwo= Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-77f169d8153so705094b3a.3 for ; Wed, 24 Sep 2025 23:29:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1758781785; x=1759386585; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=8V0lwTpuXsxcfHaxZsvIL+kGsolkwkY6GPwdx3nz5yU=; b=JuorYGwQrI2xJpceV+YQ4rHVbEWLr4fDPUuD4MTl4b7Cp1fWS0w0yhRw7g0jDfKbmC LLVmNvSDouUjx8JjLb2HX3W2B0cmjcalZ8kAp2m+hCiultuy/znzx32d5mPr+Uh8WSTI CpSe3h63qAQlYEjKLeGMKJHBTf8dP0DDJkE1UVQLI713cxAYZwKJa0PJHo2+ApYFPPUE v3yfl4divBaB7Q+xVN9rX7Npo2BkQhbwT/stMaDT0TsUuW2MffHzZ3bN3QW80gHXlxQ4 VuOzc/4N1Hsb0zqfbznBT6f1nKsSLLU0kR0SLn1GTyRuPqLqtHuyc5ltfGGNJN/KmLJq LC9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758781785; x=1759386585; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=8V0lwTpuXsxcfHaxZsvIL+kGsolkwkY6GPwdx3nz5yU=; b=qPeD5OMXVzK5biP2IcRFNM+XI4uCpl9czszmPvIHDlJJPzFaTVwFIpS2Cz4076xUKU UVpCERl6KY4MwqA4aEH8l7pXkYeG67Glvb1o3X6VAC1KpFe6MwDYV8bOBBzvRcKuCVv0 5ylswGCGjL+JHHfzhAgUHtGp2ucfvE5jzSk5QIRYQFjAXxYc03HX9a5Th0JwxEsxGvP6 HBVohEhF2YAhXD6TIB5RbXTgZTC4x4KNa2OQ566yaWrUF47qjbVEDt26WW+lQh2y1Xb/ BZ+K25Ht6V7RRk9u/pfRpRd1daVeu3SssmK2E5FlMu5tSkrfaurHr5idH/ooHk0KCP59 N5Yw== X-Forwarded-Encrypted: i=1; AJvYcCUZG09YB5NEtSnIJcX/pji76/Zj5lsuY9gKQmyKrSTuQQhnowRI84aQMCQoWcmOpo3lKUWPMbKs9Q==@kvack.org X-Gm-Message-State: AOJu0Yw7BzPG0Zsj5MJUXxuBSjOJHmGYWg/H3R32dUxBS+JHXoXKBMHG J3PIDh9EcN5Nv02c4qTV07koyKIUZN6FI9l1CQl2kendBbW0DYmEiXCTDpwrmZnpth4= X-Gm-Gg: ASbGncua5unbBq0uaveU2vQlZlmtt+e5NeI0m2SzCL3ApO5jSalqXVEmkp02MW9HIbc qQxnm0YsawuSCH/PsZQ5ddnyvYWaF4FGUQ8zx+MJmzwPKOkVkh8D++uXtibKk5V3a6cJb9QqILr RlLsGm72tyoK8JUAZtMOAhOJHh/ct3cUZgn7Ad/4rl+hg0/aadqKs3pCMvjm/q8HC7BjYjIVTA+ /RB6V80bQchkzovLC6DKo03zhsAhfJmb7CW+mdhbRBzVGbvEeC8qdVR20eH4JnuQ/u974FKOWeJ 5BjYQRiZaqxfqy+kpxPui/u4j7TQjSTF0GasuLtfBLL92S3BJ3Nd1QFEjLF2UhUMGwHS6TnQMjk GBb5CKyZGFbLV1AdYBijxhNIl/pltuO1Rp76Oqh3KNUDDehtC+FS3s8UU8g== X-Google-Smtp-Source: AGHT+IG1LqAYu9jhNBG55xyv+UGKIWKzgZ4CKtKvlXs/Xd5vgIieh3XV27iBoLmTtXwYy9tOxqb0JQ== X-Received: by 2002:a05:6a20:3ca2:b0:2d5:e559:d23a with SMTP id adf61e73a8af0-2e7d3db6116mr3182046637.55.1758781784872; Wed, 24 Sep 2025 23:29:44 -0700 (PDT) Received: from [100.82.90.25] ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-78102b2ec47sm961356b3a.50.2025.09.24.23.29.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 24 Sep 2025 23:29:44 -0700 (PDT) Message-ID: <77114896-5a7a-413d-afa1-7d0a17312c99@bytedance.com> Date: Thu, 25 Sep 2025 14:29:32 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 4/4] mm: thp: reparent the split queue during memcg offline To: Harry Yoo Cc: hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, david@redhat.com, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org References: <55370bda7b2df617033ac12116c1712144bb7591.1758618527.git.zhengqi.arch@bytedance.com> From: Qi Zheng In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6CE501C0006 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: gnkugx6tjmgqqpyxxygou6qc8ri78wgn X-HE-Tag: 1758781786-915709 X-HE-Meta: U2FsdGVkX19UAJymYA3bQYZYOBQAB3MHFWY2JKetJkwtUTv2dePehASv6u/GQvKgUb2t3Vs1pG/VOX3IvK7swPEjemdd7dA9CzJkYXo1kFIM8QTyok0p9BuLuoPZPuggUI9wSkXm+jchaTn3C72crky2i09lR117T28zbAFzNm4iffYMVd9fkH+Srl78490z4QSqguJUeuPdKSkuVlNSSnD3o+/YJ7ImKoDDDZcCW+SnlNTrqpuIVGaEiHMlIgaZsJj8fW1bzTiforPrAxZcXFzJNqbAhP6U1n18/vysB84DN8yI68Yv/3SZiMuBJn17fArN5Fd9bO9xJ8M+gy/7VGWFst6X8RWTKnAnODUHB7HFLMPFXXeoWoDLEcB/BPN/d8tGsOzj2OU2gD14o5jypgKUu14cBqgskFAoq8mMS3lTp7ea6kbKIXDS8ZGYq+v4tynV9zRB7LRo/0HIeY3404tLZCgpWdSb5VrI2oAbmNQ1IHG+YGOYmqGyvwj4U197UQfj1OtehWsAGsDXxl1Ev2d1+mPtRxRgm8OmeZCvppBGRNd4H3yMXXujM7JSqhpLI2Sv2lvh3HwupsBOK89q9CiwDyQzItk8CAe4ZRbPprzALp5+msuSTE4gLL+Tq6coedqiwQChyZbwN5DubILeVgc9VD9561rJ29iwj/FSgucaj3yvVoka0UetLJlFXNWghrL6pfQPRSliKg5rmhsccM2l+mCA6v8P6pfDBphT7Dqhb0qnwhBOMS6LWLEG5Y3l329J3vOiO0jFbvYJlckCP6OzFXFTbrlCaRxAyZdN3dDTafyQlNEwo2aa2E0nyLB60lkVqP0P488JgOf0gg3fHUnTDkwwhrRGjBOvOtN8fAyACE70iFq6b/VcWtBda8SiGFojGQhCMrCf/T2Rc+Znxe2GmjSTzGcJSQ0cOL3uCiBv9vN0sLQasyxWHCgfqZyHRqmxrpT9zJ59qZ0ZS22 d+rHWUv+ AGpbDB0ZRAx0sWeDEAwU/cb4Ubbhsiuoxk3ViMuxXV+MPEo63c8B+Ge+t3DhN1eODoHKTKp5zYS2afM8yD8YjsHF84OK7GicXowYInt89c3dtOAdqvi7Lbciq6dtc8AhXONE3Mekk6G0e7Cv8g+weA6Z3JW5lev/YEnTz5nsRp/QiTwoYVsKLZqfDYtiilDWCoWR959b22aNhhSpwDruMtyLdrEErUlk98wY93Q2rGHQVVC5ZyC/fPH6Rt0QqiukB+KgX3nfDSg5pslm7QSY9GeOfW9lC4nOItgGNijwaLtkHGsCCqJEotN0se+2HuVcY2Xkqtm52ouHTCdm+W5uwwYSX9ik7EQuObNDCuz/m7npX1EyGTONY479yu4axnT4b/TRgwIznworYYr0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Harry, On 9/24/25 10:22 PM, Harry Yoo wrote: > On Tue, Sep 23, 2025 at 05:16:25PM +0800, Qi Zheng wrote: >> In the future, we will reparent LRU folios during memcg offline to >> eliminate dying memory cgroups, which requires reparenting the split queue >> to its parent. >> >> Similar to list_lru, the split queue is relatively independent and does >> not need to be reparented along with objcg and LRU folios (holding >> objcg lock and lru lock). So let's apply the same mechanism as list_lru >> to reparent the split queue separately when memcg is offine. >> >> Signed-off-by: Qi Zheng >> --- >> include/linux/huge_mm.h | 2 ++ >> include/linux/mmzone.h | 1 + >> mm/huge_memory.c | 39 +++++++++++++++++++++++++++++++++++++++ >> mm/memcontrol.c | 1 + >> mm/mm_init.c | 1 + >> 5 files changed, 44 insertions(+) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index f327d62fc9852..a0d4b751974d2 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -417,6 +417,7 @@ static inline int split_huge_page(struct page *page) >> return split_huge_page_to_list_to_order(page, NULL, ret); >> } >> void deferred_split_folio(struct folio *folio, bool partially_mapped); >> +void reparent_deferred_split_queue(struct mem_cgroup *memcg); >> >> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, >> unsigned long address, bool freeze); >> @@ -611,6 +612,7 @@ static inline int try_folio_split(struct folio *folio, struct page *page, >> } >> >> static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {} >> +static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg) {} >> #define split_huge_pmd(__vma, __pmd, __address) \ >> do { } while (0) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 7fb7331c57250..f3eb81fee056a 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1346,6 +1346,7 @@ struct deferred_split { >> spinlock_t split_queue_lock; >> struct list_head split_queue; >> unsigned long split_queue_len; >> + bool is_dying; >> }; >> #endif > > The scheme in Muchun's version was: > > retry: > queue = folio_split_queue(folio); > spin_lock(&queue->split_queue_lock); > if (folio_memcg(folio) != folio_split_queue_memcg(folio, queue)) { > /* split queue was reparented, retry */ > spin_unlock(&queue->split_queue_lock); > goto retry; > } > /* now we have a stable mapping between the folio and the split queue */ > spin_unlock(&queue->split_queue_lock); > > Oh, I see. We can't use this scheme yet because we don't reparent LRU > folios. (I was wondering why we're adding is_dying property) Right. And reparenting THP split queue independently can avoid the following situations: ``` acquire child and parent split_queue_lock acquire child and parent objcg_lock acquire child and parent lru lock reparent THP split queue reparent objcg reparent LRU folios release child and parent lru lock release child and parent objcg_lock release child and parent split_queue_lock ``` > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 48b51e6230a67..de7806f759cba 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -1094,9 +1094,15 @@ static struct deferred_split *folio_split_queue_lock(struct folio *folio) >> struct deferred_split *queue; > > > For now it's safe to not call rcu_read_lock() here because memcgs won't > disappear under us as long as there are folios to split (we don't reparent > LRU folios), right? Right. We will add rcu_read_lock() when reparenting LRU folios. Thanks, Qi > >> memcg = folio_memcg(folio); >> +retry: >> queue = memcg ? &memcg->deferred_split_queue : >> &NODE_DATA(folio_nid(folio))->deferred_split_queue; >> spin_lock(&queue->split_queue_lock); >> + if (unlikely(queue->is_dying == true)) { >> + spin_unlock(&queue->split_queue_lock); >> + memcg = parent_mem_cgroup(memcg); >> + goto retry; >> + } >> return queue; >> } >