From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE0F7C83F07 for ; Mon, 7 Jul 2025 09:30:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38C166B028D; Mon, 7 Jul 2025 05:30:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 363E96B03F9; Mon, 7 Jul 2025 05:30:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 279966B03FB; Mon, 7 Jul 2025 05:30:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0EDF96B028D for ; Mon, 7 Jul 2025 05:30:04 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AA93559001 for ; Mon, 7 Jul 2025 09:30:03 +0000 (UTC) X-FDA: 83636947086.15.AEFC8DF Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by imf08.hostedemail.com (Postfix) with ESMTP id CA21416000E for ; Mon, 7 Jul 2025 09:30:00 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=fYh6WWtk; spf=pass (imf08.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751880601; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YBopqN4/kQa+aOaqKN5s0U/OHv3IvGI0wkWm8ojU9sA=; b=p9Ahn5WcNFH7QCo1rIZCuMgEOL8wB57vGCGBCvugqn7gkVavBtXnSeErVJAVR/nuwrrgDd EOvPkERiyEGG7NKWCEklOCf872t4mpmkZjwTbu5YUyylOSy19ro+2uSXHV+R6bs5kZ+cTL UciOR/M+3L8ZeuDxQOCtpBSlVDWMtuw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=fYh6WWtk; spf=pass (imf08.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.128.41 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751880601; a=rsa-sha256; cv=none; b=1f4mZ+xMxxDnNpH0y2LZM3Smgg1Ob7iiHIebpPPhoIUAjf/SGFTh3NzYpdfCtow09uiXyF 6uGvnWBijZV1Wt5t51lBWBcLkClMpnGmn3CcNubcVCB4Fh+2jTov0S8OAk4yJhVootk4oM d0xi5HMJPK0fV6XiPwXCtZ+R5CzqfCw= Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-451d6ade159so24155315e9.1 for ; Mon, 07 Jul 2025 02:30:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1751880599; x=1752485399; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YBopqN4/kQa+aOaqKN5s0U/OHv3IvGI0wkWm8ojU9sA=; b=fYh6WWtk6oMFnBQGlCo4os7QE8FAuOYdr8y5VclyX9pcg746qrxrcf0lciSQh11CJ8 EyQJMRC1RtmEr+I1tk6u3sq0sdbpYez6riGdyI7ismDRggNqLE/OFXSy+jNAIZC3Lu4N rUlHTDsGFC7B9bdI7cm/csGmbrrJAj8zT98TFgNRoeQPCIB9f4wWb9b7S8Oi08ZFIbbO 0GDuwcC28DCgo8VArHV7looOtHG4klI+Bi0GJEY2XKhgU3Sh5XsvirKb0JNWwXZOrmPd vfL+QJZ++6MTG5OGHMac7GBu690AyJVvs31id2mY1IxGjYm0QIx02sAnM/1hM/TjtWLA kcwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751880599; x=1752485399; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YBopqN4/kQa+aOaqKN5s0U/OHv3IvGI0wkWm8ojU9sA=; b=CTTMDJbU5bJdbZU4K6BFkmzy6Y+1QUpZlrWMmGJUkrdCMP4y8LFwr/HjqTp+CFanO/ zGNvqkCCBJ4vCM26sULNSrVhtiuiDoNHtWi5MqEVhZFrmR6hGdAe8OWDVIM03oMYzEUx o0jdlMfGidjZ7wXuPaiecjyV5RyWadbBtmwWllaObap3eipy52qOJ83uRY3uIWNxhMvh /zk25GLkgkP0cgemj7Q7oZfbXBV2NyNE3egkeif3DqytL+4Prv6PqHafQtORqltFPnwG n8QgJygAnbpnoIQZdM79H+cbdthVTebuEJGmB2bn7ipx8mqeQXi2ZptKkWy9YX300SfX powg== X-Forwarded-Encrypted: i=1; AJvYcCXz6ayo8jPmUfBtnW8IxFi+mnroRUoXeh14JJPHwvUtfK/4F1Xy/HvDMdhHD19cmvVsMHdfwIca0A==@kvack.org X-Gm-Message-State: AOJu0Yxn+NBakgOwwUiQg6n4bz8QO79d8ILy2EXOnHesCQDHJlmkZzaR GopbNsmKQStYxdV80aFEQpoUhOsZSL/u/iJowa739IHKTsvr8kInEBkYy1hQdW0gYNFDHjSV6yf TmI7A3GWHjoFqKOi5GSGR1nOdIRQzU2rggY7DLw/01g== X-Gm-Gg: ASbGncsW8tcCIJaXXUjWOIbmnxFEa+MlzsC9UvpSwWN52oSAzaoIa6pnsrBsVm7w4Jt oLC5DmLK9dnT5sfYd2lYK2NUTRqR//PHUMIKi2jCKCqdx1CLoZDpN2HctD1MnNsUhaPz6Qeb0R1 IFc9Sc5EJODhFM2rl3Qm1kzEswU6WkHszR+SuF+IeKVAAI6A== X-Google-Smtp-Source: AGHT+IFYO17VdpJkXKyCPrctRgd0p1eg8twLrHrvA7MgQGyjyYonUinveltNkkukOIhcTx/2KGStT26+0oMi0Y4ix74= X-Received: by 2002:a05:6000:4909:b0:3a6:d6e4:dffd with SMTP id ffacd0b85a97d-3b497019165mr9286179f8f.14.1751880598925; Mon, 07 Jul 2025 02:29:58 -0700 (PDT) MIME-Version: 1.0 References: <20250415024532.26632-1-songmuchun@bytedance.com> <20250415024532.26632-27-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Mon, 7 Jul 2025 17:29:22 +0800 X-Gm-Features: Ac12FXwHR19F3-OQz6JNsz_IWrXQcKv3piDCzJFFhHLteEbVb23UBkLndGuMBtA Message-ID: Subject: Re: [External] Re: [PATCH RFC 26/28] mm: memcontrol: introduce memcg_reparent_ops To: Harry Yoo Cc: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 8hwwnn7no1n8pswtghf3dtw8ynueh475 X-Rspamd-Queue-Id: CA21416000E X-Rspamd-Server: rspam11 X-Rspam-User: X-HE-Tag: 1751880600-508264 X-HE-Meta: U2FsdGVkX1/5sf7Rm5Xv/WeShoy+hXrrt0yXQ5XwTwBXG0P38DEExYOBEb4bwDYxI+w+teKiw/wNjJrsuxbmfVZ5xmZyDt1svabYFyj0xUxttXyS167IQF2STYJqOljzWbKMVtKiBw55NmJPXflZsXUWRGbCy493Oirl8Np7RTd87XlFLK8nIInQD+P/gt7GspdjhKV36cRr24/+DrzjqbwFwo86U4PgLiz8fHN96gat+yny4a0VTrrB8a86C/42C34tm95NMoSnx/XkHRQwvR7kpCZpu/wNTXv1tRyRKFqJKyUvoiXYgx2KMZIVbnj3kTuKDEPi1xH1TUWDqObtgM6ntUPjY+SJuvQUw2eTLqBKbKp6zfue/mTPL6NdLNN2y5DB7fBZ2Tgn/k5GYTv5aajHz+zXCKF1dCpMRs/hglw4geL9CSt6xbkCwJooRqwR5Gfv1EUlhnSVHllRy0DMdsGCnjR7qoKu5Gsx731nlczSRvqt9wXbsvyPkuR32KJZjIm6AMnUZgGIAjxmk118659mQ2ByRwxMeJPVswEelaWsX7vXn1/6SKmfCQ6dyJ6cBfQp3cdjnvrxq19KoQNbRQlc2+6Ywd/WN74lD0zLuh8r7ZRDmvoK4+CtpQA9qI39kk04MBfyda32bffhzUtJqrJ/WdUh9qfzlH1niHopzQ8fLklp5fxjB5XPSdovWqH7FI9M5Mc0lqches039trZ9HD6yEEsO5Pz7Sa3KgdddfxAlCAcr1TTjJaQ6tkvpds40v0opZAoNlreRw1PBJQmGrHuFuOL+dImi2qyUDY7GIv+/HVKZOiicwWS3FPjX46ZwzBfjsukDTyTCOz7d/8XcxKIvwJCuS5I4csXzT10+2IMGLJLyKQrrSd2iZXq/VL11kj1m5h3ePUResq6Iu/0anPnw2HHvaKey6CKhkmpU0qdtJAJAqgd1hGjT5LFHsPw0oLuMhc9jmE6Ogf0vIR ygAUVZ35 xRJg2wS83gzHGYc0cL9lBt1m/X6fog7i0MIbQ8zcdRvKFb2pRmmI4YmLjL6QPYCIMONpt8aEz4ZfM+gF057WdkOTMOGoU+lZoDzDovbQrz8Y68tEFKDN+qIbdM93XMDFnQKoE03pZ3rpvuiy9uwjQ7BrvMnr6LCaajw+jLLFl9FXVTk/8ZG8mKYQjfJCe5eZ9czCAC/LK/phqrjZH5aTMusakg2sxR7BCDm/VDwYnv94ZY0GmTTz+CJNiZn8sFBL4oeJzxOVzV0Y896E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 2, 2025 at 6:13=E2=80=AFAM Harry Yoo wro= te: > > On Mon, Jun 30, 2025 at 09:47:25PM +0900, Harry Yoo wrote: > > On Tue, Apr 15, 2025 at 10:45:30AM +0800, Muchun Song wrote: > > > In the previous patch, we established a method to ensure the safety o= f the > > > lruvec lock and the split queue lock during the reparenting of LRU fo= lios. > > > The process involves the following steps: > > > > > > memcg_reparent_objcgs(memcg) > > > 1) lock > > > // lruvec belongs to memcg and lruvec_parent belongs to paren= t memcg. > > > spin_lock(&lruvec->lru_lock); > > > spin_lock(&lruvec_parent->lru_lock); > > > > > > 2) relocate from current memcg to its parent > > > // Move all the pages from the lruvec list to the parent lruv= ec list. > > > > > > 3) unlock > > > spin_unlock(&lruvec_parent->lru_lock); > > > spin_unlock(&lruvec->lru_lock); > > > > > > In addition to the folio lruvec lock, the deferred split queue lock > > > (specific to THP) also requires a similar approach. Therefore, we abs= tract > > > the three essential steps from the memcg_reparent_objcgs() function. > > > > > > memcg_reparent_objcgs(memcg) > > > 1) lock > > > memcg_reparent_ops->lock(memcg, parent); > > > > > > 2) relocate > > > memcg_reparent_ops->relocate(memcg, reparent); > > > > > > 3) unlock > > > memcg_reparent_ops->unlock(memcg, reparent); > > > > > > Currently, two distinct locks (such as the lruvec lock and the deferr= ed > > > split queue lock) need to utilize this infrastructure. In the subsequ= ent > > > patch, we will employ these APIs to ensure the safety of these locks > > > during the reparenting of LRU folios. > > > > > > Signed-off-by: Muchun Song > > > --- > > > include/linux/memcontrol.h | 20 ++++++++++++ > > > mm/memcontrol.c | 62 ++++++++++++++++++++++++++++++------= -- > > > 2 files changed, 69 insertions(+), 13 deletions(-) > > > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > > index 27b23e464229..0e450623f8fa 100644 > > > --- a/include/linux/memcontrol.h > > > +++ b/include/linux/memcontrol.h > > > @@ -311,6 +311,26 @@ struct mem_cgroup { > > > struct mem_cgroup_per_node *nodeinfo[]; > > > }; > > > > > > +struct memcg_reparent_ops { > > > + /* > > > + * Note that interrupt is disabled before calling those callbacks= , > > > + * so the interrupt should remain disabled when leaving those cal= lbacks. > > > + */ > > > + void (*lock)(struct mem_cgroup *src, struct mem_cgroup *dst); > > > + void (*relocate)(struct mem_cgroup *src, struct mem_cgroup *dst); > > > + void (*unlock)(struct mem_cgroup *src, struct mem_cgroup *dst); > > > +}; > > > + > > > +#define DEFINE_MEMCG_REPARENT_OPS(name) = \ > > > + const struct memcg_reparent_ops memcg_##name##_reparent_ops =3D {= \ > > > + .lock =3D name##_reparent_lock, = \ > > > + .relocate =3D name##_reparent_relocate, = \ > > > + .unlock =3D name##_reparent_unlock, = \ > > > + } > > > + > > > +#define DECLARE_MEMCG_REPARENT_OPS(name) \ > > > + extern const struct memcg_reparent_ops memcg_##name##_reparent_op= s > > > + > > > /* > > > * size of first charge trial. > > > * TODO: maybe necessary to use big numbers in big irons or dynamic = based of the > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 1f0c6e7b69cc..3fac51179186 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -194,24 +194,60 @@ static struct obj_cgroup *obj_cgroup_alloc(void= ) > > > return objcg; > > > } > > > > > > -static void memcg_reparent_objcgs(struct mem_cgroup *memcg) > > > +static void objcg_reparent_lock(struct mem_cgroup *src, struct mem_c= group *dst) > > > +{ > > > + spin_lock(&objcg_lock); > > > +} > > > + > > > +static void objcg_reparent_relocate(struct mem_cgroup *src, struct m= em_cgroup *dst) > > > { > > > struct obj_cgroup *objcg, *iter; > > > - struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); > > > > > > - objcg =3D rcu_replace_pointer(memcg->objcg, NULL, true); > > > + objcg =3D rcu_replace_pointer(src->objcg, NULL, true); > > > + /* 1) Ready to reparent active objcg. */ > > > + list_add(&objcg->list, &src->objcg_list); > > > + /* 2) Reparent active objcg and already reparented objcgs to dst.= */ > > > + list_for_each_entry(iter, &src->objcg_list, list) > > > + WRITE_ONCE(iter->memcg, dst); > > > + /* 3) Move already reparented objcgs to the dst's list */ > > > + list_splice(&src->objcg_list, &dst->objcg_list); > > > +} > > > > > > - spin_lock_irq(&objcg_lock); > > > +static void objcg_reparent_unlock(struct mem_cgroup *src, struct mem= _cgroup *dst) > > > +{ > > > + spin_unlock(&objcg_lock); > > > +} > > > > > > - /* 1) Ready to reparent active objcg. */ > > > - list_add(&objcg->list, &memcg->objcg_list); > > > - /* 2) Reparent active objcg and already reparented objcgs to pare= nt. */ > > > - list_for_each_entry(iter, &memcg->objcg_list, list) > > > - WRITE_ONCE(iter->memcg, parent); > > > - /* 3) Move already reparented objcgs to the parent's list */ > > > - list_splice(&memcg->objcg_list, &parent->objcg_list); > > > - > > > - spin_unlock_irq(&objcg_lock); > > > +static DEFINE_MEMCG_REPARENT_OPS(objcg); > > > + > > > +static const struct memcg_reparent_ops *memcg_reparent_ops[] =3D { > > > + &memcg_objcg_reparent_ops, > > > +}; > > > + > > > +#define DEFINE_MEMCG_REPARENT_FUNC(phase) \ > > > + static void memcg_reparent_##phase(struct mem_cgroup *src, \ > > > + struct mem_cgroup *dst) \ > > > + { \ > > > + int i; \ > > > + \ > > > + for (i =3D 0; i < ARRAY_SIZE(memcg_reparent_ops); i++) = \ > > > + memcg_reparent_ops[i]->phase(src, dst); \ > > > + } > > > + > > > +DEFINE_MEMCG_REPARENT_FUNC(lock) > > > +DEFINE_MEMCG_REPARENT_FUNC(relocate) > > > +DEFINE_MEMCG_REPARENT_FUNC(unlock) > > > + > > > +static void memcg_reparent_objcgs(struct mem_cgroup *src) > > > +{ > > > + struct mem_cgroup *dst =3D parent_mem_cgroup(src); > > > + struct obj_cgroup *objcg =3D rcu_dereference_protected(src->objcg= , true); > > > + > > > + local_irq_disable(); > > > + memcg_reparent_lock(src, dst); > > > + memcg_reparent_relocate(src, dst); > > > + memcg_reparent_unlock(src, dst); > > > + local_irq_enable(); > > > > Hi, > > > > It seems unnecessarily complicated to 1) acquire objcg, lruvec and > > thp_sq locks, 2) call their ->relocate() callbacks, and > > 3) release those locks. > > > > Why not simply do the following instead? > > > > for (i =3D 0; i < ARRAY_SIZE(memcg_reparent_ops); i++) { > > local_irq_disable(); > > memcg_reparent_ops[i]->lock(src, dst); > > memcg_reparent_ops[i]->relocate(src, dst); > > memcg_reparent_ops[i]->unlock(src, dst); > > local_irq_enable(); > > } > > > > As there is no actual lock dependency between the three. > > > > Or am I missing something important about the locking requirements? > > Hmm... looks like I was missing some important requirements! > > It seems like: > > 1) objcg should be reparented under lruvec locks, otherwise > users can observe folio_memcg(folio) !=3D lruvec_memcg(lruvec) > > 2) Similarly, lruvec_reparent_relocate() should reparent all folios > at once under lruvec locks, otherwise users can observe > folio_memcg(folio) !=3D lruvec_memcg(lruvec) for some folios. > > IoW, lruvec_reparent_relocate() cannot do something like this: > while (lruvec is not empty) { > move some pages; > unlock lruvec locks; > cond_resched(); > lock lruvec locks; > } > > Failing to satisfy 1) and 2) means user can't rely on a stable binding > between a folio and a memcg, which is a no-go. > > Also, 2) makes it quite undesirable to iterate over folios and move each > one to the right generation in MGLRU as this will certainly introduce > soft lockups as the memcg size grows... Sorry for the late reply. Yes, you are right. We should iterate each folio without holding any spinlocks. So I am confirming if we could do some preparation work like making it suitable for reparenting without holding an= y spinlock. Then we could reparent those folios like the non-MGLRU case. Thanks. > > Is my reasoning correct? > If so, adding a brief comment about 1 and 2 wouldn't hurt ;) OK. > > -- > Cheers, > Harry / Hyeonggon