From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2541C369CA for ; Thu, 17 Apr 2025 18:22:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C65DE6B02B4; Thu, 17 Apr 2025 14:22:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF0EB6B02B5; Thu, 17 Apr 2025 14:22:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8DBB6B02B6; Thu, 17 Apr 2025 14:22:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8317C6B02B4 for ; Thu, 17 Apr 2025 14:22:34 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CE7A5802B1 for ; Thu, 17 Apr 2025 18:22:34 +0000 (UTC) X-FDA: 83344356228.13.D98C720 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf10.hostedemail.com (Postfix) with ESMTP id C67C9C0008 for ; Thu, 17 Apr 2025 18:22:32 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iEI9Dwxj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744914153; a=rsa-sha256; cv=none; b=JH7lHWcgaaVK7bK3twr7c+yQkIRZmJq1DLMK4jrQaVl/n7v+C1H2+psjQELG101dSwnT2i T/whwLbDRoeC4clqhN2Mkvr6DgbVt3ga0s+rErQWCGuDD93vGGAwWg96eBE+j+KHACwINV OmXREcdGrBEuidIR0eOJPmWHTuG1tKQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iEI9Dwxj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744914153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZOAeBJO/WS4vfKtVVVEGHYmVPUUqppq9u6R1ATTPTqw=; b=DJAPu1bGugyJZGNY4/z73fHTolQpY3KpNlw/1xwwyAZrVqdeBr5SJnh03frDHxkN6BQNDl 35S269+yfqS6PsFe1x1PrFDUfoSyW/pg/PtM+JiE6rZfqTf9VY4QaOJE9DwSHXXVphctoA dD/tGj4OJ3s6ZWlUKZRmjl0k0clc47g= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-30effbfaf61so12748731fa.0 for ; Thu, 17 Apr 2025 11:22:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744914151; x=1745518951; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZOAeBJO/WS4vfKtVVVEGHYmVPUUqppq9u6R1ATTPTqw=; b=iEI9DwxjpRPmgQdV24Nch4h5654t8tcp3bYuKKpk3LOUZxCu+KhHJ3QCijIZXd6R0C iRJfvn94FZ7FULFLFRGPFkm365JUzaYkBiYfnpsITfVbE5JIDNaFmJIN+43k41kcmYb0 npDjp5/s99mx/s9guEiDUDXgm0jllKUC36fJYXgZGo7U8cqqFHR3R5OdMnB5n2EBWnz0 VAWIybwvxCRwCfyeXrsteVvsBMvDd43VmOKNosSOiKo54CjZbakzFrelAhR7+430vQd7 DykVcFVr0B0Ni9wjjmnhi3Lrw1zXqsTq6KnlxATqF0fymE+HjnTabhCdTPy7/zjT15Pw 16gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744914151; x=1745518951; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZOAeBJO/WS4vfKtVVVEGHYmVPUUqppq9u6R1ATTPTqw=; b=mQfVVmdmKDGbklwo0l0BUsuMZFpeK1fO64TbOuokE05F1t7N2OC3bzKBA9KkhCJ1hC faGJBRc7EHhjFB3HmMeo7oVrainw9dq2nW5ryehsxAEp5a8wRg+eXsN66T9yYWvlvE41 lAA5TVhM8qiOdi3mw5z3tJoztWJ9k0JnoQZXzg5EfF9WjrL2THoZGrGDToQekvPurzGf y6lE/1EwfskpxrHBT1B3MJeD+waCPpCcMx3j3BwtRot4qKYXBBIntCqRhA5bNjvnsb3z pEy33bzgPKOTHiWBdDqU8XL9Qz903YxKnv3J4On8DQgbZdzzlW7Etqpj3mZIo/9PveUW wZ7A== X-Forwarded-Encrypted: i=1; AJvYcCUg6BFAsuUIXqGMQnKfmy7zCLQKSWtzbmkyLREX60oxD/U29fUpAv0kN/x81bAoNmBJ8C3r06iz2g==@kvack.org X-Gm-Message-State: AOJu0YwnijKp+ptyrAbLa/SKemEs1KYkM6kewHClobDePOiaRrowj1IB 2VKBez9RiwousmdK8WgMVsIssFopoSbxJBgusKkt7dfUa8oJanB0B5ZBjf6yV/Cyskp6MGrIGAg P+3STM80V3Uf7ifYcQarrGrV/fC4= X-Gm-Gg: ASbGncs5cFB34NciwK1ce8f67EBLEVFBkg/jxnCYQOlfvSQGnbCulSS+hx7Kz7CYT/y t4fsZAKZOBuJzAX8ejFq9LYIreNAl/tcmUo/az+dMRRA6pXtaPVAJvdhfFiunkBeWqdar5SH4XI X1SW0wnHKBbBebJE3BTlkFmQ== X-Google-Smtp-Source: AGHT+IGNxJetQAdmZxPsKt7mIdqzM/Ejc5KV3uSEsScaLixz2qKOvSsv+1pKp5ZTrElO3eaLMLifvctqXSLyFrsJPz0= X-Received: by 2002:a2e:300b:0:b0:30d:6a77:c498 with SMTP id 38308e7fff4ca-3108eeee73bmr3853981fa.4.1744914150409; Thu, 17 Apr 2025 11:22:30 -0700 (PDT) MIME-Version: 1.0 References: <20250415024532.26632-1-songmuchun@bytedance.com> In-Reply-To: From: Kairui Song Date: Fri, 18 Apr 2025 02:22:12 +0800 X-Gm-Features: ATxdqUHcG8d_aiyS2th12G_MAUsmvhqwaj6344WYWF2XAVN1AGndbtg__x_BSUk Message-ID: Subject: Re: [PATCH RFC 00/28] Eliminate Dying Memory Cgroup To: Muchun Song Cc: Muchun Song , hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C67C9C0008 X-Stat-Signature: fsorz14bst3jskrpudqmx796n3s7shj6 X-HE-Tag: 1744914152-410700 X-HE-Meta: U2FsdGVkX1/+ACpJkM6ODrXF6lM91nFtxUKMvnX8OtFLeUC2b7rCUij37+2ZQFyzsfYu/ROeejmLZBnfOE/otHi1MAb9pvBhDeudP49UGyVomJGpZclHgTMnpT1b0ElPGDkJI8ocz8m1xslVqA+Rk16W6y4u98dlHBQ5L5HSYymfxJU5IqVVp6FZbKDkblO9+k94jpCI4JA9aMP4HCi0/wxXTWzCdPuogg/mamGlUvTmiDbG7WuOmuzN1OShIumEWi4VmOhQn7AQnnCJ4M02W1NIM+BnQSYRZDYIvFNDOhOolTJ/Ak4qGun/Hf/NlvXjIjZFhmMgkIYEBrvNJsYrF8exaYcbtRs6S3bIGq0dFgDaJANGCClhdLmSf6/lFjUQ5R7xw9vbfDkML0+IjSWAVyGNZMKoz+CEbHhDzsHXiG1BqI1att9Lice4+0FddnGc8XLiXH8JcAmI/JbAf1KHNlfXqQSrQhoV+n4GIHcXUb+aVG9T5YgIwRO5Cu3whi/Um00pblhximPPgSE1v3Mm1TLevFG+hn+IGh6ZWP7/8Nz5LV1eELOfh/Md+1bkLr6IfOZaI73SqSv13O9pwpMSOvTbxc1TPx8w9SEXL/5RCrwqTCO6velXIx0TdIzHLrUsVdRr49gLqcEPLKwBN1t0Qm1/v7ziiDBg+HLsHktYRTq4ZopPxL/eolVvSe9EY9vNXVAWVi8sraZCPUg+zFYP6943fmcRff4SdC5+CPU8soYPd3qS6F7cPKm3fs40i21JHNT77MrOWilW9pMZbH1A7h06TR4Le5KLHOKFw3RHJ78jfLNqzRsZuT0YLQPghYuKWATQqTXkT0geOkdvhGwVEsmdxS+SfD5XagLmahfQ2d1TgjTNq2QZ1mppDO5OnVhH9K+Y4Av+Kec8rEfN1+65wXxUjTlXoWeltqh7Kx5xteB1w6WjMh97USD0lcytyPriq/US5iDisHrKHQHuD1u OfX3F2nh HgHWIx0zA1Al+HGaX8xIOQqGcMNpOyXlU+oAk/kl0PzRwgjZhNE76Qm3HpsIHDCRWpBzbkJknbOOzZ9KJxr3NoOqijJFTpfRtAdjrlaJ0mawDBuXSBMDyyIw7ekCTLQ1/jB5xLGRXDuMCe+dxpnSRHdniCbN+vor/ajBMa0ZxELQ8edXZwvf/X7dWwmqujypblhJExcZYYcggu/CQrux76zQRtKmpzQI0YbFVrOfsOZw7KInAMpk2IePl08xiVlKReMlztjw/chO//ObzCRseAhXInTcB4rG9dpjHhCSiNZ4Y/rVAk29O+8Bu+HS8VmQJIlkXwzcraFTSNhOjqVgzQMcjhqNVs0aFgNWdYu1r+ZkqIi9F+60AOGYY3NNoX8Sfv+925M6bfxo7dpRuOtRl+cUxXsDO5FWU+IHzODvM0BUhrNbxhvL5AgVw9+hDR3ltyg1o/vB5Tktq0uQzF6TC4XQHMs207G3WDVfM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 15, 2025 at 4:02=E2=80=AFPM Muchun Song = wrote: > > > > > On Apr 15, 2025, at 14:19, Kairui Song wrote: > > > > On Tue, Apr 15, 2025 at 10:46=E2=80=AFAM Muchun Song wrote: > >> > >> This patchset is based on v6.15-rc2. It functions correctly only when > >> CONFIG_LRU_GEN (Multi-Gen LRU) is disabled. Several issues were encoun= tered > >> during rebasing onto the latest code. For more details and assistance,= refer > >> to the "Challenges" section. This is the reason for adding the RFC tag= . > >> > >> ## Introduction > >> > >> This patchset is intended to transfer the LRU pages to the object cgro= up > >> without holding a reference to the original memory cgroup in order to > >> address the issue of the dying memory cgroup. A consensus has already = been > >> reached regarding this approach recently [1]. > >> > >> ## Background > >> > >> The issue of a dying memory cgroup refers to a situation where a memor= y > >> cgroup is no longer being used by users, but memory (the metadata > >> associated with memory cgroups) remains allocated to it. This situatio= n > >> may potentially result in memory leaks or inefficiencies in memory > >> reclamation and has persisted as an issue for several years. Any memor= y > >> allocation that endures longer than the lifespan (from the users' > >> perspective) of a memory cgroup can lead to the issue of dying memory > >> cgroup. We have exerted greater efforts to tackle this problem by > >> introducing the infrastructure of object cgroup [2]. > >> > >> Presently, numerous types of objects (slab objects, non-slab kernel > >> allocations, per-CPU objects) are charged to the object cgroup without > >> holding a reference to the original memory cgroup. The final allocatio= ns > >> for LRU pages (anonymous pages and file pages) are charged at allocati= on > >> time and continues to hold a reference to the original memory cgroup > >> until reclaimed. > >> > >> File pages are more complex than anonymous pages as they can be shared > >> among different memory cgroups and may persist beyond the lifespan of > >> the memory cgroup. The long-term pinning of file pages to memory cgrou= ps > >> is a widespread issue that causes recurring problems in practical > >> scenarios [3]. File pages remain unreclaimed for extended periods. > >> Additionally, they are accessed by successive instances (second, third= , > >> fourth, etc.) of the same job, which is restarted into a new cgroup ea= ch > >> time. As a result, unreclaimable dying memory cgroups accumulate, > >> leading to memory wastage and significantly reducing the efficiency > >> of page reclamation. > >> > >> ## Fundamentals > >> > >> A folio will no longer pin its corresponding memory cgroup. It is nece= ssary > >> to ensure that the memory cgroup or the lruvec associated with the mem= ory > >> cgroup is not released when a user obtains a pointer to the memory cgr= oup > >> or lruvec returned by folio_memcg() or folio_lruvec(). Users are requi= red > >> to hold the RCU read lock or acquire a reference to the memory cgroup > >> associated with the folio to prevent its release if they are not conce= rned > >> about the binding stability between the folio and its corresponding me= mory > >> cgroup. However, some users of folio_lruvec() (i.e., the lruvec lock) > >> desire a stable binding between the folio and its corresponding memory > >> cgroup. An approach is needed to ensure the stability of the binding w= hile > >> the lruvec lock is held, and to detect the situation of holding the > >> incorrect lruvec lock when there is a race condition during memory cgr= oup > >> reparenting. The following four steps are taken to achieve these goals= . > >> > >> 1. The first step to be taken is to identify all users of both functi= ons > >> (folio_memcg() and folio_lruvec()) who are not concerned about bindi= ng > >> stability and implement appropriate measures (such as holding a RCU = read > >> lock or temporarily obtaining a reference to the memory cgroup for a > >> brief period) to prevent the release of the memory cgroup. > >> > >> 2. Secondly, the following refactoring of folio_lruvec_lock() demonstr= ates > >> how to ensure the binding stability from the user's perspective of > >> folio_lruvec(). > >> > >> struct lruvec *folio_lruvec_lock(struct folio *folio) > >> { > >> struct lruvec *lruvec; > >> > >> rcu_read_lock(); > >> retry: > >> lruvec =3D folio_lruvec(folio); > >> spin_lock(&lruvec->lru_lock); > >> if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) = { > >> spin_unlock(&lruvec->lru_lock); > >> goto retry; > >> } > >> > >> return lruvec; > >> } > >> > >> From the perspective of memory cgroup removal, the entire reparentin= g > >> process (altering the binding relationship between folio and its mem= ory > >> cgroup and moving the LRU lists to its parental memory cgroup) shoul= d be > >> carried out under both the lruvec lock of the memory cgroup being re= moved > >> and the lruvec lock of its parent. > >> > >> 3. Thirdly, another lock that requires the same approach is the split-= queue > >> lock of THP. > >> > >> 4. Finally, transfer the LRU pages to the object cgroup without holdin= g a > >> reference to the original memory cgroup. > >> > > > > Hi, Muchun, thanks for the patch. > > Thanks for your reply and attention. > > > > >> ## Challenges > >> > >> In a non-MGLRU scenario, each lruvec of every memory cgroup comprises = four > >> LRU lists (i.e., two active lists for anonymous and file folios, and t= wo > >> inactive lists for anonymous and file folios). Due to the symmetry of = the > >> LRU lists, it is feasible to transfer the LRU lists from a memory cgro= up > >> to its parent memory cgroup during the reparenting process. > > > > Symmetry of LRU lists doesn't mean symmetry 'hotness', it's totally > > possible that a child's active LRU is colder and should be evicted > > first before the parent's inactive LRU (might even be a common > > scenario for certain workloads). > > Yes. > > > This only affects the performance not the correctness though, so not a > > big problem. > > > > So will it be easier to just assume dying cgroup's folios are colder? > > Simply move them to parent's LRU tail is OK. This will make the logic > > appliable for both active/inactive LRU and MGLRU. > > I think you mean moving all child LRU list to the parent memcg's inactive > list. It works well for your case. But sometimes, due to shared page cach= e > pages, some pages in the child list may be accessed more frequently than > those in the parent's. Still, it's okay as they can be promoted quickly > later. So I am fine with this change. > > > > >> > >> In a MGLRU scenario, each lruvec of every memory cgroup comprises at l= east > >> 2 (MIN_NR_GENS) generations and at most 4 (MAX_NR_GENS) generations. > >> > >> 1. The first question is how to move the LRU lists from a memory cgrou= p to > >> its parent memory cgroup during the reparenting process. This is due= to > >> the fact that the quantity of LRU lists (aka generations) may differ > >> between a child memory cgroup and its parent memory cgroup. > >> > >> 2. The second question is how to make the process of reparenting more > >> efficient, since each folio charged to a memory cgroup stores its > >> generation counter into its ->flags. And the generation counter may > >> differ between a child memory cgroup and its parent memory cgroup be= cause > >> the values of ->min_seq and ->max_seq are not identical. Should thos= e > >> generation counters be updated correspondingly? > > > > I think you do have to iterate through the folios to set or clear > > their generation flags if you want to put the folio in the right gen. > > > > MGLRU does similar thing in inc_min_seq. MGLRU uses the gen flags to > > defer the actual LRU movement of folios, that's a very important > > optimization per my test. > > I noticed that, which is why I asked the second question. It's > inefficient when dealing with numerous pages related to a memory > cgroup. > > > > >> > >> I am uncertain about how to handle them appropriately as I am not an > >> expert at MGLRU. I would appreciate it if you could offer some suggest= ions. > >> Moreover, if you are willing to directly provide your patches, I would= be > >> glad to incorporate them into this patchset. > > > > If we just follow the above idea (move them to parent's tail), we can > > just keep the folio's tier info untouched here. > > > > For mapped file folios, they will still be promoted upon eviction if > > their access bit are set (rmap walk), and MGLRU's table walker might > > just promote them just fine. > > > > For unmapped file folios, if we just keep their tier info and add > > child's MGLRU tier PID counter back to the parent. Workingset > > protection of MGLRU should still work just fine. > > > >> > >> ## Compositions > >> > >> Patches 1-8 involve code refactoring and cleanup with the aim of > >> facilitating the transfer LRU folios to object cgroup infrastructures. > >> > >> Patches 9-10 aim to allocate the object cgroup for non-kmem scenarios, > >> enabling the ability that LRU folios could be charged to it and aligni= ng > >> the behavior of object-cgroup-related APIs with that of the memory cgr= oup. > >> > >> Patches 11-19 aim to prevent memory cgroup returned by folio_memcg() f= rom > >> being released. > >> > >> Patches 20-23 aim to prevent lruvec returned by folio_lruvec() from be= ing > >> released. > >> > >> Patches 24-25 implement the core mechanism to guarantee binding stabil= ity > >> between the folio and its corresponding memory cgroup while holding lr= uvec > >> lock or split-queue lock of THP. > >> > >> Patches 26-27 are intended to transfer the LRU pages to the object cgr= oup > >> without holding a reference to the original memory cgroup in order to > >> address the issue of the dying memory cgroup. > >> > >> Patch 28 aims to add VM_WARN_ON_ONCE_FOLIO to LRU maintenance helpers = to > >> ensure correct folio operations in the future. > >> > >> ## Effect > >> > >> Finally, it can be observed that the quantity of dying memory cgroups = will > >> not experience a significant increase if the following test script is > >> executed to reproduce the issue. > >> > >> ```bash > >> #!/bin/bash > >> > >> # Create a temporary file 'temp' filled with zero bytes > >> dd if=3D/dev/zero of=3Dtemp bs=3D4096 count=3D1 > >> > >> # Display memory-cgroup info from /proc/cgroups > >> cat /proc/cgroups | grep memory > >> > >> for i in {0..2000} > >> do > >> mkdir /sys/fs/cgroup/memory/test$i > >> echo $$ > /sys/fs/cgroup/memory/test$i/cgroup.procs > >> > >> # Append 'temp' file content to 'log' > >> cat temp >> log > >> > >> echo $$ > /sys/fs/cgroup/memory/cgroup.procs > >> > >> # Potentially create a dying memory cgroup > >> rmdir /sys/fs/cgroup/memory/test$i > >> done > >> > >> # Display memory-cgroup info after test > >> cat /proc/cgroups | grep memory > >> > >> rm -f temp log > >> ``` > >> > >> ## References > >> > >> [1] https://lore.kernel.org/linux-mm/Z6OkXXYDorPrBvEQ@hm-sls2/ > >> [2] https://lwn.net/Articles/895431/ > >> [3] https://github.com/systemd/systemd/pull/36827 > > > > How much overhead will it be? Objcj has some extra overhead, and we > > have some extra convention for retrieving memcg of a folio now, not > > sure if this will have an observable slow down. > > I don't think there'll be an observable slowdown. I think objcg is > more effective for slab objects as they're more sensitive than user > pages. If it's acceptable for slab objects, it should be acceptable > for user pages too. We currently have some workloads running with `nokmem` due to objcg performance issues. I know there are efforts to improve them, but so far it's still not painless to have. So I'm a bit worried about this... > > > > I'm still thinking if it be more feasible to just migrate (NOT that > > Cgroup V1 migrate, just set the folio's memcg to parent for dying > > cgroup and update the memcg charge) and iterate the folios on > > reparenting in a worker or something like that. There is already > > things like destruction workqueue and offline waitqueue. That way > > folios will still just point to a memcg, and seems would avoid a lot > > of complexity. > > I didn't adopt this approach for two reasons then: > > 1) It's inefficient to change `->memcg_data` to the parent when > iterating through all pages associated with a memory cgroup. This is a problem indeed, but isn't reparenting a rather rare operation? So a slow async worker might be just fine? > 2) During iteration, we might come across pages isolated by other > users. These pages aren't in any LRU list and will thus miss > being reparented to the parent memory cgroup. Hmm, such pages will have to be returned at some point, adding convention for isolate / return seems cleaner than adding convention for all folio memcg retrieving?