From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FD9AC54E76 for ; Sat, 18 Nov 2023 18:52:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8A4A44016D; Sat, 18 Nov 2023 13:52:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3B49440166; Sat, 18 Nov 2023 13:52:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A019A44016D; Sat, 18 Nov 2023 13:52:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8AEC2440166 for ; Sat, 18 Nov 2023 13:52:12 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5A57614052D for ; Sat, 18 Nov 2023 18:52:12 +0000 (UTC) X-FDA: 81471970104.02.D943F6A Received: from mail-io1-f48.google.com (mail-io1-f48.google.com [209.85.166.48]) by imf21.hostedemail.com (Postfix) with ESMTP id A57181C0002 for ; Sat, 18 Nov 2023 18:52:10 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="NC/ouDCr"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700333530; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=M3zLsoIT4UCSH3+FJAzb7xrP201x2RCavisKGcQY88g=; b=LJK7pU5H4qCmpR7mA0+/hc18cTd7eawN2wh7dCv+gqnuF/UJG+WYox/WXYpZINJcGyLTrw RUQeMGI5wrKe5MK4vqbYsfgfbtxl2Z6YSdFjlnDKhXS6WyzJ8IZRbXO5+tGAprOjFXimU7 QpVMHm3KiWnjJdTvrEwjM1Ywf6+QUWQ= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="NC/ouDCr"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.48 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700333530; a=rsa-sha256; cv=none; b=EnNybm+OwGjeEeEvhYIgkw8JVjsQ30ZuI34fT4TQsgXkNbclx1sIfHkjoOL9JKRK/ydH5M DL4okB6PT9fXgZGpmMfpHSsF/mi9Wc+T5mZtCTHp1TXs5KCSYUUTuZDRVwmCUpLma6J2hp q24wKAqtGTKm4ngXChTYw7PV5PA+Bgw= Received: by mail-io1-f48.google.com with SMTP id ca18e2360f4ac-7a692658181so112049439f.1 for ; Sat, 18 Nov 2023 10:52:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700333530; x=1700938330; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=M3zLsoIT4UCSH3+FJAzb7xrP201x2RCavisKGcQY88g=; b=NC/ouDCr+wCDu9459iXhJ2Yd2k+uQE10UX0dqH6iCD8tws5aa0Ina/Jf0EiYNpEoEZ vgHWrEpbQ8Jxe1oLQdhvfQ9NbJVENX2P2zfAkd2LXcFBy4xbi8fOg7QQf31TjYDZkky4 vZuW6pItSp/BPJaLkj1ecBqLMntSwoVAlyrFn/VZB4NBWzhTiqWAAE7DuX0DKf9F9ufj QEOMKpGpj/gw4qUBv4HGqaV/8dVymsqChgLt9ahMw8WIcLgNjn+5bS6vaRpMESsBWKro nl2PpGFORt1IHRc5JEpxH3YDe+hxpVLWysE/sJXF8avWR3mOqwb8mfqmrg48ogGVih68 tg9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700333530; x=1700938330; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=M3zLsoIT4UCSH3+FJAzb7xrP201x2RCavisKGcQY88g=; b=pNcZ7pzRkp5WzTs7ym8WfPRjcC5T0Z99m1tAOyGouoZH0m5P4cime8Ue/feu7adZ19 qwCCOTSGiaF30zRhSvuDTOTbvZty9EFA9oeLDeEYl1t8S5PPD31lmOzlj1zbuspVNpBO 9OLn8CSnX7do6R+vE+tKcvXGaqju9R6+IaY2T+XaomIj3dwQUvcK+ohQmNZvZXiok9Mu y6fYk7RtsOpeeQngVFH/+ALEDXx3XsUaUUB4kI7dts8BxG0mg7c4J4LaZnnYey+gvCzM z/mns6iaDBWU8lMBwxi1oU29eklLzJZdJce8rDz9ilmK9EOklSvL9nh67q8+OxXdCb+q 2cIw== X-Gm-Message-State: AOJu0Yy1NK83OfIufWvIpl3DeHEmhh3E3AYBX8J0SCnZ6bGITGltYS7t tjKhnqZ1otwg3a9UJZ5jiJNoGJR/1vb1GSZXkcI= X-Google-Smtp-Source: AGHT+IGRtMfCLqfsqoEeMz0GaSr9lY0ElCjWXU7gGcodbfY36bPZA9GVfB1EVzPEijWwCAKK7Zqv8f/sIRLBcrMGO9Y= X-Received: by 2002:a5e:da04:0:b0:7a9:5ac1:549e with SMTP id x4-20020a5eda04000000b007a95ac1549emr3624558ioj.8.1700333529735; Sat, 18 Nov 2023 10:52:09 -0800 (PST) MIME-Version: 1.0 References: <20231106183159.3562879-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Sat, 18 Nov 2023 13:51:37 -0500 Message-ID: Subject: Re: [PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback To: Yosry Ahmed Cc: Andrew Morton , Chris Li , Dan Streetman , Domenico Cerasuolo , Johannes Weiner , LKML , Seth Jennings , Shakeel Butt , Vitaly Wool , cgroups@vger.kernel.org, kernel-team@meta.com, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm , mhocko@kernel.org, muchun.song@linux.dev, roman.gushchin@linux.dev, shuah@kernel.org Content-Type: multipart/alternative; boundary="0000000000000a26f6060a71c012" X-Rspamd-Queue-Id: A57181C0002 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: odn8fupgnkfr9qscfn9qr5dom7c65p6p X-HE-Tag: 1700333530-174447 X-HE-Meta: U2FsdGVkX1/i9CegzOx/LGsjKahHjUTVOkIO37moNAUOAB7k/GkZJSYwEG8H8caq/jJibp7qMrYCgOCWTRGvXRflOLE2tQZpIT6UK3H4GG1rHXzeVfDWo1fpUxuhfhBdnCawTfWQbE+AmfLi3dsXO5hyjDuQXwKunIa/wXqo/ft9GC6WxFSeEJ3XJKbUTtK5OR6BR8m+S0XMDA3Y7daM2Jv9jXVVu5MvNgq/AwctHV4+jeWisYT7JhfvZbT2aC6lU5hhcCwSQCLXu06QcEzK0uy+wrpsdmEts1ulDgzNLqgnlHodRCk4yUur901m1/GaxjY4/mvbszabsxMecxI8eJmNcCJd6LGgIajygerTk7bI7Fn/jEKUfSjLRmtt25h8eBzLzlwmmkRPuwsYUDHvMQ2bqfTKHRjaIhZZoJaWJwab32cxDvEH0505oynrbSQ3erzFjtOm9M8fKiwvFzVS27VGRihsFNOXE6zQ+/wU7ArjtBtUbS6ysn9/4+3F16HbbfU2NupT9Xrt6n/xUDHjyw70FX5HmVxY0CbukSHaGE/LnNVjVyRyRjP+s8qJpiml+iObJHEZ40/yNPqTnUGpNXRpJdQIA/99fVJ8Dz64dzMi0gI+7vC8XPwauEztGlGPJOQ+yl6ulNrzFTViRm7fCUs7dUmgSF4a0zf0RU69iclfSAtkcsCfcJsi/oceBWD8kGmwSg9opKya0fMxmhnUX5acbn9g08sl5hycBnfImH06G9r9PQWOrS82BNUFVC43CPmYCaai0x83OIhxYXTY/Z3apXPQ/vEf2/envWKJZ40kauaHN/dC59ZGSDA+ICJWRjuU7cOJA9RtenmIBoA154oSl5Ds2DDQNIu62FKdeVy0/Z8OwJzB2Vt+nklg1kFXLbiBcQH3V263Mi0p0hFfxlQdra2T7dM5MG/KKUA3ebMGVuG/2AfGZvvMupKzfspVEQ/hFjA3Igu5+k9b4vx wral4KoC R9vD6dQafI/cdK0XCt5EKeMJ7kFFqQzgR5E8of5jYCvcphHIjmOAeiU3qipO3o0XvuvbJ0Un/2QIN08xaHkgv3XlxR4dJbbY+bWQTL+kV5YnwxPwhfWYo6yd0LjRox6TeAMJDMPsdoqigIgn/eVpC2U0zKVAkdBOoXsS37JRGidJysuSqA9/PBcsjPlEiEIkKSP++vA5j308ra4ppyedWUluK1wqwSJVJgdxRee6xev1wAVPAjfrKeutX93i+67NS9jU4n+33u6xABZUy6c8vHyECNWvQEQu6MUXFa+11ohFbwmy2s/Z4MHbwVjFbwgjcyRr5qLgf3R7A2aJKZdkF5bb8dJFT+NFSI8mtJkdOA8IUjISmz6Kkpq+jU1jEpuZYhPA7ZoNvC4AxKCvNyx/pTMZyC5OpLeHAAVeTsDhVBZawVNizNMw4KLHF5T6TSav5HwzPlJoLbDTRoPKLuS/z0EGwwTUeXzG1NC0U559n6N+cpFLDA+7cM8+d5ZQDwur8FWI119LLVw2fELPNuYq/mqgy5mdsS5jX/uKqConUKL+W13E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.006872, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --0000000000000a26f6060a71c012 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Nov 17, 2023 at 11:27=E2=80=AFAM Yosry Ahmed wrote: > > On Fri, Nov 17, 2023 at 8:23=E2=80=AFAM Nhat Pham wro= te: > > > > On Thu, Nov 16, 2023 at 4:57=E2=80=AFPM Chris Li wr= ote: > > > > > > Hi Nhat, > > > > > > I want want to share the high level feedback we discussed here in the > > > mailing list as well. > > > > > > It is my observation that each memcg LRU list can't compare the page > > > time order with other memcg. > > > It works great when the leaf level memcg hits the memory limit and yo= u > > > want to reclaim from that memcg. > > > It works less well on the global memory pressure you need to reclaim > > > from all memcg. You kind of have to > > > scan each all child memcg to find out the best page to shrink from. I= t > > > is less effective to get to the most desirable page quickly. > > > > > > This can benefit from a design similar to MGLRU. This idea is > > > suggested by Yu Zhao, credit goes to him not me. > > > In other words, the current patch is similar to the memcg page list > > > pre MGLRU world. We can have a MRLRU > > > like per memcg zswap shrink list. > > > > I was gonna summarize the points myself :P But thanks for doing this. > > It's your idea so you're more qualified to explain this anyway ;) > > > > I absolutely agree that having a generation-aware cgroup-aware > > NUMA-aware LRU is the future way to go. Currently, IIUC, the reclaim logic > > selects cgroups in a round-robin-ish manner. It's "fair" in this perspective, > > but I also think it's not ideal. As we have discussed, the current list_lru > > infrastructure only take into account intra-cgroup relative recency, no= t > > inter-cgroup relative recency. The recently proposed time-based zswap > > reclaim mechanism will provide us with a source of information, but the > > overhead of using this might be too high - and it's very zswap-specific= . > > > > Maybe after this, we should improve zswap reclaim (and perhaps all > > list_lru users) by adding generations to list_lru then take generations > > into account in the vmscan code. This patch series could be merged > > as-is, and once we make list_lru generation-aware, zswap shrinker > > will automagically be improved (along with all other list_lru/shrinker > > users). > > > > I don't know enough about the current design of MGLRU to comment > > too much further, but let me know if this makes sense, and if you have > > objections/other ideas. > > > > And if you have other documentations for MGLRU than its code, could > > you please let me know? I'm struggling to find more details about this. > > > > This could be a good place to start: > https://www.youtube.com/watch?v=3D9HvJfN21H9Y Ah I think I've seen this talk before. I'd also like to point out that the current set of heuristics employed by the shrinker somewhat mimics an active-inactive LRUs (i.e a two generations MGLRU). Not sure how to generalize this to more than two generations though. --0000000000000a26f6060a71c012 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable



On Fri, Nov 17, 2023 at 11:27=E2=80=AFAM Yosry Ahmed <yosryahmed@google.com> wrot= e:
>
> On Fri, Nov 17, 2023 at 8:23=E2=80=AFAM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Thu, Nov 16, 2023 at 4:57=E2=80=AFPM Chris Li <chrisl@kernel.org> wrote:=
> > >
> > > Hi Nhat,
> > >
> > > I want want to share the high level feedback we discussed he= re in the
> > > mailing list as well.
> > >
> > > It is my observation that each memcg LRU list can't comp= are the page
> > > time order with other memcg.
> > > It works great when the leaf level memcg hits the memory lim= it and you
> > > want to reclaim from that memcg.
> > > It works less well on the global memory pressure you need to= reclaim
> > > from all memcg. You kind of have to
> > > scan each all child memcg to find out the best page to shrin= k from. It
> > > is less effective to get to the most desirable page quickly.=
> > >
> > > This can benefit from a design similar to MGLRU. This idea i= s
> > > suggested by Yu Zhao, credit goes to him not me.
> > > In other words, the current patch is similar to the memcg pa= ge list
> > > pre MGLRU world. We can have a MRLRU
> > > like per memcg zswap shrink list.
> >
> > I was gonna summarize the points myself :P But thanks for doing t= his.
> > It's your idea so you're more qualified to explain this a= nyway ;)
> >
> > I absolutely agree that having a generation-aware cgroup-aware > > NUMA-aware LRU is the future way to go. Currently, IIUC, the recl= aim logic
> > selects cgroups in a round-robin-ish manner. It's "fair&= quot; in this perspective,
> > but I also think it's not ideal. As we have discussed, the cu= rrent list_lru
> > infrastructure only take into account intra-cgroup relative recen= cy, not
> > inter-cgroup relative recency. The recently proposed time-based z= swap
> > reclaim mechanism will provide us with a source of information, b= ut the
> > overhead of using this might be too high - and it's very zswa= p-specific.
> >
> > Maybe after this, we should improve zswap reclaim (and perhaps al= l
> > list_lru users) by adding generations to list_lru then take gener= ations
> > into account in the vmscan code. This patch series could be merge= d
> > as-is, and once we make list_lru generation-aware, zswap shrinker=
> > will automagically be improved (along with all other list_lru/shr= inker
> > users).
> >
> > I don't know enough about the current design of MGLRU to comm= ent
> > too much further, but let me know if this makes sense, and if you= have
> > objections/other ideas.
> >
> > And if you have other documentations for MGLRU than its code, cou= ld
> > you please let me know? I'm struggling to find more details a= bout this.
> >
>
> This could be a good place to start:
> https://www.youtube.com/watch?v=3D9HvJfN21H9Y<= br>

Ah I think I've seen this talk before.
<= div>
I'd also like to point out that the current set of h= euristics employed by the
shrinker somewhat mimics an active-inac= tive LRUs (i.e a two generations
MGLRU). Not sure how to generali= ze this to more than two generations
though.
--0000000000000a26f6060a71c012--