From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19FE2C4332F for ; Fri, 15 Dec 2023 02:19:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CCC76B028F; Thu, 14 Dec 2023 21:19:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 055C26B0609; Thu, 14 Dec 2023 21:19:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E38EB6B060A; Thu, 14 Dec 2023 21:19:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CFECE6B0608 for ; Thu, 14 Dec 2023 21:19:19 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 91DD4C0211 for ; Fri, 15 Dec 2023 02:19:19 +0000 (UTC) X-FDA: 81567445638.01.572355D Received: from mail-il1-f170.google.com (mail-il1-f170.google.com [209.85.166.170]) by imf28.hostedemail.com (Postfix) with ESMTP id C9E35C0016 for ; Fri, 15 Dec 2023 02:19:17 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fafoc7cj; spf=pass (imf28.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.170 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702606757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B8gS/q427oPS6HKGW6tQx+fntDpi2+l983j1qfuY/o0=; b=SvX6auUonrsVkB9IyQOK+gXrU4nyxjePvhHD6YXmCxFjzRSxFxgUYL7ISvj/c542KRV/fs ICrK87bJJgua/XsoCjqGbOAlCLETWnv1uWmqqHAuNeJmXxeTNf1d4uqFvOvupRXJIti37r W9ehv8kE2XhezjEXFCNZ/YeRQJU14+A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702606757; a=rsa-sha256; cv=none; b=jUqzVPB23agHBUjhLLy2noeTdFs7ErGw3kKfimVac2k+58ZWOQBXUxeL4jMwjvdWh+J9ja jyVv/FWquqIsxFS1eINy/zGAv+oV76l8n1uBeiUGGw6ijjUFyWoLEohAkPryHoh95kBJut 5oaaQV/QkuDJY3lQLW6NpirMERn5wVs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Fafoc7cj; spf=pass (imf28.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.170 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-il1-f170.google.com with SMTP id e9e14a558f8ab-35e70495835so1038025ab.3 for ; Thu, 14 Dec 2023 18:19:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702606757; x=1703211557; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=B8gS/q427oPS6HKGW6tQx+fntDpi2+l983j1qfuY/o0=; b=Fafoc7cjJWrlPV7mWFtjBhlUv5cCD0A/1JaUNwkGerb+4eZUU1lwd3vdjR8/tlVZU9 aJW80MUUFBmE8cEPrKoUCN1BL3cTH7q5UNNXkjS9F2wqyB+hH76+socBdMfBh2zQrWhL IXm5b0tS/toK4DeoD0qUp5nJkwoEqaQpi73CDvd/izgXQe3rghUb8au51U2uMIhtqBaO RNdymXMT9tmLvTgZsP24+wrb0Rtu+URWSa/Zvt6UwA2BhQFg4EPyC374kJ2p6otDyNSH 7NSthFZtUAEc8eADa5jv55pKCS7mkzGdrng8yMQEUDRvVEXaCgubE0eybkmgbXXZQmJw K5fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702606757; x=1703211557; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B8gS/q427oPS6HKGW6tQx+fntDpi2+l983j1qfuY/o0=; b=WNSU36F3Oi8bH18HgTIzUh8QYGI8wO3jB5gfvs6TJnVaz2vt/kxsr3MRyR+w5Ih/Ji 0HvazrziHNrUbeWOlil2+1lysM/U+alAMLUXSI8ERMziQTenLfFEy5oMr2vmAFWK7Brn YvExaQJYsvoHnAEML2gbysI9WNUbsYE67pnX2HnHe8wmin2X4+6/zIlWy/HS+zMqTZNU N1uzmiR/Or4SOqgtfLay+H+n+sVTTx5bOeyn4dTBvfwu+TweM6YDpMHvoRTgIHUjKmKC tV83TulJ0/vRCVKC4Sjmzxq5GkCK2k4eGpHmi/QPyeM7qC1ibfu2xa8gvLopY09ML6PR 8TnQ== X-Gm-Message-State: AOJu0YxiszidxvDUBCrPE/Q4xxfAcLxTVUihntkHeDQz3UWj1UWeVeWI FFXDpW/b6Ow7alPzymCuYtemYue3oLcRrLBrTMg= X-Google-Smtp-Source: AGHT+IG9KGChwrAb21xlqEK6g1rzWiKGGpdS0P+T0yaZowCVMAnp2upHHz6T9hUDQKZI/B0LedbnBo9mjXgr9ywQV6s= X-Received: by 2002:a05:6e02:1a2d:b0:35d:5995:1d5c with SMTP id g13-20020a056e021a2d00b0035d59951d5cmr16322485ile.33.1702606756802; Thu, 14 Dec 2023 18:19:16 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> <20231214171137.GA261942@cmpxchg.org> <20231214221140.GA269753@cmpxchg.org> In-Reply-To: From: Nhat Pham Date: Thu, 14 Dec 2023 18:19:05 -0800 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Chris Li Cc: Johannes Weiner , Minchan Kim , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 5naurzwk3mwdreodpx6ntxu3fpjqq4wy X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C9E35C0016 X-Rspam-User: X-HE-Tag: 1702606757-200998 X-HE-Meta: U2FsdGVkX1+cVD7lOW9f4MxnznL7V0R4+k9uWWa5zZ0kkHG8HsI4jJ/LfkXno7Qwv/jbbmKFBqyKOgyXoyxTnjEKl6KKaf+aboWhGDPa1PAVLNA1phVsKLEUwSm1OONe962/7LpSFga8BTv1HIVJtKEW5zt7dLkrB8B2gb2XVzWAOrRfVoe6tHssL1ZdGKGbaj3QqdUkU9zNezqeI9mIc/41eX0sJ/843BEm2WYfoKLXzDeKv0mtQCafSopj/dU2cBla6VfbeOB/gLYG5q0LjVYGEW62v8SePURdsWRJwa1MyjUZYqOwThd4ykUd9Bz9iygPONj8xYsovsIqzZTmhc/Zz/PrzLNcvir3hKZi1QYX7xOAF4QWdXhTERIbqMD7jYbJ9ySwTdeCYxArOwVyYgPqBzNh3RRWtIy/PLXlz/p9gdvLf9AhkRbAVFi19Aw8sI0PYLCpe+/9KihhYoiQeEPmx3VYMby1adENBVRoNAE9jip81aGe0cVGZCnsFYJmxupj+JLkjA2MuZ9fV7UCDCqDHXGJ3yAs1recdTvezknx+rEu4+RzHuQ7u9CSKYOguHllscSxF3KYKTudDKj2IoAIvtVPgq5SHHXPkUSPMkZ2ev75Q5U7ePnbAy4aYDdyoDugfhOA4BQ6aX0/9rwlpiOdJYKQ7/WR1yFR+pHC07son/5mCgF32vMt3Y7IoplFsOYc4jaHkrsUjCappsEPck+1Yfn+KZAK2Lwiwp1QCSBQJOKoIabn8Y2mI2i/rxl/XE9C36dzbfhkRKUHvMOegdBgT5IT7JXiWakh5Qh1KsPBYBS8ESpFcUiX/TMdgnQPxshHKqhq1SsTjECJwxH9821+H6AE1bl6QCT3Lh+t/iafDBKH2x6Agn9bnxoJWhevPOg9+QOSV865EbBGOPV2lwHYnBmlVwk2Um+kfOey40NEBWfK8pylDrPM7qdzJSQdylPuH0RDK0k81x2nBc8 8JI1RJgh 3F8jeaSE+RKANLHlaCgsiVB4bvYwoapcZwZf9h8qQnpUc142JdZQ2NKJrvssHzPb3LjwEAScddCSlMTykdoc6RtizZ06sCynGeObY5fCy795I3aXVgWANIfvT2lBJuyNxWvOOxR6KBNe2nlgKKaP3zbSLMHhxV2LlIDNgjsRUhmLWT4ypTSAss1LNFBRGi+Y8NiTMZt5aaXn9UeAWRilPhk6EQOSBsCYyv1QfG8m7nmUw3foJZ+J2BBWniqLU1z1gfknmVmuzwrmocbH49ypz7FsGiBvk+pk3E+HhJo8DJD/ibGhBMbFledAIuJXSqAYAvlSXLJm9wmfPsZafMQ9Bv18ARCOcEvppuTrXrxwUEBFoxvLhP1b9p/6bhfvsZNKODPdnFL5fV104mZdaDxLTt3wtTPl5o8ob3/V1MxZA2IQJ/DPk8uKS/jspvhGjzvXI/+OjYJ9m0B8DSeveV6gQO8rvkeWQOQZp0y5s X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 14, 2023 at 2:55=E2=80=AFPM Chris Li wrote: > > On Thu, Dec 14, 2023 at 2:11=E2=80=AFPM Johannes Weiner wrote: > > > > On Thu, Dec 14, 2023 at 09:34:06AM -0800, Christopher Li wrote: > > > On Thu, Dec 14, 2023 at 9:11=E2=80=AFAM Johannes Weiner wrote: > > > > > > > > Hi Johannes, > > > > > > > > > > I haven't been following the thread closely, but I noticed the di= scussion > > > > > about potential use cases for zram with memcg. > > > > > > > > > > One interesting idea I have is to implement a swap controller per= cgroup. > > > > > This would allow us to tailor the zram swap behavior to the speci= fic needs of > > > > > different groups. > > > > > > > > > > For example, Group A, which is sensitive to swap latency, could u= se zram swap > > > > > with a fast compression setting, even if it sacrifices some compr= ession ratio. > > > > > This would prioritize quick access to swapped data, even if it ta= kes up more space. > > > > > > > > > > On the other hand, Group B, which can tolerate higher swap latenc= y, could benefit > > > > > from a slower compression setting that achieves a higher compress= ion ratio. > > > > > This would maximize memory efficiency at the cost of slightly slo= wer data access. > > > > > > > > > > This approach could provide a more nuanced and flexible way to ma= nage swap usage > > > > > within different cgroups. > > > > > > > > That makes sense to me. > > > > > > > > It sounds to me like per-cgroup swapfiles would be the easiest > > > > solution to this. Then you can create zram devices with different > > > > configurations and assign them to individual cgroups. > > > > > > Ideally you need zram then following swap file after the zram. That > > > would be a list of the swap files rather than just one swapfile per > > > cgroup. > > > > > > > This would also apply to Kairu's usecase: assign zrams and hdd back= ups > > > > as needed on a per-cgroup basis. > > > > > > Same there, Kairui's request involves ZRAM and at least one extra swa= p > > > file. In other words, you really need a per cgroup swap file list. > > > > Why is that a problem? > > It is not a problem. It is the necessary infrastructure to support the > requirement. I am merely saying just having one swap file is not > enough. > > > > > swapon(zram, cgroup=3Dfoo) > > swapon(hdd, cgroup=3Dfoo) > > Interesting idea. I assume you want to use swapon/swapoff to turn on > off a device for a specific cgroup. > That seems to implite each cgroup will have a private copy of the swap > device list. > > I have considered the memory.swap.tiers for the same thing, with one > minor optimization. The list is system wide maintained with a name. > The per cgroup just has a pointer to that named list. There shouldn't > be too many such lists of swap back end combinations on the system. > > We are getting into the weeds. The bottom line is, we need to have per > cgroup a swap file list. That is the necessary evil we can't get away > with. Highly agree. This is getting waaayyyy too deep into the weeds, and the conversation has practically spiralled out of the original intention of this patch - its purported problem and proposed solution. Not to say that none of this is useful, but I sense that we first need to do the following: a) List out the requirements that the new interface has to support: the tiers made available to the cgroup, hierarchical structure (i.e do we want a tier list to have more than 1 non-zswap level? Maybe we won't need it after all, in which case the swapon solution is perhaps sufficient). b) Carefully evaluate the proposed candidates. It could be an altered memory.swap.tiers, or an extended swapon/swapoff. Perhaps we should organize a separate meeting or email thread to discuss this in detail, and write out proposed solutions for everyone to evaluate. In the meantime, I think that we should merge this new knob as-is. > > > > > > > In addition, it would naturally solve scalability and isolation > > > > problems when multiple containers would otherwise be hammering on t= he > > > > same swap backends and locks. > > > > > > > > It would also only require one, relatively simple new interface, su= ch > > > > as a cgroup parameter to swapon(). > > > > > > > > That's highly preferable over a complex configuration file like > > > > memory.swap.tiers that needs to solve all sorts of visibility and > > > > namespace issues and duplicate the full configuration interface of > > > > every backend in some new, custom syntax. > > > > > > If you don't like the syntax of memory.swap.tiers, I am open to > > > suggestions of your preferred syntax as well. The essicents of the > > > swap.tiers is a per cgroup list of the swap back ends. The names impl= y > > > that. I am not married to any given syntax of how to specify the list= . > > > Its goal matches the above requirement pretty well. > > > > Except Minchan said that he would also like different zram parameters > > depending on the cgroup. > > Minchan's requirement is new. We will need to expand the original > "memory.swap.tiers" to support such usage. > > > There is no way we'll add a memory.swap.tiers with a new configuration > > language for backend parameters. > > > > I agree that we don't want a complicated configuration language for > "memory.swap.tiers". > > Those backend parameters should be configured on the back end side. > The "memory.swap.tiers" just reference the already configured object. > Just brainstorming: > /dev/zram0 has compression algo1 for fast speed low compression ratio. > /dev/zram1 has compression algo2 for slow speed high compression ratio. > > "memory.swap.tiers" point to zram0 or zram1 or a custom list has "zram0 += hdd" > > Chris