From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07790C4167B for ; Tue, 12 Dec 2023 21:36:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2C426B03BA; Tue, 12 Dec 2023 16:36:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EDC346B03BB; Tue, 12 Dec 2023 16:36:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D55BE6B03BC; Tue, 12 Dec 2023 16:36:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C04626B03BA for ; Tue, 12 Dec 2023 16:36:38 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9A08BC0AC1 for ; Tue, 12 Dec 2023 21:36:38 +0000 (UTC) X-FDA: 81559475676.20.EA8D34A Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) by imf25.hostedemail.com (Postfix) with ESMTP id 34685A0019 for ; Tue, 12 Dec 2023 21:36:36 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IqSs2EIU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702416996; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QqP0NZs3Xrf1lShIPhdYETxSBrVBzpiJjKpJblNh5Yc=; b=R+8+lhFQ8iIqUzlywtyZlU729plEO7Kqx/qZ6TY0388Gcjz66bdGGVXc+qt5G5vUyEjCx2 TpWy6nR9BxZR2p0Q5Vc6hbOF0nhKA45JSdVMIpBY1qpsZhpbGbopTdZlL2izQlXjEhKWnL LDzjs5099SgvHysSQTgfq1pZ5NS4CcE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IqSs2EIU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.169 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702416996; a=rsa-sha256; cv=none; b=vYLZZybHUQ7GKCyvMJjHLdNnPY7o2pVROGbzDwL7Nb64JZLvpECFdzuk0ApFC/b+UyvcRp nGnRqTK7GwnXuGma2Ycz7AhhIIsf1vT6xvNJEl18fbO4Xf/DqahStEp6NNzDH3zEP1FkY4 bvaQ9itG78U6Y6ZLJccljd/CXoyq+Vk= Received: by mail-il1-f169.google.com with SMTP id e9e14a558f8ab-35f6993ee96so3274415ab.1 for ; Tue, 12 Dec 2023 13:36:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702416995; x=1703021795; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QqP0NZs3Xrf1lShIPhdYETxSBrVBzpiJjKpJblNh5Yc=; b=IqSs2EIUGwlRJUw3ufKro0RoUPcgqvwlf8zTPphPXJFypEzzoeAEVvjkfDptnz8rho vqq0gX4vWyUWiQXZKShuqeQSFHPLcbyq417/oKIg7YclG2UGEGdDy3rsodGQxdAqPWXs GpnmyL83VpYrZX2hvXgZ1/AB/afp37DC02yOPMgk8vP66Sx4Qz1R44hOgCZ8zLWajpaw H1OsUH1y4ISP3VNOyCZghyvAykBBeI2zYywyWuXmDkW65qTuw9+EOESQbeBu/oFN0CXx e0OU7w2RLOBx9ecI5QZIRzGvKegoBnp9why+fHYh8TCtYxtmD2/FKjqCOKzt5+vvQQ26 Sh+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702416995; x=1703021795; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QqP0NZs3Xrf1lShIPhdYETxSBrVBzpiJjKpJblNh5Yc=; b=Z7jJfjGNykJBgXEf+6/tauN3GVzvBmmB/3/ibSmTLnHARruK5ODzq9VRMrEDL1lPhB Z55FYg0VRaH9uLbjMNfNsQFNoc8fU6pwTZndaKd0SWPWF0fQNAiokflcu4rppd3Y01o0 ZWKAf2/rYC2uG+BQTIUUkB9NbDA/pvHpElsqX4i17Y5dH1eSDFldoJiG6iSD0AbUHwTW G9BKYuBrrhEk3LwjQFKbyXnXavg+SWY763LQuqsbYgTdule8ir/3L4h635lLfJu+tzEn GB3gBu6AFPnHPvgJtQ3qqcwEHW931JC1eCeFDyp7b5cOwZYrutyf/Of1HiE58fXj+gxI 1Q4w== X-Gm-Message-State: AOJu0YzG8ddDptIGfz30LbGWNg6OEdAQkuOeg9aT4mJZJRZr8gbJHsMm ju6zsn0mIpdzlSsmR3QR+IsBSN34mfoDju//CfA= X-Google-Smtp-Source: AGHT+IGaXYEoLEr7Vl2t1G4tynNtJ16suiLjYiz3UnBB/rUz8+LCA+G682Aa3nkZfTqfgfl4TLs0wlZZOnzseTz//+8= X-Received: by 2002:a05:6e02:1cab:b0:35d:59a2:bb5 with SMTP id x11-20020a056e021cab00b0035d59a20bb5mr6404771ill.75.1702416994943; Tue, 12 Dec 2023 13:36:34 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> In-Reply-To: <20231209034229.GA1001962@cmpxchg.org> From: Nhat Pham Date: Tue, 12 Dec 2023 13:36:23 -0800 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Johannes Weiner Cc: Chris Li , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Minchan Kim , Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 34685A0019 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: fcp5o8tt86au45rq9uuogaxkzrqipm7f X-HE-Tag: 1702416996-673181 X-HE-Meta: U2FsdGVkX18NsmW0IqkwqWfGv+6h6kfkkfULwXs2JTbZ3w+qLC1/aYYN/ii7pcJRkLO/Yu58GKcptO3uXAuqYTSsxTyAp7xZoboxVJa70GDx1TYMbEQRBfQhdzscHlbA1fhoicuEPc50C/TkVU6RAM8Du1WDrP35oHHJ4bIge37BU0zqdTWTDb3s/O705Y2Ck/Zcqc/7R+gxmWpo1yvLH16nI+2SmDWjvKi6Lc15C6MNuV0l5FFm6UtXZ33ZHsLgWbovq/hm9Q+hlqwPpGdpV1fyHss1EKuWmP2l+B077jodKd2voqZacduJd6sZXJT8iLUaPXEXQQ/N0JTNtNsZTOyLqh0iMbqUMYDw/JnwQc4EqDw7UmLnD0KYEv2JEP3QpHCwrUhWXaRk27oTEzY+qw+hAh5YguImDCYigt936PB4veA/g05LrqwSkutO2pvri2VUqxRL8F0U6yRJnCaa53KI/AU55RPBJ95L3lsXswkBnVXIsnel8AkXWHpmK4OEgt+szTYqKZZgxYmEgnS8lfX3kI5z5MkFBJStrqpvduVqA+v/VKMQwvnCCvpdW6vP4IGI6+woCotzgHD2CWxuRCAxNVyoI0zHzkpjTq6zSu+v8OZL5lawJjz9rxNFkjWsvA0Aao9mvXQIoTLqJlRlQkLngVC5T7g2fJRFcnPLvFvZz1JBV9o4IbmK2cD1rtsfqS2kNrsDknc2POWnvM7NV3+dsewhp7N1yLIkL+RUTGs3aRwmq6kMbLawy91J2tRfoIpTYm3dpOq7K2XcHUTHbJMF1oaEdCxCcQdGyeUlR5HBDVvTynhaiOmTHAP4Ut+91f9LFMNjQvYCiFl7Q+kswuiRpWADB+bmCa7ZtYY5medUxD/Xnx00/j5Xx6Uf+UQLDyR+dxoTiB4QBINOVgTFz8e6u70XmIYmj+M/NPFISC77woj2DQ4DL/RNxsGtShExtCuD39GMCafI1uZPFwJ IOTKxksL TXyOpKr1XPEsjxO2hoUUj/zcHudYLYbsdHLVlbRNsZpEEGvTfh+iGTtQQXL+d+aJS0+Pvy+O0ikXDhiKGudcPzd/I6zW4bbgfqgQDlZ1gn4W6Z6wVLWpTibcZ7MUzDJOn7eovGF98MGqe1Pp0Prq5rKHWXUW4kwTeEfTgLjqg+ppWCdE56EIMXevvcWQnKUv3/y1t8VJym/QdW5DzfJBdnKS6aCpw4QeqABXELM8QZ/sT7BiPSb1nRQpOkV82yd7enntwwmH+QZU5cR+GANl6XH77c6vMd/yMDsQgN9OStiHFvrOw9yvl5kM6vySbKtfN2PhZAifyaijsG+arkMcoUKWSuRcuQm0ne47/Cgmy/n/JfliOeZHlCan9RmLCbgjc2CrSoCaSiLNbJg6LjOvoGPKBwVqiHX6vxgsN/9eloFaGbp15H7dtO1BrjkxtfI7U0Wktkj3sW35rd+g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.005994, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 8, 2023 at 7:42=E2=80=AFPM Johannes Weiner = wrote: > > On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote: > > I can give you three usage cases right now: > > 1) Google producting kernel uses SSD only swap, it is currently on > > pilot. This is not expressible by the memory.zswap.writeback. You can > > set the memory.zswap.max =3D 0 and memory.zswap.writeback =3D 1, then S= SD > > backed swapfile. But the whole thing feels very clunky, especially > > what you really want is SSD only swap, you need to do all this zswap > > config dance. Google has an internal memory.swapfile feature > > implemented per cgroup swap file type by "zswap only", "real swap file > > only", "both", "none" (the exact keyword might be different). running > > in the production for almost 10 years. The need for more than zswap > > type of per cgroup control is really there. > > We use regular swap on SSD without zswap just fine. Of course it's > expressible. > > On dedicated systems, zswap is disabled in sysfs. On shared hosts > where it's determined based on which workload is scheduled, zswap is > generally enabled through sysfs, and individual cgroup access is > controlled via memory.zswap.max - which is what this knob is for. > > This is analogous to enabling swap globally, and then opting > individual cgroups in and out with memory.swap.max. > > So this usecase is very much already supported, and it's expressed in > a way that's pretty natural for how cgroups express access and lack of > access to certain resources. > > I don't see how memory.swap.type or memory.swap.tiers would improve > this in any way. On the contrary, it would overlap and conflict with > existing controls to manage swap and zswap on a per-cgroup basis. > > > 2) As indicated by this discussion, Tencent has a usage case for SSD > > and hard disk swap as overflow. > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.co= m/ > > +Kairui > > Multiple swap devices for round robin or with different priorities > aren't new, they have been supported for a very, very long time. So > far nobody has proposed to control the exact behavior on a per-cgroup > basis, and I didn't see anybody in this thread asking for it either. > > So I don't see how this counts as an obvious and automatic usecase for > memory.swap.tiers. > > > 3) Android has some fancy swap ideas led by those patches. > > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@kerne= l.org/ > > It got shot down due to removal of frontswap. But the usage case and > > product requirement is there. > > +Minchan > > This looks like an optimization for zram to bypass the block layer and > hook directly into the swap code. Correct me if I'm wrong, but this > doesn't appear to have anything to do with per-cgroup backend control. > > > > zswap.writeback is a more urgent need, and does not prevent swap.tier= s > > > if we do decide to implement it. > > > > I respect that urgent need, that is why I Ack on the V5 path, under > > the understanding that this zswap.writeback is not carved into stones. > > When a better interface comes alone, that interface can be obsolete. > > Frankly speaking I would much prefer not introducing the cgroup API > > which will be obsolete soon. > > > > If you think zswap.writeback is not removable when another better > > alternative is available, please voice it now. > > > > If you squash my minimal memory.swap.tiers patch, it will also address > > your urgent need for merging the "zswap.writeback", no? > > We can always deprecate ABI if something better comes along. > > However, it's quite bold to claim that memory.swap.tiers is the best > way to implement backend control on a per-cgroup basis, and that it'll > definitely be needed in the future. You might see this as a foregone > conclusion, but I very much doubt this. > > Even if such a file were to show up, I'm not convinced it should even > include zswap as one of the tiers. Zswap isn't a regular swap backend, > it doesn't show up in /proc/swaps, it can't be a second tier, the way > it interacts with its backend file is very different than how two > swapfiles of different priorities interact with each other, it's > already controllable with memory.zswap.max, etc. This is honestly the thing I was originally most iffy about :) zswap is architecturally and semantically separate from other swap options. It gets really confusing to lump it as part of the swap tiers. > > I'm open to discussing usecases and proposals for more fine-grained > per-cgroup backend control. We've had discussions about per-cgroup > swapfiles in the past. Cgroup parameters for swapon are another > thought. There are several options and many considerations. The > memory.swap.tiers idea is the newest, has probably had the least > amount of discussion among them, and looks the least convincing to me. Definitely. zswap.writeback is a really concrete feature, with immediate use-case, whereas swap.tiers seem a bit nebulous to me now, the more we discuss it. I'm not against the inclusion of something along its line though, and I'm definitely not trying to limit the use case of other folks - I'd be happy to contribute my engineering hours towards the discussion of the multi-tier swapping design (both internal implementation and and public interface), as well as actual code, when that design is fully fleshed out :) > > Let's work out the requirements first. > > The "conflict" with memory.zswap.writeback is a red herring - it's no > more of a conflict than setting memory.swap.tiers to "zswap" or "all" > and then setting memory.zswap.max or memory.swap.max to 0. Yup. > > So the notion that we have to rush in a minimal version of a MUCH > bigger concept, just to support zswap writeback disabling is > misguided. And then hope that this format works as the concept evolves > and real usecases materialize... There is no reason to take that risk.