From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25123C4332F for ; Thu, 14 Dec 2023 17:11:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3AF38D00D1; Thu, 14 Dec 2023 12:11:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AEA528D00C7; Thu, 14 Dec 2023 12:11:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B1DC8D00D1; Thu, 14 Dec 2023 12:11:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8A7118D00C7 for ; Thu, 14 Dec 2023 12:11:46 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6672A401EE for ; Thu, 14 Dec 2023 17:11:46 +0000 (UTC) X-FDA: 81566065812.16.E60632C Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf08.hostedemail.com (Postfix) with ESMTP id 3A49E16001C for ; Thu, 14 Dec 2023 17:11:44 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=tv2Wo73P; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf08.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702573904; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mxUe6pUm6l0Cf+AsboBSISPAY2O3/ATI4QcyYk76Qaw=; b=cgKqZM3m2n3XcH1GL6mZEfqoMWL5kZ+TYVudG7dErxdGehukPGjruTYLskmcWCNHMWhQXG txQPX4o+RyxfcnQtbyQjaFNMOGXimhMgbYnpZ53HQFK6TJP5AVrc1VFvLKr+OoN316CWbN szUBEEg7+QgqbQ7c3fu99dKfzhbKcDA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=tv2Wo73P; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf08.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702573904; a=rsa-sha256; cv=none; b=iFfv4gUV3o6sk0458cGePFNHi4Q8q8Ixt/RL8Z+Fv522I/WWg4sPI5WemwZYmoC+OIjJce Y41nA2EUo0ZtQh0WrHjcQlqimLPEWFupVLaYGa/HM7K1Icxnhuz/6rwOkJHv3pnHvOqkna 6uT+VMmsg84bEx3xgWooYbjDyH1BO5k= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4257ba1bc5fso63047261cf.0 for ; Thu, 14 Dec 2023 09:11:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1702573903; x=1703178703; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=mxUe6pUm6l0Cf+AsboBSISPAY2O3/ATI4QcyYk76Qaw=; b=tv2Wo73PKEfL+qe1Vh1+a3nPmnmzPYd2bSpAXXjxhGZv9P2RPi1JMIJ/eiqHDAeUXI r74XBtCuwZcXN7/6VP3WAkTRJGBOBZwFnccduzyATrvq5+NVKIGzXkXLqSa9dWWY/6lY 2Iv22xOIhgT0oDjZ8nxIA+s5Vic788D4a7ZpRqd/PAsiVZv/Ia4SKnH/Mqkeyfk6Mmsk rkafe8cs+QLvDQWM9uJIoXQDAaY9V7+6lCynzTKpfnJJFkxxJ6C7GXD8VGGDdj3EGShI RFCtt0psbTeHh4/zc2yugrmEQw+Z6N7Xk8nNHmZrtER6iziTqAVUvEzP1b+5spRPjiRZ bzIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702573903; x=1703178703; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mxUe6pUm6l0Cf+AsboBSISPAY2O3/ATI4QcyYk76Qaw=; b=ZyfBXOYl6E1Zjj0oZTU/LUCSCA++BskUnTeEIezZBXdBZT3Btp5pv+adDWloASV29L UdTJBql1x05QaULUIoXrh1PrcuFcoNnSRodrO/DOe/gjTV8DPm/KkSx7pLEjtojlF9Dv SCpqIJBb9BksMgF0Bs+2XPXHftKRzLWURSfwC8alLHbKdT5TXMCa7yXXG+xoN5qaDJ5b +zmkiqdikM24dnsx7Yhtjv/9Dx32d7XY9QBlJBb3W+J2MZdAx3nCZb392YBAPf5mrc6v CZN0UslUL69a2aVNwqUpiEzdHgYHhzmNtctANDmRJGTLoJm9+9DZa5M/KxH8EayVq/UM ZEDw== X-Gm-Message-State: AOJu0YxZ3LsDAGkXpFlHlcfaAxsm1Lj82gVCYqONyROwg7WgbMJptAQe 8w6DZbpEb3INZZSEO/E1LI67LQ== X-Google-Smtp-Source: AGHT+IGDCYzqXWzUxXKjMVSmW4MwYuT+8dRIEGxntC/zMVHNGiAZX7h9jOSwoAdpLErOc9Au0X9hgA== X-Received: by 2002:ac8:7d52:0:b0:425:4043:96e2 with SMTP id h18-20020ac87d52000000b00425404396e2mr15224897qtb.111.1702573903122; Thu, 14 Dec 2023 09:11:43 -0800 (PST) Received: from localhost ([2620:10d:c091:400::5:a0a6]) by smtp.gmail.com with ESMTPSA id e7-20020ac845c7000000b00418122186ccsm5911083qto.12.2023.12.14.09.11.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 09:11:42 -0800 (PST) Date: Thu, 14 Dec 2023 12:11:37 -0500 From: Johannes Weiner To: Minchan Kim Cc: Chris Li , Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Zhongkun He Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling Message-ID: <20231214171137.GA261942@cmpxchg.org> References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 3A49E16001C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 75dxdomwh4ksu71sg5qm9wsnwaw3p416 X-HE-Tag: 1702573904-623713 X-HE-Meta: U2FsdGVkX196nHK8KpNtpMiV41icfCJBEQl9fMV1C3MHot3ItromL+yQudz2/VX+0zoS1AQxlsR2eqVXdMQVEdHhdgGk9L4WT3HrDYQlOYw89X5GQE46eZ5d+/sPixYYRNvwKHT515ZiZLFZJupuEGnzFCvcHccAsXBwbuqwG3zLfl51tmTHdbwFU+RnrtG92lVc4kZxFtq5/h2qbndvEFfSibXhvYIcU+HgqlCQFQnp310lG/33uON9hJy3LHCtVxWH9lM2uwwbGBC71VS75gPObbmHhw2kc6iZ2JSUjQBm6ZI2z6uPw52azl6ezSSiin7tlN+utDijekjYTUCeQCdeeWH5mqczKoPkIvk0n11xjhLducDil9p9EEu+FMw7pptrIyVE29PYa1BSpkITO9gJH8/z+h4BSsx1d1colaGbVXUJzgXaQa77M1qPX/fd2FyQ3IERXINwildImA8iF0MVFl0jexPu35El9YrKK3qdb1DBxQ9VkT2KtAnAQIYIz/7eXTHPLVSIjirEcrwtzm3T/MJzluNj6NE+ehzfkKjLBNPAAdS0d21mQQSCgWHXG/93p3059D1tZFzr1rwE9DrC7vTkXt82b39ggzWF6kBMrbHBzceK2XuYT6943QpIqMvpTu2bIQs26MGOIVdQQV75upYtcMOvd7H5iK93r6dxPFtlemGZOqwWUfml9q2LLZ/BHmw8A5yylDaRLWiPtXCQi8AMiOmnh4UdlsSMeTjZyE1+6RPe2d1KaR/RRqY1f2psWbxelk6qlQO1rFUVa5i1zvBpfXdYiUaM8FF0zoLHqxNnotuhSISgtNN8ZJguM6tT5TfSHjaK+JBbpR42Y63+5Q+K3ACXUmwRZoq5u3blD4cKWans0ix+13wP88HzhfHQZPoZskaQCZ7nClUTioyNlEPPecGw7Kk3FCWxIv+t50V87cAPO+SgKSZSqgwCX0dVPz2Yx2kExShP8Sy DjUryA3x vg3fl9SfyzGDia+u+YegFcYma2n3h2a8RhyFET9CjRKLXlHqfURw0m+3/Wk6JBitu6idfMccj8ghjgIj1hX9UQCoruX/ZO5zXddRcQ3VWMkpCA8ieahQD7g3r6NsLDngw2g5TCoEp2mvuhJjrspAiCg2X7XE7WVeox3CUsiz0LND/XLDHkGu8dfulGO0Sh/8blY2f/sBiGwnWq54nUrgGwpa7u+iUQAf3yhDCnf//vGgY+Amep1SlhfEZKr173gMn0FaMWU4vt+zuQclSU3O5cqszd4MYJS0rpCsgas4T3oNDjFb/hsYog+9nYBoGkNQGX+VXfVYYFKjuwNdp5uN73OqTG7xtm34msS5Sl2NufchmCNixPFgphJHV8Z/Cwb+QjOJWREPXkKw+bSSaoKswD0xYay3q3USNSQL9502hA8fAuiL6bLs+D1T39MqE+i+S8iHCYz4eQV03jbb0reZbQYrREzpzxOvxGHaG3+3NsSnvmaV5ppu2mRiJW9H29fEotn3jVSiukylr/vB8TOK94yUTiLUj12o4saa0CK+vaEqyCl5rs6s8Am8xmpXer3XnK7ffgaw17yyQ3XBuLv8wumY5aFscokN9yfvp73TOnmEzfHABlcOdhFCXnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 11, 2023 at 02:55:43PM -0800, Minchan Kim wrote: > On Fri, Dec 08, 2023 at 10:42:29PM -0500, Johannes Weiner wrote: > > On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote: > > > I can give you three usage cases right now: > > > 1) Google producting kernel uses SSD only swap, it is currently on > > > pilot. This is not expressible by the memory.zswap.writeback. You can > > > set the memory.zswap.max = 0 and memory.zswap.writeback = 1, then SSD > > > backed swapfile. But the whole thing feels very clunky, especially > > > what you really want is SSD only swap, you need to do all this zswap > > > config dance. Google has an internal memory.swapfile feature > > > implemented per cgroup swap file type by "zswap only", "real swap file > > > only", "both", "none" (the exact keyword might be different). running > > > in the production for almost 10 years. The need for more than zswap > > > type of per cgroup control is really there. > > > > We use regular swap on SSD without zswap just fine. Of course it's > > expressible. > > > > On dedicated systems, zswap is disabled in sysfs. On shared hosts > > where it's determined based on which workload is scheduled, zswap is > > generally enabled through sysfs, and individual cgroup access is > > controlled via memory.zswap.max - which is what this knob is for. > > > > This is analogous to enabling swap globally, and then opting > > individual cgroups in and out with memory.swap.max. > > > > So this usecase is very much already supported, and it's expressed in > > a way that's pretty natural for how cgroups express access and lack of > > access to certain resources. > > > > I don't see how memory.swap.type or memory.swap.tiers would improve > > this in any way. On the contrary, it would overlap and conflict with > > existing controls to manage swap and zswap on a per-cgroup basis. > > > > > 2) As indicated by this discussion, Tencent has a usage case for SSD > > > and hard disk swap as overflow. > > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.com/ > > > +Kairui > > > > Multiple swap devices for round robin or with different priorities > > aren't new, they have been supported for a very, very long time. So > > far nobody has proposed to control the exact behavior on a per-cgroup > > basis, and I didn't see anybody in this thread asking for it either. > > > > So I don't see how this counts as an obvious and automatic usecase for > > memory.swap.tiers. > > > > > 3) Android has some fancy swap ideas led by those patches. > > > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@kernel.org/ > > > It got shot down due to removal of frontswap. But the usage case and > > > product requirement is there. > > > +Minchan > > > > This looks like an optimization for zram to bypass the block layer and > > hook directly into the swap code. Correct me if I'm wrong, but this > > doesn't appear to have anything to do with per-cgroup backend control. > > Hi Johannes, > > I haven't been following the thread closely, but I noticed the discussion > about potential use cases for zram with memcg. > > One interesting idea I have is to implement a swap controller per cgroup. > This would allow us to tailor the zram swap behavior to the specific needs of > different groups. > > For example, Group A, which is sensitive to swap latency, could use zram swap > with a fast compression setting, even if it sacrifices some compression ratio. > This would prioritize quick access to swapped data, even if it takes up more space. > > On the other hand, Group B, which can tolerate higher swap latency, could benefit > from a slower compression setting that achieves a higher compression ratio. > This would maximize memory efficiency at the cost of slightly slower data access. > > This approach could provide a more nuanced and flexible way to manage swap usage > within different cgroups. That makes sense to me. It sounds to me like per-cgroup swapfiles would be the easiest solution to this. Then you can create zram devices with different configurations and assign them to individual cgroups. This would also apply to Kairu's usecase: assign zrams and hdd backups as needed on a per-cgroup basis. In addition, it would naturally solve scalability and isolation problems when multiple containers would otherwise be hammering on the same swap backends and locks. It would also only require one, relatively simple new interface, such as a cgroup parameter to swapon(). That's highly preferable over a complex configuration file like memory.swap.tiers that needs to solve all sorts of visibility and namespace issues and duplicate the full configuration interface of every backend in some new, custom syntax.