From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644D6C4332F for ; Thu, 14 Dec 2023 18:03:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB86C6B04E5; Thu, 14 Dec 2023 13:03:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E41296B04E6; Thu, 14 Dec 2023 13:03:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9EE16B04E7; Thu, 14 Dec 2023 13:03:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B58946B04E5 for ; Thu, 14 Dec 2023 13:03:50 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 87CF4C0474 for ; Thu, 14 Dec 2023 18:03:50 +0000 (UTC) X-FDA: 81566197020.24.5EC1A73 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 1537414002B for ; Thu, 14 Dec 2023 18:03:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="F4/J1NV8"; spf=pass (imf23.hostedemail.com: domain of fdeutsch@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=fdeutsch@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702577028; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KV6ZB3o2DsttdivkrpbVEQo8ahsA1p8sk0He110IETY=; b=zjNx6BKG8VyG0ZdLPwuNJuLXUdTyfBGmrtjaGz5usELfIrvUihLSqAJfN1SdfqInr08X5Z q1hbduHViWHeUutPhiqGRi/3bnjlMKgkJVgf48Hav2GW3Q8qeC+pJ5CxHJlY/ytrA3ZtBD rLo60+WIwlftGVvQJssiqPucuXicLN8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702577028; a=rsa-sha256; cv=none; b=Zc6WLiCfq9YNbj4aA/D2/cpp2HEG6rq5Q+0q4xb/M4VZVBq1kJhmNGOWzPjHczTrtUWXG/ rj5EP6KSW658C1KTdohNOazzK9z1KKqr4O5jWwIAK4AngEH3+Yyme593zZi2V4bMOaYs7E xzIE1B0LA/q4fgrneg03d0aw3kf8n+I= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="F4/J1NV8"; spf=pass (imf23.hostedemail.com: domain of fdeutsch@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=fdeutsch@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702577027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KV6ZB3o2DsttdivkrpbVEQo8ahsA1p8sk0He110IETY=; b=F4/J1NV8Z88pVOp4Fnhk7GLtmwL2Isn2VIiPmao+bgxn0c7/QckUja9TLx++eNUTSPsuwc cqsOvM4+Sre7tucS/3y6qu4J7CxJQl6FbZyaHPoper+Okv/S1d4GOM6xVlk/XIToEjETmS lmRb74FCRjywaA0p0FXLKGe5qisCawI= Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-644-gh9hmZL9PNChytvcGBv6jQ-1; Thu, 14 Dec 2023 13:03:46 -0500 X-MC-Unique: gh9hmZL9PNChytvcGBv6jQ-1 Received: by mail-yb1-f197.google.com with SMTP id 3f1490d57ef6-dbcdf587bd6so1451512276.0 for ; Thu, 14 Dec 2023 10:03:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702577025; x=1703181825; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KV6ZB3o2DsttdivkrpbVEQo8ahsA1p8sk0He110IETY=; b=lLz7kgCGZTKB6hf2KffgcXLJ7/HxrfpUZLZva9G0lcpT54kXU0TfTS1bnDwYLObdL+ f5j0KzeHOGZyGNmiYgZJ7+Bk3jTgg+lzz7HzF9E9G1OWPDb43Klh0nny0PRZvqacxqJa kjzFNH+512enkgMFwI1yAbpueAtsuL9JfZ7lvKSSmiazqzBzTNRiUsaeZUlQAx03t2MM y+uhQoPvPaid09VEqmXZqlztilCi74tct0JRYaQJF1APxet0sBUJjDCSoSqy5hEmlkZK dGZieZhIf+Tnli23UTb2eeoe5miJtX6h+E2ZCihFcxGwOw/EYxqWAPpk3hSGDlDMZ+eL zgQQ== X-Gm-Message-State: AOJu0YxVTtNXU4zUuas8VTpDIovTVgl1DiNRGPXJCvuCjqs8OgGBnFgY CbLJabqT0HXOq5mQ6hhRp1cMcXiG03olKHquTv7h8p6rKb4Gd9H/OTYuLx0Xht4IXe4jaxpCRjl PCDiKy/XVnMwf5zHaRr/vknXFrAg= X-Received: by 2002:a25:8b82:0:b0:da0:7d1e:6e0 with SMTP id j2-20020a258b82000000b00da07d1e06e0mr6217156ybl.20.1702577025475; Thu, 14 Dec 2023 10:03:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IGOtTvkZAGQiIIv+g8QO+ItWnLPDGZBYRujSgmnUfQr/bqu7pkE9KVJpcDeQugMue98gzjpMs0GUsyYEi7yQ54= X-Received: by 2002:a25:8b82:0:b0:da0:7d1e:6e0 with SMTP id j2-20020a258b82000000b00da07d1e06e0mr6217125ybl.20.1702577024965; Thu, 14 Dec 2023 10:03:44 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> <20231209034229.GA1001962@cmpxchg.org> <20231214171137.GA261942@cmpxchg.org> In-Reply-To: From: Fabian Deutsch Date: Thu, 14 Dec 2023 19:03:28 +0100 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Yu Zhao Cc: Johannes Weiner , Minchan Kim , Chris Li , Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Kairui Song , Zhongkun He X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1537414002B X-Rspam-User: X-Stat-Signature: rkqgnkb3i54ei5fmj7ckqykyuejqa7d3 X-Rspamd-Server: rspam03 X-HE-Tag: 1702577027-157188 X-HE-Meta: U2FsdGVkX1/nFXq+aUyQRG+pxts1NVArg5EMNWS+Hy2RF/k0AH1jWk7x2YtfrfUzWsxZJk9/xSjDYQatTlj9DYgJS5pTU9Ef87zbfw27F5vDjHHQRqlrKCljAnr1G4LR3IGGJPAUXPpa2zyvyXo6sLn8g4sSkAIIh2T095p9xyXBhFV2vs8FJyH3ZUmWhrmPO6s+7UmVsTezQyzqI6gBhd5N+ZLXxWI0zvXiaOqHGScinYq7IBgAsKevOiAdwMx0CzN/Dw+NNBSTlzwMuhruMlL0C/kzwhJ3fxfdjQHLItx956O59pk+taEkRkN3QoUtb4j21GWWYpsn2BM38lqGY/IrxQbBNJG3rbUupVW0Bi2urCCPm58dEzWzA5R+RhpgOPRMv6b3tpJhS1BD7BLHESDB9kid6OGKW+QV3/hXUwD4Rd96ZtaT3McSL+neKK5wbBFcr82z3sc5YJqhRkVmgK4uusYk+fZ5LS84I4/h3CcIOhTkNei1R6ohnD801KWZuBiV9cA2OBsMhH6Qd74z4JDWsG+LLKiZ3bhjCDsq8DYN9oYa3zWiz9qAzsZiRQshDxnLcHGF64b6BUqPH9/8ay6n37YA6tqxdR5jYa+wE3L/4A54xfpE1FbTBDX6Fx+LCbbiClbotBCTI3eUlNVBKsrlQBFex8KLH8Mkznp9f/nvNkg26kOgONIzvjU8hkEXdHdlO0aCjNr1FINrEJFXpMUPG8Mxql427Linz1YHnj+v/uEReg2PBZehNULz53vr4KI4alchVe/FIQm0sCd/9RoYRBAhPSgqmrvcPQmESFM3FSmwXNYgFGS5Od+OynxCI8A/aTTnx/zo0LUDve/LuS2ub5Thcye423ZfxStRittvJlZIqDGTOEvEGiSmxGDcBiFI8L/FOlXkSVLp50yD294PFklDJ7Ro7rN/6pXj0b8hgMOOqeuBdNrc02qYETi/2G9TkIEZNjQCSqjeoGw /E08RRxs DiFugFXB3qu0X7EU4HyIvxFO2S66azjV/r0PmUyMhPn1tNh+p8gtdSivreBQHbUR2FWSK1o4xR94nD9DA3dwKOsxaiOuOA8T60dbG638xwyAx/h44McS8ZfO1dSXSyC5jx40xF5ndMcInhopxwFIp9daQa05RRvwCja4UiXGIoPd2kMTc2u2dI9vZg9C42OacehRxCgQTffQir2IRotaLanSSXDPY8nbH7asCWnNet8BL/lJI4qZ8e6woNsDEfo+hKYRBy1To7V+p+2zNh/sj4XZl3sS9R4CpJc9N5AHgTxbzxZcP1O44bpudSXy+lo7td0dK0kX0yASQhzbk7rhaRWUK08iGiNS1JLel2YfsnRIopVwnmPzYadgKKY8w+5JP6LELZetw7/aEWJaNtCPK8lFKxqp8eWOvI3dXfFVW8Xkco+LTwp8aRVit/x+BwMownvZREhxYUZ8Iqsm7lTicxzS7cSPqzy0Pw5JE8uywwL80g+BkswHr7zLXIQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 14, 2023 at 6:24=E2=80=AFPM Yu Zhao wrote: > > On Thu, Dec 14, 2023 at 10:11=E2=80=AFAM Johannes Weiner wrote: > > > > On Mon, Dec 11, 2023 at 02:55:43PM -0800, Minchan Kim wrote: > > > On Fri, Dec 08, 2023 at 10:42:29PM -0500, Johannes Weiner wrote: > > > > On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote: > > > > > I can give you three usage cases right now: > > > > > 1) Google producting kernel uses SSD only swap, it is currently o= n > > > > > pilot. This is not expressible by the memory.zswap.writeback. You= can > > > > > set the memory.zswap.max =3D 0 and memory.zswap.writeback =3D 1, = then SSD > > > > > backed swapfile. But the whole thing feels very clunky, especiall= y > > > > > what you really want is SSD only swap, you need to do all this zs= wap > > > > > config dance. Google has an internal memory.swapfile feature > > > > > implemented per cgroup swap file type by "zswap only", "real swap= file > > > > > only", "both", "none" (the exact keyword might be different). run= ning > > > > > in the production for almost 10 years. The need for more than zsw= ap > > > > > type of per cgroup control is really there. > > > > > > > > We use regular swap on SSD without zswap just fine. Of course it's > > > > expressible. > > > > > > > > On dedicated systems, zswap is disabled in sysfs. On shared hosts > > > > where it's determined based on which workload is scheduled, zswap i= s > > > > generally enabled through sysfs, and individual cgroup access is > > > > controlled via memory.zswap.max - which is what this knob is for. > > > > > > > > This is analogous to enabling swap globally, and then opting > > > > individual cgroups in and out with memory.swap.max. > > > > > > > > So this usecase is very much already supported, and it's expressed = in > > > > a way that's pretty natural for how cgroups express access and lack= of > > > > access to certain resources. > > > > > > > > I don't see how memory.swap.type or memory.swap.tiers would improve > > > > this in any way. On the contrary, it would overlap and conflict wit= h > > > > existing controls to manage swap and zswap on a per-cgroup basis. > > > > > > > > > 2) As indicated by this discussion, Tencent has a usage case for = SSD > > > > > and hard disk swap as overflow. > > > > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gm= ail.com/ > > > > > +Kairui > > > > > > > > Multiple swap devices for round robin or with different priorities > > > > aren't new, they have been supported for a very, very long time. So > > > > far nobody has proposed to control the exact behavior on a per-cgro= up > > > > basis, and I didn't see anybody in this thread asking for it either= . > > > > > > > > So I don't see how this counts as an obvious and automatic usecase = for > > > > memory.swap.tiers. > > > > > > > > > 3) Android has some fancy swap ideas led by those patches. > > > > > https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan= @kernel.org/ > > > > > It got shot down due to removal of frontswap. But the usage case = and > > > > > product requirement is there. > > > > > +Minchan > > > > > > > > This looks like an optimization for zram to bypass the block layer = and > > > > hook directly into the swap code. Correct me if I'm wrong, but this > > > > doesn't appear to have anything to do with per-cgroup backend contr= ol. > > > > > > Hi Johannes, > > > > > > I haven't been following the thread closely, but I noticed the discus= sion > > > about potential use cases for zram with memcg. > > > > > > One interesting idea I have is to implement a swap controller per cgr= oup. > > > This would allow us to tailor the zram swap behavior to the specific = needs of > > > different groups. > > > > > > For example, Group A, which is sensitive to swap latency, could use z= ram swap > > > with a fast compression setting, even if it sacrifices some compressi= on ratio. > > > This would prioritize quick access to swapped data, even if it takes = up more space. > > > > > > On the other hand, Group B, which can tolerate higher swap latency, c= ould benefit > > > from a slower compression setting that achieves a higher compression = ratio. > > > This would maximize memory efficiency at the cost of slightly slower = data access. > > > > > > This approach could provide a more nuanced and flexible way to manage= swap usage > > > within different cgroups. > > > > That makes sense to me. > > > > It sounds to me like per-cgroup swapfiles would be the easiest > > solution to this. > > Someone posted it about 10 years ago :) > https://lwn.net/Articles/592923/ > > +fdeutsch@redhat.com > Fabian recently asked me about its status. Thanks Yu! Yes, I was interested due to container use-cases. Now a few thoughts in this direction: - With swap per cgroup you loose the big "statistical" benefit of having swap on a node level. well, it depends on the size of the cgroup (i.e. system.slice is quite large). - With todays node level swap, and setting memory.swap.max=3D0 for all cgroups allows you toachieve a similar behavior (only opt-in cgroups will get swap). - the above approach however will still have a shared swap backend for all cgroups.