From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CDB2C77B73 for ; Tue, 30 May 2023 18:42:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CD4F6B0072; Tue, 30 May 2023 14:42:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57D696B0074; Tue, 30 May 2023 14:42:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44576900002; Tue, 30 May 2023 14:42:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 35ED06B0072 for ; Tue, 30 May 2023 14:42:13 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EA7E71C719E for ; Tue, 30 May 2023 18:42:12 +0000 (UTC) X-FDA: 80847791304.19.8836E06 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf19.hostedemail.com (Postfix) with ESMTP id F0D131A0009 for ; Tue, 30 May 2023 18:42:10 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=69R9E8I4; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685472131; a=rsa-sha256; cv=none; b=mRi8RLFdxpFsQeTX0K46wCyzl841DanPexahybg3Qv+6AJG2xoHsQwiaO3xvSbW1yu7ETC KhEay9Pi3dBhgcZbVKAeEoNlND94JqudaUc1k06PwyosFhvso8518Q4zrPp8H/SFM07zrp 1JXIvsyYLxUfV+pbeIms+stniI57w24= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=69R9E8I4; spf=pass (imf19.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685472131; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5ptpIq+v7Ig3U5NWDKCTVGQfDx2GQloUkHm5jynI61A=; b=Fez+It9+69waj/RGzzJ7D1O3TZQm0t+b7Ld7ifidYhRD7tTrUMGYwW5xBGuw6uQq8Iv1l3 nIyyCZ9TNxUYrbXlBMFPs3FBboMEDrUE0wXjFesvlj7sWTXrMcGofw8PPpo/kd/tHxSXqi 4T5Wyn6TT6uJZ9LB+RcZFJi8TMolhCU= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-96fd3a658eeso694473766b.1 for ; Tue, 30 May 2023 11:42:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685472129; x=1688064129; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5ptpIq+v7Ig3U5NWDKCTVGQfDx2GQloUkHm5jynI61A=; b=69R9E8I4Zp6RFpZAGlOwsQWVLQqWfrPKJtVs5l1TZfr+xlKXIIr5dPnxQcm5mTwNPX 9aQkraZShT9wGUnor0gOY/VrqirwNIJ/D1hgIWEPsppFsHzoaEsHr9xBq5aW+taoO52j 9C/cRAMIcCzg1oaqE/YVKXXeGlTaLFK8TF99fAFECv+Mlh6F9xTODxSu+H0C/fwPK/at npbq4gya28vqbUrpL8vTHP5RKCoUxGchPTtX4vtuAof5D5FtCo8k8q3rjxqSul0dAIYZ daX9U9pN75/8Rd3tPiggMITaM6Vpn60awJoGh6FOltUGbz4vVp0F+lyxx93UrFsPV7Mv yx/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685472129; x=1688064129; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5ptpIq+v7Ig3U5NWDKCTVGQfDx2GQloUkHm5jynI61A=; b=GK5WxH8/7trbUUKQs3v9inGoJMej/h3FpXW7c4WElzxnfCtxcLIwbssiC1OiOKoIbJ gS1JgaF2642DoNXDctbI2WgU4e3jWCvGKO9uxJYnz+AtVEBUFQt7BcdzoBRFd8S/FA+g A3ckk64p+4x9iM8d0V8Nf+TsKCA7KoTuqO5Nkoz7caEZQ2U24nKUdWu/8KcP6Gr+yaAJ ymoUAQgtHkJnh/7zzOB7IcXMIILIAph1eyToadHrdh1IQlXdHHYydpiHGFnbgOgsTPC+ xRuIPFpLUcb/Rra7IziSVI+mL9uFkFZMnx1biZVzZ8Roqf1sEHiSxT0qUy5aVIGDHRQx 87bQ== X-Gm-Message-State: AC+VfDwDOQ6ue9JqR/xd82S85h15zEdH3N9VpRjwuCvLLGoULc+E1+MY m/6HhmAdN10L2s8Sus0etRd/IWi7QxRl8rzlaJJq6A== X-Google-Smtp-Source: ACHHUZ5gQNuclOc8mrNfT/UmaKRWUPUZUJ44atI+RCtRLdySPU7GMw83jjQ5hgHmmfySZ/T6jGbkMbbHYPXQ6CDEd4U= X-Received: by 2002:a17:907:3e10:b0:94f:73db:b390 with SMTP id hp16-20020a1709073e1000b0094f73dbb390mr3469071ejc.65.1685472129098; Tue, 30 May 2023 11:42:09 -0700 (PDT) MIME-Version: 1.0 References: <20230530162153.836565-1-nphamcs@gmail.com> <20230530180038.GC97194@cmpxchg.org> In-Reply-To: <20230530180038.GC97194@cmpxchg.org> From: Yosry Ahmed Date: Tue, 30 May 2023 11:41:32 -0700 Message-ID: Subject: Re: [PATCH] zswap: do not shrink when memory.zswap.max is 0 To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: F0D131A0009 X-Stat-Signature: 49ke88wrwwoff9zcig7iye3mizaser4p X-Rspam-User: X-HE-Tag: 1685472130-672393 X-HE-Meta: U2FsdGVkX18PkU0miB7zMPEQhlL1l19k3gNIdwubTkE7ja/Efhbvhqo8EdcH+FafRrLShtKzNm0l5x3YZRleUWu3KSa4ZhEDOPms1oYRolCz4P7j8dgmv0CfIE/ko16GTjKi7Pik0GudsXnrUOKxY/DzWDdTYHB9z7Rm0a1M74BfLbGi47En+e6iKGMvLz5g9UQx3vY5O/q2hapxMJ1vgGn34UyvZdc1gxkIfXiEsshN/QHCUnidtgXa1RUY3zUzCPo3lxdWMrvZE0zqKZU8g4o0gte4f/HO1KJDO0Xoq5THpZOMTiyqFYQcqwM7/BDSTkb2TdgNDifhImHfcXync4rALIPma1Es0ko3yAX+coPSKvQBj2gZLKxNfwj8XSg95M2CCBSx50w9TTRwBHzsrVRlsq6QgpUXfuGgUagF4lQ1un5aTNvgtQSSEkvooTyaK1oMcQtp16co9l+2ifWzaclPKrTIsKxWmW6ziVMVL8FixMMm7cEKd/8ojE37yE3LR82MTi5i2PprjylaQyGknjaTLuouzhlAavqBnos8dtZeC9Z8Sn7ClQSvcHkj4myfhzVxDwgs8SU515mSNubDf6UwHq0HAADE4e7jV6FVavDOUJ4ZBpap01nSAXUoR2DtSt9z89pR5A+Xpvbuqfi5nbKBctuSe7KsIdPO8F+sxuerpFrHw7bqnBqmJWRaoFium6+//oy8JFEam2S53ZBc7rLqfU9b42vCoDV3E1Fs2vJLauSvH4YhAxuSldToIty1IkqD8GPvrSrfjEOns0H8v7g5MNZhxvKeSDFoE7dEK+sIIsLuhZpYgMNDppPYaF4hXF6/x3Dip2u3seeNs1cyO3t0ovxu9mC1fb7TuDix+g7/MOoi0koTh5NeUyzoLcygHEfVT40zG1XUhSsbDng/szIhxwzvTEC6NnbCq9uNBt+CcL2sbV7aci6mpGXxugQ4zgogJCdwsPfQtVaRySO kdxGvLJY zl8+3ben1PtXN9hkbeTy55glff14jGxULWHSg0kPBgpmzilmuMwqU6bS9qrUzAqaZSbjLTDFqi/yjN/FKvBl+4XSX59VTueVPKQeMqvNbjeT2W38MqgS5kGfEQP1q4pKGhoW7grhqwM54odLOafhDuImtI4v7L0khB6i005Q+nhGow2UhJjq/sDLy0Y5i2qujbmbruMRFlLGIpjy1JMzN4zT4zawMiTOSSXe0NqOO5QIe9v64R9MavFxoyATpTeDKTMBg8wZgozO8ee1oiLdL7k+DDNcqD7MVpo38Fuk0XnocRNVayi2kEmyjkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 30, 2023 at 11:00=E2=80=AFAM Johannes Weiner wrote: > > On Tue, May 30, 2023 at 09:52:36AM -0700, Yosry Ahmed wrote: > > On Tue, May 30, 2023 at 9:22=E2=80=AFAM Nhat Pham w= rote: > > > > > > Before storing a page, zswap first checks if the number of stored pag= es > > > exceeds the limit specified by memory.zswap.max, for each cgroup in t= he > > > hierarchy. If this limit is reached or exceeded, then zswap shrinking= is > > > triggered and short-circuits the store attempt. > > > > > > However, if memory.zswap.max =3D 0 for a cgroup, no amount of writeba= ck > > > will allow future store attempts from processes in this cgroup to > > > succeed. Furthermore, this create a pathological behavior in a system > > > where some cgroups have memory.zswap.max =3D 0 and some do not: the > > > processes in the former cgroups, under memory pressure, will evict pa= ges > > > stored by the latter continually, until the need for swap ceases or t= he > > > pool becomes empty. > > > > > > As a result of this, we observe a disproportionate amount of zswap > > > writeback and a perpetually small zswap pool in our experiments, even > > > though the pool limit is never hit. > > > > > > This patch fixes the issue by rejecting zswap store attempt without > > > shrinking the pool when memory.zswap.max is 0. > > > > > > Fixes: f4840ccfca25 ("zswap: memcg accounting") > > > Signed-off-by: Nhat Pham > > > --- > > > include/linux/memcontrol.h | 6 +++--- > > > mm/memcontrol.c | 8 ++++---- > > > mm/zswap.c | 9 +++++++-- > > > 3 files changed, 14 insertions(+), 9 deletions(-) > > > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > > index 222d7370134c..507bed3a28b0 100644 > > > --- a/include/linux/memcontrol.h > > > +++ b/include/linux/memcontrol.h > > > @@ -1899,13 +1899,13 @@ static inline void count_objcg_event(struct o= bj_cgroup *objcg, > > > #endif /* CONFIG_MEMCG_KMEM */ > > > > > > #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) > > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > > void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); > > > void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size= ); > > > #else > > > -static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > +static inline int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > { > > > - return true; > > > + return 0; > > > } > > > static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, > > > size_t size) > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 4b27e245a055..09aad0e6f2ea 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -7783,10 +7783,10 @@ static struct cftype memsw_files[] =3D { > > > * spending cycles on compression when there is already no room left > > > * or zswap is disabled altogether somewhere in the hierarchy. > > > */ > > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > { > > > struct mem_cgroup *memcg, *original_memcg; > > > - bool ret =3D true; > > > + int ret =3D 0; > > > > > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > > > return true; > > > @@ -7800,7 +7800,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *ob= jcg) > > > if (max =3D=3D PAGE_COUNTER_MAX) > > > continue; > > > if (max =3D=3D 0) { > > > - ret =3D false; > > > + ret =3D -ENODEV; > > > break; > > > } > > > > > > @@ -7808,7 +7808,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *ob= jcg) > > > pages =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) / PA= GE_SIZE; > > > if (pages < max) > > > continue; > > > - ret =3D false; > > > + ret =3D -ENOMEM; > > > break; > > > } > > > mem_cgroup_put(original_memcg); > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 59da2a415fbb..7b13dc865438 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1175,8 +1175,13 @@ static int zswap_frontswap_store(unsigned type= , pgoff_t offset, > > > } > > > > > > objcg =3D get_obj_cgroup_from_page(page); > > > - if (objcg && !obj_cgroup_may_zswap(objcg)) > > > - goto shrink; > > > + if (objcg) { > > > + ret =3D obj_cgroup_may_zswap(objcg); > > > + if (ret =3D=3D -ENODEV) > > > + goto reject; > > > + if (ret =3D=3D -ENOMEM) > > > + goto shrink; > > > + } > > > > I wonder if we should just make this: > > > > if (objcg && !obj_cgroup_may_zswap(objcg)) > > goto reject; > > > > Even if memory.zswap.max is > 0, if the limit is hit, shrinking the > > zswap pool will only help if we happen to writeback a page from the > > same memcg that hit its limit. Keep in mind that we will only > > writeback one page every time we observe that the limit is hit (even > > with Domenico's patch, because zswap_can_accept() should be true). > > > > On a system with a handful of memcgs, > > it seems likely that we wrongfully writeback pages from other memcgs > > because of this. Achieving nothing for this memcg, while hurting > > others. OTOH, without invoking writeback when the limit is hit, the > > memcg will just not be able to use zswap until some pages are > > faulted back in or invalidated. > > > > I am not sure which is better, just thinking out loud. > > You're absolutely right. > > Currently the choice is writing back either everybody or nobody, > meaning between writeback and cgroup containment. They're both so poor > that I can't say I strongly prefer one over the other. > > However, I have a lame argument in favor of this patch: > > The last few fixes from Nhat and Domenico around writeback show that > few people, if anybody, are actually using writeback. So it might not > actually matter that much in practice which way we go with this patch. > Per-memcg LRUs will be necessary for it to work right. > > However, what Nhat is proposing is how we want the behavior down the > line. So between two equally poor choices, I figure we might as well > go with the one that doesn't require another code change later on. > > Doesn't that fill you with radiant enthusiasm? If we have per-memcg LRUs, and memory.zswap.max =3D=3D 0, then we should be in one of two situations: (a) memory.zswap.max has always been 0, so the LRU for this memcg is empty, so we don't really need the special case for memory.zswap.max =3D=3D 0. (b) memory.zswap.max was reduced to 0 at some point, and some pages are already in zswap. In this case, I don't think shrinking the memcg is such a bad idea, we would be lazily enforcing the limit. In that sense I am not sure that this change won't require another code change. It feels like special casing memory.zswap.max =3D=3D 0 is only needed now due to the lack of per-memcg LRUs. > > > Seems like this can be solved by having per-memcg LRUs, or at least > > providing an argument to the shrinker of which memcg to reclaim from. > > This would only be possible when the LRU is moved to zswap. > > +1