From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B133FC77B73 for ; Tue, 30 May 2023 18:43:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 224306B0072; Tue, 30 May 2023 14:43:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D3796B0074; Tue, 30 May 2023 14:43:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C3E76B0075; Tue, 30 May 2023 14:43:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EEC4D6B0072 for ; Tue, 30 May 2023 14:43:14 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C25CA80310 for ; Tue, 30 May 2023 18:43:14 +0000 (UTC) X-FDA: 80847793908.18.16D517B Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf18.hostedemail.com (Postfix) with ESMTP id B4E6D1C001B for ; Tue, 30 May 2023 18:43:12 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=eztyZFUd; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685472192; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gYlC6f1ptZM9RLLhM2nAeeNeJg2ua4dk7rTUSObgR8A=; b=kAhh95SOgFhmGtAu0dHyR1Yn1TCuer9/h5HPAuPVvBoob+JCZAxF+tWF0IJ4DGGueUfkWd RsfXAeDYbk7fDztw80sgDz3WUX7BBCzriqD8K14yMRtaTlZV3qO+BDUCTQPrErYInbc1kO YIfG5KZLGGm67fG02nMCs7YDQwg4f/k= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=eztyZFUd; spf=pass (imf18.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685472192; a=rsa-sha256; cv=none; b=Nsk3S2a/eBhPNbivrsrn/dBAeBYf28eqEow8NxMEORhytSDbCgeKmq2DfxDxQKsAJN8QOr kzsLARs8upGQ4733NOy9LjiIOAKlki8a4HwGSwrdHO44Pga9YmIb4NOC/tpyqODdGJy41J cL13RPvBMFUM+EyBvkdXv6j/i2gBm00= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-96f99222e80so22119566b.1 for ; Tue, 30 May 2023 11:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685472191; x=1688064191; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gYlC6f1ptZM9RLLhM2nAeeNeJg2ua4dk7rTUSObgR8A=; b=eztyZFUdNTCsYWXIz8bXNkVgGqmeVODl99L+cyyUAHCyaDT66qPJExCC8yi4Fipu/G KzP7gwyLFSGrwT6y3UyLF5jFLFaRXlxWxIgxGTVLSDkEp44GgQz+UhNsrpXJyDybbp+2 +9R5txif6Tq1Gshc+NQRqRyyj9cRFK4I4HPqrTgEIWd7LlKjDsnB2ZVO5rxIsRF2SuU/ PORCSSdJk3V+njyKDZ9s3QdOMtKV09CmQyn8SLuAn7e0trk+5MHTt4VjvtV4AP9F4Tns U8x3pCsPQSMYYullXI5WBytvYTln9ewl6dSjFrtWu6H/T21edwMwaAGg23yP9LCru3WH agEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685472191; x=1688064191; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gYlC6f1ptZM9RLLhM2nAeeNeJg2ua4dk7rTUSObgR8A=; b=GUWObeOYUluD0Ub9X+ST6bEE1OF7tehmO36+Wwe0m9yHrcpwp0KXETCjgJ9BSx9ugQ MZV/6CZ3YgZ/grOVglhZ+kmIWYpFr8hHecV+MUhoYlBg/KYMcluX6dTGdtLQ5TiJcc1G 90FvCdl0CFy5i9vphxELQCwN6SgJn0f7qsZcRZCyhAcULIAUapWkvNi0sqnL/MOrhIat 9QGjwi/UIPV2mQ9i7Okx69BSppxy6aL2veQL0ewvMJyBE7wq1HH5MEPYRH4GnVWzAjwf RHzqm67u8BW0sy/y9+SxCek+yoZ/gUAr6TXn4oBSYyt8BorfAJctRHVoNztTKKua6xam 3kvQ== X-Gm-Message-State: AC+VfDz8csMEV+B7Mh+dqDyAfuoBN1hu5Oe5P10MyTLOwB68ns3Jrzb8 HUO2KgkwtbZA1tDbYAMInYzIFYPwi74RXGRuNylZyA== X-Google-Smtp-Source: ACHHUZ4Up5j7j1dkGSw8jtZaye0oKfZ0iz4LLsDnW/efMFR9/mRGSpcrTUU9zlCHs3dveVNbfOkvYupOm5tj0mlvlms= X-Received: by 2002:a17:907:9721:b0:94f:2916:7d7 with SMTP id jg33-20020a170907972100b0094f291607d7mr2884534ejc.19.1685472191237; Tue, 30 May 2023 11:43:11 -0700 (PDT) MIME-Version: 1.0 References: <20230530162153.836565-1-nphamcs@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 30 May 2023 11:42:34 -0700 Message-ID: Subject: Re: [PATCH] zswap: do not shrink when memory.zswap.max is 0 To: Nhat Pham Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: B4E6D1C001B X-Rspam-User: X-Stat-Signature: bcczkzbsc3krk6nu5qxufb1e1e6fa5u6 X-Rspamd-Server: rspam01 X-HE-Tag: 1685472192-870183 X-HE-Meta: U2FsdGVkX1+5j62T3X03JcN9gWOisO9E1l+A1XpDfyeQoerHLvLg94Y29kaSlPIBx7IFdhiWkY5S3WeNXJYWlkFrDMDAW67U1pvxRb2MdRNPNA1Qe/RwmJQbHgj1buFb1QZt0javHmYeE8XA/9GOUJS7hBR4TYWF13OMpPtZ8EAgOCRw61WHFsX8+XO9K0HvhTKsx4Vp6n3E3zh6NAKIblKE8T7E7cjsm98rh1/LKfm3y+M7Kt0b+MBWg634RwkOtfCmQAJj1EG4TY8BCKTwDx/yZwbD3JaAWt4ntVxEEAUhnfklcXo+CKpvO6ZbtSQcro9QC3WvSC9C7f7JrR0vS9Ubg19PbIEsnJe8lai8FeR60wKj0hVxiReVzfzGzdL50clOLhRZ1gSu2CsXMna07mTkc+vcxSPxq5Z4lPXV2t894C/EaUCttLrMATxwYs0HGqqwjdEywRiAmVHauJ4CaqvpKfldDrcwpfFhE9V2xzExCAAclgwEAuXMPfg7jcuhWFL8JmzkvCC4cpSjJBKEpF5gNZPsdRaqURqLkP/ZTVQ1XqubzeYygC99MvRck7IRr/PWtfTgIivL3dwothTUUY/iOQI6hapqkMbvyuo3Th3lOVrkBnu/R2t4Mj66v8mHinVWSSGI3qapBL18Z+Vi8OQX2T6kUg3lJix5Ch0evtO9mass2flH7PuvMKaJ2uR/E4egOmK6zfm6fpgYIQB5WtHLo9VkmOnZwD/P5gjwRsUcSY2CXD3bDm4/EMteqiVQqhqxlz1tGSo9XCK/UtxwNdsQVSW6ulBW20w0LDjp/hSufANDXO9WEvLyEfACUGKk9NyZCkr1DhBOMAntdh7bgv+eb0UOwmogHNpCveqBandVXtUfpoZe7Tr8EtHJJravVbmEDUqI21NhSNhoSAQQzIpDxovB2ptv3aHkjg7tDKfdtQwUE6hnhod501JtMRAIIQm6tclLr5u8qK7nBU3 cm0xbvzt eJE+ukZhfeRBehznowkJLhRxCnfcbP0q2rpInDx4dFaS7MmdLJ0znZKz4nQwXUF7WfEZNUXVdgv1duJ1uMFz5Ht5Joy/Fe/DPHFJit3Qr4wxbcmjO+BSN+NzuMYvZ3FhTu5jtQNreogj5kalOnaA8uswfTVnJToIgxYIm/zfRu4kTopFiEsaRxgQHN9IATzqaLv65ARrkYlK+dzqBWt1T46NZGyKY80SSTxLST1hcjPCEqyvzQL/ko/9OhtbRVbaO8LBk/eSrsExIow3lkDTQifBCiA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 30, 2023 at 11:27=E2=80=AFAM Nhat Pham wrot= e: > > On Tue, May 30, 2023 at 9:53=E2=80=AFAM Yosry Ahmed wrote: > > > > On Tue, May 30, 2023 at 9:22=E2=80=AFAM Nhat Pham w= rote: > > > > > > Before storing a page, zswap first checks if the number of stored pag= es > > > exceeds the limit specified by memory.zswap.max, for each cgroup in t= he > > > hierarchy. If this limit is reached or exceeded, then zswap shrinking= is > > > triggered and short-circuits the store attempt. > > > > > > However, if memory.zswap.max =3D 0 for a cgroup, no amount of writeba= ck > > > will allow future store attempts from processes in this cgroup to > > > succeed. Furthermore, this create a pathological behavior in a system > > > where some cgroups have memory.zswap.max =3D 0 and some do not: the > > > processes in the former cgroups, under memory pressure, will evict pa= ges > > > stored by the latter continually, until the need for swap ceases or t= he > > > pool becomes empty. > > > > > > As a result of this, we observe a disproportionate amount of zswap > > > writeback and a perpetually small zswap pool in our experiments, even > > > though the pool limit is never hit. > > > > > > This patch fixes the issue by rejecting zswap store attempt without > > > shrinking the pool when memory.zswap.max is 0. > > > > > > Fixes: f4840ccfca25 ("zswap: memcg accounting") > > > Signed-off-by: Nhat Pham > > > --- > > > include/linux/memcontrol.h | 6 +++--- > > > mm/memcontrol.c | 8 ++++---- > > > mm/zswap.c | 9 +++++++-- > > > 3 files changed, 14 insertions(+), 9 deletions(-) > > > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > > index 222d7370134c..507bed3a28b0 100644 > > > --- a/include/linux/memcontrol.h > > > +++ b/include/linux/memcontrol.h > > > @@ -1899,13 +1899,13 @@ static inline void count_objcg_event(struct o= bj_cgroup *objcg, > > > #endif /* CONFIG_MEMCG_KMEM */ > > > > > > #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) > > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > > void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); > > > void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size= ); > > > #else > > > -static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > +static inline int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > { > > > - return true; > > > + return 0; > > > } > > > static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, > > > size_t size) > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 4b27e245a055..09aad0e6f2ea 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -7783,10 +7783,10 @@ static struct cftype memsw_files[] =3D { > > > * spending cycles on compression when there is already no room left > > > * or zswap is disabled altogether somewhere in the hierarchy. > > > */ > > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > > { > > > struct mem_cgroup *memcg, *original_memcg; > > > - bool ret =3D true; > > > + int ret =3D 0; > > > > > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > > > return true; > > > @@ -7800,7 +7800,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *ob= jcg) > > > if (max =3D=3D PAGE_COUNTER_MAX) > > > continue; > > > if (max =3D=3D 0) { > > > - ret =3D false; > > > + ret =3D -ENODEV; > > > break; > > > } > > > > > > @@ -7808,7 +7808,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *ob= jcg) > > > pages =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) / PA= GE_SIZE; > > > if (pages < max) > > > continue; > > > - ret =3D false; > > > + ret =3D -ENOMEM; > > > break; > > > } > > > mem_cgroup_put(original_memcg); > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 59da2a415fbb..7b13dc865438 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1175,8 +1175,13 @@ static int zswap_frontswap_store(unsigned type= , pgoff_t offset, > > > } > > > > > > objcg =3D get_obj_cgroup_from_page(page); > > > - if (objcg && !obj_cgroup_may_zswap(objcg)) > > > - goto shrink; > > > + if (objcg) { > > > + ret =3D obj_cgroup_may_zswap(objcg); > > > + if (ret =3D=3D -ENODEV) > > > + goto reject; > > > + if (ret =3D=3D -ENOMEM) > > > + goto shrink; > > > + } > > > > I wonder if we should just make this: > > > > if (objcg && !obj_cgroup_may_zswap(objcg)) > > goto reject; > > > > Even if memory.zswap.max is > 0, if the limit is hit, shrinking the > > zswap pool will only help if we happen to writeback a page from the > > same memcg that hit its limit. Keep in mind that we will only > > writeback one page every time we observe that the limit is hit (even > > with Domenico's patch, because zswap_can_accept() should be true). > > > > On a system with a handful of memcgs, > > it seems likely that we wrongfully writeback pages from other memcgs > > because of this. Achieving nothing for this memcg, while hurting > > others. OTOH, without invoking writeback when the limit is hit, the > > memcg will just not be able to use zswap until some pages are > > faulted back in or invalidated. > > > > I am not sure which is better, just thinking out loud. > > > > Seems like this can be solved by having per-memcg LRUs, or at least > > providing an argument to the shrinker of which memcg to reclaim from. > > This would only be possible when the LRU is moved to zswap. > > I totally agree! This seems like the logical next step in zswap's evoluti= on. > I actually proposed this fix with this future development in mind - with > a per-memcg LRU, we can trigger memcg-specific shrinking in > place of this indiscriminate writeback. It seems less drastic a change > (compared to removing shrinking here now, then reintroducing it later). As I stated in my reply to Johannes, I am just not sure that we will need to special case memory.zswap.max =3D=3D 0 when we have proper writeback. WDYT? > > Thanks for the feedback, Yosry! > > > > > > > > > > > /* reclaim space if needed */ > > > if (zswap_is_full()) { > > > -- > > > 2.34.1 > > > > > >