From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81904C77B7A for ; Tue, 30 May 2023 18:27:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07C1D6B0072; Tue, 30 May 2023 14:27:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02C3F900003; Tue, 30 May 2023 14:27:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3622900002; Tue, 30 May 2023 14:27:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D031A6B0072 for ; Tue, 30 May 2023 14:27:24 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9548B1202DE for ; Tue, 30 May 2023 18:27:24 +0000 (UTC) X-FDA: 80847754008.15.2563C73 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) by imf05.hostedemail.com (Postfix) with ESMTP id 86737100013 for ; Tue, 30 May 2023 18:27:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=mhdTq7lw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.49 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685471242; a=rsa-sha256; cv=none; b=qP3Wxgt2rKEwrANs6Qkv9sgUgRfY6jaffEApmHiyMDEgmkh+O4oE2eJbbgHhLbqyDLNuuC zVOfMUEJLfghfBORVPwqIom1cUoQYXGYtNF5OFxRlYUvLNchFByALOYirCCpmOwA88skGF LtiDjnhceOJEB1tKG8s28jx40FMX6yU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=mhdTq7lw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.219.49 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685471242; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WgX1FQ0z8o/q+NS4H/4WUbgQk3q74ANklOJUqgkv+F4=; b=LWHTQehKncrmz90f+2D5wm/Jaj/s5WgF0l/Hsq4yDF9YY4G9FIRaYw6wyzMViF1So0mtKw BCJecNeoaoI9tUWd+5SBbmCKa2+/7VORduJ4BbcgGxtlCFrk8pP9hz/OHAdtRc0qDPayJa ZrLJS3/Xrgo7OCIRB+5UBKw1EAzteBk= Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-6262be06e2eso10019106d6.1 for ; Tue, 30 May 2023 11:27:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685471241; x=1688063241; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WgX1FQ0z8o/q+NS4H/4WUbgQk3q74ANklOJUqgkv+F4=; b=mhdTq7lwqRXFPgouNWrj3sd3x47u+PVzoR/3WdAaU3N+crD+MwB7Qm0kdEkPLHjqjW W4F0duuGeDlidiDPIdJ/Y+tpet57jzCFYNQys4CBusazL+U/2RosfxnMn3TZkfnXc3hb m0ngGLoWLTKG2IbAqGdSMWnNVG2q1YLH16Fmbdl2azcthxLwBE5aruMT6JPRYr4eEPRk XwqHJVbPzMoe9kQDYpGxGtAhheouckh19XowqaLvhotx98KARim8+I/KxfT+SQPS9D1K cSHjZ7NTQOLi432RnCLWNMpXNbnnVJ/jad8fTU0cNg+lVM+lqvwQrEyGtTIFPqTTpI9Q Ru5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685471241; x=1688063241; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WgX1FQ0z8o/q+NS4H/4WUbgQk3q74ANklOJUqgkv+F4=; b=bEiymSezvDxh7/S/jQppywj5psNopAfRP8MMpRnvjQDQRvbJ9xyYCQhdkQXL93vcy/ nIJYcJrJhXNXtUv5NiaUT5BQzlf2nAL3dbvy5ChtU8gAmjXnRy1CSys0efq/xEp/7iSP Mjz5GMvIl5hV6m+KqzzM8rvRF5wZWW0zcd7UMUqieLCcmiLYbdiPnkhHTq2HNdNnpHwZ J9hANp3z23sYW/C4mxP7fYguYpN91aUhBnjOGCh3XdX0aB31/ZzAPSD02Lne47hhmuUJ 9jL4mEKui9e2o0r+YgBxmq2F3qgrw8C2aGG0IxfCGJeQ7CL0WfQByXaT1kqPOLjFuQwd F15w== X-Gm-Message-State: AC+VfDwB1tayKWIrkf19IatF//27ZSgd7n/+OdRlBMaYMrsyEsWGtbuX QWsmdcsjxLdm3KyLFQib0Zl/TvnLKo0nQ0SEPt8= X-Google-Smtp-Source: ACHHUZ7JWv5UFJgNlPKuN5ECgH8o17J83GhtqrLglAJUcmV9Ngrua9ObhQyYFWsv6JmkVRvum5VYSG1wQ6L3MzlZA5g= X-Received: by 2002:a05:6214:2627:b0:623:c96a:e735 with SMTP id gv7-20020a056214262700b00623c96ae735mr3740707qvb.1.1685471241425; Tue, 30 May 2023 11:27:21 -0700 (PDT) MIME-Version: 1.0 References: <20230530162153.836565-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Tue, 30 May 2023 11:27:10 -0700 Message-ID: Subject: Re: [PATCH] zswap: do not shrink when memory.zswap.max is 0 To: Yosry Ahmed Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 86737100013 X-Stat-Signature: fdru1exy6ayyp3ib4kgoa4y4k74qgxmh X-HE-Tag: 1685471242-239343 X-HE-Meta: U2FsdGVkX19a4c8zr6RSPO6Y/Lo3WuJXD1YMC6UoxTQYJriaj3vqGiQG5ePcnredMcwYZzItTOWHaTQrHTajjavrg9bcNH+QaSfqe9MdRSo30xY6tCoFVCvXmwvlsRIQgjxKh6XxR3B8v1SxlKmwbV1kiEAV7OkxGFBQcVegqu6YH9gUJQRD5uac6WjSrsjTp/434NET+NQ/8SQ9hi3IO+ulfSsJ8q2jeAYA+lkA6Q14g0SleCZ9tabVNHK4jq5I0RlSCyWhFLU8i70sxY8AQKSISkzeTFbsVzvGmYDBPDE/geMRdzpRrnbm8ZlZ9ytVTmVdcr3eOfC1aO89GSulpjX364hjexnUMHm9ReiT/HLZPM+Zbh5hmtQQZKAlS7UjlszCoHkPTG7nEpTG9+kvElbYTZt6RMRSsYMEd3PpNh5s7fN/xLXhv46QkO0urgQKJ0ikvv165v+y3huXrt1uqiMdOUhNbuxZ+XIqQQmvrzhmg41CrZkO5jCZglHhWaYzUnnL1ucCIH/y/X+iRNX9at3IyoupAsLuRNUYbXVr3jh3s4pKywlTDwXDnAROoUmUlzwi+zPo+TxUnkHOQnobaIlm+RwcUXMgqLAyuTYnZCM+TqFtFdYXVsDDNblIIQOs3yBn6RIEpgV0z3tZg6X1ifqi7Aa0z9tRzt6SdBUfCLFXmcidJrYg4rfZIexig/6vSHmujzFliol5BzNRsPMLmSfr0An1KxISl4CgsMa0HihRt+PVekvhhwjrA3t+NrYQKOQ6LqEByYe/qgOsNuIg4GlwRlfKWAlrhKtRSGfaJc7XKHtaDnIm8vWaltDesMgw5HMHwrm246ej3tqs5TDVQPBppm+Mzy2gA37XfF9XlZuzC79nCpgj9hnHeSVStQhvbswPWi3U515UxUeD2kutkKXv0vR1a/lxZanqAqErZlNlce7u13Yu6klCjkTBmtQV1bvhM4HBxXEJ+5jQQbC ephyetl0 +Qjn07w7zUv7vFhhq2Z6bI3hYvQOq6Ux3t3opGoWdSksADvfa0XeMrUMgOwsi1cY6wi/X8NXSsR3FVpEMfBflAlZTqrJKZ+iyPMSzC828xwPTWQqcHkrW61dW294QCDkuS+wuhtB0TqHsLSnNvw5B01QmYXNfqRzOxbX1P0Tm3Hhwc3MVOSQK6WdCbVSVrvdVLB5/yRI65Np5bDDUYKxkOR4rmvU+XEhtpM7UkgAwWvUVzpOHkhLknpkqhwvp4huN3IXXaig5wzlLTLQAQEZEelpEqJ6pqxlSU1A2hTp0VKYSctOimwpemE/Ujw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 30, 2023 at 9:53=E2=80=AFAM Yosry Ahmed = wrote: > > On Tue, May 30, 2023 at 9:22=E2=80=AFAM Nhat Pham wro= te: > > > > Before storing a page, zswap first checks if the number of stored pages > > exceeds the limit specified by memory.zswap.max, for each cgroup in the > > hierarchy. If this limit is reached or exceeded, then zswap shrinking i= s > > triggered and short-circuits the store attempt. > > > > However, if memory.zswap.max =3D 0 for a cgroup, no amount of writeback > > will allow future store attempts from processes in this cgroup to > > succeed. Furthermore, this create a pathological behavior in a system > > where some cgroups have memory.zswap.max =3D 0 and some do not: the > > processes in the former cgroups, under memory pressure, will evict page= s > > stored by the latter continually, until the need for swap ceases or the > > pool becomes empty. > > > > As a result of this, we observe a disproportionate amount of zswap > > writeback and a perpetually small zswap pool in our experiments, even > > though the pool limit is never hit. > > > > This patch fixes the issue by rejecting zswap store attempt without > > shrinking the pool when memory.zswap.max is 0. > > > > Fixes: f4840ccfca25 ("zswap: memcg accounting") > > Signed-off-by: Nhat Pham > > --- > > include/linux/memcontrol.h | 6 +++--- > > mm/memcontrol.c | 8 ++++---- > > mm/zswap.c | 9 +++++++-- > > 3 files changed, 14 insertions(+), 9 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 222d7370134c..507bed3a28b0 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -1899,13 +1899,13 @@ static inline void count_objcg_event(struct obj= _cgroup *objcg, > > #endif /* CONFIG_MEMCG_KMEM */ > > > > #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); > > void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size); > > #else > > -static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > +static inline int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > { > > - return true; > > + return 0; > > } > > static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, > > size_t size) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4b27e245a055..09aad0e6f2ea 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -7783,10 +7783,10 @@ static struct cftype memsw_files[] =3D { > > * spending cycles on compression when there is already no room left > > * or zswap is disabled altogether somewhere in the hierarchy. > > */ > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > { > > struct mem_cgroup *memcg, *original_memcg; > > - bool ret =3D true; > > + int ret =3D 0; > > > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > > return true; > > @@ -7800,7 +7800,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objc= g) > > if (max =3D=3D PAGE_COUNTER_MAX) > > continue; > > if (max =3D=3D 0) { > > - ret =3D false; > > + ret =3D -ENODEV; > > break; > > } > > > > @@ -7808,7 +7808,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objc= g) > > pages =3D memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE= _SIZE; > > if (pages < max) > > continue; > > - ret =3D false; > > + ret =3D -ENOMEM; > > break; > > } > > mem_cgroup_put(original_memcg); > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 59da2a415fbb..7b13dc865438 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1175,8 +1175,13 @@ static int zswap_frontswap_store(unsigned type, = pgoff_t offset, > > } > > > > objcg =3D get_obj_cgroup_from_page(page); > > - if (objcg && !obj_cgroup_may_zswap(objcg)) > > - goto shrink; > > + if (objcg) { > > + ret =3D obj_cgroup_may_zswap(objcg); > > + if (ret =3D=3D -ENODEV) > > + goto reject; > > + if (ret =3D=3D -ENOMEM) > > + goto shrink; > > + } > > I wonder if we should just make this: > > if (objcg && !obj_cgroup_may_zswap(objcg)) > goto reject; > > Even if memory.zswap.max is > 0, if the limit is hit, shrinking the > zswap pool will only help if we happen to writeback a page from the > same memcg that hit its limit. Keep in mind that we will only > writeback one page every time we observe that the limit is hit (even > with Domenico's patch, because zswap_can_accept() should be true). > > On a system with a handful of memcgs, > it seems likely that we wrongfully writeback pages from other memcgs > because of this. Achieving nothing for this memcg, while hurting > others. OTOH, without invoking writeback when the limit is hit, the > memcg will just not be able to use zswap until some pages are > faulted back in or invalidated. > > I am not sure which is better, just thinking out loud. > > Seems like this can be solved by having per-memcg LRUs, or at least > providing an argument to the shrinker of which memcg to reclaim from. > This would only be possible when the LRU is moved to zswap. I totally agree! This seems like the logical next step in zswap's evolution= . I actually proposed this fix with this future development in mind - with a per-memcg LRU, we can trigger memcg-specific shrinking in place of this indiscriminate writeback. It seems less drastic a change (compared to removing shrinking here now, then reintroducing it later). Thanks for the feedback, Yosry! > > > > > > /* reclaim space if needed */ > > if (zswap_is_full()) { > > -- > > 2.34.1 > > > >