From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CAEAC7EE23 for ; Tue, 30 May 2023 18:00:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDF81900002; Tue, 30 May 2023 14:00:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8F566B0074; Tue, 30 May 2023 14:00:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3049900002; Tue, 30 May 2023 14:00:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C18496B0072 for ; Tue, 30 May 2023 14:00:46 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7B778A02B4 for ; Tue, 30 May 2023 18:00:44 +0000 (UTC) X-FDA: 80847686808.11.3E488B1 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf18.hostedemail.com (Postfix) with ESMTP id 1CC1B1C0049 for ; Tue, 30 May 2023 18:00:40 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b="pTeq3T/4"; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.181 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685469641; a=rsa-sha256; cv=none; b=lkuUqNldsXN4yWX01xLhWi2aGhhRjEzpfbe+mFaVWSCgo70yOV+yUz9W/owmhp7qfoNtWq YxJDz6UkDDwsHGSiUCCfjzgqj0VbNpNT9TssIWPYsRCg6eO/l7dZYBeXZA8xut6K/Apyl2 JoAmGcvJ1DuFtp0ttPdNUNciLeKVHR4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b="pTeq3T/4"; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf18.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.181 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685469641; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PAUyTlmfG/NzX286qOIA+d3/9/hv10y02ntNbNbWsp4=; b=7LVSIn3sYfezPc2gMIzSGM9H6aaDBz+U5G9Q6p7dSOLCsSBXfZTFOr8JARtahIFaHJ5jeT 67uLirfSy2eS0Keu7bFyqfcfRwcwUX6J62AdDzxhYTRhBDryTsbPht6mBkeK7BJEB1Ompe 6uTCrqOi8pTgRw3mXjkQUYz2arGpWE4= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-75afeacb5e4so558599485a.3 for ; Tue, 30 May 2023 11:00:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1685469640; x=1688061640; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=PAUyTlmfG/NzX286qOIA+d3/9/hv10y02ntNbNbWsp4=; b=pTeq3T/4iDaD4Y9y+5ezzfuqyD0HTP7mGhC5QChKZkokZiuyTtu1LVTFVfpKdS/Cr1 7aJMmbRk3Gfdn+Eb1o5x2eS1awpUya8omy+sUVHnqXmVFc/icBn4SFzp4gXAXCo3CHZ/ Dw45ok7bly0/CjSkZ3Ocr5oMDNsmyKXJ5vCQ8HnAvCnN8q1DXCjTIPreVSFAmNOgbO7Y pHSe3zzsVx1zQuxPaAJ8LJy0S9XEQ53iq5UJox1q4PeyAbDqnWr+DdJKfiDz77yHXGTd zZRwgKmwNHl9sboYZb4o4JW0jQoXetpqh7TlxpKRajUmAZZOXTboSDsw1tH8ZXhRPqKc 3HTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685469640; x=1688061640; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PAUyTlmfG/NzX286qOIA+d3/9/hv10y02ntNbNbWsp4=; b=Ts2FSEWZ3GarMFfwmdI0tDzmfgvwojOdB4Lh4WLzeZISHbVM29JBPCfpMcC6evZKMy Gu8l08u+txQjKzVljYAXgKGnbEOdjyS2ODKRcgscIpRscWZ/IMR/i6sy/22uC9vBEwU1 5f0fXL0xzgRgAuV4lwk1uK7d5QI/zP3AaVTg5B1C6JA3k5bDwSKfguS+lsFx03FZYHBs A4IGcXKQbBJZORC/Y8owLcNyZFD+uIQqpUEL+24zI1AUATpA9vGQWkP0yIVXXhjhqwhP +wgtxSMgOY5+bZsSFMLxPIGq46i3eu8NwOZDjsBGNqIosnlmC5FANp9TlJx8GdTPBc9a 9phQ== X-Gm-Message-State: AC+VfDyyTD+641aSej584g5HWv+hS5aicGnUawH8Hj+3gNWPpH1kNdOH nXmQCjk5P1egl71/GyM4PKpktw== X-Google-Smtp-Source: ACHHUZ4/1Ch0kmW0T32jslFZ93sjg8hICoOeOzr0XBJfwVod7NFxrp4gExjPHSBPH/93NCLvZBH7lw== X-Received: by 2002:a05:6214:c4a:b0:625:aa49:9abc with SMTP id r10-20020a0562140c4a00b00625aa499abcmr3568464qvj.64.1685469639781; Tue, 30 May 2023 11:00:39 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:8bb6]) by smtp.gmail.com with ESMTPSA id et18-20020a056214177200b006212456fd8asm4739509qvb.100.2023.05.30.11.00.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 11:00:39 -0700 (PDT) Date: Tue, 30 May 2023 14:00:38 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Nhat Pham , akpm@linux-foundation.org, cerasuolodomenico@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, kernel-team@meta.com Subject: Re: [PATCH] zswap: do not shrink when memory.zswap.max is 0 Message-ID: <20230530180038.GC97194@cmpxchg.org> References: <20230530162153.836565-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1CC1B1C0049 X-Stat-Signature: wspgiuf36aekhth8u3khwn6o8m3nqj3s X-HE-Tag: 1685469640-76992 X-HE-Meta: U2FsdGVkX18NTn+ZVFme3UCfXP/aMEXlmyZpLQzvR6UAxAjySKS0Zo2L0t9ndlpSramfIL47UvIHB2ynZDoehPaFiTwZbXSkSgSuOktW4SwYlrx2cJj1xyn4Iv+rOJU5Q4Z0lGhi06C7cm/FyUvVyV3qKDJKInMt1T4s5F4ljjRTZw6XDap41OZI6pbT3kvorTQ5bcDLjL1MrKfESaHEsxZlUwiz1mvPRHaGA3dgiTONb91t9QwTcnTZ2tSgLW5DqRJ3M5L8Ug0o4FWs9Gvw2PJUoRFZlozFWFe7I4gWAaXfmEtrw02bpCZUkPylLbKCDaHulRMrtXswt/a0gvIJvYXZBxoJw4C/czRZBgj06tbX05nqXhBlxdZPk/YUTr3zE0etkG1Z+BHlzzC8Kob4LOOPQbw2QAhv9BKi6Ixju/u+/MlPB0Ja2+5fLTMdzRL8GFlyNfDOme+orPqvP5yeY3q1TDYmUbmoT1O3cOHzyU8QnAT9mJ+0Q07ccBKb/5f5WdFK4eP9sNx+0hzJO8ght17Ir2GcjCUkMI0AFqlnIqtsgzKN6P6kKT1F/BaX1kAx3V3cY7fpwHhNpUJhDIlhejK7lPUNBCr/t9Oa4lBODzd1gi0J2k/CRMPe22G/7/NBB6cWxQLy4DVcbMZ1SYBZ5XoEymvZQp3Jj0jkApIqTf/FgSimNXt2WUrZrM8IGss+366kZ30nNtuWUqvwuX8q+Zj1Iz6b77mA++wcSA2rdYrmsp+ttP4iKBNRIyEFKihe8KEeHpxKDGKQ2cOAW4pU4oWb/CwAEzWGibWRTsW70s33I3PI9bBVRY1aHi/uPNQO0khnvzYqkcQTVd7clYZAR745+77jhrxc6V66sle0sFqwvu7KUDe1MscwhoS4+V+rZ7yymSVVPANTNxFr6FsBH4izABAD8FcJw3HWF0SWw6CvoccFXrbdrrPSyiHIZ9va+qcA/EYH45BbJ1B2bVn ngJC//09 35X4SMk7DZrAzQ5E8I910R2QCeimVcjYTb8OAD27T87XUSQTRBL0wZDwRYow/uTlTyN8j3cwxftY1nFbXilZvC31icJDgn+QqUqAWp/zC0zjrqcnQdYHqlkHjwUxc97ZZFelBAdnt4ZedrKD992MZhpqi2a6EnZPemD8F6nNQi3QIOKG2Rx2++zt3BYzrVmPnvwylfYyN0aNLljBjMpERz1o34GTuDja1arw6OSxlsBfnA8Rv797z0EctZrJ/1PNh0gsXEjWbC44m8FzkncfP+SeJBikvRIuE/YwBVYyPTLPTYuyq/q3t3iMdv6fubngV418ltb7pgmrTI9VThqPPIFDvEvp/t2tO3nwGxw6rywBNXfaUXVr3cY4mVEy/b3NKnXtzJJcAF0GvdeWLpL9xmW2XOC6yrb2ILvgsoGXuFR485zPkex9yeNUFfrHGPQQntv2W X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 30, 2023 at 09:52:36AM -0700, Yosry Ahmed wrote: > On Tue, May 30, 2023 at 9:22 AM Nhat Pham wrote: > > > > Before storing a page, zswap first checks if the number of stored pages > > exceeds the limit specified by memory.zswap.max, for each cgroup in the > > hierarchy. If this limit is reached or exceeded, then zswap shrinking is > > triggered and short-circuits the store attempt. > > > > However, if memory.zswap.max = 0 for a cgroup, no amount of writeback > > will allow future store attempts from processes in this cgroup to > > succeed. Furthermore, this create a pathological behavior in a system > > where some cgroups have memory.zswap.max = 0 and some do not: the > > processes in the former cgroups, under memory pressure, will evict pages > > stored by the latter continually, until the need for swap ceases or the > > pool becomes empty. > > > > As a result of this, we observe a disproportionate amount of zswap > > writeback and a perpetually small zswap pool in our experiments, even > > though the pool limit is never hit. > > > > This patch fixes the issue by rejecting zswap store attempt without > > shrinking the pool when memory.zswap.max is 0. > > > > Fixes: f4840ccfca25 ("zswap: memcg accounting") > > Signed-off-by: Nhat Pham > > --- > > include/linux/memcontrol.h | 6 +++--- > > mm/memcontrol.c | 8 ++++---- > > mm/zswap.c | 9 +++++++-- > > 3 files changed, 14 insertions(+), 9 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 222d7370134c..507bed3a28b0 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -1899,13 +1899,13 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, > > #endif /* CONFIG_MEMCG_KMEM */ > > > > #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg); > > void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); > > void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size); > > #else > > -static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > +static inline int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > { > > - return true; > > + return 0; > > } > > static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, > > size_t size) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4b27e245a055..09aad0e6f2ea 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -7783,10 +7783,10 @@ static struct cftype memsw_files[] = { > > * spending cycles on compression when there is already no room left > > * or zswap is disabled altogether somewhere in the hierarchy. > > */ > > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > { > > struct mem_cgroup *memcg, *original_memcg; > > - bool ret = true; > > + int ret = 0; > > > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > > return true; > > @@ -7800,7 +7800,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > if (max == PAGE_COUNTER_MAX) > > continue; > > if (max == 0) { > > - ret = false; > > + ret = -ENODEV; > > break; > > } > > > > @@ -7808,7 +7808,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) > > pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE; > > if (pages < max) > > continue; > > - ret = false; > > + ret = -ENOMEM; > > break; > > } > > mem_cgroup_put(original_memcg); > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 59da2a415fbb..7b13dc865438 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1175,8 +1175,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > > } > > > > objcg = get_obj_cgroup_from_page(page); > > - if (objcg && !obj_cgroup_may_zswap(objcg)) > > - goto shrink; > > + if (objcg) { > > + ret = obj_cgroup_may_zswap(objcg); > > + if (ret == -ENODEV) > > + goto reject; > > + if (ret == -ENOMEM) > > + goto shrink; > > + } > > I wonder if we should just make this: > > if (objcg && !obj_cgroup_may_zswap(objcg)) > goto reject; > > Even if memory.zswap.max is > 0, if the limit is hit, shrinking the > zswap pool will only help if we happen to writeback a page from the > same memcg that hit its limit. Keep in mind that we will only > writeback one page every time we observe that the limit is hit (even > with Domenico's patch, because zswap_can_accept() should be true). > > On a system with a handful of memcgs, > it seems likely that we wrongfully writeback pages from other memcgs > because of this. Achieving nothing for this memcg, while hurting > others. OTOH, without invoking writeback when the limit is hit, the > memcg will just not be able to use zswap until some pages are > faulted back in or invalidated. > > I am not sure which is better, just thinking out loud. You're absolutely right. Currently the choice is writing back either everybody or nobody, meaning between writeback and cgroup containment. They're both so poor that I can't say I strongly prefer one over the other. However, I have a lame argument in favor of this patch: The last few fixes from Nhat and Domenico around writeback show that few people, if anybody, are actually using writeback. So it might not actually matter that much in practice which way we go with this patch. Per-memcg LRUs will be necessary for it to work right. However, what Nhat is proposing is how we want the behavior down the line. So between two equally poor choices, I figure we might as well go with the one that doesn't require another code change later on. Doesn't that fill you with radiant enthusiasm? > Seems like this can be solved by having per-memcg LRUs, or at least > providing an argument to the shrinker of which memcg to reclaim from. > This would only be possible when the LRU is moved to zswap. +1