From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38F75CDB474 for ; Fri, 13 Oct 2023 02:47:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CFF78D015B; Thu, 12 Oct 2023 22:47:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97FE28D0015; Thu, 12 Oct 2023 22:47:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8478C8D015B; Thu, 12 Oct 2023 22:47:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 751C98D0015 for ; Thu, 12 Oct 2023 22:47:32 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 36AE01A0419 for ; Fri, 13 Oct 2023 02:47:32 +0000 (UTC) X-FDA: 81338902344.19.DCC391B Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf09.hostedemail.com (Postfix) with ESMTP id 56525140015 for ; Fri, 13 Oct 2023 02:47:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=A7Txw4fQ; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697165250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cYYs7lQd4WDMEoTxMMLaOduMsvwNYj0Il5pdvl2CUOQ=; b=ygyUqvbvJ+r+AzZP1U3sLnC9dv7Kvq3nhj1IAvZ6UKQpGB+Bo2ygpazdF+Ebm6uQZ2/PEq z3Dk71GRTDu9uWGaqq4vcynN5GShBl6UPbi9KPFuLc9b7erPgD7Uwpi0nY+JD924njUZ9t WOLnD9vTiFycpPKSLUBzS1nbBOh4Gkc= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=A7Txw4fQ; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697165250; a=rsa-sha256; cv=none; b=IrDcljtKcnOav8M9dMBy4EsJapJo46IQ+ZrviN/tXJFv+RKKzfU2mJW9x318aODBt2SpXw nefOn5rwJuRZBecgrkOYLMwsTyzZkfBZWkfqX3DSD9p1Rw0yWpC6wCpB9woSaNTcwXEX5C zLQDKDu/UY1H8vYLEbV4VXKTP1lcKbo= Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-53e0d21a4easo2228897a12.1 for ; Thu, 12 Oct 2023 19:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697165249; x=1697770049; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cYYs7lQd4WDMEoTxMMLaOduMsvwNYj0Il5pdvl2CUOQ=; b=A7Txw4fQtPGcM6hGhzv5BuClok0xvwGzaEZ02rEIeUb0/p4xKpV0tYeroXmoF6GIYT 69hjGAc29ekF5NGKuuooFmSrjWB8s1v8eJivr1kLa6CnnzAW9Y0V73ikEOD5j9LDFr4L BaT4oVaUeOsJg4PQzJwswZVAnlzQulT471hS566oi1l/Xo+baw1lx3tL/KCIiGrAmfNg KOdCOx2XyJzGoOsJjyXQ84CqWi7/cwqZIHdNS66Qa/0fpKIsegzD0voqMm6ehma5yEs5 zoDY22mRjf9Bx+A1D1pgXEtizJ7t94AzkDN//goBojATOmA+7oYjW03LFmu2nbGuxcAr tYrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697165249; x=1697770049; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cYYs7lQd4WDMEoTxMMLaOduMsvwNYj0Il5pdvl2CUOQ=; b=YqDuAnXMhYe+dmFyg6M4CqtV+Md1slCq+/QQP5pcFIGElViQ2NeDYLZTIANIYptqUH 1OpMuDFaidD/3QeDxtniOMn7PoZIIhNbuv0vO9QjfTQq8Pn814rBnNdyAGTyA6llZ6YE jgFk47V+1gEL1lYdl97AMPKGEDVkCslOKAa8t7nUUynmGh3ZLSe4d3XWtm6W3Z99boLi U2QZ3zQjgjEn7ekdx4EdSL4H6qrjdtkCYwf4RjCmLx0AG1ar+ipOWqK1RpQAAJOUVC0P OmUbRCmTAJ8MTG8Dzb7wRd1m5Dsw91tXoGoBzUxUVSIf+cuyWxDMg/fotwjUgp9Hwhd/ 4c5w== X-Gm-Message-State: AOJu0YzAIRcqVY40WtQjnEqfLaPoF9jzM9up8r7ZIvSberIb7tC/+xkb r7IDd+/7xQvHmerhXm8wdWDxh8V9aCECxNzcesOg4g== X-Google-Smtp-Source: AGHT+IGXCOZZfR7HN3szDEzM2KtARzOb70RibFw64isCYaTnTXGWMypafHZeB6GYKf6Nl7sDAU/2En57ZYXG1TUzrSM= X-Received: by 2002:a17:906:7484:b0:9b9:f980:8810 with SMTP id e4-20020a170906748400b009b9f9808810mr16268302ejl.34.1697165248449; Thu, 12 Oct 2023 19:47:28 -0700 (PDT) MIME-Version: 1.0 References: <20231011051117.2289518-1-hezhongkun.hzk@bytedance.com> In-Reply-To: <20231011051117.2289518-1-hezhongkun.hzk@bytedance.com> From: Yosry Ahmed Date: Thu, 12 Oct 2023 19:46:52 -0700 Message-ID: Subject: Re: [RFC PATCH] zswap: add writeback_time_threshold interface to shrink zswap pool To: Zhongkun He Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, nphamcs@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 56525140015 X-Rspam-User: X-Stat-Signature: m5hbx868zptws8p5bcryijc7cpx1kw5p X-Rspamd-Server: rspam01 X-HE-Tag: 1697165250-427228 X-HE-Meta: U2FsdGVkX18qra6+Sx4KWEyfDzahJg31bpAnulj6kDMCmqQhfP6lg4xG+h4Dv7I44FEmabPJYygi+Ad0bVfkP+GfI/G5NTTTh1Tk00bAmmdvgo8odXQSzwyGniWTzLBnWCuX1rchy42rwNBtlMsXDTWHU96PEfZa618Ghqj1RYpv+49Kff9kSNsnjmwC6tZUJjNYn6Vdcs1eBtlUZtwDYRZkRuWZTK3iRiXsfkqIZwixEv06/C25SXaswZFh6j6Zp+yrh1xCY2wbLlAgMUYmvc6GV+06gJhE7Mjo6kp4Pjpwg83GvE2b0bzsAyGGYjJKBQXQOgSu+2BAPqtq+cK+shT2fScCKfRDESUrlbn6/Qn/XTpPsLWtBZL3xlL12O/n2RM5ZQMxuFD0vg/0HbhVmA6+uSkYi23arn0WndjyIr1dvzEI7dj+cfpLUlaik1D/Rt5oRnShqv3mvETicS7scctKMR75GWp3DefPheIzY+/n0xM3zThqZo4qGU//Kcs+Q2N5IZ4AlAVgaabMcrY8boygwicex1NdIZgUPwq5wZYIPrtUh6s6dCI6TBhSb3GONrwnpe3jjv3JlZhkyiydXJz7BM2e01if4GXrma6zIcfBp2rRb19xmTLLs7rxZ8X9DqB3OOEktlZfyMcwrp+0NnAz3c9TSOWyrzbg/+8pUmSdoe2Zs5r9vAynwizF+cL07srl3OINBxdgdmInej7nYgGWIl1xC8WWwF8KxlLPCbAHhHMlmXaSuLPjALpTOJ31b7IV7zzzlhKfzhAGTix/GqCLWj6lY8YA5Tru5wYBcZZs8dOG3ZflASXVznOyxHeNPI3pgvCR9SITCReOWK2UnDu7YVPevwn9HYZQSrLSMkh3I6NmaxWhNDRvF48mhnCgVNB4CU5mTH692q63OP+8p2i5Sh1bZDFKH36ckAwwoFqCa7kDeVKHzcoxgE1JQsiss78sgGXxs4osAh+F0y7 4aVgyTbp NRbvXdbcGMN+JOXiA61Eoa4QmUmyzrNU7LLmPu72QdrjU7IxA+lesz6kSzTcjgtpZJrteXEYZydEEwzi1caA/KdD2XNRmfK+uZfK4Gj5uTxVYiT9vJddvh93OZeYpiwm16GaHel8r8xUN08WmQPnvJoPmf/12onGNmUeP0gkk96+5OtxPMqUhBdEPLVDiZaPtv8qfBP0G5PSV5X58a2FmhGYRrvY+mU0WRG7bX3Y1lYwcNQt/TzddTX+njwM/snfyaGJVnLG+f98VfIsZ0RKi/V4EQMcLuCMrlNkODY+NAlnqAJgZ8+cP7Rjg/5KQ224q44E/9fp8gVpEdn8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 10, 2023 at 10:11=E2=80=AFPM Zhongkun He wrote: > > zswap does not have a suitable method to select objects that have not > been accessed for a long time, and just shrink the pool when the limit > is hit. There is a high probability of wasting memory in zswap if the > limit is too high. > > This patch add a new interface writeback_time_threshold to shrink zswap > pool proactively based on the time threshold in second, e.g.:: > > echo 600 > /sys/module/zswap/parameters/writeback_time_threshold > > If zswap_entrys have not been accessed for more than 600 seconds, they > will be swapout to swap. if set to 0, all of them will be swapout. > > Signed-off-by: Zhongkun He I prefer if this can be done through memory.reclaim when the zswap shrinker is in place, as others have suggested. I understand that this provides more control by specifying the time at which to start writing pages out, which is similar to zram writeback AFAICT, but it is also difficult to determine the right value to write here. I am also not sure how you decide that it is better to writeback cold pages in zswap or compress cold pages in the LRUs. The pages in zswap are obviously colder, but accessing them after they are written back is much more expensive, to the point that it could be better to compress more cold memory from the LRUs. This is obviously not straightforward and requires a fair amount of tuning to do more good than harm. That being said, if we decide to move forward with this I have a couple of comments: - I think you should check out how zram implements idle writeback and try to make things consistent. Zswap and zram don't really see eye to eye, but some consistency would be nice. If you looked at zram's implementation you would realize that you also need to update the access time when a page is read (unless the load is exclusive). - This should be behind a config option. Every word that we add to struct zswap_entry reduces the zswap savings by roughly 0.2%. Maybe this doesn't sound like much but it adds up. Let's not opt everyone in unless they ask for it. > --- > Documentation/admin-guide/mm/zswap.rst | 9 +++ > mm/zswap.c | 76 ++++++++++++++++++++++++++ > 2 files changed, 85 insertions(+) > > diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin= -guide/mm/zswap.rst > index 45b98390e938..9ffaed26c3c0 100644 > --- a/Documentation/admin-guide/mm/zswap.rst > +++ b/Documentation/admin-guide/mm/zswap.rst > @@ -153,6 +153,15 @@ attribute, e. g.:: > > Setting this parameter to 100 will disable the hysteresis. > > +When there is a lot of cold memory according to the store time in the zs= wap, > +it can be swapout and save memory in userspace proactively. User can wri= te > +writeback time threshold in second to enable it, e.g.:: > + > + echo 600 > /sys/module/zswap/parameters/writeback_time_threshold > + > +If zswap_entrys have not been accessed for more than 600 seconds, they w= ill be > +swapout. if set to 0, all of them will be swapout. > + > A debugfs interface is provided for various statistic about pool size, n= umber > of pages stored, same-value filled pages and various counters for the re= asons > pages are rejected. > diff --git a/mm/zswap.c b/mm/zswap.c > index 083c693602b8..c3a19b56a29b 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -141,6 +141,16 @@ static bool zswap_exclusive_loads_enabled =3D IS_ENA= BLED( > CONFIG_ZSWAP_EXCLUSIVE_LOADS_DEFAULT_ON); > module_param_named(exclusive_loads, zswap_exclusive_loads_enabled, bool,= 0644); > > +/* zswap writeback time threshold in second */ > +static unsigned int zswap_writeback_time_thr; > +static int zswap_writeback_time_thr_param_set(const char *, const struct= kernel_param *); > +static const struct kernel_param_ops zswap_writeback_param_ops =3D { > + .set =3D zswap_writeback_time_thr_param_set, > + .get =3D param_get_uint, > +}; > +module_param_cb(writeback_time_threshold, &zswap_writeback_param_ops, > + &zswap_writeback_time_thr, 0644); > + > /* Number of zpools in zswap_pool (empirically determined for scalabilit= y) */ > #define ZSWAP_NR_ZPOOLS 32 > > @@ -197,6 +207,7 @@ struct zswap_pool { > * value - value of the same-value filled pages which have same content > * objcg - the obj_cgroup that the compressed memory is charged to > * lru - handle to the pool's lru used to evict pages. > + * sto_time - the store time of zswap_entry. > */ > struct zswap_entry { > struct rb_node rbnode; > @@ -210,6 +221,7 @@ struct zswap_entry { > }; > struct obj_cgroup *objcg; > struct list_head lru; > + ktime_t sto_time; > }; > > /* > @@ -288,6 +300,31 @@ static void zswap_update_total_size(void) > zswap_pool_total_size =3D total; > } > > +static void zswap_reclaim_entry_by_timethr(void); > + > +static bool zswap_reach_timethr(struct zswap_pool *pool) > +{ > + struct zswap_entry *entry; > + ktime_t expire_time =3D 0; > + bool ret =3D false; > + > + spin_lock(&pool->lru_lock); > + > + if (list_empty(&pool->lru)) > + goto out; > + > + entry =3D list_last_entry(&pool->lru, struct zswap_entry, lru); > + expire_time =3D ktime_add(entry->sto_time, > + ns_to_ktime(zswap_writeback_time_thr * NSEC_PER_S= EC)); > + > + if (ktime_after(ktime_get_boottime(), expire_time)) > + ret =3D true; > +out: > + spin_unlock(&pool->lru_lock); > + return ret; > +} > + > + > /********************************* > * zswap entry functions > **********************************/ > @@ -395,6 +432,7 @@ static void zswap_free_entry(struct zswap_entry *entr= y) > else { > spin_lock(&entry->pool->lru_lock); > list_del(&entry->lru); > + entry->sto_time =3D 0; > spin_unlock(&entry->pool->lru_lock); > zpool_free(zswap_find_zpool(entry), entry->handle); > zswap_pool_put(entry->pool); > @@ -709,6 +747,28 @@ static void shrink_worker(struct work_struct *w) > zswap_pool_put(pool); > } > > +static void zswap_reclaim_entry_by_timethr(void) > +{ > + struct zswap_pool *pool =3D zswap_pool_current_get(); > + int ret, failures =3D 0; > + > + if (!pool) > + return; > + > + while (zswap_reach_timethr(pool)) { > + ret =3D zswap_reclaim_entry(pool); > + if (ret) { > + zswap_reject_reclaim_fail++; > + if (ret !=3D -EAGAIN) > + break; > + if (++failures =3D=3D MAX_RECLAIM_RETRIES) > + break; > + } > + cond_resched(); > + } > + zswap_pool_put(pool); > +} > + > static struct zswap_pool *zswap_pool_create(char *type, char *compressor= ) > { > int i; > @@ -1037,6 +1097,21 @@ static int zswap_enabled_param_set(const char *val= , > return ret; > } > > +static int zswap_writeback_time_thr_param_set(const char *val, > + const struct kernel_param *kp) > +{ > + int ret =3D -ENODEV; > + > + /* if this is load-time (pre-init) param setting, just return. */ > + if (system_state !=3D SYSTEM_RUNNING) > + return ret; > + > + ret =3D param_set_uint(val, kp); > + if (!ret) > + zswap_reclaim_entry_by_timethr(); > + return ret; > +} > + > /********************************* > * writeback code > **********************************/ > @@ -1360,6 +1435,7 @@ bool zswap_store(struct folio *folio) > if (entry->length) { > spin_lock(&entry->pool->lru_lock); > list_add(&entry->lru, &entry->pool->lru); > + entry->sto_time =3D ktime_get_boottime(); > spin_unlock(&entry->pool->lru_lock); > } > spin_unlock(&tree->lock); > -- > 2.25.1 >