From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
sjenning@redhat.com, ddstreet@ieee.org,
vitaly.wool@konsulko.com, kernel-team@fb.com
Subject: Re: [PATCH v2] mm: zswap: shrink until can accept
Date: Fri, 26 May 2023 11:15:31 -0700 [thread overview]
Message-ID: <CAJD7tkY55Z9n7Ue-4+a691t4YJAs+0e7gEZGocF7cp197gL+Dg@mail.gmail.com> (raw)
In-Reply-To: <20230526181023.GA49039@cmpxchg.org>
On Fri, May 26, 2023 at 11:10 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, May 26, 2023 at 07:39:55PM +0200, Domenico Cerasuolo wrote:
> > This update addresses an issue with the zswap reclaim mechanism, which
> > hinders the efficient offloading of cold pages to disk, thereby
> > compromising the preservation of the LRU order and consequently
> > diminishing, if not inverting, its performance benefits.
> >
> > The functioning of the zswap shrink worker was found to be inadequate,
> > as shown by basic benchmark test. For the test, a kernel build was
> > utilized as a reference, with its memory confined to 1G via a cgroup and
> > a 5G swap file provided. The results are presented below, these are
> > averages of three runs without the use of zswap:
> >
> > real 46m26s
> > user 35m4s
> > sys 7m37s
> >
> > With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G
> > system), the results changed to:
> >
> > real 56m4s
> > user 35m13s
> > sys 8m43s
> >
> > written_back_pages: 18
> > reject_reclaim_fail: 0
> > pool_limit_hit:1478
> >
> > Besides the evident regression, one thing to notice from this data is
> > the extremely low number of written_back_pages and pool_limit_hit.
> >
> > The pool_limit_hit counter, which is increased in zswap_frontswap_store
> > when zswap is completely full, doesn't account for a particular
> > scenario: once zswap hits his limit, zswap_pool_reached_full is set to
> > true; with this flag on, zswap_frontswap_store rejects pages if zswap is
> > still above the acceptance threshold. Once we include the rejections due
> > to zswap_pool_reached_full && !zswap_can_accept(), the number goes from
> > 1478 to a significant 21578266.
> >
> > Zswap is stuck in an undesirable state where it rejects pages because
> > it's above the acceptance threshold, yet fails to attempt memory
> > reclaimation. This happens because the shrink work is only queued when
> > zswap_frontswap_store detects that it's full and the work itself only
> > reclaims one page per run.
> >
> > This state results in hot pages getting written directly to disk,
> > while cold ones remain memory, waiting only to be invalidated. The LRU
> > order is completely broken and zswap ends up being just an overhead
> > without providing any benefits.
> >
> > This commit applies 2 changes: a) the shrink worker is set to reclaim
> > pages until the acceptance threshold is met and b) the task is also
> > enqueued when zswap is not full but still above the threshold.
> >
> > Testing this suggested update showed much better numbers:
> >
> > real 36m37s
> > user 35m8s
> > sys 9m32s
> >
> > written_back_pages: 10459423
> > reject_reclaim_fail: 12896
> > pool_limit_hit: 75653
> >
> > V2:
> > - loop against == -EAGAIN rather than != -EINVAL and also break the loop
> > on MAX_RECLAIM_RETRIES (thanks Yosry)
> > - cond_resched() to ensure that the loop doesn't burn the cpu (thanks
> > Vitaly)
> >
> > Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
> > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> > ---
> > mm/zswap.c | 15 ++++++++++++---
> > 1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 59da2a415fbb..f953dceaab34 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -37,6 +37,7 @@
> > #include <linux/workqueue.h>
> >
> > #include "swap.h"
> > +#include "internal.h"
> >
> > /*********************************
> > * statistics
> > @@ -587,9 +588,17 @@ static void shrink_worker(struct work_struct *w)
> > {
> > struct zswap_pool *pool = container_of(w, typeof(*pool),
> > shrink_work);
> > + int ret, failures = 0;
> >
> > - if (zpool_shrink(pool->zpool, 1, NULL))
> > - zswap_reject_reclaim_fail++;
> > + do {
> > + ret = zpool_shrink(pool->zpool, 1, NULL);
> > + if (ret) {
> > + zswap_reject_reclaim_fail++;
> > + failures++;
> > + }
> > + cond_resched();
> > + } while (!zswap_can_accept() && ret == -EAGAIN &&
> > + failures < MAX_RECLAIM_RETRIES);
>
> It should also loop on !ret, right?
>
> AFAIU Yosry's suggestion was that instead of breaking only on -EINVAL,
> it should break on all failures but -EAGAIN. But it should still keep
> going if the shrink was successful and the pool cannot accept yet.
>
> Basically, something like this?
>
> do {
> ret = zpool_shrink(pool->zpool, 1, NULL);
> if (ret) {
> zswap_reject_reclaim_fail++;
> if (ret != -EAGAIN)
> break;
> if (++failures == MAX_RECLAIM_RETRIES)
> break;
> }
> cond_resched();
> } while (!zswap_can_accept());
Yes, that's what I meant. Otherwise if shrink is successful we end up
doing 1 page only, which is exactly what we are trying to avoid here.
next prev parent reply other threads:[~2023-05-26 18:16 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-26 17:39 Domenico Cerasuolo
2023-05-26 18:10 ` Johannes Weiner
2023-05-26 18:15 ` Yosry Ahmed [this message]
2023-05-26 18:16 ` Domenico Cerasuolo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkY55Z9n7Ue-4+a691t4YJAs+0e7gEZGocF7cp197gL+Dg@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=cerasuolodomenico@gmail.com \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox