From: Nhat Pham <nphamcs@gmail.com>
To: Zhongkun He <hezhongkun.hzk@bytedance.com>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org,
vitaly.wool@konsulko.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Chris Li <chrisl@kernel.org>
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry
Date: Thu, 4 Jan 2024 11:42:58 -0800 [thread overview]
Message-ID: <CAKEwX=P5AC+ubnunnZr5vMiC6fFU+E_E7jg_FZztWwZRYSxTWQ@mail.gmail.com> (raw)
In-Reply-To: <CACSyD1O7t0+BXUujJ81RAdEys3MUnmpu0sRADLazoyvayx5DLA@mail.gmail.com>
On Wed, Jan 3, 2024 at 6:12 AM Zhongkun He <hezhongkun.hzk@bytedance.com> wrote:
>
> > That's around 2.7% increase in real time, no? Admittedly, this
> > micro-benchmark is too small to conclude either way, but the data
> > doesn't seem to be in your favor.
> >
> > I'm a bit concerned about the overhead here, given that with this
> > patch we will drain the per-cpu batch on every written-back entry.
> > That's quite a high frequency, especially since we're moving towards
> > more writeback (either with the new zswap shrinker, or your time
> > threshold-based writeback mechanism). For instance, there seems to be
> > some (local/per-cpu) locking involved, no? Could there be some form of
> > lock contentions there (especially since with the new shrinker, you
> > can have a lot of concurrent writeback contexts?)
> >
> > Furthermore, note that a writeback from zswap to swap saves less
> > memory than a writeback from memory to swap, so the effect of the
> > extra overhead will be even more pronounced. That is, the amount of
> > extra work done (from this change) to save one unit of memory would be
> > even larger than if we call lru_add_drain() every time we swap out a
> > page (from memory -> swap). And I'm pretty sure we don't call
> > lru_add_drain() every time we swap out a page - I believe we call
> > lru_add_drain() every time we perform a shrink action. For e.g, in
> > shrink_inactive_list(). That's much coarser in granularity than here.
> >
> > Also, IIUC, the more often we perform lru_add_drain(), the less
> > batching effect we will obtain. IOW, the overhead of maintaining the
> > batch will become higher relative to the performance gains from
> > batching.
> >
> > Maybe I'm missing something - could you walk me through how
> > lru_add_drain() is fine here, from this POV? Thanks!
> >
> > >
> > > After writeback, we perform the following steps to release the memory again
> > > echo 1g > memory.reclaim
> > >
> > > Base:
> > > total used recalim total used
> > > Mem: 38Gi 2.5Gi ----> 38Gi 1.5Gi
> > > Swap: 5.0Gi 1.0Gi ----> 5Gi 1.5Gi
> > > used memory -1G swap +0.5g
> > > It means that half of the pages are failed to move to the tail of lru list,
> > > So we need to release an additional 0.5Gi anon pages to swap space.
> > >
> > > With this patch:
> > > total used recalim total used
> > > Mem: 38Gi 2.6Gi ----> 38Gi 1.6Gi
> > > Swap: 5.0Gi 1.0Gi ----> 5Gi 1Gi
> > >
> > > used memory -1Gi, swap +0Gi
> > > It means that we release all the pages which have been add to the tail of
> > > lru list in zswap_writeback_entry() and folio_rotate_reclaimable().
> > >
> >
> > OTOH, this suggests that we're onto something. Swap usage seems to
> > decrease quite a bit. Sounds like a real problem that this patch is
> > tackling.
> > (Please add this benchmark result to future changelog. It'll help
> > demonstrate the problem).
>
> Yes
>
> >
> > I'm inclined to ack this patch, but it'd be nice if you can assuage my
> > concerns above (with some justification and/or larger benchmark).
> >
>
> OK,thanks.
>
> > (Or perhaps, we have to drain, but less frequently/higher up the stack?)
> >
>
> I've reviewed the code again and have no idea. It would be better if
> you have any suggestions.
Hmm originally I was thinking of doing an (unconditional)
lru_add_drain() outside of zswap_writeback_entry() - once in
shrink_worker() and/or zswap_shrinker_scan(), before we write back any
of the entries. Not sure if it would work/help here tho - haven't
tested that idea yet.
>
> New test:
> This patch will add the execution of folio_rotate_reclaimable(not executed
> without this patch) and lru_add_drain,including percpu lock competition.
> I bind a new task to allocate memory and use the same batch lock to compete
> with the target process, on the same CPU.
> context:
> 1:stress --vm 1 --vm-bytes 1g (bind to cpu0)
> 2:stress --vm 1 --vm-bytes 5g --vm-hang 0(bind to cpu0)
> 3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0.
>
> Average time of five tests
>
> Base patch patch + compete
> 4.947 5.0676 5.1336
> +2.4% +3.7%
> compete means: a new stress run in cpu0 to compete with the writeback process.
> PID USER %CPU %MEM TIME+ COMMAND P
> 1367 root 49.5 0.0 1:09.17 bash (writeback) 0
> 1737 root 49.5 2.2 0:27.46 stress (use percpu
> lock) 0
>
> around 2.4% increase in real time,including the execution of
> folio_rotate_reclaimable(not executed without this patch) and lru_add_drain,but
> no lock contentions.
Hmm looks like the regression is still there, no?
>
> around 1.3% additional increase in real time with lock contentions on the same
> cpu.
>
> There is another option here, which is not to move the page to the
> tail of the inactive
> list after end_writeback and delete the following code in
> zswap_writeback_entry(),
> which did not work properly. But the pages will not be released first.
>
> /* move it to the tail of the inactive list after end_writeback */
> SetPageReclaim(page);
Or only SetPageReclaim on pages on LRU?
>
> Thanks,
> Zhongkun
>
> > Thanks,
> > Nhat
> >
> > >
> > > Thanks for your time Nhat and Andrew. Happy New Year!
next prev parent reply other threads:[~2024-01-04 19:43 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-24 14:27 Zhongkun He
2023-12-29 19:44 ` Andrew Morton
2023-12-30 2:09 ` Nhat Pham
2024-01-02 11:39 ` [External] " Zhongkun He
2024-01-02 14:09 ` Zhongkun He
2024-01-02 23:27 ` Nhat Pham
2024-01-03 14:12 ` Zhongkun He
2024-01-04 19:42 ` Nhat Pham [this message]
2024-01-05 14:10 ` Zhongkun He
2024-01-07 18:53 ` Nhat Pham
2024-01-07 21:29 ` Nhat Pham
2024-01-07 21:59 ` Nhat Pham
2024-01-08 23:12 ` Yosry Ahmed
2024-01-09 3:13 ` Zhongkun He
2024-01-09 16:29 ` Yosry Ahmed
2024-01-10 1:32 ` Nhat Pham
2024-01-11 3:48 ` Zhongkun He
2024-01-11 11:27 ` Yosry Ahmed
2024-01-11 19:25 ` Nhat Pham
2024-01-12 7:08 ` Zhongkun He
2024-01-16 13:40 ` Zhongkun He
2024-01-16 20:28 ` Yosry Ahmed
2024-01-17 9:52 ` Zhongkun He
2024-01-17 17:53 ` Yosry Ahmed
2024-01-17 19:29 ` Nhat Pham
2024-01-16 21:03 ` Matthew Wilcox
2024-01-17 10:41 ` Zhongkun He
2024-01-11 2:57 ` Zhongkun He
2024-01-09 2:43 ` Zhongkun He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKEwX=P5AC+ubnunnZr5vMiC6fFU+E_E7jg_FZztWwZRYSxTWQ@mail.gmail.com' \
--to=nphamcs@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=hezhongkun.hzk@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox