linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Zhongkun He <hezhongkun.hzk@bytedance.com>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org,
	yosryahmed@google.com,  sjenning@redhat.com, ddstreet@ieee.org,
	vitaly.wool@konsulko.com,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  Chris Li <chrisl@kernel.org>
Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry
Date: Thu, 4 Jan 2024 11:42:58 -0800	[thread overview]
Message-ID: <CAKEwX=P5AC+ubnunnZr5vMiC6fFU+E_E7jg_FZztWwZRYSxTWQ@mail.gmail.com> (raw)
In-Reply-To: <CACSyD1O7t0+BXUujJ81RAdEys3MUnmpu0sRADLazoyvayx5DLA@mail.gmail.com>

On Wed, Jan 3, 2024 at 6:12 AM Zhongkun He <hezhongkun.hzk@bytedance.com> wrote:
>
> > That's around 2.7% increase in real time, no? Admittedly, this
> > micro-benchmark is too small to conclude either way, but the data
> > doesn't seem to be in your favor.
> >
> > I'm a bit concerned about the overhead here, given that with this
> > patch we will drain the per-cpu batch on every written-back entry.
> > That's quite a high frequency, especially since we're moving towards
> > more writeback (either with the new zswap shrinker, or your time
> > threshold-based writeback mechanism). For instance, there seems to be
> > some (local/per-cpu) locking involved, no? Could there be some form of
> > lock contentions there (especially since with the new shrinker, you
> > can have a lot of concurrent writeback contexts?)
> >
> > Furthermore, note that a writeback from zswap to swap saves less
> > memory than a writeback from memory to swap, so the effect of the
> > extra overhead will be even more pronounced. That is, the amount of
> > extra work done (from this change) to save one unit of memory would be
> > even larger than if we call lru_add_drain() every time we swap out a
> > page (from memory -> swap). And I'm pretty sure we don't call
> > lru_add_drain() every time we swap out a page - I believe we call
> > lru_add_drain() every time we perform a shrink action. For e.g, in
> > shrink_inactive_list(). That's much coarser in granularity than here.
> >
> > Also, IIUC, the more often we perform lru_add_drain(), the less
> > batching effect we will obtain. IOW, the overhead of maintaining the
> > batch will become higher relative to the performance gains from
> > batching.
> >
> > Maybe I'm missing something - could you walk me through how
> > lru_add_drain() is fine here, from this POV? Thanks!
> >
> > >
> > > After writeback, we perform the following steps to release the memory again
> > > echo 1g > memory.reclaim
> > >
> > > Base:
> > >                      total        used        recalim        total        used
> > > Mem:           38Gi       2.5Gi       ---->             38Gi       1.5Gi
> > > Swap:         5.0Gi       1.0Gi       ---->              5Gi        1.5Gi
> > > used memory  -1G   swap +0.5g
> > > It means that  half of the pages are failed to move to the tail of lru list,
> > > So we need to release an additional 0.5Gi anon pages to swap space.
> > >
> > > With this patch:
> > >                      total        used        recalim        total        used
> > > Mem:           38Gi       2.6Gi       ---->             38Gi       1.6Gi
> > > Swap:         5.0Gi       1.0Gi       ---->              5Gi        1Gi
> > >
> > > used memory  -1Gi,  swap +0Gi
> > > It means that we release all the pages which have been add to the tail of
> > > lru list in zswap_writeback_entry() and folio_rotate_reclaimable().
> > >
> >
> > OTOH, this suggests that we're onto something. Swap usage seems to
> > decrease quite a bit. Sounds like a real problem that this patch is
> > tackling.
> > (Please add this benchmark result to future changelog. It'll help
> > demonstrate the problem).
>
> Yes
>
> >
> > I'm inclined to ack this patch, but it'd be nice if you can assuage my
> > concerns above (with some justification and/or larger benchmark).
> >
>
> OK,thanks.
>
> > (Or perhaps, we have to drain, but less frequently/higher up the stack?)
> >
>
> I've reviewed the code again and have no idea. It would be better if
> you have any suggestions.

Hmm originally I was thinking of doing an (unconditional)
lru_add_drain() outside of zswap_writeback_entry() - once in
shrink_worker() and/or zswap_shrinker_scan(), before we write back any
of the entries. Not sure if it would work/help here tho - haven't
tested that idea yet.

>
> New test:
> This patch will add the execution of folio_rotate_reclaimable(not executed
> without this patch) and lru_add_drain,including percpu lock competition.
> I bind a new task to allocate memory and use the same batch lock to compete
> with the target process, on the same CPU.
> context:
> 1:stress --vm 1 --vm-bytes 1g    (bind to cpu0)
> 2:stress --vm 1 --vm-bytes 5g --vm-hang 0(bind to cpu0)
> 3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0.
>
> Average time of five tests
>
> Base      patch            patch + compete
> 4.947      5.0676          5.1336
>                 +2.4%          +3.7%
> compete means: a new stress run in cpu0 to compete with the writeback process.
> PID USER        %CPU  %MEM     TIME+ COMMAND                         P
>  1367 root         49.5      0.0       1:09.17 bash     (writeback)            0
>  1737 root         49.5      2.2       0:27.46 stress      (use percpu
> lock)    0
>
> around 2.4% increase in real time,including the execution of
> folio_rotate_reclaimable(not executed without this patch) and lru_add_drain,but
> no lock contentions.

Hmm looks like the regression is still there, no?

>
> around 1.3% additional  increase in real time with lock contentions on the same
> cpu.
>
> There is another option here, which is not to move the page to the
> tail of the inactive
> list after end_writeback and delete the following code in
> zswap_writeback_entry(),
> which did not work properly. But the pages will not be released first.
>
> /* move it to the tail of the inactive list after end_writeback */
> SetPageReclaim(page);

Or only SetPageReclaim on pages on LRU?

>
> Thanks,
> Zhongkun
>
> > Thanks,
> > Nhat
> >
> > >
> > > Thanks for your time Nhat and Andrew. Happy New Year!


  reply	other threads:[~2024-01-04 19:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-24 14:27 Zhongkun He
2023-12-29 19:44 ` Andrew Morton
2023-12-30  2:09 ` Nhat Pham
2024-01-02 11:39   ` [External] " Zhongkun He
2024-01-02 14:09     ` Zhongkun He
2024-01-02 23:27     ` Nhat Pham
2024-01-03 14:12       ` Zhongkun He
2024-01-04 19:42         ` Nhat Pham [this message]
2024-01-05 14:10           ` Zhongkun He
2024-01-07 18:53             ` Nhat Pham
2024-01-07 21:29             ` Nhat Pham
2024-01-07 21:59               ` Nhat Pham
2024-01-08 23:12                 ` Yosry Ahmed
2024-01-09  3:13                   ` Zhongkun He
2024-01-09 16:29                     ` Yosry Ahmed
2024-01-10  1:32                       ` Nhat Pham
2024-01-11  3:48                         ` Zhongkun He
2024-01-11 11:27                           ` Yosry Ahmed
2024-01-11 19:25                           ` Nhat Pham
2024-01-12  7:08                             ` Zhongkun He
2024-01-16 13:40                               ` Zhongkun He
2024-01-16 20:28                                 ` Yosry Ahmed
2024-01-17  9:52                                   ` Zhongkun He
2024-01-17 17:53                                     ` Yosry Ahmed
2024-01-17 19:29                                     ` Nhat Pham
2024-01-16 21:03                                 ` Matthew Wilcox
2024-01-17 10:41                                   ` Zhongkun He
2024-01-11  2:57                       ` Zhongkun He
2024-01-09  2:43                 ` Zhongkun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=P5AC+ubnunnZr5vMiC6fFU+E_E7jg_FZztWwZRYSxTWQ@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox