linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Wei Xu <weixugc@google.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Yu Zhao <yuzhao@google.com>, Chris Li <chrisl@kernel.org>,
	 Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org,  Greg Thelen <gthelen@google.com>
Subject: Re: [PATCH v2 1/3] mm, lru_gen: batch update counters on againg
Date: Tue, 16 Jan 2024 01:09:11 +0800	[thread overview]
Message-ID: <CAMgjq7C01uT_wASNor+F-sNEupcc3_fYd6rpmfmzQh3rCqutyg@mail.gmail.com> (raw)
In-Reply-To: <CAMgjq7Bv7L_J4pfbHnopsbOzrqZfgfqTYuJUMXo94Z6vp+74-Q@mail.gmail.com>

Kairui Song <ryncsn@gmail.com> 于2024年1月15日周一 01:42写道:
>
> Wei Xu <weixugc@google.com> 于2024年1月13日周六 05:01写道:
> >
> > On Thu, Jan 11, 2024 at 10:33 AM Kairui Song <ryncsn@gmail.com> wrote:
> > >
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > When lru_gen is aging, it will update mm counters page by page,
> > > which causes a higher overhead if age happens frequently or there
> > > are a lot of pages in one generation getting moved.
> > > Optimize this by doing the counter update in batch.
> > >
> > > Although most __mod_*_state has its own caches the overhead
> > > is still observable.
> > >
> > > Tested in a 4G memcg on a EPYC 7K62 with:
> > >
> > >   memcached -u nobody -m 16384 -s /tmp/memcached.socket \
> > >     -a 0766 -t 16 -B binary &
> > >
> > >   memtier_benchmark -S /tmp/memcached.socket \
> > >     -P memcache_binary -n allkeys \
> > >     --key-minimum=1 --key-maximum=16000000 -d 1024 \
> > >     --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6
> > >
> > > Average result of 18 test runs:
> > >
> > > Before: 44017.78 Ops/sec
> > > After:  44687.08 Ops/sec (+1.5%)
> >
> > I see the same performance numbers get quoted in all the 3 patches.
> > How much performance improvements does this particular patch provide
> > (the same for the other 2 patches)? If as the cover letter says, the
> > most performance benefits come from patch 3 (prefetching), can we just
> > have that patch alone to avoid the extra complexities.
>
> Hi Wei,
>
> Indeed these are two different optimization technique, I can reorder
> the series, prefetch is the first one and should be more acceptable,
> other optimizations can come later. And add standalone info about
> improvement of batch operations.
>
> >
> > > Signed-off-by: Kairui Song <kasong@tencent.com>
> > > ---
> > >  mm/vmscan.c | 64 +++++++++++++++++++++++++++++++++++++++++++++--------
> > >  1 file changed, 55 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 4f9c854ce6cc..185d53607c7e 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -3113,9 +3113,47 @@ static int folio_update_gen(struct folio *folio, int gen)
> > >         return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
> > >  }
> > >
> > > +/*
> > > + * Update LRU gen in batch for each lru_gen LRU list. The batch is limited to
> > > + * each gen / type / zone level LRU. Batch is applied after finished or aborted
> > > + * scanning one LRU list.
> > > + */
> > > +struct gen_update_batch {
> > > +       int delta[MAX_NR_GENS];
> > > +};
> > > +
> > > +static void lru_gen_update_batch(struct lruvec *lruvec, int type, int zone,
> > > +                                struct gen_update_batch *batch)
> > > +{
> > > +       int gen;
> > > +       int promoted = 0;
> > > +       struct lru_gen_folio *lrugen = &lruvec->lrugen;
> > > +       enum lru_list lru = type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON;
> > > +
> > > +       for (gen = 0; gen < MAX_NR_GENS; gen++) {
> > > +               int delta = batch->delta[gen];
> > > +
> > > +               if (!delta)
> > > +                       continue;
> > > +
> > > +               WRITE_ONCE(lrugen->nr_pages[gen][type][zone],
> > > +                          lrugen->nr_pages[gen][type][zone] + delta);
> > > +
> > > +               if (lru_gen_is_active(lruvec, gen))
> > > +                       promoted += delta;
> > > +       }
> > > +
> > > +       if (promoted) {
> > > +               __update_lru_size(lruvec, lru, zone, -promoted);
> > > +               __update_lru_size(lruvec, lru + LRU_ACTIVE, zone, promoted);
> > > +       }
> > > +}
> > > +
> > >  /* protect pages accessed multiple times through file descriptors */
> > > -static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
> > > +static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio,
> > > +                        bool reclaiming, struct gen_update_batch *batch)
> > >  {
> > > +       int delta = folio_nr_pages(folio);
> > >         int type = folio_is_file_lru(folio);
> > >         struct lru_gen_folio *lrugen = &lruvec->lrugen;
> > >         int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> > > @@ -3138,7 +3176,8 @@ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclai
> > >                         new_flags |= BIT(PG_reclaim);
> > >         } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
> > >
> > > -       lru_gen_update_size(lruvec, folio, old_gen, new_gen);
> > > +       batch->delta[old_gen] -= delta;
> > > +       batch->delta[new_gen] += delta;
> > >
> > >         return new_gen;
> > >  }
> > > @@ -3672,6 +3711,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
> > >  {
> > >         int zone;
> > >         int remaining = MAX_LRU_BATCH;
> > > +       struct gen_update_batch batch = { };
> >
> > Can this batch variable be moved away from the stack?  We (Google) use
> > a much larger value for MAX_NR_GENS internally. This large stack
> > allocation from "struct gen_update_batch batch" can significantly
> > increase the risk of stack overflow for our use cases.
> >
>
> Thanks for the info.
> Do you have any suggestion about where we should put the batch info? I
> though about merging it with lru_gen_mm_walk but that depend on
> kzalloc and not useable for slow allocation path, the overhead could
> be larger than benefit in many cases.
>
> Not sure if we can use some thing like a preallocated per-cpu cache
> here to avoid all the issues.

Hi Wei,

After second thought, the batch is mostly used together with
folio_inc_gen which means most pages are only being moved between two
gens (being protected/unreclaimable), so I think only one counter int
is needed in the batch, I'll update this patch and do some test based
on this.


  reply	other threads:[~2024-01-15 17:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 18:33 [PATCH v2 0/3] mm, lru_gen: batch update pages when aging Kairui Song
2024-01-11 18:33 ` [PATCH v2 1/3] mm, lru_gen: batch update counters on againg Kairui Song
2024-01-11 18:37   ` Kairui Song
2024-01-12 21:01   ` Wei Xu
2024-01-14 17:42     ` Kairui Song
2024-01-15 17:09       ` Kairui Song [this message]
2024-01-11 18:33 ` [PATCH v2 2/3] mm, lru_gen: move pages in bulk when aging Kairui Song
2024-01-11 18:33 ` [PATCH v2 3/3] mm, lru_gen: try to prefetch next page when canning LRU Kairui Song
2024-01-11 18:35   ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7C01uT_wASNor+F-sNEupcc3_fYd6rpmfmzQh3rCqutyg@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=gthelen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox