linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] mm/filemap: optimize filemap folio adding
Date: Wed, 20 Mar 2024 17:06:55 +0800	[thread overview]
Message-ID: <CAMgjq7BGN9SknnXF+doO-3p18OOYRzB0D5PMdwwstpCXtxj47g@mail.gmail.com> (raw)
In-Reply-To: <ZfoPjUyPiXpFSxA4@casper.infradead.org>

On Wed, Mar 20, 2024 at 6:20 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Mar 19, 2024 at 05:27:33PM +0800, Kairui Song wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > Instead of doing multiple tree walks, do one optimism range check
> > with lock hold, and exit if raced with another insertion. If a shadow
> > exists, check it with a new xas_get_order helper before releasing the
> > lock to avoid redundant tree walks for getting its order.
> >
> > Drop the lock and do the allocation only if a split is needed.
> >
> > In the best case, it only need to walk the tree once. If it needs
> > to alloc and split, 3 walks are issued (One for first ranced
> > conflict check and order retrieving, one for the second check after
> > allocation, one for the insert after split).
> >
> > Testing with 4k pages, in an 8G cgroup, with 20G brd as block device:
> >
> > fio -name=cached --numjobs=16 --filename=/mnt/test.img \
> >   --buffered=1 --ioengine=mmap --rw=randread --time_based \
> >   --ramp_time=30s --runtime=5m --group_reporting
> >
> > Before:
> > bw (  MiB/s): min=  790, max= 3665, per=100.00%, avg=2499.17, stdev=20.64, samples=8698
> > iops        : min=202295, max=938417, avg=639785.81, stdev=5284.08, samples=8698
> >
> > After (+4%):
> > bw (  MiB/s): min=  451, max= 3868, per=100.00%, avg=2599.83, stdev=23.39, samples=8653
> > iops        : min=115596, max=990364, avg=665556.34, stdev=5988.20, samples=8653
> >
> > Test result with THP (do a THP randread then switch to 4K page in hope it
> > issues a lot of splitting):
> >
> > fio -name=cached --numjobs=16 --filename=/mnt/test.img \
> >   --buffered=1 --ioengine mmap -thp=1 --readonly \
> >   --rw=randread --random_distribution=random \
> >   --time_based --runtime=5m --group_reporting
> >
> > fio -name=cached --numjobs=16 --filename=/mnt/test.img \
> >   --buffered=1 --ioengine mmap --readonly \
> >   --rw=randread --random_distribution=random \
> >   --time_based --runtime=5s --group_reporting
> >
> > Before:
> > bw (  KiB/s): min=28071, max=62359, per=100.00%, avg=53542.44, stdev=179.77, samples=9520
> > iops        : min= 7012, max=15586, avg=13379.39, stdev=44.94, samples=9520
> > bw (  MiB/s): min= 2457, max= 6193, per=100.00%, avg=3923.21, stdev=82.48, samples=144
> > iops        : min=629220, max=1585642, avg=1004340.78, stdev=21116.07, samples=144
> >
> > After (+-0.0%):
> > bw (  KiB/s): min=30561, max=63064, per=100.00%, avg=53635.82, stdev=177.21, samples=9520
> > iops        : min= 7636, max=15762, avg=13402.82, stdev=44.29, samples=9520
> > bw (  MiB/s): min= 2449, max= 6145, per=100.00%, avg=3914.68, stdev=81.15, samples=144
> > iops        : min=627106, max=1573156, avg=1002158.11, stdev=20774.77, samples=144
> >
> > The performance is better (+4%) for 4K cached read and unchanged for THP.
> >
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> >  mm/filemap.c | 127 ++++++++++++++++++++++++++++++---------------------
> >  1 file changed, 76 insertions(+), 51 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 6bbec8783793..c1484bcdbddb 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -848,12 +848,77 @@ void replace_page_cache_folio(struct folio *old, struct folio *new)
> >  }
> >  EXPORT_SYMBOL_GPL(replace_page_cache_folio);
> >
> > +static int __split_add_folio_locked(struct xa_state *xas, struct folio *folio,
> > +                                 pgoff_t index, gfp_t gfp, void **shadowp)
>

Thanks for the very helpful review!

> I don't love the name of this function.  Splitting is a rare thing that
> it does.  I'd suggest it's more filemap_store().

Yes, the function name is a bit misleading indeed, I can rename it as
you suggested, eg. __filemap_store_locked ?

>
> > +{
> > +     void *entry, *shadow, *alloced_shadow = NULL;
> > +     int order, alloced_order = 0;
> > +
> > +     gfp &= GFP_RECLAIM_MASK;
> > +     for (;;) {
> > +             shadow = NULL;
> > +             order = 0;
> > +
> > +             xas_for_each_conflict(xas, entry) {
> > +                     if (!xa_is_value(entry))
> > +                             return -EEXIST;
> > +                     shadow = entry;
> > +             }
> > +
> > +             if (shadow) {
> > +                     if (shadow == xas_reload(xas)) {
>
> Why do you need the xas_reload here?

This part of code works based on the guarantee that If there is a
larger entry, it will be the first and only entry iterated by
xas_for_each_conflict/xas_find_conflict. I added an xas_reload is here
to ensure that. But on second thought, this seems not needed indeed.

Will it be better if I write this part this way?

+ shadow = NULL;
+ order = -1;
+ xas_for_each_conflict(xas, entry) {
+           if (!xa_is_value(entry))
+                    return -EEXIST;
+          /*
+          * If there is a larger entry, it will be the first
+          * and only entry iterated.
+          */
+         if (order == -1)
+                  order = xas_get_order(xas);
+         shadow = entry;
+ }
+
+ if (shadow) {
+          /* check if alloc & split need, or if previous alloc is
still valid */
+         if (order > 0 && order > folio_order(folio)) {
+                   if (shadow != alloced_shadow || order != alloced_order)
+                            goto unlock;
+                   xas_split(xas, shadow, order);
+                   xas_reset(xas);
+          }
+          order = -1;
+          if (shadowp)
+                   *shadowp = shadow;
+ }

>
> > +                             order = xas_get_order(xas);
> > +                             if (order && order > folio_order(folio)) {
> > +                                     /* entry may have been split before we acquired lock */
> > +                                     if (shadow != alloced_shadow || order != alloced_order)
> > +                                             goto unlock;
> > +                                     xas_split(xas, shadow, order);
> > +                                     xas_reset(xas);
> > +                             }
> > +                             order = 0;
> > +                     }
>
> I don't think this is right.  I think we can end up skipping a split
> and storing a folio into a slot which is of greater order than the folio
> we're storing.

If there is a larger slot, xas_for_each_conflict and check above
should catch that?

>
> > +                     if (shadowp)
> > +                             *shadowp = shadow;
> > +             }
> > +
> > +             xas_store(xas, folio);
> > +             /* Success, return with mapping locked */
> > +             if (!xas_error(xas))
> > +                     return 0;
> > +unlock:
> > +             /*
> > +              * Unlock path, if errored, return unlocked.
> > +              * If allocation needed, alloc and retry.
> > +              */
> > +             xas_unlock_irq(xas);
> > +             if (order) {
> > +                     if (unlikely(alloced_order))
> > +                             xas_destroy(xas);
> > +                     xas_split_alloc(xas, shadow, order, gfp);
> > +                     if (!xas_error(xas)) {
> > +                             alloced_shadow = shadow;
> > +                             alloced_order = order;
> > +                     }
> > +                     goto next;
> > +             }
> > +             /* xas_nomem result checked by xas_error below */
> > +             xas_nomem(xas, gfp);
> > +next:
> > +             xas_lock_irq(xas);
> > +             if (xas_error(xas))
> > +                     return xas_error(xas);
> > +
> > +             xas_reset(xas);
> > +     }
> > +}
>
> Splitting this out into a different function while changing the logic
> really makes this hard to review ;-(

Sorry about this :(

This patch basically rewrites the logic of __filemap_add_folio and the
function is getting long, so I thought it would be easier to
understand if we split it out.
I initially updated the code in place but that change diff seems more
messy to me.

>
> I don't object to splitting the function, but maybe two patches; one
> to move the logic and a second to change it?
>

I can keep it in place in V2.


  reply	other threads:[~2024-03-20  9:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19  9:27 [PATCH 0/4] mm/filemap: optimize folio adding and splitting Kairui Song
2024-03-19  9:27 ` [PATCH 1/4] mm/filemap: return early if failed to allocate memory for split Kairui Song
2024-03-19 16:46   ` Matthew Wilcox
2024-03-19  9:27 ` [PATCH 2/4] mm/filemap: clean up hugetlb exclusion code Kairui Song
2024-03-19 16:48   ` Matthew Wilcox
2024-03-19  9:27 ` [PATCH 3/4] lib/xarray: introduce a new helper xas_get_order Kairui Song
2024-03-19 16:52   ` Matthew Wilcox
2024-03-19  9:27 ` [PATCH 4/4] mm/filemap: optimize filemap folio adding Kairui Song
2024-03-19 22:19   ` Matthew Wilcox
2024-03-20  9:06     ` Kairui Song [this message]
2024-03-21 18:35       ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7BGN9SknnXF+doO-3p18OOYRzB0D5PMdwwstpCXtxj47g@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox