From: Mateusz Guzik <mjguzik@gmail.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Liam.Howlett@oracle.com, pedro.falcato@gmail.com,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [RFC PATCH] vm: align vma allocation and move the lock back into the struct
Date: Mon, 12 Aug 2024 06:29:38 +0200 [thread overview]
Message-ID: <CAGudoHF=oPXU1RaCn3G0Scqw8+yr_0-Mj4ENZSYMyyGwc5Cgcg@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com>
On Mon, Aug 12, 2024 at 12:50 AM Suren Baghdasaryan <surenb@google.com> wrote:
> Ok, disabling adjacent cacheline prefetching seems to do the trick (or
> at least cuts down the regression drastically):
>
> Hmean faults/cpu-1 470577.6434 ( 0.00%) 470745.2649 * 0.04%*
> Hmean faults/cpu-4 445862.9701 ( 0.00%) 445572.2252 * -0.07%*
> Hmean faults/cpu-7 422516.4002 ( 0.00%) 422677.5591 * 0.04%*
> Hmean faults/cpu-12 344483.7047 ( 0.00%) 330476.7911 * -4.07%*
> Hmean faults/cpu-21 192836.0188 ( 0.00%) 195266.8071 * 1.26%*
> Hmean faults/cpu-30 140745.9472 ( 0.00%) 140655.0459 * -0.06%*
> Hmean faults/cpu-48 110507.4310 ( 0.00%) 103802.1839 * -6.07%*
> Hmean faults/cpu-56 93507.7919 ( 0.00%) 95105.1875 * 1.71%*
> Hmean faults/sec-1 470232.3887 ( 0.00%) 470404.6525 * 0.04%*
> Hmean faults/sec-4 1757368.9266 ( 0.00%) 1752852.8697 * -0.26%*
> Hmean faults/sec-7 2909554.8150 ( 0.00%) 2915885.8739 * 0.22%*
> Hmean faults/sec-12 4033840.8719 ( 0.00%) 3845165.3277 * -4.68%*
> Hmean faults/sec-21 3845857.7079 ( 0.00%) 3890316.8799 * 1.16%*
> Hmean faults/sec-30 3838607.4530 ( 0.00%) 3838861.8142 * 0.01%*
> Hmean faults/sec-48 4882118.9701 ( 0.00%) 4608985.0530 * -5.59%*
> Hmean faults/sec-56 4933535.7567 ( 0.00%) 5004208.3329 * 1.43%*
>
> Now, how do we disable prefetching extra cachelines for vm_area_structs only?
I'm unaware of any mechanism of the sort.
The good news is that Broadwell is an old yeller and if memory serves
right the impact is not anywhere near this bad on newer
microarchitectures, making "merely" 64 alignment (used all over in the
kernel for amd64) a practical choice (not just for vma).
Also note that in your setup you are losing out on performance in
other multithreaded cases, unrelated to anything vma.
That aside as I mentioned earlier the dedicated vma lock cache results
in false sharing between separate vmas, except this particular
benchmark does not test for it (which in your setup should be visible
even if the cache grows the SLAB_HWCACHE_ALIGN flag).
I think the thing to do here is to bench on other cpus and ignore the
Broadwell + adjacent cache line prefetcher result if they come back
fine -- the code should not be held hostage by an old yeller.
To that end I think it would be best to ask the LKP folks at Intel.
They are very approachable so there should be no problem arranging it
provided they have some spare capacity. I believe grabbing the From
person and the cc list from this thread will do it:
https://lore.kernel.org/oe-lkp/ZriCbCPF6I0JnbKi@xsang-OptiPlex-9020/ .
By default they would run their own suite, which presumably has some
overlap with this particular benchmark in terms of generated workload
(but I don't think they run *this* particular benchmark itself,
perhaps it would make sense to ask them to add it?). It's your call
here.
If there are still problems and the lock needs to remain separate, the
bare minimum damage-controlling measure would be to hwalign the vma
lock cache -- it wont affect the pts benchmark, but it should help
others.
Should the decision be to bring the lock back into the struct, I'll
note my patch is merely slapped together to a state where it can be
benchmarked and I have no interest in beating it into a committable
shape. You stated you already had an equivalent (modulo keeping
something in a space previously occupied by the pointer to the vma
lock), so as far as I'm concerned you can submit that with your
authorship.
--
Mateusz Guzik <mjguzik gmail.com>
next prev parent reply other threads:[~2024-08-12 4:29 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-08 18:59 Mateusz Guzik
2024-08-08 19:38 ` Suren Baghdasaryan
2024-08-08 20:03 ` Mateusz Guzik
2024-08-08 21:19 ` Suren Baghdasaryan
2024-08-09 3:57 ` Suren Baghdasaryan
2024-08-09 8:14 ` Mateusz Guzik
2024-08-09 14:59 ` Suren Baghdasaryan
2024-08-09 15:09 ` Vlastimil Babka
2024-08-09 16:56 ` Suren Baghdasaryan
2024-08-11 22:50 ` Suren Baghdasaryan
2024-08-12 4:29 ` Mateusz Guzik [this message]
2024-08-12 15:27 ` Suren Baghdasaryan
2024-08-12 16:30 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGudoHF=oPXU1RaCn3G0Scqw8+yr_0-Mj4ENZSYMyyGwc5Cgcg@mail.gmail.com' \
--to=mjguzik@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=pedro.falcato@gmail.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox