From: Uladzislau Rezki <urezki@gmail.com>
To: Adrian Huang <adrianhuang0701@gmail.com>
Cc: urezki@gmail.com, ahuang12@lenovo.com, akpm@linux-foundation.org,
hch@infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 1/1] mm: vmalloc: Optimize vmap_lazy_nr arithmetic when purging each vmap_area
Date: Mon, 2 Sep 2024 19:00:44 +0200 [thread overview]
Message-ID: <ZtXvPAoAo7UooJoV@pc636> (raw)
In-Reply-To: <20240902120046.26478-1-ahuang12@lenovo.com>
On Mon, Sep 02, 2024 at 08:00:46PM +0800, Adrian Huang wrote:
> On Fri, Aug 30, 2024 at 3:00 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > atomic_long_add_return() might also introduce a high contention. We can
> > optimize by splitting into more light atomics. Can you check it on your
> > 448-cores system?
>
> Interestingly, the following result shows the latency of
> free_vmap_area_noflush() is just 26 usecs (The worst case is 16ms-32ms).
>
> /home/git-repo/bcc/tools/funclatency.py -u free_vmap_area_noflush & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1
>
> usecs : count distribution
> 0 -> 1 : 18166 | |
> 2 -> 3 : 41929818 |** |
> 4 -> 7 : 181203439 |*********** |
> 8 -> 15 : 464242836 |***************************** |
> 16 -> 31 : 620077545 |****************************************|
> 32 -> 63 : 442133041 |**************************** |
> 64 -> 127 : 111432597 |******* |
> 128 -> 255 : 3441649 | |
> 256 -> 511 : 302655 | |
> 512 -> 1023 : 738 | |
> 1024 -> 2047 : 73 | |
> 2048 -> 4095 : 0 | |
> 4096 -> 8191 : 0 | |
> 8192 -> 16383 : 0 | |
> 16384 -> 32767 : 196 | |
>
> avg = 26 usecs, total: 49415657269 usecs, count: 1864782753
>
>
> free_vmap_area_noflush() just executes the lock prefix one time, so the
> wrost case might be just about a hundred clock cycles.
>
> The problem of purge_vmap_node() is that some cores are busy on purging
> each vmap_area of the *long* purge_list and executing atomic_long_sub()
> for each vmap_area, while other cores free vmalloc allocations and execute
> atomic_long_add_return() in free_vmap_area_noflush(). The following crash
> log shows the 22 cores are busy on purging vmap_area structs [1]:
>
> crash> bt -a | grep "purge_vmap_node+291" | wc -l
> 22
>
> So, the latency of purge_vmap_node() dramatically increases becase it
> excutes the lock prefix over 600,0000 times. The issue can be easier
> to reproduce if more cores execute purge_vmap_node() simultaneously.
>
Right. This is clear to me. Under heavy stressing in a tight loop we
invoke atomic_long_sub() per one freed VA. Having 448-cores and one
stress job per-cpu we end up with a high-contention spot when access
to an atomic which requires a cache-line lock.
>
>
> Tested the following patch with the light atomics. However, nothing improved
> (But, the worst case is improved):
>
> usecs : count distribution
> 0 -> 1 : 7146 | |
> 2 -> 3 : 31734187 |** |
> 4 -> 7 : 161408609 |*********** |
> 8 -> 15 : 461411377 |********************************* |
> 16 -> 31 : 557005293 |****************************************|
> 32 -> 63 : 435518485 |******************************* |
> 64 -> 127 : 175033097 |************ |
> 128 -> 255 : 42265379 |*** |
> 256 -> 511 : 399112 | |
> 512 -> 1023 : 734 | |
> 1024 -> 2047 : 72 | |
>
> avg = 32 usecs, total: 59952713176 usecs, count: 1864783491
>
Thank you for checking this! So there is no difference. As for worst
case, it might be an error of measurements. The problem is that we/you
measure the time which includes a context switch because a context which
triggers the free_vmap_area_noflush() function can easily be preempted.
--
Uladzislau Rezki
next prev parent reply other threads:[~2024-09-02 17:00 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-29 13:06 Adrian Huang
2024-08-29 19:00 ` Uladzislau Rezki
2024-08-30 16:26 ` Uladzislau Rezki
2024-08-31 7:03 ` Christophe JAILLET
2024-09-02 17:03 ` Uladzislau Rezki
2024-09-02 12:00 ` Adrian Huang
2024-09-02 17:00 ` Uladzislau Rezki [this message]
2024-08-31 0:33 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZtXvPAoAo7UooJoV@pc636 \
--to=urezki@gmail.com \
--cc=adrianhuang0701@gmail.com \
--cc=ahuang12@lenovo.com \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox