From: Uladzislau Rezki <urezki@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
zuoze <zuoze1@huawei.com>,
gustavoars@kernel.org, akpm@linux-foundation.org,
linux-hardening@vger.kernel.org, linux-mm@kvack.org,
keescook@chromium.org
Subject: Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check.
Date: Mon, 16 Dec 2024 20:18:24 +0100 [thread overview]
Message-ID: <Z2B9AMOSgTPboo2U@pc636> (raw)
In-Reply-To: <Z1-riu65--CviPba@casper.infradead.org>
Hello, Matthew!
> On Wed, Dec 04, 2024 at 09:51:07AM +0100, Uladzislau Rezki wrote:
> > I think, when i have more free cycles, i will check it from performance
> > point of view. Because i do not know how much a maple tree is efficient
> > when it comes to lookups, insert and removing.
>
> Maple tree has a fanout of around 8-12 at each level, while an rbtree has
> a fanout of two (arguably 3, since we might find the node). Let's say you
> have 1000 vmalloc areas. A perfectly balanced rbtree would have 9 levels
> (and might well be 11+ levels if imperfectly balanced -- and part of the
> advantage of rbtrees over AVL trees is that they can be less balanced
> so need fewer rotations). A perfectly balanced maple tree would have
> only 3 levels.
>
Thank you for your explanation and some input on this topic. Density, a
high of tree and branching factor should make the work better :)
>
> Addition/removal is more expensive. We biased the implementation heavily
> towards lookup, so we chose to keep it very compact. Most users (and
> particularly the VMA tree which was our first client) do more lookups
> than modifications; a real application takes many more pagefaults than
> it does calls to mmap/munmap/mprotect/etc.
>
This is what i see. Some use cases are degraded. For example stress-ng
forking bench is worse, test_vmalloc.sh also reports a degrade:
See below figures:
<snip>
# Default
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64
+ 59.52% 7.15% [kernel] [k] __vmalloc_node_range_noprof
+ 37.98% 0.22% [test_vmalloc] [k] fix_size_alloc_test
+ 37.32% 8.56% [kernel] [k] vfree.part.0
+ 35.31% 0.00% [kernel] [k] ret_from_fork_asm
+ 35.31% 0.00% [kernel] [k] ret_from_fork
+ 35.31% 0.00% [kernel] [k] kthread
+ 35.05% 0.00% [test_vmalloc] [k] test_func
+ 34.16% 0.06% [test_vmalloc] [k] long_busy_list_alloc_test
+ 32.10% 0.12% [kernel] [k] __get_vm_area_node
+ 31.69% 1.82% [kernel] [k] alloc_vmap_area
+ 27.24% 5.01% [kernel] [k] _raw_spin_lock
+ 25.45% 0.15% [test_vmalloc] [k] full_fit_alloc_test
+ 23.57% 0.03% [kernel] [k] remove_vm_area
+ 22.23% 22.23% [kernel] [k] native_queued_spin_lock_slowpath
+ 14.34% 0.94% [kernel] [k] alloc_pages_bulk_noprof
+ 10.80% 7.51% [kernel] [k] free_vmap_area_noflush
+ 10.59% 10.59% [kernel] [k] clear_page_rep
+ 9.52% 8.96% [kernel] [k] insert_vmap_area
+ 7.39% 2.82% [kernel] [k] find_unlink_vmap_area
# Maple-tree
time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64
+ 74.33% 1.50% [kernel] [k] __vmalloc_node_range_noprof
+ 55.73% 0.06% [kernel] [k] __get_vm_area_node
+ 55.53% 1.07% [kernel] [k] alloc_vmap_area
+ 53.78% 0.09% [test_vmalloc] [k] long_busy_list_alloc_test
+ 53.75% 1.76% [kernel] [k] _raw_spin_lock
+ 52.81% 51.80% [kernel] [k] native_queued_spin_lock_slowpath
+ 28.93% 0.09% [test_vmalloc] [k] full_fit_alloc_test
+ 23.29% 2.43% [kernel] [k] vfree.part.0
+ 20.29% 0.01% [kernel] [k] mt_insert_vmap_area
+ 20.27% 0.34% [kernel] [k] mtree_insert_range
+ 15.30% 0.05% [test_vmalloc] [k] fix_size_alloc_test
+ 14.06% 0.05% [kernel] [k] remove_vm_area
+ 13.73% 0.00% [kernel] [k] ret_from_fork_asm
+ 13.73% 0.00% [kernel] [k] ret_from_fork
+ 13.73% 0.00% [kernel] [k] kthread
+ 13.51% 0.00% [test_vmalloc] [k] test_func
+ 13.15% 0.87% [kernel] [k] alloc_pages_bulk_noprof
+ 9.92% 9.54% [kernel] [k] clear_page_rep
+ 9.62% 0.07% [kernel] [k] find_unlink_vmap_area
+ 9.55% 0.04% [kernel] [k] mtree_erase
+ 5.92% 1.44% [kernel] [k] free_unref_page
+ 4.92% 0.24% [kernel] [k] mas_insert.isra.0
+ 4.69% 0.93% [kernel] [k] mas_erase
+ 4.47% 0.02% [kernel] [k] rcu_do_batch
+ 3.35% 2.10% [kernel] [k] __vmap_pages_range_noflush
+ 3.00% 2.81% [kernel] [k] mas_wr_store_type
<snip>
i.e. insert/remove are more expansive, at least my test show this.
It looks like, mtree_insert() uses a range_variant which implies
a tree update after an insert operation is completed. And probably
where an overhead comes from. If i use a b+tree(my own implementation),
as expected, it is better than rb-tree because of b+tree properties.
I have composed some data, you can find more bench data there:
wget ftp://vps418301.ovh.net/incoming/Maple_tree_comparison_with_rb_tree_in_vmalloc.pdf
>> That's what maple trees do; they store non-overlapping ranges. So you
>> can look up any address in a range and it will return you the pointer
>> associated with that range. Just like you'd want for a page fault ;-)
Thank you. I see. I though that it also can work as a regular b+ or b
tress so we do not spend cycles on updates to track ranges. Like below
code:
int ret = mtree_insert(t, va->va_start, va, GFP_KERNEL);
i do not store a range here, i store key -> value pair but maple-tree
considers it as range: [va_start:va_start]. Maybe we can improve this
case when not a range is passed?
This is just my thoughts :)
--
Uladzislau Rezki
next prev parent reply other threads:[~2024-12-16 19:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-03 2:31 Ze Zuo
2024-12-03 4:11 ` Matthew Wilcox
2024-12-03 11:23 ` zuoze
2024-12-03 12:39 ` Uladzislau Rezki
2024-12-03 13:10 ` zuoze
2024-12-03 13:25 ` Uladzislau Rezki
2024-12-03 13:30 ` Kefeng Wang
2024-12-03 13:39 ` Uladzislau Rezki
2024-12-03 13:45 ` Kefeng Wang
2024-12-03 13:51 ` Uladzislau Rezki
2024-12-03 14:10 ` Kefeng Wang
2024-12-03 14:20 ` Uladzislau Rezki
2024-12-03 19:02 ` Uladzislau Rezki
2024-12-03 19:56 ` Matthew Wilcox
2024-12-04 1:38 ` zuoze
2024-12-04 4:43 ` Kees Cook
2024-12-04 7:55 ` Uladzislau Rezki
2024-12-04 9:21 ` zuoze
2024-12-04 9:27 ` Uladzislau Rezki
2024-12-04 8:51 ` Uladzislau Rezki
2024-12-16 4:24 ` Matthew Wilcox
2024-12-16 19:18 ` Uladzislau Rezki [this message]
2024-12-04 1:21 ` zuoze
2024-12-03 6:12 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2B9AMOSgTPboo2U@pc636 \
--to=urezki@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=gustavoars@kernel.org \
--cc=keescook@chromium.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=zuoze1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox