Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	zuoze <zuoze1@huawei.com>,
	gustavoars@kernel.org, akpm@linux-foundation.org,
	linux-hardening@vger.kernel.org, linux-mm@kvack.org,
	keescook@chromium.org
Subject: Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check.
Date: Mon, 16 Dec 2024 20:18:24 +0100	[thread overview]
Message-ID: <Z2B9AMOSgTPboo2U@pc636> (raw)
In-Reply-To: <Z1-riu65--CviPba@casper.infradead.org>

Hello, Matthew!

> On Wed, Dec 04, 2024 at 09:51:07AM +0100, Uladzislau Rezki wrote:
> > I think, when i have more free cycles, i will check it from performance
> > point of view. Because i do not know how much a maple tree is efficient
> > when it comes to lookups, insert and removing.
> 
> Maple tree has a fanout of around 8-12 at each level, while an rbtree has
> a fanout of two (arguably 3, since we might find the node).  Let's say you
> have 1000 vmalloc areas.  A perfectly balanced rbtree would have 9 levels
> (and might well be 11+ levels if imperfectly balanced -- and part of the
> advantage of rbtrees over AVL trees is that they can be less balanced
> so need fewer rotations).  A perfectly balanced maple tree would have
> only 3 levels.
> 
Thank you for your explanation and some input on this topic. Density, a
high of tree and branching factor should make the work better :)

>
> Addition/removal is more expensive.  We biased the implementation heavily
> towards lookup, so we chose to keep it very compact.  Most users (and
> particularly the VMA tree which was our first client) do more lookups
> than modifications; a real application takes many more pagefaults than
> it does calls to mmap/munmap/mprotect/etc.
> 
This is what i see. Some use cases are degraded. For example stress-ng
forking bench is worse, test_vmalloc.sh also reports a degrade:

See below figures:

<snip>
# Default
urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64
+   59.52%     7.15%  [kernel]        [k] __vmalloc_node_range_noprof
+   37.98%     0.22%  [test_vmalloc]  [k] fix_size_alloc_test
+   37.32%     8.56%  [kernel]        [k] vfree.part.0
+   35.31%     0.00%  [kernel]        [k] ret_from_fork_asm
+   35.31%     0.00%  [kernel]        [k] ret_from_fork
+   35.31%     0.00%  [kernel]        [k] kthread
+   35.05%     0.00%  [test_vmalloc]  [k] test_func
+   34.16%     0.06%  [test_vmalloc]  [k] long_busy_list_alloc_test
+   32.10%     0.12%  [kernel]        [k] __get_vm_area_node
+   31.69%     1.82%  [kernel]        [k] alloc_vmap_area
+   27.24%     5.01%  [kernel]        [k] _raw_spin_lock
+   25.45%     0.15%  [test_vmalloc]  [k] full_fit_alloc_test
+   23.57%     0.03%  [kernel]        [k] remove_vm_area
+   22.23%    22.23%  [kernel]        [k] native_queued_spin_lock_slowpath
+   14.34%     0.94%  [kernel]        [k] alloc_pages_bulk_noprof
+   10.80%     7.51%  [kernel]        [k] free_vmap_area_noflush
+   10.59%    10.59%  [kernel]        [k] clear_page_rep
+    9.52%     8.96%  [kernel]        [k] insert_vmap_area
+    7.39%     2.82%  [kernel]        [k] find_unlink_vmap_area

# Maple-tree
time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64
+   74.33%     1.50%  [kernel]        [k] __vmalloc_node_range_noprof
+   55.73%     0.06%  [kernel]        [k] __get_vm_area_node
+   55.53%     1.07%  [kernel]        [k] alloc_vmap_area
+   53.78%     0.09%  [test_vmalloc]  [k] long_busy_list_alloc_test
+   53.75%     1.76%  [kernel]        [k] _raw_spin_lock
+   52.81%    51.80%  [kernel]        [k] native_queued_spin_lock_slowpath
+   28.93%     0.09%  [test_vmalloc]  [k] full_fit_alloc_test
+   23.29%     2.43%  [kernel]        [k] vfree.part.0
+   20.29%     0.01%  [kernel]        [k] mt_insert_vmap_area
+   20.27%     0.34%  [kernel]        [k] mtree_insert_range
+   15.30%     0.05%  [test_vmalloc]  [k] fix_size_alloc_test
+   14.06%     0.05%  [kernel]        [k] remove_vm_area
+   13.73%     0.00%  [kernel]        [k] ret_from_fork_asm
+   13.73%     0.00%  [kernel]        [k] ret_from_fork
+   13.73%     0.00%  [kernel]        [k] kthread
+   13.51%     0.00%  [test_vmalloc]  [k] test_func
+   13.15%     0.87%  [kernel]        [k] alloc_pages_bulk_noprof
+    9.92%     9.54%  [kernel]        [k] clear_page_rep
+    9.62%     0.07%  [kernel]        [k] find_unlink_vmap_area
+    9.55%     0.04%  [kernel]        [k] mtree_erase
+    5.92%     1.44%  [kernel]        [k] free_unref_page
+    4.92%     0.24%  [kernel]        [k] mas_insert.isra.0
+    4.69%     0.93%  [kernel]        [k] mas_erase
+    4.47%     0.02%  [kernel]        [k] rcu_do_batch
+    3.35%     2.10%  [kernel]        [k] __vmap_pages_range_noflush
+    3.00%     2.81%  [kernel]        [k] mas_wr_store_type
<snip>

i.e. insert/remove are more expansive, at least my test show this.
It looks like, mtree_insert() uses a range_variant which implies
a tree update after an insert operation is completed. And probably
where an overhead comes from. If i use a b+tree(my own implementation),
as expected, it is better than rb-tree because of b+tree properties.

I have composed some data, you can find more bench data there:

wget ftp://vps418301.ovh.net/incoming/Maple_tree_comparison_with_rb_tree_in_vmalloc.pdf

>> That's what maple trees do; they store non-overlapping ranges.  So you
>> can look up any address in a range and it will return you the pointer
>> associated with that range.  Just like you'd want for a page fault ;-)
Thank you. I see. I though that it also can work as a regular b+ or b
tress so we do not spend cycles on updates to track ranges. Like below
code:

int ret = mtree_insert(t, va->va_start, va, GFP_KERNEL);

i do not store a range here, i store key -> value pair but maple-tree
considers it as range: [va_start:va_start]. Maybe we can improve this
case when not a range is passed?

This is just my thoughts :)

--
Uladzislau Rezki

next prev parent reply	other threads:[~2024-12-16 19:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-03  2:31 Ze Zuo
2024-12-03  4:11 ` Matthew Wilcox
2024-12-03 11:23   ` zuoze
2024-12-03 12:39     ` Uladzislau Rezki
2024-12-03 13:10       ` zuoze
2024-12-03 13:25         ` Uladzislau Rezki
2024-12-03 13:30         ` Kefeng Wang
2024-12-03 13:39           ` Uladzislau Rezki
2024-12-03 13:45             ` Kefeng Wang
2024-12-03 13:51               ` Uladzislau Rezki
2024-12-03 14:10                 ` Kefeng Wang
2024-12-03 14:20                   ` Uladzislau Rezki
2024-12-03 19:02                     ` Uladzislau Rezki
2024-12-03 19:56                       ` Matthew Wilcox
2024-12-04  1:38                         ` zuoze
2024-12-04  4:43                         ` Kees Cook
2024-12-04  7:55                         ` Uladzislau Rezki
2024-12-04  9:21                           ` zuoze
2024-12-04  9:27                             ` Uladzislau Rezki
2024-12-04  8:51                         ` Uladzislau Rezki
2024-12-16  4:24                           ` Matthew Wilcox
2024-12-16 19:18                             ` Uladzislau Rezki [this message]
2024-12-04  1:21                       ` zuoze
2024-12-03  6:12 ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2B9AMOSgTPboo2U@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=gustavoars@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=zuoze1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox