From: Wen Gu <guwen@linux.alibaba.com>
To: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: shaozhengchao <shaozhengchao@huawei.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree
Date: Fri, 5 Jan 2024 16:10:32 +0800 [thread overview]
Message-ID: <238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com> (raw)
In-Reply-To: <20240102184633.748113-5-urezki@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5957 bytes --]
On 2024/01/03 02:46, Uladzislau Rezki wrote:
> Store allocated objects in a separate nodes. A va->va_start
> address is converted into a correct node where it should
> be placed and resided. An addr_to_node() function is used
> to do a proper address conversion to determine a node that
> contains a VA.
>
> Such approach balances VAs across nodes as a result an access
> becomes scalable. Number of nodes in a system depends on number
> of CPUs.
>
> Please note:
>
> 1. As of now allocated VAs are bound to a node-0. It means the
> patch does not give any difference comparing with a current
> behavior;
>
> 2. The global vmap_area_lock, vmap_area_root are removed as there
> is no need in it anymore. The vmap_area_list is still kept and
> is _empty_. It is exported for a kexec only;
>
> 3. The vmallocinfo and vread() have to be reworked to be able to
> handle multiple nodes.
>
> Reviewed-by: Baoquan He <bhe@redhat.com>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
<...>
> struct vmap_area *find_vmap_area(unsigned long addr)
> {
> + struct vmap_node *vn;
> struct vmap_area *va;
> + int i, j;
>
> - spin_lock(&vmap_area_lock);
> - va = __find_vmap_area(addr, &vmap_area_root);
> - spin_unlock(&vmap_area_lock);
> + /*
> + * An addr_to_node_id(addr) converts an address to a node index
> + * where a VA is located. If VA spans several zones and passed
> + * addr is not the same as va->va_start, what is not common, we
> + * may need to scan an extra nodes. See an example:
> + *
> + * <--va-->
> + * -|-----|-----|-----|-----|-
> + * 1 2 0 1
> + *
> + * VA resides in node 1 whereas it spans 1 and 2. If passed
> + * addr is within a second node we should do extra work. We
> + * should mention that it is rare and is a corner case from
> + * the other hand it has to be covered.
> + */
> + i = j = addr_to_node_id(addr);
> + do {
> + vn = &vmap_nodes[i];
>
> - return va;
> + spin_lock(&vn->busy.lock);
> + va = __find_vmap_area(addr, &vn->busy.root);
> + spin_unlock(&vn->busy.lock);
> +
> + if (va)
> + return va;
> + } while ((i = (i + 1) % nr_vmap_nodes) != j);
> +
> + return NULL;
> }
>
Hi Uladzislau Rezki,
I really like your work, it is great and helpful!
Currently, I am working on using shared memory communication (SMC [1])
to transparently accelerate TCP communication between two peers within
the same OS instance[2].
In this scenario, a vzalloced kernel buffer acts as a shared memory and
will be simultaneous read or written by two SMC sockets, thus forming an
SMC connection.
socket1 socket2
| ^
| | userspace
---- write -------------------- read ------
| +-----------------+ | kernel
+--->| shared memory |---+
| (vzalloced now) |
+-----------------+
Then I encountered the performance regression caused by lock contention
in find_vmap_area() when multiple threads transfer data through multiple
SMC connections on machines with many CPUs[3].
According to perf, the performance bottleneck is caused by the global
vmap_area_lock contention[4]:
- writer:
smc_tx_sendmsg
-> memcpy_from_msg
-> copy_from_iter
-> check_copy_size
-> check_object_size
-> if (CONFIG_HARDENED_USERCOPY is set) check_heap_object
-> if(vm) find_vmap_area
-> try to hold vmap_area_lock lock
- reader:
smc_rx_recvmsg
-> memcpy_to_msg
-> copy_to_iter
-> check_copy_size
-> check_object_size
-> if (CONFIG_HARDENED_USERCOPY is set) check_heap_object
-> if(vm) find_vmap_area
-> try to hold vmap_area_lock lock
Fortunately, thank you for this patch set, the global vmap_area_lock was
removed and per node lock vn->busy.lock is introduced. it is really helpful:
In 48 CPUs qemu environment, the Requests/s increased by 5 times:
- nginx
- wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80
vzalloced shmem vzalloced shmem(with this patch set)
Requests/sec 113536.56 583729.93
But it also has some overhead, compared to using kzalloced shared memory
or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:
kzalloced shmem vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
Requests/sec 831950.39 805164.78
So, as a newbie in Linux-mm, I would like to ask for some suggestions:
Is it possible to further eliminate the overhead caused by lock contention
in find_vmap_area() in this scenario (maybe this is asking too much), or the
only way out is not setting CONFIG_HARDENED_USERCOPY or not using vzalloced
buffer in the situation where cocurrent kernel-userspace-copy happens?
Any feedback will be appreciated. Thanks again for your time.
[1] Shared Memory Communications (SMC) enables two SMC capable peers to
communicate by using memory buffers that each peer allocates for the
partner's use. It improves throughput, lowers latency and cost, and
maintains existing functions. See details in https://www.ibm.com/support/pages/node/7009315
[2] https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/
[3] issues: https://lore.kernel.org/all/1fbd6b74-1080-923a-01c1-689c3d65f880@huawei.com/
analysis: https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/
[4] Some flamegraphs are attached,
- SMC using vzalloced buffer: vzalloc_t96.svg
- SMC using vzalloced buffer and with this patchset: vzalloc_t96_improve.svg
- SMC using vzalloced buffer and unset CONFIG_HARDENED_USERCOPY: vzalloc_t96_nocheck.svg
- SMC using kzalloced buffer: kzalloc_t96.svg
Best regards,
Wen Gu
[-- Attachment #2: vzalloc_t96.svg --]
[-- Type: image/svg+xml, Size: 202182 bytes --]
[-- Attachment #3: vzalloc_t96_improve.svg --]
[-- Type: image/svg+xml, Size: 276330 bytes --]
[-- Attachment #4: vzalloc_t96_nocheck.svg --]
[-- Type: image/svg+xml, Size: 283784 bytes --]
[-- Attachment #5: kzalloc_t96.svg --]
[-- Type: image/svg+xml, Size: 296847 bytes --]
next prev parent reply other threads:[~2024-01-05 8:10 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-02 18:46 [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 01/11] mm: vmalloc: Add va_alloc() helper Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 02/11] mm: vmalloc: Rename adjust_va_to_fit_type() function Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 03/11] mm: vmalloc: Move vmap_init_free_space() down in vmalloc.c Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 04/11] mm: vmalloc: Remove global vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-05 8:10 ` Wen Gu [this message]
2024-01-05 10:50 ` Uladzislau Rezki
2024-01-06 9:17 ` Wen Gu
2024-01-06 16:36 ` Uladzislau Rezki
2024-01-07 6:59 ` Hillf Danton
2024-01-08 7:45 ` Wen Gu
2024-01-08 18:37 ` Uladzislau Rezki
2024-01-16 23:25 ` Lorenzo Stoakes
2024-01-18 13:15 ` Uladzislau Rezki
2024-01-20 12:55 ` Lorenzo Stoakes
2024-01-22 17:44 ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 05/11] mm/vmalloc: remove vmap_area_list Uladzislau Rezki (Sony)
2024-01-16 23:36 ` Lorenzo Stoakes
2024-01-02 18:46 ` [PATCH v3 06/11] mm: vmalloc: Remove global purge_vmap_area_root rb-tree Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 07/11] mm: vmalloc: Offload free_vmap_area_lock lock Uladzislau Rezki (Sony)
2024-01-03 11:08 ` Hillf Danton
2024-01-03 15:47 ` Uladzislau Rezki
2024-01-11 9:02 ` Dave Chinner
2024-01-11 15:54 ` Uladzislau Rezki
2024-01-11 20:37 ` Dave Chinner
2024-01-12 12:18 ` Uladzislau Rezki
2024-01-16 22:12 ` Dave Chinner
2024-01-18 18:15 ` Uladzislau Rezki
2024-02-08 0:25 ` Baoquan He
2024-02-08 13:57 ` Uladzislau Rezki
2024-02-28 9:48 ` Baoquan He
2024-02-28 10:39 ` Uladzislau Rezki
2024-02-28 12:26 ` Baoquan He
2024-03-22 18:21 ` Guenter Roeck
2024-03-22 19:03 ` Uladzislau Rezki
2024-03-22 20:53 ` Guenter Roeck
2024-01-02 18:46 ` [PATCH v3 08/11] mm: vmalloc: Support multiple nodes in vread_iter Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 09/11] mm: vmalloc: Support multiple nodes in vmallocinfo Uladzislau Rezki (Sony)
2024-01-02 18:46 ` [PATCH v3 10/11] mm: vmalloc: Set nr_nodes based on CPUs in a system Uladzislau Rezki (Sony)
2024-01-11 9:25 ` Dave Chinner
2024-01-15 19:09 ` Uladzislau Rezki
2024-01-16 22:06 ` Dave Chinner
2024-01-18 18:23 ` Uladzislau Rezki
2024-01-18 21:28 ` Dave Chinner
2024-01-19 10:32 ` Uladzislau Rezki
2024-01-02 18:46 ` [PATCH v3 11/11] mm: vmalloc: Add a shrinker to drain vmap pools Uladzislau Rezki (Sony)
2024-02-22 8:35 ` [PATCH v3 00/11] Mitigate a vmap lock contention v3 Uladzislau Rezki
2024-02-22 23:15 ` Pedro Falcato
2024-02-23 9:34 ` Uladzislau Rezki
2024-02-23 10:26 ` Baoquan He
2024-02-23 11:06 ` Uladzislau Rezki
2024-02-23 15:57 ` Baoquan He
2024-02-23 18:55 ` Uladzislau Rezki
2024-02-28 9:27 ` Baoquan He
2024-02-29 10:38 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com \
--to=guwen@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shaozhengchao@huawei.com \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox