From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6E1DE7717F for ; Mon, 16 Dec 2024 19:18:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C43D6B00A5; Mon, 16 Dec 2024 14:18:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 774586B00A8; Mon, 16 Dec 2024 14:18:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63C9C8D0001; Mon, 16 Dec 2024 14:18:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 43FB36B00A5 for ; Mon, 16 Dec 2024 14:18:32 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BFB501C7D86 for ; Mon, 16 Dec 2024 19:18:31 +0000 (UTC) X-FDA: 82901783286.02.BBC74F9 Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf19.hostedemail.com (Postfix) with ESMTP id A21761A0003 for ; Mon, 16 Dec 2024 19:17:58 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="QOd/LNH5"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734376695; a=rsa-sha256; cv=none; b=cOoIG3s4q6Yk58UaDo6IsBkwNyrsjdLclvKdzzOMi2Umdk0LnoE1rdNJcYj8UbPOGWDcq2 F7dSqONJe5fY+bz15iiv/B+QtIgInS5JGNcGNKeoNzANh+RL5m9TZaemgZjb1vL+H5K7Rf pT9wlnYbgDm2w3nlge+jFBOOS2Q/Mxo= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="QOd/LNH5"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734376695; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0qIkhhjweqlnzre4u5fMNQlUZKG2BC9xxiFTsEQhvHs=; b=EVGUyNHdklUo6nkrSVo3avU61EVBqtRJX2GfjkA5w55ZS8JI8/3c+TLBT/x7GKwW8mXG4A NE0n7IK+GscsLPE0XCiV+zzp1buEaxQ9VusdFKZzo3jFg+XL5zzG82Dt7QZWwW7AlXAeXn F33PK1Gnb3OfHa/dFQcsQb1mUnNewMI= Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-30229d5b1caso45122391fa.2 for ; Mon, 16 Dec 2024 11:18:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734376708; x=1734981508; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=0qIkhhjweqlnzre4u5fMNQlUZKG2BC9xxiFTsEQhvHs=; b=QOd/LNH5wqw/vDrAKPk4i8gGgFdJTikyQkZ0eCcYAZ8GR4YE0OXH752tVQi+9DQyD6 xj5vpywifm0jPk0TdX4lUH5W5zMeQxO4XKwaSZRA9g7Nn2v66lH54IDXblm5rte87P+v igd1Roc9Xcp81jZ9mRzbWA8hrXczUDfk5IjG3ha/mVEMA7PC+KlX3+JOzFVXLMNb5JTN LLeefIvlUHT/QJ8ixDPr/GPQoOrA0TgksuMH1bKVnIAud+/ehGGxwzkrJshyvJM4tB9j i7sWPuYGB96/dfR7I89//xRgeSx8uOFcpdl0MhiDpRqyAP3jCBxO/ygaVvyCnSdfEkqe GagQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734376708; x=1734981508; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0qIkhhjweqlnzre4u5fMNQlUZKG2BC9xxiFTsEQhvHs=; b=rNr8mXjW9L0coBRUhuKP4DjgAp/nYa/tM3KYK/7E2oLkg4v+FvvmNFkLa5CkUZxHkI jRVdsuNjs8K54+jfDiWctxOY/Esf0HxXPjtjYldgeCoEsNI0CAFtJbqNJh/yt2hiuWzE HnHkrexqwrTgn4QkWgEFhg4Gn3a65KuIKNvOkDbwKp3hUw0H8PuM8klaMf2+6IKgxCUS 9gWfWeNmh6O4aCYXGZk5NPTCtVXcDPku3K6VTH2RBZtR4wjXi18pRQl5nBsFnO2lLspb yWg90/HAuulvfvtX6QDY8EDESD4U/tJyxsKzifYHsQM3GWxUhNsAzBLi9fE2EexYJ1GT Kp0Q== X-Forwarded-Encrypted: i=1; AJvYcCWC0s97U7qt6vLgxfLr2wlapaxc5hsK4/I0eAqHfg5M161RT7yLzEnCCwwvWDBaQ0oIRL9SS4YC0A==@kvack.org X-Gm-Message-State: AOJu0YzGh+5SBaCNFGjCth6gl4c+1VQ6y04Cnp8pnDcK4YVL0y5Eax8s cq7eguAnLi+aFaeuF0ttaNA2mu3z57LjAEtRqQfYdK6t6AVeKx7k X-Gm-Gg: ASbGncsPkHSq8h3IlmIcbP3GWrY0Ic0X+iPqa4RiWWnXDibAhdLB8fzsrAkN1/CKija VEH9pKHZqw2kHhmKkK4xgRJGpFiJsNvjWzXsxUyAbDTJg9u6RA0oOviPbA0q7U3uFhToJp4s4yy y+vt4a4ThB4extWPahLVjUZFisnS5S5a4BXtyMXQIWbFV5UVy0oDT2/LVuhljkpt9Duw7EQYild tKBLWaGPflWclANLfTWAvANGCF3KivbKzPM0ociVkJmsoA7eDpyJ+gJDlpX6tO+ft1fjKugmQZl xrQ= X-Google-Smtp-Source: AGHT+IGpMpSH+7h0Xrwnrp5NX2sExSdVit7K8OUZuQwEXfC2IsZQlUbaeHsKiWpeRvs+m7n+BJ59NQ== X-Received: by 2002:a2e:a546:0:b0:302:336a:8ada with SMTP id 38308e7fff4ca-3025459b71fmr45271311fa.27.1734376707481; Mon, 16 Dec 2024 11:18:27 -0800 (PST) Received: from pc636 (host-95-203-7-38.mobileonline.telia.com. [95.203.7.38]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-30344175ad0sm10140111fa.84.2024.12.16.11.18.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Dec 2024 11:18:26 -0800 (PST) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Mon, 16 Dec 2024 20:18:24 +0100 To: Matthew Wilcox Cc: Uladzislau Rezki , Kefeng Wang , zuoze , gustavoars@kernel.org, akpm@linux-foundation.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, keescook@chromium.org Subject: Re: [PATCH -next] mm: usercopy: add a debugfs interface to bypass the vmalloc check. Message-ID: References: <76995749-1c2e-4f78-9aac-a4bff4b8097f@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A21761A0003 X-Stat-Signature: 1pa5mzn9xj7ccyezjcowxnimro13stpp X-HE-Tag: 1734376678-747612 X-HE-Meta: U2FsdGVkX18M0nOCvks4Uvnsfd1jP5ZuITeqyS4PDV0o6d/ANitfDGM8VSwIn1M/auGcX3MO6N0E7NAcxOsVWmzIRFLI5FRF5r7tJ54GKtWlIYy4NplcCtz8SP4mNfO1jGmkCuEsa17a7sKSDUQAsYERxlbz275Yr5DKiIenV7nelfrSa8OrVTJW2B5azP8Lj0r+o3i89sVSXG578xtgKDkH/Neki55iaM0H7hMUuemtHOHpIwQgAIoO/aX6NzlQX9mjFQcxCXC3jtclW1jrtnbSxOXtvtf4BjG8Uc0EDiwyCLwKkYjyybOpoPqkPupzZCXbE4EFsXypY7QVmLJx3y8PwSFVmmL/5WCPnKToxm3jkxPOb8AkFsfE0lHfdfj5Ry0LGXWH8evnpeE63kHtXgIWlpkOgjEkLRNG6BFiNXhnggmKcd8JYP0wE9DRiBtI7sT2YK6lfN+zHaWbHqkaw28F2BUOSz0BZP3bDz5P+xpgBVnlQ8V/nwQk0aDOCE+reGTX+cBAGpDK5tWPARRGlSRw/tvd1u0/PiyTY+f8DrpdVfcxoLUTWSFvSobOstqTuduIsVCYhpDdCA7dQXiFCcUHmKoKnoIxw3DO4COIpVPKBkaVd5dPtnk2NhLpYOqtL1+b4j4xrOe9KhBqccBnLDogV4yQE9CIkqTTER03YB8L0tuOhWIeczzs2EIH5AZtXqJ9yiF6Mf8fLx0rLhFiccDq7HZIzbwED5xRUtJTngCEl0HQI1WEcMuI8tsjCjwDewHRRjlVSdKY8SiZqr3c+wzUf2WkockXysPcJLowPuA3UyCJ5zKQBVSAIlzi68UMad/9fYsD8MsC+OdN7MXjm7ulrE7HPFdIpHrDaLpjQSWz2rUVOg9wfeeXQJZ85Iu3wWEi52RUi0n3E7prv5D3MA6YroqbD2WnEn75WLfZj+i46IVczA0FSl6LGjj+H6UO//s9uNXsyAFdUzljxbj 716xVO/c Gt366ZboFV55SS5Sio0p5hbhrmejYucUqiA7t9KW0ItSe+fg54Wxfvrk6BXio2kupKpqp7PldWdi3WGwkdxbiBoqtzAq/EOZINsRWh+YhjerAjb675DMmIeAXax8dmspL/3diaie48+CJjdcdiufwMv7aIAXK9jY9Zox8oTa/xIzJbKH4BZ6DFGDzYA8Ou/C9myI3uuqsFT+mOCd4mDMBR5I5pjns1JCjfQXQTunstxkjqZF+ktKmpMRSanc0ay+5LJznQVpYOEJaBK06xkyLHWXk6rvr3mW79SPN6vg0v21InyAx4l+s214YRejGf48cZxocppEPzh3L5wg352Hh9fJ1ggaG9yEzNOwfA7zWG4MprWhI8bQIW2Hb1SszH30qrnuVrO1sp3ELq9ZWvp8+eJOAFlgfTL/yR/A4uG1n8WBlduTRfa/c9QfMQQbWn5mUSN01KDz1dQ45+tA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.444531, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello, Matthew! > On Wed, Dec 04, 2024 at 09:51:07AM +0100, Uladzislau Rezki wrote: > > I think, when i have more free cycles, i will check it from performance > > point of view. Because i do not know how much a maple tree is efficient > > when it comes to lookups, insert and removing. > > Maple tree has a fanout of around 8-12 at each level, while an rbtree has > a fanout of two (arguably 3, since we might find the node). Let's say you > have 1000 vmalloc areas. A perfectly balanced rbtree would have 9 levels > (and might well be 11+ levels if imperfectly balanced -- and part of the > advantage of rbtrees over AVL trees is that they can be less balanced > so need fewer rotations). A perfectly balanced maple tree would have > only 3 levels. > Thank you for your explanation and some input on this topic. Density, a high of tree and branching factor should make the work better :) > > Addition/removal is more expensive. We biased the implementation heavily > towards lookup, so we chose to keep it very compact. Most users (and > particularly the VMA tree which was our first client) do more lookups > than modifications; a real application takes many more pagefaults than > it does calls to mmap/munmap/mprotect/etc. > This is what i see. Some use cases are degraded. For example stress-ng forking bench is worse, test_vmalloc.sh also reports a degrade: See below figures: # Default urezki@pc638:~$ time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64 + 59.52% 7.15% [kernel] [k] __vmalloc_node_range_noprof + 37.98% 0.22% [test_vmalloc] [k] fix_size_alloc_test + 37.32% 8.56% [kernel] [k] vfree.part.0 + 35.31% 0.00% [kernel] [k] ret_from_fork_asm + 35.31% 0.00% [kernel] [k] ret_from_fork + 35.31% 0.00% [kernel] [k] kthread + 35.05% 0.00% [test_vmalloc] [k] test_func + 34.16% 0.06% [test_vmalloc] [k] long_busy_list_alloc_test + 32.10% 0.12% [kernel] [k] __get_vm_area_node + 31.69% 1.82% [kernel] [k] alloc_vmap_area + 27.24% 5.01% [kernel] [k] _raw_spin_lock + 25.45% 0.15% [test_vmalloc] [k] full_fit_alloc_test + 23.57% 0.03% [kernel] [k] remove_vm_area + 22.23% 22.23% [kernel] [k] native_queued_spin_lock_slowpath + 14.34% 0.94% [kernel] [k] alloc_pages_bulk_noprof + 10.80% 7.51% [kernel] [k] free_vmap_area_noflush + 10.59% 10.59% [kernel] [k] clear_page_rep + 9.52% 8.96% [kernel] [k] insert_vmap_area + 7.39% 2.82% [kernel] [k] find_unlink_vmap_area # Maple-tree time sudo ./test_vmalloc.sh run_test_mask=7 nr_threads=64 + 74.33% 1.50% [kernel] [k] __vmalloc_node_range_noprof + 55.73% 0.06% [kernel] [k] __get_vm_area_node + 55.53% 1.07% [kernel] [k] alloc_vmap_area + 53.78% 0.09% [test_vmalloc] [k] long_busy_list_alloc_test + 53.75% 1.76% [kernel] [k] _raw_spin_lock + 52.81% 51.80% [kernel] [k] native_queued_spin_lock_slowpath + 28.93% 0.09% [test_vmalloc] [k] full_fit_alloc_test + 23.29% 2.43% [kernel] [k] vfree.part.0 + 20.29% 0.01% [kernel] [k] mt_insert_vmap_area + 20.27% 0.34% [kernel] [k] mtree_insert_range + 15.30% 0.05% [test_vmalloc] [k] fix_size_alloc_test + 14.06% 0.05% [kernel] [k] remove_vm_area + 13.73% 0.00% [kernel] [k] ret_from_fork_asm + 13.73% 0.00% [kernel] [k] ret_from_fork + 13.73% 0.00% [kernel] [k] kthread + 13.51% 0.00% [test_vmalloc] [k] test_func + 13.15% 0.87% [kernel] [k] alloc_pages_bulk_noprof + 9.92% 9.54% [kernel] [k] clear_page_rep + 9.62% 0.07% [kernel] [k] find_unlink_vmap_area + 9.55% 0.04% [kernel] [k] mtree_erase + 5.92% 1.44% [kernel] [k] free_unref_page + 4.92% 0.24% [kernel] [k] mas_insert.isra.0 + 4.69% 0.93% [kernel] [k] mas_erase + 4.47% 0.02% [kernel] [k] rcu_do_batch + 3.35% 2.10% [kernel] [k] __vmap_pages_range_noflush + 3.00% 2.81% [kernel] [k] mas_wr_store_type i.e. insert/remove are more expansive, at least my test show this. It looks like, mtree_insert() uses a range_variant which implies a tree update after an insert operation is completed. And probably where an overhead comes from. If i use a b+tree(my own implementation), as expected, it is better than rb-tree because of b+tree properties. I have composed some data, you can find more bench data there: wget ftp://vps418301.ovh.net/incoming/Maple_tree_comparison_with_rb_tree_in_vmalloc.pdf >> That's what maple trees do; they store non-overlapping ranges. So you >> can look up any address in a range and it will return you the pointer >> associated with that range. Just like you'd want for a page fault ;-) Thank you. I see. I though that it also can work as a regular b+ or b tress so we do not spend cycles on updates to track ranges. Like below code: int ret = mtree_insert(t, va->va_start, va, GFP_KERNEL); i do not store a range here, i store key -> value pair but maple-tree considers it as range: [va_start:va_start]. Maybe we can improve this case when not a range is passed? This is just my thoughts :) -- Uladzislau Rezki