From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5DF3CE7A89 for ; Mon, 25 Sep 2023 03:57:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C56C06B0141; Sun, 24 Sep 2023 23:57:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BDFBA6B0146; Sun, 24 Sep 2023 23:57:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA79B6B0148; Sun, 24 Sep 2023 23:57:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 94AF06B0141 for ; Sun, 24 Sep 2023 23:57:54 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4FE8C1CA61E for ; Mon, 25 Sep 2023 03:57:54 +0000 (UTC) X-FDA: 81273761268.30.EC58E59 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf08.hostedemail.com (Postfix) with ESMTP id 031AA160005 for ; Mon, 25 Sep 2023 03:57:50 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="GFB63/Iu"; spf=pass (imf08.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695614272; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ZL8oS6h/SMWhCHncu4CSFCmNusM3zUoiub5E91Pn6TA=; b=WOnnvoclphXw19LSfd8jHuiRJ/1+e5pPbhbgI5etEvxUsfdPuhvV7gQ82jwqmpuiqvWsv7 1pnE/SNTTX578MIZTgluh2yU0p3aI/tRgbSE/WPUUcgLv7pNAqqKO7z8//KSr5KkzJe2xt idAs/IqILAznKO4v8bCNme7eKgsywxY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695614272; a=rsa-sha256; cv=none; b=wEA8g6Ed64x6PdxeeVB1ks5IhWMGZN1wQ0vQtk5x3wN/FtvSGb1WwFo1dM5elLhUBSbO+e o//qxTJxAOHO7DPKSiNingkL31e1zFK9LgxhgfwTfjKAcRb44EcUKtQxnJbxJ3Bo0N2Fpa eSG4z9ep0w52R/0TueOqAXYsCX+DaI0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="GFB63/Iu"; spf=pass (imf08.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-690bccb0d8aso3896419b3a.0 for ; Sun, 24 Sep 2023 20:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695614265; x=1696219065; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ZL8oS6h/SMWhCHncu4CSFCmNusM3zUoiub5E91Pn6TA=; b=GFB63/IuDlxCeEZNdYD/0mv6OBZtsunHwncwo807SsmMuYZTDg7vCLZWJwGU6Bxs/z cDXoN7rclSv6RfeKXo/v2goyaMXJQxe5o14mOFWUSuyhDzIcu9rUL/fWc5xjskDvdCup qgUh26K/u7+lQmm9r1q+7B1Z/9grBa+cF5v5XMEJvxpiumCFW+9etoJF48lyv4nnuL+q jLrIBPZYebwWeI0Zz8Vruy9XJ4djIewNVrwD4+aSVTXuv/WU8KWmmyZrJQvsqg9Dkbr7 G2RmbZzz9VP9cSYQUQPO5xjO/et5Fop1tjtbeVULKtOkEaN7um/TJVJkaY7Oq3yA9YTg c7Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695614265; x=1696219065; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZL8oS6h/SMWhCHncu4CSFCmNusM3zUoiub5E91Pn6TA=; b=CCcUoq+GPKvH+r3NaLROCjZPfbFiJk49+kx0w/mugUFya728aLN2CaIOTDWV44wXHe KRgAJcJ/XK8ML6FytU7jGbJcrrv54aRezjfpnkCgd9hKza6qUaKUSJiCkVR7kuxVxDHx U7U8es5K8CjjkV2tV8qpCgW5ra0U4FnQujPUFJe9oZm1El08UW3B6zK+/AOMdIwP+DVu 0awqpskOLUbPaE240IkmE7FhaezH+cV4DW1OYnANuxPUItZFAVyLgsB7MizrDo167sb8 DCC3CoxgTda/xI6NramKaM1+ZtyoXFsi9Bp7vwkeyXhClnxDpnAuUov8fDCZCWSxmHe7 RRvQ== X-Gm-Message-State: AOJu0Yw5/UkgPSEjjleAueHiBz52X2sAsg3uu8CBNXqVWFMF5/PNIvbj cLLkyglcWmodWmzfvXPT+n6FWQ== X-Google-Smtp-Source: AGHT+IFP1hfquosoj31qZFLIMz20MNEzlSyLXJoKN6SESAMuexDGVnApQHVkteHWucUZMl8Lpc79LA== X-Received: by 2002:a05:6a00:1990:b0:68b:fdfe:76c2 with SMTP id d16-20020a056a00199000b0068bfdfe76c2mr4715925pfl.20.1695614264958; Sun, 24 Sep 2023 20:57:44 -0700 (PDT) Received: from GL4FX4PXWL.bytedance.net ([203.208.167.146]) by smtp.gmail.com with ESMTPSA id fm1-20020a056a002f8100b00679a4b56e41sm7025387pfb.43.2023.09.24.20.57.37 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 24 Sep 2023 20:57:43 -0700 (PDT) From: Peng Zhang To: Liam.Howlett@oracle.com, corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, surenb@google.com, michael.christie@oracle.com, mjguzik@gmail.com, mathieu.desnoyers@efficios.com, npiggin@gmail.com, peterz@infradead.org, oliver.sang@intel.com Cc: zhangpeng.00@bytedance.com, maple-tree@lists.infradead.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v3 0/9] Introduce __mt_dup() to improve the performance of fork() Date: Mon, 25 Sep 2023 11:56:08 +0800 Message-Id: <20230925035617.84767-1-zhangpeng.00@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 031AA160005 X-Rspam-User: X-Stat-Signature: s8hrhhebtrc57ups4hwi6memjjmrytem X-Rspamd-Server: rspam03 X-HE-Tag: 1695614270-842850 X-HE-Meta: U2FsdGVkX181dy1YgM80wGHJD7iu1EoR+J4iSQIQ9lQVo8hDe46OT2TBsh5iN1qa7zgUheg742FErA1FEWvMR5yOZOtrZDx/IFuOAtFnVmX5SwgV7BY9SvF/M0YnzFZUPEFp3GFWYh9WCY3Zr7f6MX6o9PpJ2GVylxFnKMA6uvULmWXqUvWI7APGQgnRjQNUS6WR73O3Y6tT6R2AtWgoMnHjDDhNOP3uVbqeap9RFDFINwQCqJ+FMruxKaIAZ/wCpxOx+lj3qdseHEiTW5Qx7ol5q0fNNav4zQQWa0eJeOOrVtJ2hWCsSx0XZYDpWbZJ1L8nwD8Yhfer12yZVVpImDdsK3rcaacU2BDMEXPpnA/xCMaLN/hYmaHwR1fXYYdNp6V+WqC1ya1l8/5qIuMd/sp6XgZbspqc3vXtvqQuI42MsQxrBWm0ljHoriD3kgMgpzUnRP3wlOHmjIjBTz2wifE7BIwVQnCH0JQ4BkogfqIYO9lPXfegZL//MQ3EgriUL/q3LAVfgETicuzFYvpNps2K1V+MfG0Kox/YXgcdJRlSLyC6Hy4U7oEGzAglChNlloSj/nOAWeM/MgWAScqAZJlsvxIm37pKAiiVckQakAj+qqnf9EYnwDvC6dMxtdpcLHgZegcz+51eRKrHAgrDUf+tnDhIiOmrwX7GhZN5dW4Ju0+LceOZ12fFDXEzEFIr1hbZ9RsNDcM+s+qzwTZxm/TtVYSsnrTRXajER5t4qGiEIWhTnE9Wh1STg1ta6SRCxNZU5Q+hH835MRzyHcPexx2CFS1gkeTlPxJ0yV6FV3krq5MCD6duLqAg3O5jQ1E5aY9bL4agOgIpGCjN8yiMUNt95HHmC2tMsbHKJvPVDQ/0B1MN2JPHtJxMOyVbblqH4Iy9P0lOJsFX5HQ1lX5iIhfIZJHk3TUMggZmIRfIusfLJi/N4XIuaJaNDUhXmerLZEFr3xCFS0wx4sczlb/ F7pd5baW 3ISL4iJipzrDgE+eEr0sJy44AkhTPbIRUWHXQZR4sHgtGOPt6WJnhGgGcoslSLsyumNkeB9JMccKHVAxYKC6jthLGsJxC5NAqnUvpYzUgPrb3ZjPLAWPS/U0jPu3UvbIGe7Hli1DlmSEnHaSRExiwEGa3sllXpV152NIPSs00utq0tTMOJbbFsnhoqUzhefVN5MPuXlPFz3vrBZeqeEnHpf9rXg0Q72pWXAKHnw9aAp70KxQ0D7y+pMSmyc1ETrzK6UiRbJWmHqA3l9Zcf2hJgKl4KUe+q+sPORYAGK13ZvQ0Y+OKvPjKt0TZJ2/f0FTNL+fUpQ6ZVJzZc+vG7l/8tIt3p+M2g5+xc5LvBoKalsSzoJuruJEqQziI0GiQE3pqMCxeeTi9CLKbxir4U/QO1kbcHx0wLtiH+5dEJBEWgp9VVvafpT7/H+m5+PqfOuWqXJnmikna+RFb4pz0FCZ1mtyrXr3YO4RWbv4lVjMw84h2GUtC6qXKjNEPztrYy5l1J5K1zNn/hhE8jvv9/rWGvrx8n2h6/lYiSLGoUY854XH7iTgef1Bw+I5v2l1/pf/oJO6YvfyYzUAjpN7IR8X/C55TlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all, This series introduces __mt_dup() to improve the performance of fork(). During the duplication process of mmap, all VMAs are traversed and inserted one by one into the new maple tree, causing the maple tree to be rebalanced multiple times. Balancing the maple tree is a costly operation. To duplicate VMAs more efficiently, mtree_dup() and __mt_dup() are introduced for the maple tree. They can efficiently duplicate a maple tree. By applying __mt_dup() to dup_mmap(), better performance is achieved compared to the original method. After using this method, the average time complexity decreases from O(n * log(n)) to O(n). Here are some algorithmic details about {mtree, __mt}_dup(). We perform a DFS pre-order traversal of all nodes in the source maple tree. During this process, we fully copy the nodes from the source tree to the new tree. This involves memory allocation, and when encountering a new node, if it is a non-leaf node, all its child nodes are allocated at once. Some previous discussions can be referred to as [1]. There is a "spawn" in byte-unixbench[2], which can be used to test the performance of fork(). I modified it slightly to make it work with different number of VMAs. Below are the test results. By default, there are 21 VMAs. The first row shows the number of additional VMAs added on top of the default. The last two rows show the number of fork() calls per ten seconds. The test results were obtained with CPU binding to avoid scheduler load balancing that could cause unstable results. There are still some fluctuations in the test results, but at least they are better than the original performance. Increment of VMAs: 0 100 200 400 800 1600 3200 6400 next-20230921: 112326 75469 54529 34619 20750 11355 6115 3183 Apply this: 116505 85971 67121 46080 29722 16665 9050 4805 +3.72% +13.92% +23.09% +33.11% +43.24% +46.76% +48.00% +50.96% Thanks to kernel test robot for reporting the warning about nested locks. Thanks to Liam for all the suggestions. Changes since v2: - Some minor modifications to mtree_dup(), __mt_dup() and their test code. - Introduce {mtree, mas}_lock_nested() to address lockdep warnings. - Update the documentation for maple tree. - Introduce undo_dup_mmap() to address the failure of dup_mmap(). - Performance data was retested based on the latest next-20230921, and there were some fluctuations in the results which were expected. [1] https://lore.kernel.org/lkml/463899aa-6cbd-f08e-0aca-077b0e4e4475@bytedance.com/ [2] https://github.com/kdlucas/byte-unixbench/tree/master v1: https://lore.kernel.org/lkml/20230726080916.17454-1-zhangpeng.00@bytedance.com/ v2: https://lore.kernel.org/lkml/20230830125654.21257-1-zhangpeng.00@bytedance.com/ Peng Zhang (9): maple_tree: Add mt_free_one() and mt_attr() helpers maple_tree: Introduce {mtree,mas}_lock_nested() maple_tree: Introduce interfaces __mt_dup() and mtree_dup() maple_tree: Add test for mtree_dup() maple_tree: Update the documentation of maple tree maple_tree: Skip other tests when BENCH is enabled maple_tree: Update check_forking() and bench_forking() maple_tree: Preserve the tree attributes when destroying maple tree fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Documentation/core-api/maple_tree.rst | 4 + include/linux/maple_tree.h | 7 + include/linux/mm.h | 1 + kernel/fork.c | 34 ++- lib/maple_tree.c | 300 ++++++++++++++++++++- lib/test_maple_tree.c | 69 +++-- mm/internal.h | 3 +- mm/memory.c | 7 +- mm/mmap.c | 52 +++- tools/include/linux/spinlock.h | 1 + tools/testing/radix-tree/maple.c | 363 ++++++++++++++++++++++++++ 11 files changed, 787 insertions(+), 54 deletions(-) -- 2.20.1