From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF72EC001DE for ; Mon, 31 Jul 2023 09:53:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78A28280022; Mon, 31 Jul 2023 05:53:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7399228001A; Mon, 31 Jul 2023 05:53:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 600F6280022; Mon, 31 Jul 2023 05:53:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4DD4428001A for ; Mon, 31 Jul 2023 05:53:11 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F260EC01AE for ; Mon, 31 Jul 2023 09:53:10 +0000 (UTC) X-FDA: 81071443740.04.7CFBB69 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf07.hostedemail.com (Postfix) with ESMTP id 6EAEA40003 for ; Mon, 31 Jul 2023 09:53:06 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=NmnMeTCp; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf07.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690797187; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=T+JtLVpE1c+T8lTzPshdsq+FsKMITmkXRw9+xohDxso=; b=8FxKZwHHwegVc8HIAookNQGp/GLYJba/9ejoFj60fDmHcmG4cdm6JgY+1FZdTm5GUS1g80 HGDPzGdO8426Hx2HCqf6JGBGlHzfdC0H03fI+CsCSTP3AnS0DiRMDfhoeEYniGYp8Qb3aO 6ELbeLcvlBxCdyf5joWCmDYXoeRFXfY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=NmnMeTCp; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf07.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690797187; a=rsa-sha256; cv=none; b=Y/Mq5GApCW8IftFLiEHWzaPSSN4SmqLOxMHSvzdQPrheyvjcPSdI1cBJKu+GyPMJKKNe/2 J4Y9Vp5Gwcd2+TB22shRj9CjNw/nyDR8iLf1tOafFPr9ViWRHeeyDGh1EkVD0FuU+BZo5b xAXH+gyPN0kCKkLF9mdpzs7sh0ewfP0= Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-686f1240a22so3960716b3a.0 for ; Mon, 31 Jul 2023 02:53:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690797185; x=1691401985; h=content-transfer-encoding:in-reply-to:from:cc:references:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=T+JtLVpE1c+T8lTzPshdsq+FsKMITmkXRw9+xohDxso=; b=NmnMeTCpG45ADvgRQUxLTcuLKqzReilHwsUkdyHXbO5BjaQY9fLOW0be7wwGBPt3Fd GtY/CoLQ5ulVLRqMsuir9CEI9RtZz58XDfHbyqVYPt/JSyD/PDgQwC+TE4n49JEu/3si GmRdVnhQQ5F0iLdVKapxuAKSQqvT2jS4qIS4h1lP92DqyFxGRX1xvvdaJ+PEOsZ7I9xI b7m60FRxUpqEmjmgFNS6l6MvWDoeEO/XpNBBCtoVuZuJAX+BZU8AgpQL0u+gXPXMOtsh sDFGkL7zoHzSDOhvSXsxLhkLURuBtZlD6Z5vLcQSjK98oJEFTuJz36IF5tC26NhHAqec LUwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690797185; x=1691401985; h=content-transfer-encoding:in-reply-to:from:cc:references:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=T+JtLVpE1c+T8lTzPshdsq+FsKMITmkXRw9+xohDxso=; b=dMVhcEXOprbj6LQGN4HCee8SW1STb9vNHkY4YWXC+LDcmuK/2PyJ8OKnzYKsUVXR0/ 0jYh4Rm3wMh10gr5hlQnYxHymWZZPdP+4jOYh1rOvDBVyeCqUuW5XErzd0yPw+3dOItG mp6vSkEabmQqX+zqNEoEm/yM4fXWEQjXx+0KiBKv3AkzBu/X8Y1l1rl2uEdF2XGscEBp W6FioE+MPda//mNjH2zDg8+QgQGNOHPZJGOuoHuszU629nMQvkchiGGbhqDO5QvCaCI0 CYUz6GtOplsDalmISmY8M2hyQdGmFGKrnsXBhbX80EYdLtgDgfeI/O1RklT0/jc2rYOI 1Qyg== X-Gm-Message-State: ABy/qLa3ZdFVFxEbvG3jh4EOBBIwVRB135+cQEXfI0f+cyhXOCNai7hq Vsq7M2ewu1QlStLPednIgQD9+g== X-Google-Smtp-Source: APBJJlFSLuhbiNV50ZKuLiiTHQLOYesPds3YhLzlcaKUyv6zvLWtkd1WU763/LuyNYaf6VVfyIALvQ== X-Received: by 2002:a05:6a20:32aa:b0:137:3b34:93e5 with SMTP id g42-20020a056a2032aa00b001373b3493e5mr7988841pzd.59.1690797184901; Mon, 31 Jul 2023 02:53:04 -0700 (PDT) Received: from ?IPV6:fdbd:ff1:ce00:1c2a:1cd4:8b91:108f:bf15? ([2408:8000:b001:1:1f:58ff:f102:103]) by smtp.gmail.com with ESMTPSA id c13-20020aa7880d000000b00640f51801e6sm5823982pfo.159.2023.07.31.02.53.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 02:53:04 -0700 (PDT) Message-ID: <36f2d5d1-00ff-d8a7-ed40-15eb5d929224@bytedance.com> Date: Mon, 31 Jul 2023 17:52:59 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 Subject: Re: [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}() To: "Liam R. Howlett" References: <20230726080916.17454-1-zhangpeng.00@bytedance.com> <20230726080916.17454-2-zhangpeng.00@bytedance.com> <20230726145825.2fufoujgc5faiszq@revolver> Cc: corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, surenb@google.com, michael.christie@oracle.com, mathieu.desnoyers@efficios.com, peterz@infradead.org, npiggin@gmail.com, avagin@gmail.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org From: Peng Zhang In-Reply-To: <20230726145825.2fufoujgc5faiszq@revolver> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 6EAEA40003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: fea4pjq69tsw4q51tdkc3goorttt1y1b X-HE-Tag: 1690797186-634051 X-HE-Meta: U2FsdGVkX1/tPoVfH0nMOzI1FeGV07rHnk9OT8s6xB6raNPwlpbDsuSFy4ImGvO0bg/iQcgT1D+aqXHWfqsIypZMKhWoX5L6CQm3GChIz1qwmj13X41N4x4MkQTttxA74g/eg1mC3rfFW9SMCKOPXzRAGoBe+anX4L83iPppP92pgw5b449WFQu9p6jlc0h6gF2v6aTFRMLa+CljLJrUq1N3Hmrqb9dmpmOkUaYvtkokWgkRmRbC47XfjS3fK3R8mCGIcX8XkI9BTPPFT/q0w0MiNxpNOyInuhDkc8/DbVqq5v4riYyWw7ewR3/5hl3nTZ0RWmILtPWRLQw/gYb8zMrW3uKP99ZSH1XjaWAXWmO2pshUGuhp9HEwZWrTK7vp6d8GV+33xCTUNoUPvS7QpVladpCFMjwEGzjkSim/EATTLyLI1aiDHM0zF2JUVE/N4NdEmuWnoEay+WPy2qbWmcqzXETlkyOnusBjqPQXT824hfTlKmypx7cXsx5sB8PXizBOr7mNQnjSFmfR6sjKt9spcQ5D47CoCWoeDOeCLOXsmae/XupY3g4yM52ruAipk38hFAqZl8Q8PsLcqeVtIusgu7m3bFuaWqF2yUZcly6veX2VmUU/o7ZxLWUgPryo2XzCEbr4/DCSVWVghfd1Fe3XYnoedjvlyrE6VCOqFYLUuogJ2TkntXgISGKXI6AyWcvrxNhs8JKb8WnsCQaFrRJ6xtxcI1EUPykKXwcAoLt1jyEu9s1BfUbrQtm/ieM6FNjJ84SLgGPpS7aQ/h867EEH+s+nJqjnhhLtm5OxGjw9hHHbtjsSB+gSykcZ9/DX7JVARGmAXKAt3NmDrlZFj9FUFr//T7ZtXTeoyfDXiStPTqnF2oCI9y1v2cv0x185b7VveH+dtVX3YwYVCugcyge9JwG8/yXYmbzk7OgDiXAZSMrCqARDweQ7SCORFRPqlY19K3pSDv8uRDctRSj yHMbJjU1 vzF6gzBzVI6Fh3bYCcbJOS/4XCoGr4CWbdLD59j+RbPNHWZlW2pjr4NPhP6+rCyQG3o4Enw2Ux7MPgKtZkrjkFXKwqyVbxKXDqWyqfp4GDoLwf7G1uN1mon9foS1Wo8+AAmZqv6nOVEUh7TlHOUXGqt7mPqV5TK2d0FRubsZ6fAKmM8gzcqT23Mdr1zb1liPa0jeaKV/J2NiisQyi4w3C+a+xu5kyTDOPjwxWWGd9rlNDOcBR9UdEuorwSdd7DRoMtdWRAyIzT2CCAiGOUPOYNFYGSJ4dYdjuoTcBsr6yUjRQiFe0RHtlSBHfACA2mlHKu2NNYcDl+6A+7yXXuoFR7aMCFZD1TPsJtpWdWNiFC6XeVkeDrxm+wfNIKDwDBv7ObgHOsalZEGCrNbMi7yNOeUT34+VwlnXhzWCuoDanbw9qsf2nI5R75MFf9QwF3J62l1ZTbC4h2x4vBDKftR10WjXwjGigkvdvHGHWfv12o95/oAo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2023/7/26 22:58, Liam R. Howlett 写道: > * Peng Zhang [230726 04:10]: >> Introduce ma_nonleaf_data_end{_nocheck}() to get the data end of >> non-leaf nodes without knowing the maximum value of nodes, so that any >> ascending can be avoided even if the maximum value of nodes is not known. >> >> The principle is that we introduce MAPLE_ENODE to mark an ENODE, which >> cannot be used by metadata, so we can distinguish whether it is ENODE or >> metadata. >> >> The nocheck version is to avoid lockdep complaining in some scenarios >> where no locks are held. >> >> Signed-off-by: Peng Zhang >> --- >> lib/maple_tree.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 68 insertions(+), 2 deletions(-) >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index a3d602cfd030..98e4fdf6f4b9 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -310,12 +310,19 @@ static inline void mte_set_node_dead(struct maple_enode *mn) >> #define MAPLE_ENODE_TYPE_SHIFT 0x03 >> /* Bit 2 means a NULL somewhere below */ >> #define MAPLE_ENODE_NULL 0x04 >> +/* Bit 7 means this is an ENODE, instead of metadata */ >> +#define MAPLE_ENODE 0x80 > > We were saving this bit for more node types. I don't want to use this > bit for this reason since you could have done BFS to duplicate the tree > using the existing way to find the node end. We have reserved 4 bits for the node type. I don't think there will be more than 16 node types going forward. Even the DFS that has been implemented can use the existing way to get the data end. I didn't use it because when walking up the tree, we don't know the maximum value of the node, and the continuous upward walk will introduce more overhead, which is what mas_ascend() does. Doing BFS cannot avoid this problem also. The reason I don't do BFS is that it has more overhead than DFS. If you think of a tree as a graph, doing DFS will only walk each edge twice. What if it is BFS? Since we can't use queues, we can only emulate BFS, which additionally does something like mas_next_node() does, which introduces more overhead than DFS. Considering only the layer of leaf nodes, it needs to walk each edge twice. So the overhead of doing BFS is more than DFS. > > Bits are highly valuable and this is the only remaining bit. I had > thought about using this in Feb 2021 to see if there was metadata or > not, but figured a way around it (using the max trick) and thus saved > this bit for potential expansion of node types. I thought of another way to get the maximum value of a node without doing any extra upward walk. When doing DFS, we can use a stack to save the maximum value of ancestor nodes. The stack size can be set to MAPLE_HEIGHT_MAX. In this way, this bit can be reserved, and there is no need to do a loop like mas_ascend() in order to get the maximum value. > >> + >> +static inline bool slot_is_mte(unsigned long slot) >> +{ >> + return slot & MAPLE_ENODE; >> +} >> >> static inline struct maple_enode *mt_mk_node(const struct maple_node *node, >> enum maple_type type) >> { >> - return (void *)((unsigned long)node | >> - (type << MAPLE_ENODE_TYPE_SHIFT) | MAPLE_ENODE_NULL); >> + return (void *)((unsigned long)node | (type << MAPLE_ENODE_TYPE_SHIFT) | >> + MAPLE_ENODE_NULL | MAPLE_ENODE); >> } >> >> static inline void *mte_mk_root(const struct maple_enode *node) >> @@ -1411,6 +1418,65 @@ static inline struct maple_enode *mas_start(struct ma_state *mas) >> return NULL; >> } >> >> +/* >> + * ma_nonleaf_data_end() - Find the end of the data in a non-leaf node. >> + * @mt: The maple tree >> + * @node: The maple node >> + * @type: The maple node type >> + * >> + * Uses metadata to find the end of the data when possible without knowing the >> + * node maximum. >> + * >> + * Return: The zero indexed last slot with child. >> + */ >> +static inline unsigned char ma_nonleaf_data_end(struct maple_tree *mt, >> + struct maple_node *node, >> + enum maple_type type) >> +{ >> + void __rcu **slots; >> + unsigned long slot; >> + >> + slots = ma_slots(node, type); >> + slot = (unsigned long)mt_slot(mt, slots, mt_pivots[type]); >> + if (unlikely(slot_is_mte(slot))) >> + return mt_pivots[type]; >> + >> + return ma_meta_end(node, type); >> +} >> + >> +/* >> + * ma_nonleaf_data_end_nocheck() - Find the end of the data in a non-leaf node. >> + * @node: The maple node >> + * @type: The maple node type >> + * >> + * Uses metadata to find the end of the data when possible without knowing the >> + * node maximum. This is the version of ma_nonleaf_data_end() that does not >> + * check for lock held. This particular version is designed to avoid lockdep >> + * complaining in some scenarios. >> + * >> + * Return: The zero indexed last slot with child. >> + */ >> +static inline unsigned char ma_nonleaf_data_end_nocheck(struct maple_node *node, >> + enum maple_type type) >> +{ >> + void __rcu **slots; >> + unsigned long slot; >> + >> + slots = ma_slots(node, type); >> + slot = (unsigned long)rcu_dereference_raw(slots[mt_pivots[type]]); >> + if (unlikely(slot_is_mte(slot))) >> + return mt_pivots[type]; >> + >> + return ma_meta_end(node, type); >> +} >> + >> +/* See ma_nonleaf_data_end() */ >> +static inline unsigned char mte_nonleaf_data_end(struct maple_tree *mt, >> + struct maple_enode *enode) >> +{ >> + return ma_nonleaf_data_end(mt, mte_to_node(enode), mte_node_type(enode)); >> +} >> + >> /* >> * ma_data_end() - Find the end of the data in a node. >> * @node: The maple node >> -- >> 2.20.1 >> >>