From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 154A8C001DF for ; Mon, 31 Jul 2023 12:24:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64945280036; Mon, 31 Jul 2023 08:24:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D467280023; Mon, 31 Jul 2023 08:24:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44C62280036; Mon, 31 Jul 2023 08:24:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2FB60280023 for ; Mon, 31 Jul 2023 08:24:39 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E4E2A12023F for ; Mon, 31 Jul 2023 12:24:38 +0000 (UTC) X-FDA: 81071825436.15.8487B1F Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf09.hostedemail.com (Postfix) with ESMTP id 4E856140003 for ; Mon, 31 Jul 2023 12:24:36 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=K6GPcQAx; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf09.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690806277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n33i2l+o13R2vXkTMdoXjMS5PBDW3Nc5tow2fit7wXw=; b=qoO4GpD/OBO6drFv/Xd4CuSqOcbihidWCQvCT1FmBGat8YDcwQmiMkIo8d9YjzgpEr56Pg hwV7pIaFsdhbpw9BuyFqc20SGruQ50p+yQyS4s/OtD0A2d1ZAxBkeNE17kzhz5bWOoBoDw o0aBNbWZuotDkRekUrAG9gv+5P/x73U= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=K6GPcQAx; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf09.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690806277; a=rsa-sha256; cv=none; b=TndpQPK0IwxcDRfjIpIOpht7g0SqEuARsI5tf0tdZKWPrtYkbWX8J1xm4FS2AEOdF5Leqq ropBQEABsDr22itmjgXVwCOdqB+XCB59XF3Wfz7Q1oSnop5JvH+XjlV8FuoMvRZ4uas1Dm YbEpMH/hC5CMxMAYU5kn2kNE8T3Rxvs= Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-686f1240a22so4076812b3a.0 for ; Mon, 31 Jul 2023 05:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690806275; x=1691411075; h=content-transfer-encoding:in-reply-to:cc:from:references:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=n33i2l+o13R2vXkTMdoXjMS5PBDW3Nc5tow2fit7wXw=; b=K6GPcQAxCt1vZzjj5YetA6l5koeAHwxlWkzP07rifIlXI502MaLVpwE/AyfjICJ6mS LvLbBpX160QFnZ0voELDsIi71o5vomh39WjAEwINwscs7Und+M4Gl/vm5CFpbXG7U52b wm+PC/JqkBZzyjLc/B4BuCb5EpPihhAFtIIEu3laGBI4PmV2NOtXojtIXiWPPiMFNUR2 p3yL08t3+JrqnXJIsDMxvavpQeplUZjhPacMtSme5Fuvo1jkjJiDeCqGBiDOLIZRrX4P Syt2whJ0qVwEz8cEUZLVEqETAFZp25dRSmf1JHROH3XZztTAnvZ0CEkhqB0m6YLJz6YD GVCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690806275; x=1691411075; h=content-transfer-encoding:in-reply-to:cc:from:references:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=n33i2l+o13R2vXkTMdoXjMS5PBDW3Nc5tow2fit7wXw=; b=JLIbHwM+W9CVn4eZ9BVnGxv+B8Q+4Z3h0X54Tti9LrbRL8x+n04rGM6FGgTyBVsEBz dfAoOcvh3CinMMOln2IMGWS/UzIZ2ER2C+p2NYyRp/gcO35fWt8xVFPg1sD8jexH6JqK 3r+Z2n5x9krLbZRE3Cv+IHueGGMXyaQ1QX6QIE4754oNl1VD62sY1EHsDQB9/0fedLyD tFQMYFBGeIPH+mtNX1XuZY6X7XGTqzALWbgFFKa68MJ7x6oG5eJc7xmZnAKP7sGv/F+m CG/HXCeqxvVc4b035pEGRX7GUuhWl8YF3vDyl7aCZocGMu7EeFZ0UPPtXEK6HXz/8OHj k7Wg== X-Gm-Message-State: ABy/qLY1JxeQlpwx3KEvdqTbiXic7pOkBu53vogXTtjgoj9EAE0Y5mo5 i7M9ezxQZC0p4lh1510LMuvdiQ== X-Google-Smtp-Source: APBJJlHIW1GlcpJxqQYz6lvbRT5uLUG/tKcYmNusnJKwbG9AYQELc26f4iscaquGrENqZMvdri//EA== X-Received: by 2002:a05:6a00:9a2:b0:676:76ea:e992 with SMTP id u34-20020a056a0009a200b0067676eae992mr10589145pfg.5.1690806274821; Mon, 31 Jul 2023 05:24:34 -0700 (PDT) Received: from [10.90.34.137] ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id bn9-20020a056a00324900b0064398fe3451sm7465987pfb.217.2023.07.31.05.24.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 05:24:34 -0700 (PDT) Message-ID: Date: Mon, 31 Jul 2023 20:24:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 Subject: Re: [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() To: "Liam R. Howlett" References: <20230726080916.17454-1-zhangpeng.00@bytedance.com> <20230726080916.17454-5-zhangpeng.00@bytedance.com> <20230726160354.konsgq6hidj7gr5u@revolver> From: Peng Zhang Cc: Peng Zhang , willy@infradead.org, michael.christie@oracle.com, surenb@google.com, npiggin@gmail.com, corbet@lwn.net, mathieu.desnoyers@efficios.com, avagin@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, brauner@kernel.org, peterz@infradead.org In-Reply-To: <20230726160354.konsgq6hidj7gr5u@revolver> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4E856140003 X-Stat-Signature: 7yiebn7gto8ciossf7j7mweaoqg5nspp X-HE-Tag: 1690806276-65600 X-HE-Meta: U2FsdGVkX1+36HM7gRBTOpJKfR3lHHm/QKNJBiAkO5IU8bqRgKaSO3xCs89GZSeZldv+NPEFs0XA3o+rNMjwjHxmMxERp1op2cR24AQ6y4dRgFLXbncq08fEY4zSZ0I6DrUqIZDI0ALsXEKzxNgCBaQE1QVJBe5sWIVSMOztZKSnTqjrhosomo4XhtfNkqNE6Nj0/huz60N5JYKhL/q+2oKYNhrnijdvwSOBbwqIrrB82bSUEnyQP7Pu+ZCYEGB4E+DSqem1q1EqcB8Xc9ZRZ15jJZTx0Vu1TX6SahEq/8SyuILKpwA+8UcqseZrf8q9bCixHLWTOUPwOzuZia9mu6qx/uFfPe4NEY3wVDIuJKAgJt14a1fbkIhbmRZnzj2z4pGXkswLGWxC65GvWnIocy78HXXENYFJrwqhsUIF0TRe151+i8b/PnUWK4hYVrFq1EGWI+hWnSoGqoCblqVI/KaTMUHc/Ej5RSI/RythOQcSEDmyFvMCTUqm4ffPPNUYfspjJ3TPT2+Om9Amm1e4NfWtRE3og/Wz1ZE0jI+UJ7nSUcviQoOq7H44Oe5Kp9nCCyuIPS4ZtaFgJrcMqBopz1q4VX4TVcJ/sq+TIx30qYvSnx/iiCoXMv6ChjizBUDfPVwtSiRdQOADrDC86fS+evf0L/J5PMzQKhH5BbE7cdNjJkWFhtzrleXWUxW1mV86txsuqRAyZpKZ5KTKoxCWy7j/fE7yqirfiFbCsirezXurmDM1tBIFw41+otRrRoRM26Yr58eU3evBBbFywuA0j90JZvGP33Eh0LUKSdXkb4zVzl2hYIZNf4ksDvJ0MFFzGWq4aQIg/Tyhr1Zy5fiwyutlY4n1jhc0WrisuBBJdFDGeRW5Qpe48/DCA85zt8fuHICY7QxfZze4OtpLigxJ4O7PiWWMw7zmaHaGB6dwal5lzugJ8HXUTOb+pSfntMM6fTo2KxSwOYNAeAJhcV0 ahVv1gvV gs9G+z3bFAQQ4tYuO5bAAnsRDYaHldC4LuXPFAKku7JmQUHgGMbYx88LENWLty4xI2a/KqexEjT0JAmBmb5Y/GWaZzowBRjH9CAaNa1Kg/nm962tCaXmOJzcH9qlyrp71tJk0jpGgTD3CW834GRVvj/Tjp10Gs1D9dUVdfviaj7ivAwwlhFDTbBLGKHnhC+8BduzRM3t1OKD9Hxk6qFApqqoKjY4QAVFyp9wq+thCJa/P4QHoSSERXdfMWnwljpklTVxxWQyAcS0w/eg2o4McWZjdera6hnHhe/+m1bnJOiMjddzxbParvVfolsLANt6/yjJ0GabF4ZHfl7lJ+6E13GUt83fgE2Ve64jbts6i2NURmXpz9gzSrMFpU8P3X/nhpG9yJpZfsAMNs0friLpZAxWZp11VWS398VlahlYm4gOwBEjRPH16zicpW2/ObD2C/NgIcpVmDH7yFWSVzyz+sZiMdrfT7sF7L8tJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2023/7/27 00:03, Liam R. Howlett 写道: > * Peng Zhang [230726 04:10]: >> Introduce interfaces __mt_dup() and mt_dup(), which are used to >> duplicate a maple tree. Compared with traversing the source tree and >> reinserting entry by entry in the new tree, it has better performance. >> The difference between __mt_dup() and mt_dup() is that mt_dup() holds >> an internal lock. >> >> Signed-off-by: Peng Zhang >> --- >> include/linux/maple_tree.h | 3 + >> lib/maple_tree.c | 211 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 214 insertions(+) >> >> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h >> index c962af188681..229fe78e4c89 100644 >> --- a/include/linux/maple_tree.h >> +++ b/include/linux/maple_tree.h >> @@ -327,6 +327,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, >> void *entry, gfp_t gfp); >> void *mtree_erase(struct maple_tree *mt, unsigned long index); >> >> +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> + >> void mtree_destroy(struct maple_tree *mt); >> void __mt_destroy(struct maple_tree *mt); >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index da3a2fb405c0..efac6761ae37 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -6595,6 +6595,217 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) >> } >> EXPORT_SYMBOL(mtree_erase); >> >> +/* >> + * mt_dup_free() - Free the nodes of a incomplete maple tree. >> + * @mt: The incomplete maple tree >> + * @node: Free nodes from @node >> + * >> + * This function frees all nodes starting from @node in the reverse order of >> + * mt_dup_build(). At this point we don't need to hold the source tree lock. >> + */ >> +static void mt_dup_free(struct maple_tree *mt, struct maple_node *node) >> +{ >> + void **slots; >> + unsigned char offset; >> + struct maple_enode *enode; >> + enum maple_type type; >> + unsigned char count = 0, i; >> + > > Can we make these labels inline functions and try to make this a loop? I did this just to make things easier. Refer to the implementation of walk_tg_tree_from() in sched/core.c. Using some loops and inline functions probably doesn't simplify things. I'll try to do that and give up if it complicates things. > >> +try_ascend: >> + if (ma_is_root(node)) { >> + mt_free_one(node); >> + return; >> + } >> + >> + offset = ma_parent_slot(node); >> + type = ma_parent_type(mt, node); >> + node = ma_parent(node); >> + if (!offset) >> + goto free; >> + >> + offset--; >> + >> +descend: >> + slots = (void **)ma_slots(node, type); >> + enode = slots[offset]; >> + if (mte_is_leaf(enode)) >> + goto free; >> + >> + type = mte_node_type(enode); >> + node = mte_to_node(enode); >> + offset = ma_nonleaf_data_end_nocheck(node, type); >> + goto descend; >> + >> +free: >> + slots = (void **)ma_slots(node, type); >> + count = ma_nonleaf_data_end_nocheck(node, type) + 1; >> + for (i = 0; i < count; i++) >> + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; >> + >> + /* Cast to __rcu to avoid sparse checker complaining. */ >> + mt_free_bulk(count, (void __rcu **)slots); >> + goto try_ascend; >> +} >> + >> +/* >> + * mt_dup_build() - Build a new maple tree from a source tree >> + * @mt: The source maple tree to copy from >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * @to_free: Free nodes starting from @to_free if the build fails >> + * >> + * This function builds a new tree in DFS preorder. If it fails due to memory >> + * allocation, @to_free will store the last failed node to free the incomplete >> + * tree. Use mt_dup_free() to free nodes. >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +static inline int mt_dup_build(struct maple_tree *mt, struct maple_tree *new, >> + gfp_t gfp, struct maple_node **to_free) > > I am trying to change the functions to be two tabs of indent for > arguments from now on. It allows for more to fit on a single line and > still maintains a clear separation between code and argument lists. I'm not too concerned about code formatting. . . At least in this patchset. > >> +{ >> + struct maple_enode *enode; >> + struct maple_node *new_node, *new_parent = NULL, *node; >> + enum maple_type type; >> + void __rcu **slots; >> + void **new_slots; >> + unsigned char count, request, i, offset; >> + unsigned long *set_parent; >> + unsigned long new_root; >> + >> + mt_init_flags(new, mt->ma_flags); >> + enode = mt_root_locked(mt); >> + if (unlikely(!xa_is_node(enode))) { >> + rcu_assign_pointer(new->ma_root, enode); >> + return 0; >> + } >> + >> + new_node = mt_alloc_one(gfp); >> + if (!new_node) >> + return -ENOMEM; >> + >> + new_root = (unsigned long)new_node; >> + new_root |= (unsigned long)enode & MAPLE_NODE_MASK; >> + >> +copy_node: > > Can you make copy_node, descend, ascend inline functions instead of the > goto jumping please? It's better to have loops over jumping around a > lot. Gotos are good for undoing things and retry, but constructing > loops with them makes it difficult to follow. Same as above. > >> + node = mte_to_node(enode); >> + type = mte_node_type(enode); >> + memcpy(new_node, node, sizeof(struct maple_node)); >> + >> + set_parent = (unsigned long *)&(new_node->parent); >> + *set_parent &= MAPLE_NODE_MASK; >> + *set_parent |= (unsigned long)new_parent; > > Maybe make a small inline to set the parent instead of this? > > There are some defined helpers for setting the types like > ma_parent_ptr() and ma_enode_ptr() to make casting more type-safe. Ok, I'll try to do that. > >> + if (ma_is_leaf(type)) >> + goto ascend; >> + >> + new_slots = (void **)ma_slots(new_node, type); >> + slots = ma_slots(node, type); >> + request = ma_nonleaf_data_end(mt, node, type) + 1; >> + count = mt_alloc_bulk(gfp, request, new_slots); >> + if (!count) { >> + *to_free = new_node; >> + return -ENOMEM; >> + } >> + >> + for (i = 0; i < count; i++) >> + ((unsigned long *)new_slots)[i] |= >> + ((unsigned long)mt_slot_locked(mt, slots, i) & >> + MAPLE_NODE_MASK); >> + offset = 0; >> + >> +descend: >> + new_parent = new_node; >> + enode = mt_slot_locked(mt, slots, offset); >> + new_node = mte_to_node(new_slots[offset]); >> + goto copy_node; >> + >> +ascend: >> + if (ma_is_root(node)) { >> + new_node = mte_to_node((void *)new_root); >> + new_node->parent = ma_parent_ptr((unsigned long)new | >> + MA_ROOT_PARENT); >> + rcu_assign_pointer(new->ma_root, (void *)new_root); >> + return 0; >> + } >> + >> + offset = ma_parent_slot(node); >> + type = ma_parent_type(mt, node); >> + node = ma_parent(node); >> + new_node = ma_parent(new_node); >> + if (offset < ma_nonleaf_data_end(mt, node, type)) { >> + offset++; >> + new_slots = (void **)ma_slots(new_node, type); >> + slots = ma_slots(node, type); >> + goto descend; >> + } >> + >> + goto ascend; >> +} >> + >> +/** >> + * __mt_dup(): Duplicate a maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree using a faster method than traversing >> + * the source tree and inserting entries into the new tree one by one. The user >> + * needs to lock the source tree manually. Before calling this function, @new >> + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, >> + * we may also need to manually set @new's external lock using >> + * mt_set_external_lock(). >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > We use mas_ for things that won't handle the locking and pass in a maple > state. Considering the leaves need to be altered once this is returned, > I would expect passing in a maple state should be feasible? But we don't really need mas here. What do you think the state of mas should be when this function returns? Make it point to the first entry, or the last entry? > >> +{ >> + int ret; >> + struct maple_node *to_free = NULL; >> + >> + ret = mt_dup_build(mt, new, gfp, &to_free); >> + >> + if (unlikely(ret == -ENOMEM)) { > > On other errors, will the half constructed tree be returned? Is this > safe? Of course, mt_dup_free() is carefully designed to handle this. > >> + if (to_free) >> + mt_dup_free(new, to_free); >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(__mt_dup); >> + >> +/** >> + * mt_dup(): Duplicate a maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree using a faster method than traversing >> + * the source tree and inserting entries into the new tree one by one. The >> + * function will lock the source tree with an internal lock, and the user does >> + * not need to manually handle the lock. Before calling this function, @new must >> + * be an empty tree or an uninitialized tree. If @mt uses an external lock, we >> + * may also need to manually set @new's external lock using >> + * mt_set_external_lock(). >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > mtree_ ususually used to indicate locking is handled. Before unifying mtree_* and mt_*, I don't think I can see any difference between them. At least mt_set_in_rcu() and mt_clear_in_rcu() will hold the lock. > >> +{ >> + int ret; >> + struct maple_node *to_free = NULL; >> + >> + mtree_lock(mt); >> + ret = mt_dup_build(mt, new, gfp, &to_free); >> + mtree_unlock(mt); >> + >> + if (unlikely(ret == -ENOMEM)) { >> + if (to_free) >> + mt_dup_free(new, to_free); > > Again, is a half constructed tree safe to return? Since each caller > checks to_free is NULL, could that be in mt_dup_free() instead? Yes, this check can be put in mt_dup_free(). > >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(mt_dup); >> + >> /** >> * __mt_destroy() - Walk and free all nodes of a locked maple tree. >> * @mt: The maple tree >> -- >> 2.20.1 >> >>