From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC8CCCFD2E4 for ; Fri, 11 Oct 2024 08:39:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 748846B00A4; Fri, 11 Oct 2024 04:39:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F94D6B00A5; Fri, 11 Oct 2024 04:39:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 599416B00A6; Fri, 11 Oct 2024 04:39:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3B4936B00A4 for ; Fri, 11 Oct 2024 04:39:52 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id EBAE580E8E for ; Fri, 11 Oct 2024 08:39:47 +0000 (UTC) X-FDA: 82660673340.03.310DD60 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf17.hostedemail.com (Postfix) with ESMTP id 9C87440005 for ; Fri, 11 Oct 2024 08:39:47 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=exBo4Q2T; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728635961; a=rsa-sha256; cv=none; b=X6jANzx9uyk3fGVa/W7iaZ3/RGSwcmkXoYLwvFoUvIaYXMdhWgom3vemMCoVtrcZi1otZn XU5op2+TDCZMvB5WJY9yqv3qoxIMHlxNeTbUjf0zsMLvQudCKbsPn2Vp7rkLh1+Md4b9H+ PrSG0QMDZ7EDhKZNIvs26AyYUAtTTdY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=exBo4Q2T; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728635961; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jQzLRSOUJVDYXV5FlqvwVfbjfzwXvPn7jKX4d5M9SJ0=; b=D4oifi6Ci7SDx65FiRQlnKPMz8+BGt/pMN0HBJhu9xksAdyo7E6azuVpeSToEdA7eVuFMq 4uBGUT8nrWMjd/xRrLuMZhbLAIduArNSbCC9QAgrMGvqLxOpm9toUEK20PAANzGiU2A55Q ERCNN1mo0KVY2lKxtl0aXPG5ecEcqAM= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5c5cf26b95aso2224512a12.3 for ; Fri, 11 Oct 2024 01:39:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728635988; x=1729240788; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=jQzLRSOUJVDYXV5FlqvwVfbjfzwXvPn7jKX4d5M9SJ0=; b=exBo4Q2T5jwv5VLzKB/9hPv4K5q0vActdeu+NN0+rl9caWxLM9Ry+h05D256blNuRc 5BK12f0Cz1xmx1rnQweXSvHhKf/lKhP2pMsq7XzpbKMkIGOLFPwrZ1CZSoAEhq1KhrDX 75Fhlf528kmnR3E4vE2VxW+s8FdT7atUOL8T4n8mtISXJLJgkMihwcGGzfynrYkuPl2n wurKmlxnH1KFETNZVCEl2enqWYnozdy9BmVX73VrSDMGhSuwBp+4vnPh0X60Sa1BzMbb oXHDo8uw27bh6h6U3TVXLzHBEqVKGQ+zxyoaN3Rwqg7j4cKGcMeKO25V2IunE7dYWZQn vGBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728635988; x=1729240788; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=jQzLRSOUJVDYXV5FlqvwVfbjfzwXvPn7jKX4d5M9SJ0=; b=QV5LJaKYE1GqLbiCACBRJfErtRyRf4qjcALvzdRwqiGZ9PKMnPY1qGG3gMxo3YWOj0 krUlNSZTuaCQIDWtbrgdQwejoA2d5ACULBmvqTlxXWdOMC5mBfglu3JciqgD6hsnqYq9 oZBZbBbzXkCAEaxjBB/4B5L5n2aMA+/NK3TKIQrVhD/xhEXwpVaH7UWgjR4FviYUT9Bn 3hH1DmAbRMhPqXZ3baah5sEYWWsvIaE/w2kIhYOTMHDWp0JURIm3AEkYvcBx/dgK+pIJ QXQqXiNgDh2ucHvfLRtRxgXtiTnS5z6+OOziIAAsFc8rLlOS3pACEbkf6OdbTsYpEDyt 9OoQ== X-Forwarded-Encrypted: i=1; AJvYcCWzXd/pKhwvl3YhuNdATfgtYmg5JZhnzGrLmQWdZYAmjtw0G/ksHwMv4RpXUcZUqSKy3ActQ1ZBsQ==@kvack.org X-Gm-Message-State: AOJu0Yx0M5u3x9E3XiIEfpP1oiS8vyxFdpIAHOSjZYkEiGk9aewHD2Qb znuL8PNMlRieSyG2MNoqf7O3phlWvDv9UEGnmLXUETjfjuX0CFrf X-Google-Smtp-Source: AGHT+IEIho5nttJ6WjZ7zRjMXHUbxFjeSz/Fn8mbvW+6INfFIzc+90bMeayuWcrCvN/e0t/5ZN8E6w== X-Received: by 2002:a17:907:e208:b0:a99:8ed2:7e41 with SMTP id a640c23a62f3a-a99b943f8ecmr145623366b.11.1728635987609; Fri, 11 Oct 2024 01:39:47 -0700 (PDT) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a99a80dcdc3sm190750366b.172.2024.10.11.01.39.46 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 11 Oct 2024 01:39:46 -0700 (PDT) Date: Fri, 11 Oct 2024 08:39:45 +0000 From: Wei Yang To: Lorenzo Stoakes Cc: Andrew Morton , "Liam R . Howlett" , Matthew Wilcox , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sidhartha Kumar , Bert Karwatzki , Mikhail Gavrilov , maple-tree@lists.infradead.org Subject: Re: [PATCH hotfix 6.12 v3 1/2] maple_tree: correct tree corruption on spanning store Message-ID: <20241011083945.z6a3y22pszyq6hse@master> Reply-To: Wei Yang References: <48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com> User-Agent: NeoMutt/20170113 (1.7.2) X-Rspam-User: X-Stat-Signature: 11t6npu54iekt6brazjtftanccqwp3pb X-Rspamd-Queue-Id: 9C87440005 X-Rspamd-Server: rspam02 X-HE-Tag: 1728635987-488397 X-HE-Meta: U2FsdGVkX1+norAhBbaqOdo1bwLQLeI3mXDE6poMtQZEIyvV+q3Xkx3wRXA+uFhx8NmJ4RsQzvtlPC3Z83hHeL9UO7kSMt4pSqist7ZgUfOJax9I8jmY/ZkQF6whMEkZddMldUPgIapOg9Ui4wNHaZ3FOtZ+tAItQI4sy9kZVg3r0ebuukUnSJl++Ify8NJz9f6rTHPACuGHgjPB9TiTp7UbeOpTaMP8aECWg3BINEDkw7f8UGEkpDuaycLEmiRhk8bw48ryTsEW14he1WGpYd6RcnaYqWXq2YXbokCE3lxVLrplqQR2xE3cSP48BgjIHLPeomYbXr1cQs6NaRGPJQLWT8EpOEyjwb2SJC23DiADrYFjzpyXDmSZL/wLLjf7GRqpAdUCmWK3kBNhGGuFd91c4F05tmkIew0b79DuYpPkCwjUumE7Jsi9Bw4s6sOn22f+0L2f/nRb35mi+I5g/Kwj/rwKgw8tzPVvLXywbmWdp368uFQhulN84bBdnwjrXRlSyNqr5KIjEmJekGhK70tR1oQD2InnvS+a7sRr+Phn+id2Cu/B7lePQjpSFXI6arL37+3eDa6wQ6ll5fzrl5AyU9IiSzJncSHBIhCjTUweb89/Vy27nfVwSiAicbfbJ+ma4XcXOC2JL3zOi/2Skf73h0Yb13MJ3cC8SOCWyDhOClLBGYyM+EMhbcE5hBQDHP49VvtWG0qUmPgxAqOwZMF/2rBIJXKveQnJD6CbCbpxZyUZnZW2TqfrfwXc3/jyW27lNnPROknPH5XUZ2ohng1Wdf1AxoQFWj60qROoWiHlyZxqPvuOglVyKBkADQV07kuSCmV5ozRACXTVdsoWn06RGm/++K9nYKMX9BJYMFQDrj7tMx5XCNCGTnilLqXLAYQrt8m6Rk8vcAddi4SXipzJE/uOLSP5JdEo45u0jhkifIqLyfjxKNhbf2IByvIuCQdyQCpm40pQ+M6OlxY 7UyK2bVI 9XHUwBzqlQeNOz/A2wmRGGlXuBwYrJ59t22xCeK9kmU2gE3bF2FlMvbAiHJmeOUcDD+FVFhpU7mZP40eF2vIrwC6W7xPaTeRljMWD/5ilzMdhn5yLkUXFNFFAe/txphraoAK6ktIlldC5IXZUm3G9zfrUsuXugtoWXVleBiR5K5jqOYbmwcvJb4TBc+6S+lF7prv8kkHnxn0kulK2GMhP5yQW6V1LJCyoz6gorfecBclwULkpdeuxohDdF+1KrG6JRIP++ulF/YrSU/iJBcCXiWXelcaWP0Orj7JakPPp952jrVexASOWQD7tUqIENK0tRlbXnHWBgT2VxeoJUj49BkZSB3QbrLcwSQ23M5WLweS8tb/yEYoWVLuB8y0xDAffZoas+QVIqPVKaJWR6qc5IMSOQDoA2Jhro/BWzpqqxs8FmsUToF9TgrFNaJA1kmB67QfKyhUE2tjeU0jUUK7mjqR+gge7fx//nkE7K3UYe5XQxFSnnVRxQem6CuLvF7y8u9da4PAvkBlqNEcFeI3wcCs86BVxjgwzlLHwhLOcMzQoZ2jVeENEV3qauZke2Gy/gl2rPcGkL2c4+OOfKJSf2SgrDbTaJmX4rWHiPVP1UKFvxO8LkKCsfkU+zkWTK/hffj3CgLTxZt0S809T6/qxFXj7hzSaFwkPNA09oMcocmJMTJsi8cDpR4nn6QtdzSnwr0b9DQkYWuJxmz0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 07, 2024 at 04:28:32PM +0100, Lorenzo Stoakes wrote: >There has been a subtle bug present in the maple tree implementation from >its inception. > >This arises from how stores are performed - when a store occurs, it will >overwrite overlapping ranges and adjust the tree as necessary to >accommodate this. > >A range may always ultimately span two leaf nodes. In this instance we walk >the two leaf nodes, determine which elements are not overwritten to the >left and to the right of the start and end of the ranges respectively and >then rebalance the tree to contain these entries and the newly inserted >one. > >This kind of store is dubbed a 'spanning store' and is implemented by >mas_wr_spanning_store(). > >In order to reach this stage, mas_store_gfp() invokes mas_wr_preallocate(), >mas_wr_store_type() and mas_wr_walk() in turn to walk the tree and update >the object (mas) to traverse to the location where the write should be >performed, determining its store type. > >When a spanning store is required, this function returns false stopping at >the parent node which contains the target range, and mas_wr_store_type() >marks the mas->store_type as wr_spanning_store to denote this fact. > >When we go to perform the store in mas_wr_spanning_store(), we first >determine the elements AFTER the END of the range we wish to store (that >is, to the right of the entry to be inserted) - we do this by walking to >the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we >have just determined contains the range over which we intend to write. > >We then turn our attention to the entries to the left of the entry we are >inserting, whose state is represented by l_mas, and copy these into a 'big >node', which is a special node which contains enough slots to contain two >leaf node's worth of data. > >We then copy the entry we wish to store immediately after this - the copy >and the insertion of the new entry is performed by mas_store_b_node(). > >After this we copy the elements to the right of the end of the range which >we are inserting, if we have not exceeded the length of the node >(i.e. r_mas.offset <= r_mas.end). > >Herein lies the bug - under very specific circumstances, this logic can >break and corrupt the maple tree. > >Consider the following tree: > >Height > 0 Root Node > / \ > pivot = 0xffff / \ pivot = ULONG_MAX > / \ > 1 A [-----] ... > / \ > pivot = 0x4fff / \ pivot = 0xffff > / \ > 2 (LEAVES) B [-----] [-----] C > ^--- Last pivot 0xffff. > >Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note >that all ranges expressed in maple tree code are inclusive): > >1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then > determines that this is a spanning store across nodes B and C. The mas > state is set such that the current node from which we traverse further > is node A. > >2. In mas_wr_spanning_store() we try to find elements to the right of pivot > 0xffff by searching for an index of 0x10000: > > - mas_wr_walk_index() invokes mas_wr_walk_descend() and > mas_wr_node_walk() in turn. > > - mas_wr_node_walk() loops over entries in node A until EITHER it > finds an entry whose pivot equals or exceeds 0x10000 OR it > reaches the final entry. > > - Since no entry has a pivot equal to or exceeding 0x10000, pivot > 0xffff is selected, leading to node C. > > - mas_wr_walk_traverse() resets the mas state to traverse node C. We > loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk() > in turn once again. > > - Again, we reach the last entry in node C, which has a pivot of > 0xffff. > >3. We then copy the elements to the left of 0x4000 in node B to the big > node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry > too. > >4. We determine whether we have any entries to copy from the right of the > end of the range via - and with r_mas set up at the entry at pivot > 0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at > pivot 0xffff. > >5. BUG! The maple tree is corrupted with a duplicate entry. > >This requires a very specific set of circumstances - we must be spanning >the last element in a leaf node, which is the last element in the parent >node. > >spanning store across two leaf nodes with a range that ends at that shared >pivot. > >A potential solution to this problem would simply be to reset the walk each >time we traverse r_mas, however given the rarity of this situation it seems >that would be rather inefficient. > >Instead, this patch detects if the right hand node is populated, i.e. has >anything we need to copy. > >We do so by only copying elements from the right of the entry being inserted >when the maximum value present exceeds the last, rather than basing this on >offset position. > >The patch also updates some comments and eliminates the unused bool return >value in mas_wr_walk_index(). > >The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma tree >in mmap_region()") seems to have made the probability of this event much >more likely, which is the point at which reports started to be submitted >concerning this bug. > >The motivation for this change arose from Bert Karwatzki's report of >encountering mm instability after the release of kernel v6.12-rc1 which, >after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration >options, was identified as maple tree corruption. > >After Bert very generously provided his time and ability to reproduce this >event consistently, I was able to finally identify that the issue discussed >in this commit message was occurring for him. > >Reported-and-tested-by: Bert Karwatzki >Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/ >Reported-by: Mikhail Gavrilov >Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/ >Fixes: 54a611b60590 ("Maple Tree: add new data structure") >Signed-off-by: Lorenzo Stoakes >Acked-by: Vlastimil Babka Reviewed-by: Wei Yang -- Wei Yang Help you, Help me