From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF716C3600C for ; Fri, 4 Apr 2025 00:50:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6AF7E6B0006; Thu, 3 Apr 2025 20:50:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 637226B0007; Thu, 3 Apr 2025 20:50:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B61E6B0008; Thu, 3 Apr 2025 20:50:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2A9086B0006 for ; Thu, 3 Apr 2025 20:50:38 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E1293818E8 for ; Fri, 4 Apr 2025 00:50:38 +0000 (UTC) X-FDA: 83294530956.14.0869AAF Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf19.hostedemail.com (Postfix) with ESMTP id EF5A51A0003 for ; Fri, 4 Apr 2025 00:50:36 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cajGoBGK; spf=pass (imf19.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743727837; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Th78P7lLc34hMpgoaKVJWgxxETQOSCyPOGjwMuTxpqE=; b=Nq9w2BDEr/wd13Q0gyu7H5D4+eJ6jOEAUNhKxsjplNRSYWzRwMkrlM6MTg0bjIGkL1CQzf CwcNHujLg1R8+fDl639g4NWg1uOiZqYm/dO1TrRLY32zP0H6hNnXlfOCvKi2qCvAu/s8I/ YavBj9AZ3BhdAd61zHb5p3SHXPjBRIs= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=cajGoBGK; spf=pass (imf19.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743727837; a=rsa-sha256; cv=none; b=V0YhYWUoy4IfYJ9fciwY6BW4F4ZiuDqDrRnLUZYxNHfLvonT3ndKRjpeYy3nrQ3edfcM7t Zow+fQr2BWcr8D2r4KPJVtR3JTXl0GMTOYYFI8qbluvUpkINF1mc4mECP5H3cg+tFLcxi4 V7rcMgK4K0GuBJeiw2prbEKMNoTA5ME= Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-7fd581c2bf4so1254654a12.3 for ; Thu, 03 Apr 2025 17:50:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743727836; x=1744332636; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Th78P7lLc34hMpgoaKVJWgxxETQOSCyPOGjwMuTxpqE=; b=cajGoBGKofc9d54kqwG8Z9nqYXAv3VODIi8V97m3r4aWZ8emhNrL1qYu3bDgIxvIBB 9qlyplN1i8bi8rzBxdkslSYv3hGDIlKWWgv6TOcJlljdjTe/rPCCTEDH7ZbEz4A1kzQH uVHvLFkKbLmpNXnOOFnaQM0TOt6ec/IxQfq56wX4hCU7ect3pEznbdeCoUmaTt6J4HCk aTYjFmLzdhYL2Ek7V19wBT3c2PqWV9NCtpLNItQbmVaWyv8pmTJn3lRJ/fCmRKBkhkjJ tBiJpo3IdKTedhB3V/JO9Wt3iZjZ1IPsDN6NSLznOX4N8470B0dtvJm98k2XtNG6fndx 2+5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743727836; x=1744332636; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Th78P7lLc34hMpgoaKVJWgxxETQOSCyPOGjwMuTxpqE=; b=JmKAJYq4lnBG8u4xaX89Lm0X9jjc9FKp6ZDep1QqLrxh9otbp26CEX/pKuy+59zSCV f7DmrpZKjMRQxgHJsqHfWpxojGAIosjsxRmkBynYzCrY8zBqSvpNGP4oo8NU5JwEGYUU bBAqc98rlABoghXPdEA0CNFU+iWxwP92Mq4/e7ycxwqWdzdeDJNiq0kVUBH9ldI8+4Ba HuoJkpidG90eb1fZGXajkNKLNV7nSU2w9SpCcBlvqKxJvFXNYyX2poOXpQBfZFgQGNkD R5q85Xf3Z2AB/mptz2Z7pcjMSYmMTZl7wxsvksTopKoLH+6FQYCJEtaTee4T2lwY8hSY XGew== X-Forwarded-Encrypted: i=1; AJvYcCUIZPa7MFGtkWgRpdrYgo/K2/tbhGBFbU+GOnRvxPtLFtFP4cyhxB6AufG4J77YeTAoNwNQdTZ6Qg==@kvack.org X-Gm-Message-State: AOJu0YzNrpzwsVEfQHddFvlcW7f5b+JDiKI+5Zxj0k6Ybi3cjSEVRDZ4 cwaD7+skRkv/6Ldw9QxmgxGzGfkQhhxntBw50GbHeUFZI08zWLlQ X-Gm-Gg: ASbGnctkKt//ophbvzfvluicKH2Rx9MjHhl7Ni5QOhOBFbVBNFK81pWbnt34PYsiYqR WqtqvnXBlpNfbLLljWjTt0D/e2N+ICJPISX2kGtU52fzKiGFtV4uk8+r0ud+zZSPDjqpcfTtPup CjuntVGbKyKiO20fBrxjuPZHn7Zjt5GmUcj4wz3o7ZYZnX/lj0KbAjJtg0q/h2MLSaDkm9F6/3y WRnJ2fAetbjfR4p5aR1aTtq4RompULghTQMost571d1hTN23Lduo3blx57TBz8Jk3A5paFis3eK VnOo3LRYO0ADpyRkrhRPZyh4wWBg+JaF/fc9l8ioC7vpXvfDWXUOKNkCAc/uGYQBQRA3vRFVYpj A X-Google-Smtp-Source: AGHT+IE0OK+ZG86CXfS8cxl7/u4OcTyr0/sVE0W60caXee3AGQFNLpQ/eKk0mGeHGmFqwPk/RLsUqQ== X-Received: by 2002:a17:90b:5188:b0:2ea:5dea:eb0a with SMTP id 98e67ed59e1d1-306a60e4ac0mr894543a91.4.1743727835579; Thu, 03 Apr 2025 17:50:35 -0700 (PDT) Received: from fedora (c-67-164-59-41.hsd1.ca.comcast.net. [67.164.59.41]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-3057ca8768fsm2391935a91.27.2025.04.03.17.50.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Apr 2025 17:50:34 -0700 (PDT) Date: Thu, 3 Apr 2025 17:50:31 -0700 From: "Vishal Moola (Oracle)" To: Qu Wenruo Cc: Matthew Wilcox , Qu Wenruo , Linux Memory Management List , "linux-fsdevel@vger.kernel.org" , linux-btrfs , vivek.kasireddy@intel.com, Andrew Morton Subject: Re: Large folios and filemap_get_folios_contig() Message-ID: References: <59539c02-d353-4811-bcbe-080b408f445e@suse.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="jdO5iHBsjAZEVpV9" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <59539c02-d353-4811-bcbe-080b408f445e@suse.com> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: EF5A51A0003 X-Stat-Signature: xy96oz7foq1o33thk9bmiyiwywpwx51g X-HE-Tag: 1743727836-233046 X-HE-Meta: U2FsdGVkX19M0UXNjeu3isFkPGlSSvIoX3+7EGD0hX97rz1tsiFNka45wCrYD5+HDPCtNyULQgiN8vyutctHdDvsLl4b4XdA3cxwLjOYsfXcH4vmivee1yiLovmZNOTLqCPLCTG+44DV6GctUKUVLlhROIYDa+HkfFG4fKJ2AaJJ767QlfJe9T5EodE9DD97i3MwlyiSIBWvCW8rtSizjexKKi+29fp6N/zR4Wo13f8uTIJ0DUtmCASmhwMCI9fzULF1F5F43+27lomhQkPxuU+gmsPth6nnmF9b+Tbr6RujwKUTITTLWuwb3+Wa93jBpXeALgKZcoQFZpnf3MSZ1DAUeIyq0/WivMEMrXmKIF0fPbpVDW4ughJIaERa+xCcScsPqzPGkNIM+7STMQtNddlrKEb3MgH4E2zRBHuqEvvVcdqbF93KCIJVNxmzgbJRVGs2a06GZYndx3S8z2X9ku6WEZZLsM7JgAJ/Kc6Wmcj5ljrfUUc7uJMoLdBdo+5PoLEhQz2WvxBw6Rl2VMVoJ2o/MRIjgyjBzySi+r5ynCEi3QwnccBgiNuxRbbAcGKULT8kVeb7Eu+ib8mK7HjmV6l/lvmKL23BF8PbIVDkETYGAeJTfvDqug8mKwleh8M+P9zDYx2R1ai39D9i0adDeW2eSfOC3nwM1/PvBHQ/wLjn1Zi/HyGIb44Y2OMRS+8KZGbHuGZE5whzANMhCs/aQuvmPgK92LIXYtYKFDCYX9k9ISsfFm00sXepJ7r+3slKzc9feXC7MX1i8Z/ERlLgMFIrU+PaqO28bgbS605fxUcvr51qhS2ko4YwtMvDBJ4o2WhL1gi/R3snOFDHKofPd1uvrIYUXCXd5VWdtgrccpSoiJ0sPoTa8HxMu+djpqyBJHKZoB2qlW/GKPvUfHvak+/pwM3MgZ0263ZIkSoxPuPSP3vcMOp5gbmzPSf1y1GziPVx/xO8fP9cEZOrIzP FMjexoLF 4QQHYSz5o28iORGRsA7obYYvZDZb6L/fcfe3tSElfUr69sQcsjqzMviHQRa7fgfeEWNWj4L2GAnjo29RiClZHEsM8FbIsABjKBIOzzdZbt8NjC8KxJB7yZwzAEFT8Y/jIKCDSo1KerlGFC4R15yt150u53CPZ9GeJ1lb8T0Eu4bhRUmlNYxcRMixYKr7M6GxtiW3GLz3hem4zYjiBeefGGniKrQvTlhcBtwIkrXszJCcuJhhKI5Jr7Luk+TqM0rZoptYZhzzuI1OvZQbQt51dh9H5Y1V1D6umpmxEBPGDkmwlaQiXXRU2AUZhTTP0czBgpjBBRU4ZcXPMHec5oTNP3qy0LoqF1WkEAnoVstut70qK17cTV/e9yb2mKd7mXz0sReLn9qZZmYCkCVYzzju129RKZ05dgshFKPSlksL2w1/pK6YAThxyYDEgf1SkKhg4VDhGBSOVLhkBEp+ODBjeGvC+oZ5lOuZAG5ls27NWn/bnS2od3EiP1DGN/oPJa91IxenjHTIDPdVeALRx/8PxRqdOvbmWZY+8BV1AQGdqXovJSKnwNJMeg4BkaG8GLkxpsBbI8EIlVkZwBGiZgSc0wOO/1mGAToIsugUwAF0L0/Fr0ZDT7tDudfwjpaGC5PuDuW/M5qPfP9sMm+NAH2prxf0/EK+7I9YwT8fkzbUYrDk/ZmEG7n1kTkbtew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --jdO5iHBsjAZEVpV9 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, Apr 04, 2025 at 07:46:59AM +1030, Qu Wenruo wrote: > > > 在 2025/4/3 23:05, Matthew Wilcox 写道: > > On Thu, Apr 03, 2025 at 08:06:53PM +1030, Qu Wenruo wrote: > > > Recently I hit a bug when developing the large folios support for btrfs. > > > > > > That we call filemap_get_folios_contig(), then lock each returned folio. > > > (We also have a case where we unlock each returned folio) > > > > > > However since a large folio can be returned several times in the batch, > > > this obviously makes a deadlock, as btrfs is trying to lock the same > > > folio more than once. > > > > Sorry, what? A large folio should only be returned once. xas_next() > > moves to the next folio. How is it possible that > > filemap_get_folios_contig() returns the same folio more than once? > > But that's exactly what I got from filemap_get_folios_contig(): > > lock_delalloc_folios: r/i=5/260 locked_folio=720896(65536) start=782336 > end=819199(36864) > lock_delalloc_folios: r/i=5/260 found_folios=1 > lock_delalloc_folios: r/i=5/260 i=0 folio=720896(65536) > lock_delalloc_folios: r/i=5/260 found_folios=8 > lock_delalloc_folios: r/i=5/260 i=0 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=1 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=2 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=3 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=4 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=5 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=6 folio=786432(262144) > lock_delalloc_folios: r/i=5/260 i=7 folio=786432(262144) > > r/i is the root and inode number from btrfs, and you can completely ignore > it. > > @locked_folio is the folio we're already holding a lock, the value inside > the brackets is the folio size. > > @start and @end is the range we're searching for, the value inside the > brackets is the search range length. > > The first iteration returns the current locked folio, and since the range > inside that folio is only 4K, thus it's only returned once. > > The next 8 slots are all inside the same large folio at 786432, resulting > duplicated entries. > > > > > > Then I looked into the caller of filemap_get_folios_contig() inside > > > mm/gup, and it indeed does the correct skip. > > > > ... that code looks wrong to me. > > It looks like it's xas_find() is doing the correct skip by calling > xas_next_offset() -> xas_move_index() to skip the next one. > > But the filemap_get_folios_contig() only calls xas_next() by increasing the > index, not really skip to the next folio. > > Although I can be totally wrong as I'm not familiar with the xarray > internals at all. Thanks for bringing this to my attention, it looks like this is due to a mistake during my folio conversion for this function. The xas_next() call tries to go to the next index, but if that index is part of a multi-index entry things get awkward if we don't manually account for the index shift of a large folio (which I missed). Can you please try out this attached patch and see if it gets rid of the duplicate problem? > However I totally agree the duplicated behavior (and the extra handling of > duplicated entries) looks very wrong. > > Thanks, > Qu --jdO5iHBsjAZEVpV9 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=0001-Fix-filemap_get_folios_contig-returning-batches-of-i.patch >From 91e8353cee38b1624fc13587f6db5058d764d983 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Thu, 3 Apr 2025 16:54:17 -0700 Subject: [PATCH] Fix filemap_get_folios_contig returning batches of identical folios filemap_get_folios_contig() is supposed to return distinct folios found within [start, end]. Large folios in the Xarray become multi-index entries. xas_next() can iterate through the sub-indexes before finding a sibling entry and breaking out of the loop. This can result in a returned folio_batch containing an indeterminate number of duplicate folios, which forces the callers to skeptically handle the returned batch. This is inefficient and incurs a large maintenance overhead. We can fix this by calling xas_advance() after we have successfully adding a folio to the batch to ensure our Xarray is positioned such that it will correctly find the next folio - similar to filemap_get_read_batch(). Fixes: 35b471467f88 ("filemap: add filemap_get_folios_contig()") Signed-off-by: Vishal Moola (Oracle) Cc: --- mm/filemap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/filemap.c b/mm/filemap.c index cc69f174f76b..bc7b28dfba3c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2233,6 +2233,7 @@ unsigned filemap_get_folios_contig(struct address_space *mapping, *start = folio->index + nr; goto out; } + xas_advance(&xas, folio_next_index(folio) - 1); continue; put_folio: folio_put(folio); -- 2.48.1 --jdO5iHBsjAZEVpV9--