From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30ABED64097 for ; Fri, 8 Nov 2024 22:34:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A3EF6B00C1; Fri, 8 Nov 2024 17:34:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 853886B00C2; Fri, 8 Nov 2024 17:34:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71AE86B00C3; Fri, 8 Nov 2024 17:34:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 52CE66B00C1 for ; Fri, 8 Nov 2024 17:34:04 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BC72E120FAA for ; Fri, 8 Nov 2024 22:34:03 +0000 (UTC) X-FDA: 82764381630.28.4A359C5 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf29.hostedemail.com (Postfix) with ESMTP id 96D47120012 for ; Fri, 8 Nov 2024 22:33:11 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dyF9SGmT; spf=pass (imf29.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731105072; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/rtbWxBpUgKg49scbC1SEF5Nn+qd22JpYqesU+kKdH0=; b=LY3kjBJowm7BQTnG6utVwmkqOV/djlXeMsgQjLxLSzEZJmu4ojKjNRq5esmivB+REBb6x9 cswsE/pODBAtGSGklXQ56rYht+rCFDpELDTM1vSUDNFC9r2w6n72hnwXRjev3A/u8RiLVZ eRWiNqBgYGFq15scmZrtvGQJLujc+D4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dyF9SGmT; spf=pass (imf29.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731105072; a=rsa-sha256; cv=none; b=RB2BiC1jIDbjjWq8vbSjm5CAm2E3jh43jU95gUfilbhLiM+tare8uWxs7D2oEAqnyspinO QlHSgG59iaIMRkhZN7pJwc/9hLQQywI2t2kSZhgdaADpeS4Dur77WEYjf4c535v8VsYs6l 7TZBbM2bZu7Z7HbDz1LvAHkVTPzUr/8= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-460b16d4534so15683351cf.3 for ; Fri, 08 Nov 2024 14:34:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731105241; x=1731710041; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/rtbWxBpUgKg49scbC1SEF5Nn+qd22JpYqesU+kKdH0=; b=dyF9SGmTkOlCa8fkeHvUMQwSLSm0xSOS9g2/0RouDgzn0Lae9Tm5Bxkuqh/YBZUtDp TJEXgE0R+p06I8/yvonAsYz//OYmw91XM3FfY+pZTLyf0Wujx42xOlXJSD4TzxsVmH48 9Ryq3nkDxj93+AqpoUs/Jd6TDN95J7c/ASogG60TXMZ8S0lHois32JBU9DwLARzQVUv5 odMO8qeCIVasgb7jaCEBMcOJm2YQunUbmE6Al/tzSyF2PecTF4JazP9XDUCQTw/R3q7K EC/8uGAiRz+TJsIm7b+eTs/x+vCP5jcYicPi/PgxujsoIf2GFSx3mKodB4Icw6CZdihI Vnyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731105241; x=1731710041; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/rtbWxBpUgKg49scbC1SEF5Nn+qd22JpYqesU+kKdH0=; b=dTXvauN6bIQD1cJxncFbdm/MlzydeYG/7q4jgf45Iaz18T2ofxvexKhh8Mh5T1o2t1 6/T84fnj2eE4San3ZKSVjMxb2flwH27QjCKom4Hk/myJH/X2349P2Xnx+hKNsDL6KNEm nSZAu1PQwAOp8QVTOY3lPASCVPgLPMOv9E+S1rXFd5dy3xQ93PeIvmizIQXQf7A6/a3T BPLm2KwUcVYX6WZvAMeiz/FKxaE5T5oHaycCTBvlupfJja3l05ia1tAO0jtYVcTLT6ck VdAZedlMbXuXS7KYiqyJcpvgUYJlk/bDLmYpC47f4FONE9YfIu7hKrLmPOZX1xGbAov6 Xssg== X-Forwarded-Encrypted: i=1; AJvYcCWNoBiZY7TD4/scxFGMuT7jF3rYK662HuRGUlG4agi0y0hoF47NpFbV+ioIOYkF2fuKH1a0DpqTlw==@kvack.org X-Gm-Message-State: AOJu0YyKA55mv4OiMBdghYwSpaZosswrcOCKDCNseUz6FhV7UMAlBoWw 5Ehw02SA3PieyFw5rdq/LyvO+RnPgb1UsCDd8EmYUYnLZ1GL0SwTiWnr+VNY+Mvu/JGKrTsEJ04 HkQ4sbJEn9vZr5zQmf1ysykOK5pI= X-Google-Smtp-Source: AGHT+IGOxEcJXCsO967q4+s9Bc/NC1jzpXnfAnlqf9+eCxrdRKNu1BGRHvOCuzAt16xPvjW0P+s3YUfFHTplTB+md+0= X-Received: by 2002:ac8:6690:0:b0:463:1561:a0e3 with SMTP id d75a77b69052e-4631561a1d2mr14273981cf.19.1731105241006; Fri, 08 Nov 2024 14:34:01 -0800 (PST) MIME-Version: 1.0 References: <20241107235614.3637221-1-joannelkoong@gmail.com> <20241107235614.3637221-7-joannelkoong@gmail.com> <1b3a36fe-1f62-410c-97fa-d59e7385f683@linux.alibaba.com> In-Reply-To: <1b3a36fe-1f62-410c-97fa-d59e7385f683@linux.alibaba.com> From: Joanne Koong Date: Fri, 8 Nov 2024 14:33:50 -0800 Message-ID: Subject: Re: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, shakeel.butt@linux.dev, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 96D47120012 X-Stat-Signature: k18b87dijq8hasa9d847j7qsjrgfh1d7 X-Rspam-User: X-HE-Tag: 1731105191-50966 X-HE-Meta: U2FsdGVkX18nPG05CAqvv0BU4RnHAt6Yx6DYiOhWJIm4TTHe+dDufMwtRyyBk3jjipwUDDIler818XpnxmkM1iUG+MVDOPkqehpWVhq+TQ47Vy3hLvaxgc4UtLkoO5wNVMk15cxEew/W6VbjlbrRKIZd39hi46/MKxWnHDRyg//oEuxFogt1fi0FKyYjBTFtDkUh0uQ8RMVjIf4y4lrBYOSCj2pCVQ43myT08uAaDKzpqh4TVQBP9XrpGMb/c/LkXcdDE8FI3SLrCN0oqKNyeXLRV+WBgBSmyAMP07xxoDPhYLwzQiLtVbqtjkhP8mzAGGpKQDRHOjf2qaCi77mLyyCHj4CUHezLWmtBJo/dYj6MTiUWaNNg5ECUGCymXEXTBXLRb0tczphfQvcFELwPPALP8Ir1HEdQc5GwfKzbfG5O0tUBlpIIZcZhK+Uw7afXMEz6CPJNuc1HymFTXBCt8DldiLD+A71X999aft0J/Xn2VbQJZ2BG2LFMsyD1hZymzt0AooVBcPRPh7AVVBrLG48nQyq9Zv3cPgk13CfA3kP+h/NZ3AjijVuHegbKVwBkUnf0GanRe9xmISu/KmDfAHgPf+oFC7p33FdfVuYfm5nbee/YaBwm5E/r2tXl2kzK0FlpbhrffAYsiN0CwCgfCTlZQUg+y9Q+fudxiWQI6SrsYfhQkxPSSQN55e5E9unv34fj6mxL7u4Jdcq0kMgJLysHV3ysqS8Hjgt0EFKd/AP+v8Ct6dkaXvbU9xdYFy+7T7p8M17llRJi/TRjZ3PBJNFaCT7uJjyXwinPIW6wCKVpAe3RsSqjkDwES4uTgrhmN5xXPgys2eHbNn7LbfehppRswXMxWW5yYMgMc+A4Kt6PXFUdk1YnZJgZdEfG1EBriQmk00D08tntqI+opNKUb1CriHhIMWbDEDusxxfvgS00/iK+U6TOOoBqZSv6b15tqoA72VRmoNPPpaUWAvF 6XRwnQbf IofLm58rZqY9UBZ0BYzQYqbteVPRPgHmbm5njvWXLA4uoPLo5c/58yXli4Us37ix2fEOgSq+GS99i8tREehFqfdGSwual7FuVSWyPeB8ftodI9zKGmoaNVI5TOs6A0Jmu/JYa1b8U8SeNoQ3VTKUch9HW5mIHRwAq/Os4kzExT24tuKfX31brS0k0RIsrqD0EqS4+MxBPTBz24ASYwP93Vk05K0D9SOg4KN6RkVM3V2B668K3oGRV315MG/PgWFkKRvug5MXmGWMvqLZ/qAAk+7lLWnjg5wM+bt+91xEgB/Odvkv73SAkezF2wouzgb2FbNCnmODAcpkyoVZSj6rS1+ZZ+uZUsGks4sAodjgadPo0rDbnizkS8XJx/A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 8, 2024 at 12:49=E2=80=AFAM Jingbo Xu wrote: > > Hi, Joanne, > > Thanks for the continuing work! > > On 11/8/24 7:56 AM, Joanne Koong wrote: > > Currently, we allocate and copy data to a temporary folio when > > handling writeback in order to mitigate the following deadlock scenario > > that may arise if reclaim waits on writeback to complete: > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > To work around this, we allocate a temporary folio and copy over the > > original folio to the temporary folio so that writeback can be > > immediately cleared on the original folio. This additionally requires u= s > > to maintain an internal rb tree to keep track of writeback state on the > > temporary folios. > > > > A recent change prevents reclaim logic from waiting on writeback for > > folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it. > > This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which > > will prevent FUSE folios from running into the reclaim deadlock describ= ed > > above) and removes the temporary folio + extra copying and the internal > > rb tree. > > > > fio benchmarks -- > > (using averages observed from 10 runs, throwing away outliers) > > > > Setup: > > sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=3D4= -o source=3D~/tmp_mount ~/fuse_mount > > > > fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1k,4k,1M}= --size=3D2G > > --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --directory=3D/roo= t/fuse_mount > > > > bs =3D 1k 4k 1M > > Before 351 MiB/s 1818 MiB/s 1851 MiB/s > > After 341 MiB/s 2246 MiB/s 2685 MiB/s > > % diff -3% 23% 45% > > > > Signed-off-by: Joanne Koong > > > > @@ -1622,7 +1543,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, s= truct iov_iter *iter, > > return res; > > } > > } > > - if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) { > > + if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + cou= nt - 1))) { > > filemap_range_has_writeback() is not equivalent to > fuse_range_is_writeback(), as it will return true as long as there's any > locked or dirty page? I can't find an equivalent helper function at > hand though. > Hi Jingbo, I couldn't find an equivalent helper function either. My guess is that filemap_range_has_writeback() returns true if the page is locked because it doesn't have a way of determining if the page is dirty or not (it seems like if a page is locked, then we can't read the writeback bit on it) so it errs on the side of assuming yes. For this case, it seems okay to me to use filemap_range_has_writeback() because if we get back a false positive (eg filemap_range_has_writeback() returns true when it's actually false), the only cost is the overhead of an additional fuse_sync_writes() call but fuse_sync_writes() will return immediately from the wait(). > > > > @@ -3423,7 +3143,6 @@ void fuse_init_file_inode(struct inode *inode, un= signed int flags) > > fi->iocachectr =3D 0; > > init_waitqueue_head(&fi->page_waitq); > > init_waitqueue_head(&fi->direct_io_waitq); > > - fi->writepages =3D RB_ROOT; > > It seems that 'struct rb_root writepages' is not removed from fuse_inode > structure. > Nice catch! I'll remove this from the fuse_inode struct in v5. > > Besides, I also looked through the former 5 patches and can't find any > obvious errors at the very first glance. Hopefully the MM guys could > offer more professional reviews. > Thanks for looking through this code in this version and the past versions of this patchset too. It's much appreciated! > -- > Thanks, > Jingbo