From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ADE1D637A7 for ; Thu, 14 Nov 2024 00:39:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D80956B0095; Wed, 13 Nov 2024 19:39:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D31316B0098; Wed, 13 Nov 2024 19:39:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF82D6B0099; Wed, 13 Nov 2024 19:39:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A84646B0095 for ; Wed, 13 Nov 2024 19:39:23 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4ED21808FF for ; Thu, 14 Nov 2024 00:39:23 +0000 (UTC) X-FDA: 82782841386.28.52D1A41 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf23.hostedemail.com (Postfix) with ESMTP id 0357E140010 for ; Thu, 14 Nov 2024 00:38:52 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="HpjwmOk/"; spf=pass (imf23.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731544567; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ui92GwdJMLYEeuUgce0xtSp+y54z2Swd5voIaLF3cpw=; b=lYEiw2ah4e/tPNhveCHB1Br6R1RQYLk/dVcVjCTBU/j4EQj7hmoUJ4N55DAigglYSuBA+B fKHAfzQMoKGs017Wbej1SOIzaEdI/1iRBkhhvgFBuY3OdQsIHfaJ0uCFQ2hCvCVU3FlqaM SJ9Vt7qkcNS03K5TY6erfsqlusdNpck= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="HpjwmOk/"; spf=pass (imf23.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.178 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731544567; a=rsa-sha256; cv=none; b=P4zB8YW4FelxqCbUO2CQOhydsheL1dWBb5fIT5QidqDV2+c59RPImSM31SaYHygBym3k+V ucKEE30hFLBYb7Tx8yhg3uNLm7LAtyWdy4NmGyt+1iobXAD5LKrk82aZM75uVwFBQ3wATM +lOHGk/E6NcFZqRjKr179HFtJSMCq68= Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-4608e389407so429951cf.2 for ; Wed, 13 Nov 2024 16:39:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731544760; x=1732149560; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ui92GwdJMLYEeuUgce0xtSp+y54z2Swd5voIaLF3cpw=; b=HpjwmOk/Yaa6VM4pFu37YmrW6Yw8i3kGoNbIZ/ajZhk91mPvDw15NPhHnivfT2e2Yt nN7vn/IJpSfdNCPPkO/ueOOF1cQ65AQ9oSJldx1EAQaxmhhaMtr1J8bZVCqG7glf7Qyc m8VuanEgSvXWCZp5It/I1kO8hyjziKA1+qekWgpUNWnAq+ihawAemH3x1+IwYqJ5H+4O FuS7g70GHMc2Q4uAQh/1lg7sl8fhWcfdUgbbOYYBUlEcJvUw/jcLAaONXbOAmNIjW4vi P6GrqLGF0fiJtDie/deTIFH/2aRf599DGDYafOwR7hy3LUI2XERBUCDEVlYvJYyufsU+ 5wwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731544760; x=1732149560; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ui92GwdJMLYEeuUgce0xtSp+y54z2Swd5voIaLF3cpw=; b=FUQe9HHUGJ5DX1Nks2mweJssUIDUQl/lqCkdJ9Rgpbo3oOxPiywU4VgcDQFKhm42YB jbcQhE4M9PtD7ZD707LdfS+LdZPe2WXGgRdCf948PC25jbqOCb39g3mMktimU3imJsbB 9Bx/pTUYq8RR/MBw9UTXmAqhJ8+uiP4N/CISxXA3hwWdlxEhpNsXZec9PIKviey3k1yT DWlX12Jfar2x5nQCo9C5lwxcsDy5E2+mxQpQg6+RQQ/em3vjz/Y0SexqMKZNEAsrOsKQ np/CRO9uGdsuTlAVH+aXQLngzrG8C9Z8LMSDC58+DEoG+1CvWfciG7LQXYqvTv4kf5sJ +PKQ== X-Forwarded-Encrypted: i=1; AJvYcCVE9H7hp8Q/ZqG9NIfal5Mvd1yL0JMsU2idFB+yI7G4nU6oAoFi2szicgaV02eTtyh4C1lsNqzbnQ==@kvack.org X-Gm-Message-State: AOJu0YwOCzNzzI0CDljaf70ZbttKGMtBTNottfI/MUZvGrf/atx5Myuq ra/LnH6t7EozPTgmKN4wwglZ4zl1c1KeICFfon3ONeuPKZPFTGfuuKPJ5ZqyJfKHoBHxnfw0VyK BXe4d4dnFML3QtIPEorAaGNs6oaE= X-Google-Smtp-Source: AGHT+IEFBXoeA3UddAD4FhSSfshE8uACpQYHlp1nJYMICNz3NabjlaCML/zHdAQgVLMjzCFG5b61laABmgl8G6dWPuY= X-Received: by 2002:a05:622a:18a4:b0:462:f5ca:5b06 with SMTP id d75a77b69052e-4630930ed42mr321207931cf.3.1731544760444; Wed, 13 Nov 2024 16:39:20 -0800 (PST) MIME-Version: 1.0 References: <20241107235614.3637221-1-joannelkoong@gmail.com> <20241107235614.3637221-7-joannelkoong@gmail.com> In-Reply-To: From: Joanne Koong Date: Wed, 13 Nov 2024 16:39:09 -0800 Message-ID: Subject: Re: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, shakeel.butt@linux.dev, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0357E140010 X-Stat-Signature: hxs3ps93rao1dpn4m88zmoqyq7u38j7b X-Rspam-User: X-HE-Tag: 1731544732-470645 X-HE-Meta: U2FsdGVkX19Bx+mlFjYRZ/w97LuJDQ3IBQ+uUPFMzWkqtoLHb89FovWj1HsWXp0hCpHLkHvRTAc/NfEEgEMYgpSJI+XoQyL4sDCoQfJUxlwdFLE05usHRqustiLhPwBY/GfYOhXAnEKRYqBJ8Ztv9/n66fDsA0msKvJWMVXSEuThcSd0g+iL52bj52cYrSuYdaKNAntc5JjCzTxTVfY17b47yiTBwA2wHKlZtiE5hN7rV7zLcZZAMIgILxSRyYj8Xs7FtCmmmL4niDTjkTE5e6TOCM4M5fxEzjprOYw7VCG2UJqQPW1vHj04CM5AlTgRxvepRKHVbuqG4jYx6AB3DG9wn4QlilL8C5APZShWh4Ej15Axi3UqAeIny+L+LfFDCHae09hpXYZ9R86S4NZI0AnXlVlbn77s1irMj1W21dRsyd3gERK2rP61lKfsIX5PYHZm5hsMi6QJG3roS+pnIQzamoN+D1Zs/4pad4p03RhCHa9sTiInOpud5K2k1bm7LWksQiR0C7v/hs4DbM3oL9Xl6WlN8BGCjrarWXsDC+MXjrxWsnIe+vkrIhrvwqG/pj3+4UzrCfdvF+oxLLdfEDk06cggT7GIEoau4HNk12b/Vd+3/P0lk/RcdeEWEI1fBIEC9smjY81gCf/ONqgJnHaOW6l+6PymIi0cehs+IsiSXxDMB3ca/QycgYFtyxSOiNvW+P+4WnpQbl4v4lMeSowvZP3T4k1u8XTrTIatMiPZQfkAl8TKv74rI5bqDgnE+oL04hWwESczYtg31wdvmU46Jhp6eWFxpnJCyeLK8R8GbeiWpad8oOfnZy/TSz4du4VBsOIVfd9d7M7xmYMwg+UAX/XhKJfFSISklS35FNsNfDtYwKw7ukCG1WTyOpsA/pEZdDpP05YyIk0P4R8FbPFVqj7xY4/yciGeOh77so03Mre/Vm8ZnNtWaT+c4/f+IBalXdQeO4PGvEFIZn3 llYIy4SV z1xPu95zK1gM7R6zipzwZDirnfgINnddL3HQJ4LV8nM2GRo8D70M4lrvPP+prgx/tcThQ7srxE4CoO5zQnRI+uy+T4/6/eoE5g7afcXQx5SKzTBn/bj64SJJ+pxaRzJmJoQPCN20XqXxRCGexa83nbgjW+JuoGjZqkqEfLSZQS/DVmullR8zBsFzsHBacegq8r0TWPjGykctNhBZv5+pnsVmHR7YAVTNRi03Jd/ZoB9wGqx8k4U5JRGiCZM7XOs2W/KOrJRoiyQBp7f1/b6dsSLMiJyyVwknCk1qZEs224ums5ht6XL32QZbqdzlc4jxHgeZW8PFdBmSWX6JDtNZEvWNohucgSFsIUDNS8HQxL2D4IvhCBvkEkS8JgA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 1:25=E2=80=AFAM Jingbo Xu wrote: > > Hi Joanne, > > On 11/8/24 7:56 AM, Joanne Koong wrote: > > Currently, we allocate and copy data to a temporary folio when > > handling writeback in order to mitigate the following deadlock scenario > > that may arise if reclaim waits on writeback to complete: > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > To work around this, we allocate a temporary folio and copy over the > > original folio to the temporary folio so that writeback can be > > immediately cleared on the original folio. This additionally requires u= s > > to maintain an internal rb tree to keep track of writeback state on the > > temporary folios. > > > > A recent change prevents reclaim logic from waiting on writeback for > > folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it. > > This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which > > will prevent FUSE folios from running into the reclaim deadlock describ= ed > > above) and removes the temporary folio + extra copying and the internal > > rb tree. > > > > fio benchmarks -- > > (using averages observed from 10 runs, throwing away outliers) > > > > Setup: > > sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=3D4= -o source=3D~/tmp_mount ~/fuse_mount > > > > fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1k,4k,1M}= --size=3D2G > > --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --directory=3D/roo= t/fuse_mount > > > > bs =3D 1k 4k 1M > > Before 351 MiB/s 1818 MiB/s 1851 MiB/s > > After 341 MiB/s 2246 MiB/s 2685 MiB/s > > % diff -3% 23% 45% > > > > Signed-off-by: Joanne Koong > > I think there are some places checking or waiting for writeback could be > reconsidered if they are still needed or not after we dropping the temp > page design. > > As they are inherited from the original implementation, at least they > are harmless. I think they could be remained in this patch, and could > be cleaned up later if really needed. > Thank you for the thorough inspection! > > > @@ -891,7 +813,7 @@ static int fuse_do_readfolio(struct file *file, str= uct folio *folio) > > * have writeback that extends beyond the lifetime of the folio. = So > > * make sure we read a properly synced folio. > > */ > > - fuse_wait_on_folio_writeback(inode, folio); > > + folio_wait_writeback(folio); > > I doubt if wait-on-writeback is needed here, as now page cache won't be > freed until the writeback IO completes. > > The routine attempts to free page cache, e.g. invalidate_inode_pages2() > (generally called by distributed filesystems when the file content has > been modified from remote) or truncate_inode_pages() (called from > truncate(2) or inode eviction) will wait for writeback completion (if > any) before freeing page. > > Thus I don't think there's any possibility that .read_folio() or > .readahead() will be called when the writeback has not completed. > Great point. I'll remove this line and the comment above it too. > > > @@ -1172,7 +1093,7 @@ static ssize_t fuse_send_write_pages(struct fuse_= io_args *ia, > > int err; > > > > for (i =3D 0; i < ap->num_folios; i++) > > - fuse_wait_on_folio_writeback(inode, ap->folios[i]); > > + folio_wait_writeback(ap->folios[i]); > > Ditto. > Why did we need this fuse_wait_on_folio_writeback() even when we had the temp pages? If I'm understanding it correctly, fuse_send_write_pages() is only called for the writethrough case (by fuse_perform_write()), so these folios would never even be under writeback, no? > > > > static void fuse_writepage_args_page_fill(struct fuse_writepage_args *= wpa, struct folio *folio, > > - struct folio *tmp_folio, uint32= _t folio_index) > > + uint32_t folio_index) > > { > > struct inode *inode =3D folio->mapping->host; > > struct fuse_args_pages *ap =3D &wpa->ia.ap; > > > > - folio_copy(tmp_folio, folio); > > - > > - ap->folios[folio_index] =3D tmp_folio; > > + folio_get(folio); > > I still think this folio_get() here is harmless but redundant. > > Ditto page cache won't be freed before writeback completes. > > Besides, other .writepages() implementaions e.g. iomap_writepages() also > doen't get the refcount when constructing the writeback IO. > Point taken. I'll remove this then, since other .writepages() also don't obtain a refcount. > > > @@ -2481,7 +2200,7 @@ static int fuse_write_begin(struct file *file, st= ruct address_space *mapping, > > if (IS_ERR(folio)) > > goto error; > > > > - fuse_wait_on_page_writeback(mapping->host, folio->index); > > + folio_wait_writeback(folio); > > I also doubt if wait_on_writeback() is needed here, as now there won't > be duplicate writeback IOs for the same page. What prevents there from being a duplicate writeback write for the same page? This is the path I'm looking at: ksys_write() vfs_write() new_sync_write() op->write_iter() fuse_file_write_iter() fuse_cache_write_iter() generic_file_write_iter() __generic_file_write_iter() generic_perform_write() op->write_begin() fuse_write_begin() but I'm not seeing where there's anything that prevents a duplicate write from happening. > > > > @@ -2545,13 +2264,11 @@ static int fuse_launder_folio(struct folio *fol= io) > > { > > int err =3D 0; > > if (folio_clear_dirty_for_io(folio)) { > > - struct inode *inode =3D folio->mapping->host; > > - > > /* Serialize with pending writeback for the same page */ > > - fuse_wait_on_page_writeback(inode, folio->index); > > + folio_wait_writeback(folio); > > I think folio_wait_writeback() is unneeded after dropping the temp page > copying. This is introduced in commit > 3993382bb3198cc5e263c3519418e716bd57b056 ("fuse: launder page should > wait for page writeback") since .launder_page() could be called when the > previous writeback of the same page has not completed yet. Since now we > won't clear PG_writeback until the writeback completes, .launder_page() > won't be called on the same page when the corresponding writeback IO is > still inflight. > Nice catch, I'll remove this in v4. Thanks, Joanne > > -- > Thanks, > Jingbo