From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEFA5D3ABF4 for ; Mon, 11 Nov 2024 21:30:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F5B78D0010; Mon, 11 Nov 2024 16:30:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 480408D0001; Mon, 11 Nov 2024 16:30:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D2748D0010; Mon, 11 Nov 2024 16:30:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0B0BC8D0001 for ; Mon, 11 Nov 2024 16:30:51 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A7D09121B67 for ; Mon, 11 Nov 2024 21:30:50 +0000 (UTC) X-FDA: 82775108346.11.DDBDE75 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf17.hostedemail.com (Postfix) with ESMTP id 5E49140007 for ; Mon, 11 Nov 2024 21:30:17 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="RIAFqqi/"; spf=pass (imf17.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731360416; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lmpu3R1opbEF6p+UWXlMJZk0n+2FpvEnZ1gS0fp686Q=; b=FmPUIukYB0wtfMUHsWPVrd1krn4bkmMulvHnDrukbhrJGpmHz9kTBzETKusq/mKLOQfyyL 7o5oZ1ESfr7i522r/QHeDqByZNSpS5vM2h78Dp3UHngc3c53UHefv0gNhhind9WD0nTSuT VnKRBQSdl8xGc6g/qSJ4kyX+nY9LF5w= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="RIAFqqi/"; spf=pass (imf17.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731360416; a=rsa-sha256; cv=none; b=awbfKp4gF06OQ46YPcqcW8l4aJ0AaNhxCnS1F/20Wp/JUA7VfqRV2CJxpiZYu91keppbdC cZa4QVIM7FWpK8cq6P9IY3TgkKM0c7b19WYmIAitlPfj28asxXmY24sWrZbNW3GTwYo4PX ki17N8l1n6wTQzeu4Gx2lIOQfgmzHds= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-4609967ab7eso36585241cf.3 for ; Mon, 11 Nov 2024 13:30:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731360648; x=1731965448; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lmpu3R1opbEF6p+UWXlMJZk0n+2FpvEnZ1gS0fp686Q=; b=RIAFqqi/PkyGJ4sC82ul9hbqosycTtGppPP+9ZA/frL1SigEtlTiWTx/Vu967xJxd1 iksFw0LtsnlMAuhDjubARmEHXfLKAHIm2t+CfGFdcmFtC12n4s48PjMNDD7cJhDGQptJ 28knxe1J0EFkPt+FKnmud7Sp+ao9xsW4VDJ8pbniUbEvvdO0bXP/YMVC6zKOorle4tmb 5xuiVyWfKb78GUn3h37f4Evof5HQ47cOoiK5OVutGRxZm85VU38R9XOdTUbQndzR3Bsq R6N1oEr/dYCO2Z7CTt35YUSvJxEaJ0po5M7U/TuYoS1mQEwSs+FVppvrQfTRl+Hdc8DE uhXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731360648; x=1731965448; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lmpu3R1opbEF6p+UWXlMJZk0n+2FpvEnZ1gS0fp686Q=; b=oDh1fu3MDkkkWFyyVwwoHLx1hUwCyS006PwK+XKCQr07amaVx/RUqtCe1FbIloA1yj 4pAVD3+vxu0YPregCnBw8SYyCVwXbhma5Ntz5zJ1epkCjdu/pw4/Yklfe3DYrmJGdI4j v0lWnCyx32L7IIHW1dQnRvUlPklGSpFBMyDEg+xT6i+mN9AVnQiCgmpYNh56KOzzUv2b jdtX5z1TUeznyJsbtyp0NJGI4zL1/+SxLRqD+ez98A05T0P3oav0ryrJzeUkAXs3Izm2 xj+cCb9x+FnqEB/Bc13KuANHkecnHplgNL0Rx3gMXHDeSbeAbJuSWIUUqdbF42NVDjJd 2+Pw== X-Forwarded-Encrypted: i=1; AJvYcCUjiBvaEmTBXSbg7VrZwG1p1J63/NtpzBfnEO4N2ot7SXMto2Mc/F0W2qLYFUeEKKGV9518lhVwKA==@kvack.org X-Gm-Message-State: AOJu0Yw3SPywF3b9pFQ7ux79r+p6FAWCmjQnCjVa+e8G/eg7doOdNvVA U88f9s1q4D6yaOtfygpEfjn/Y1irKGSal/upeqnI2hZzOtJJ7Nq72ebtrMuGtwjjcnAMBajogmR 4m8kzlvPUJc+4Al8vPRzOYMuZJAI= X-Google-Smtp-Source: AGHT+IHV/JyqMtFI4SIoKeOeebYTCGXh/AlqJqvdhixK6R1v5aKusI5M8c2blSVBEvX8DSv1bO5NqGoyeqca8p0QAI0= X-Received: by 2002:a05:622a:138e:b0:460:aa51:840a with SMTP id d75a77b69052e-46309430898mr222924151cf.45.1731360647815; Mon, 11 Nov 2024 13:30:47 -0800 (PST) MIME-Version: 1.0 References: <20241107235614.3637221-1-joannelkoong@gmail.com> <20241107235614.3637221-7-joannelkoong@gmail.com> <9c0dbdac-0aed-467c-86c7-5b9a9f96d89d@linux.alibaba.com> In-Reply-To: <9c0dbdac-0aed-467c-86c7-5b9a9f96d89d@linux.alibaba.com> From: Joanne Koong Date: Mon, 11 Nov 2024 13:30:37 -0800 Message-ID: Subject: Re: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree To: Jingbo Xu Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, shakeel.butt@linux.dev, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5E49140007 X-Stat-Signature: 3djdutwa7yrjotno16myg6r9b358we5z X-Rspam-User: X-HE-Tag: 1731360617-160569 X-HE-Meta: U2FsdGVkX1/7XT5CjaO5jq3jscw0oAMbLx3oKkYsKUuTjn2HyA1wBG9GODU24Kup3MjwR+FcypX0KBlCnaObXF78UnCpEYwVrEJV34JlguV2WwULxKTLA82GCObCFr11lKr2tBkxSoH+xmHs22O/W1uiTwvVsx3Mg9cc3PJo6WzsjwSjtSx3mtoa7PmpCERMsJGug3+kh3feqKyeZEAOgRSe5k981xzfzH7B37ZWrbDQheh80jwJheNHj52hILanDr3YiLF4M9tuxhNhysbJvXc7eJ34XXuEmMVSI9VdRfGDvCqMVY11PGrB6d9r+ZosI19zoM78KOTtDpaiKnsoYxBISywrelSms5sxTpaA8uefGjAgviSvZdcF6YvViX7tclF6NwbZm0NFTvjnfpuiEMjOAwM9pQnEbdzJ74zh6hmhwZ31Nt8eSeaXD6V1cd2+TPTFERbdv/VpXwDEMllFB2ThCrD/+TKOqkte65nRqlK/XKzVJRA0E/cT5DgwTWdsVAUqrgseQbQnp+CJHDVXicE+bHsvdz4oaDUxRFVuIyhNsxiWkzKZ1yJyc/mFQ7fG+lA03D4nvMlmqO/F+5Tvg8UZCDk9WBbedozwKkU2F17QcDq9EoHJToRQRUekrq6XtpT83wrnU72yHYI6TcVfBVt+sVClCOMsWl9FuPDVuo1SiAtfnQ6/Neue4xd1XV4tO6gqniNJBIB154218REoOLzEzRGg59F79JdOqpwSadI3V9nQawW+4vbXb4wn7iGsrblSDGlghedc/AfVzLeBxO9NJCa99twjPUOO0jIlgI++d4EChzvtd6LVdiF1+gefWgAWJmjxHT/44TxN/xq+bxTzPE31xlX5NJ1ghoLys+XPmMhIeF5VWurvh0cM4jUvW/p26ZBHgqQ9QWzGWCbNV9yt7lzUC5eO7cdsD0G8fIUBQV70h6DCtVTDY05K+MnTTDyOjIbmMm86aXNLFED 7rt9igXz if9kXc0XOHtyIBNmqmX6763zchpVF3jMtnF5JM5a7+gNHxKIbau6wcWYxigGrQSpGvNYEkTqsYjQkJaA4eG5sEcYyeAb18epS+qQB5BBHVS4bNh1NuTSTsxdrwgjx7hERQ1v/3xAsh9JkUAuMvmHYDBO9UDhzrwMiRWrdBIfmYGmmtLkr+yX4b6OtjQR8ZtWVrQkwC2cycFvmOJp8vAmEDS+ZpbG4mmmpAwUFnOQLydi5eIwnLJadav7h7BFoP51xeJy8Od8Z5kxLDNv1pk1O0YL6dxh9ac4kr3cabFX8YnBPgOQLP0BSeptwT//b8TxWykcehNWHMqh+jj8aHrtl8O77cePTIpHDH92hP81bgtFsbQbqEu8beCUViQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000098, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 11, 2024 at 12:32=E2=80=AFAM Jingbo Xu wrote: > > Hi, Joanne and Miklos, > > On 11/8/24 7:56 AM, Joanne Koong wrote: > > Currently, we allocate and copy data to a temporary folio when > > handling writeback in order to mitigate the following deadlock scenario > > that may arise if reclaim waits on writeback to complete: > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > To work around this, we allocate a temporary folio and copy over the > > original folio to the temporary folio so that writeback can be > > immediately cleared on the original folio. This additionally requires u= s > > to maintain an internal rb tree to keep track of writeback state on the > > temporary folios. > > > > A recent change prevents reclaim logic from waiting on writeback for > > folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it. > > This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which > > will prevent FUSE folios from running into the reclaim deadlock describ= ed > > above) and removes the temporary folio + extra copying and the internal > > rb tree. > > > > fio benchmarks -- > > (using averages observed from 10 runs, throwing away outliers) > > > > Setup: > > sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=3D4= -o source=3D~/tmp_mount ~/fuse_mount > > > > fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1k,4k,1M}= --size=3D2G > > --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --directory=3D/roo= t/fuse_mount > > > > bs =3D 1k 4k 1M > > Before 351 MiB/s 1818 MiB/s 1851 MiB/s > > After 341 MiB/s 2246 MiB/s 2685 MiB/s > > % diff -3% 23% 45% > > > > Signed-off-by: Joanne Koong > > > IIUC this patch seems to break commit > 8b284dc47291daf72fe300e1138a2e7ed56f38ab ("fuse: writepages: handle same > page rewrites"). > Interesting! My understanding was that we only needed that commit because we were clearing writeback on the original folio before writeback had actually finished. Now that folio writeback state is accounted for normally (eg through writeback being set/cleared on the original folio), does the folio_wait_writeback() call we do in fuse_page_mkwrite() not mitigate this? > > - /* > > - * Being under writeback is unlikely but possible. For example d= irect > > - * read to an mmaped fuse file will set the page dirty twice; onc= e when > > - * the pages are faulted with get_user_pages(), and then after th= e read > > - * completed. > > - */ > > In short, the target scenario is like: > > ``` > # open a fuse file and mmap > fd1 =3D open("fuse-file-path", ...) > uaddr =3D mmap(fd1, ...) > > # DIRECT read to the mmaped fuse file > fd2 =3D open("ext4-file-path", O_DIRECT, ...) > read(fd2, uaddr, ...) > # get_user_pages() of uaddr, and triggers faultin > # a_ops->dirty_folio() <--- mark PG_dirty > > # when DIRECT IO completed: > # a_ops->dirty_folio() <--- mark PG_dirty If you have the direct io function call stack at hand, could you point me to the function where the direct io completion marks this folio as dirty? > ``` > > The auxiliary write request list was introduced to fix this. > > I'm not sure if there's an alternative other than the auxiliary list to > fix it, e.g. calling folio_wait_writeback() in a_ops->dirty_folio() so > that the same folio won't get dirtied when the writeback has not > completed yet? > I'm curious how other filesystems solve for this - this seems like a generic situation other filesystems would run into as well. Thanks, Joanne > > > -- > Thanks, > Jingbo