From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AC0FD5E39A for ; Mon, 11 Nov 2024 08:32:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D88A46B0089; Mon, 11 Nov 2024 03:32:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D38A36B008A; Mon, 11 Nov 2024 03:32:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4E116B008C; Mon, 11 Nov 2024 03:32:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A70F66B0089 for ; Mon, 11 Nov 2024 03:32:32 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 56DD381778 for ; Mon, 11 Nov 2024 08:32:32 +0000 (UTC) X-FDA: 82773147366.14.6E0110D Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by imf26.hostedemail.com (Postfix) with ESMTP id 707AC14000F for ; Mon, 11 Nov 2024 08:31:57 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Ed3uhBSy; spf=pass (imf26.hostedemail.com: domain of jefflexu@linux.alibaba.com designates 115.124.30.111 as permitted sender) smtp.mailfrom=jefflexu@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731313863; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WZAMxPae5+8nJ4E38SLlG5upXOWFiLLeZBjopoTsqEk=; b=5Bort0x5I4uMdwdiuYsTg0qx8dvjDqhPxpudW9QnvJyphzCa2iicdUrI57gAfotKaF7fp1 xLf9gBIlyBMYmhQGQkp4gnYCd48gcDdXRT9WpSe0OAjtB25pXQnqRIxqe2yNCFq6SkyL6k 2RW1OZgnUrThZ88CaeC990ujTM00ZSo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731313863; a=rsa-sha256; cv=none; b=zkrhY0ejJGZ0w8FwpM5UYhEmIOrorHvNlqEKOHlwvcphMOL6xEM774fLOEKuo90fFhreT1 W4hd5UuMG1wLkFwAbheCZ1PIwUUqEXFUhhIChAJG/eQ5AfV2QCDI/W+LM6TlxCONXYH46I EQXFOaUbZwn8ZHJInV11TN+Ed2MM5do= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Ed3uhBSy; spf=pass (imf26.hostedemail.com: domain of jefflexu@linux.alibaba.com designates 115.124.30.111 as permitted sender) smtp.mailfrom=jefflexu@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1731313943; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=WZAMxPae5+8nJ4E38SLlG5upXOWFiLLeZBjopoTsqEk=; b=Ed3uhBSyfilWjZovTxEn4eQ0BQxp4GH//S4KwoS1NIRpdTfwlDKHuviRrzaQGJg6Gp8dV/Zbf2iLwwLitddD3r9zQIgoU0jA1DuF2++EJ78jS7BGeJ3h+YO4LmEthM2nMpM/6OnYf/kU7UM+hyUoBCjgJT4DeTWyIgcZrG5OKxQ= Received: from 30.221.145.166(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0WJ7teQf_1731313941 cluster:ay36) by smtp.aliyun-inc.com; Mon, 11 Nov 2024 16:32:22 +0800 Message-ID: <9c0dbdac-0aed-467c-86c7-5b9a9f96d89d@linux.alibaba.com> Date: Mon, 11 Nov 2024 16:32:20 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree To: Joanne Koong , miklos@szeredi.hu, linux-fsdevel@vger.kernel.org Cc: shakeel.butt@linux.dev, josef@toxicpanda.com, linux-mm@kvack.org, bernd.schubert@fastmail.fm, kernel-team@meta.com References: <20241107235614.3637221-1-joannelkoong@gmail.com> <20241107235614.3637221-7-joannelkoong@gmail.com> Content-Language: en-US From: Jingbo Xu In-Reply-To: <20241107235614.3637221-7-joannelkoong@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Stat-Signature: do8yxooj1mindgyfrepz4zgoq4fj7b8t X-Rspamd-Queue-Id: 707AC14000F X-Rspam-User: X-HE-Tag: 1731313917-775721 X-HE-Meta: U2FsdGVkX19vh/ZX+R+Fs7uTmkH0y0RL54k1DzCpqfCx//tzPBJpyzr2E7pLA+tqlRVRkMlHx/87B0cs58JAkw2CR7KUd/8yfwIjLhWnJh9FBQ8N79poCALHt2ZJxkCjgrYgQtsRlk6WIX8LfryO6QbS/lS6vt9Szo0CtjTlJ3zpEqOhyUIVLqIYjAPZKzjQWKXchfNFV+QG7yQie9MMQHTHpu/D8X9ZwA+burmR5WMqF2hGgaah04Sqc4SmdHW3HDO2ZBojqVSm4xpBQSNXDne7TQgphmJpYoWKuoLUykIlSDGQ1n/TXE2Nse69GLuFnXHL4Y00zmTtXszUrivz/xPCpTRnglWptIcoXm0EgFmEQW/Uu4p2YeW8yeiXnWeglolXghgqCS3KVQoRMTBK9muxj/XB6PCNHjtBSfGcFLlF4+zlQDkK8aFYmTd1iOjjJjhC/WKtYWAl/HfYJUbQZ3e0JQK0Ef3lvNyH92WMtQbNRn4pKSQbHCAztog9wnyVeRQA16d6NEa42CyGU3miAgYl++Zxx18TZDnQnQ152Baw6fpY2XzwghkdoUZrr9KXPFVjCjjM1NX/b+AzS3z9JVSSHBpOewGNYWcx0QpkRswrc5UqiptciZFplb4QP8nnNrIvZtpElZQ23Mv36JlXzDuvZUGGom6yKyE/sib9Dk7K3xjx/vYhX6NH6r5K88ab0GlLsjedh9c7Ns5PcfNjHhmyKBUcAlUrZPctrVDYFYzFJsDK7Gn+q8nSzA9hHiY0rLQd3YTD3NxMwnSyvaL3BkvuTBdB+sPK+TKO2EwyTJcweNkC7w16E8BLDEgEvtmcHbx+yQC2EeLznyijQgsmCf1yO9Jdva/LzzxfHe8yIQBjjyC50PX2dwFyf9hpVLUKOCEYyRpAZ4XrinjVVL92ALHJo6zRkSu2Arn6LtrxYnGnnGz1DQg7iHSj+SOWdlhXHU6Oc65wItYZbRcUR27 QVFNJulk Y6LvWdFDeweLNmvDpl18YPHuhFs6kKihxM69Z2jjUIDkCPAwJyDjdz5N+w1eKSjj6Hz+tTlbaFVGfeAfwd6ss6jGdMIC9vN5pSyeNsxz3SVqNbUSzd55DQoXQz/neQbOPGlJmw/mid0I+c0fkyUMUSvIc761MLjm732YRRaKhd9vRentw3XUl+u5mJQIzuVpJwhaqWj4ch99MbzT1ajEAeFJUdXb7tw+Tl7aFeoYg+EZcyzuoqQJysF3K6Y5MuSCyAmTnjwPq9c49QENHbw/cGE9seA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Joanne and Miklos, On 11/8/24 7:56 AM, Joanne Koong wrote: > Currently, we allocate and copy data to a temporary folio when > handling writeback in order to mitigate the following deadlock scenario > that may arise if reclaim waits on writeback to complete: > * single-threaded FUSE server is in the middle of handling a request > that needs a memory allocation > * memory allocation triggers direct reclaim > * direct reclaim waits on a folio under writeback > * the FUSE server can't write back the folio since it's stuck in > direct reclaim > > To work around this, we allocate a temporary folio and copy over the > original folio to the temporary folio so that writeback can be > immediately cleared on the original folio. This additionally requires us > to maintain an internal rb tree to keep track of writeback state on the > temporary folios. > > A recent change prevents reclaim logic from waiting on writeback for > folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it. > This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which > will prevent FUSE folios from running into the reclaim deadlock described > above) and removes the temporary folio + extra copying and the internal > rb tree. > > fio benchmarks -- > (using averages observed from 10 runs, throwing away outliers) > > Setup: > sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount > > fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G > --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount > > bs = 1k 4k 1M > Before 351 MiB/s 1818 MiB/s 1851 MiB/s > After 341 MiB/s 2246 MiB/s 2685 MiB/s > % diff -3% 23% 45% > > Signed-off-by: Joanne Koong IIUC this patch seems to break commit 8b284dc47291daf72fe300e1138a2e7ed56f38ab ("fuse: writepages: handle same page rewrites"). > - /* > - * Being under writeback is unlikely but possible. For example direct > - * read to an mmaped fuse file will set the page dirty twice; once when > - * the pages are faulted with get_user_pages(), and then after the read > - * completed. > - */ In short, the target scenario is like: ``` # open a fuse file and mmap fd1 = open("fuse-file-path", ...) uaddr = mmap(fd1, ...) # DIRECT read to the mmaped fuse file fd2 = open("ext4-file-path", O_DIRECT, ...) read(fd2, uaddr, ...) # get_user_pages() of uaddr, and triggers faultin # a_ops->dirty_folio() <--- mark PG_dirty # when DIRECT IO completed: # a_ops->dirty_folio() <--- mark PG_dirty ``` The auxiliary write request list was introduced to fix this. I'm not sure if there's an alternative other than the auxiliary list to fix it, e.g. calling folio_wait_writeback() in a_ops->dirty_folio() so that the same folio won't get dirtied when the writeback has not completed yet? -- Thanks, Jingbo