From: Joanne Koong <joannelkoong@gmail.com>
To: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
shakeel.butt@linux.dev, josef@toxicpanda.com,
linux-mm@kvack.org, bernd.schubert@fastmail.fm,
kernel-team@meta.com
Subject: Re: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree
Date: Mon, 11 Nov 2024 13:30:37 -0800 [thread overview]
Message-ID: <CAJnrk1YUPZhCUhGqu+bBngzrG-yCCRLZc7fiOfXQZ0dxCHJV8Q@mail.gmail.com> (raw)
In-Reply-To: <9c0dbdac-0aed-467c-86c7-5b9a9f96d89d@linux.alibaba.com>
On Mon, Nov 11, 2024 at 12:32 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
> Hi, Joanne and Miklos,
>
> On 11/8/24 7:56 AM, Joanne Koong wrote:
> > Currently, we allocate and copy data to a temporary folio when
> > handling writeback in order to mitigate the following deadlock scenario
> > that may arise if reclaim waits on writeback to complete:
> > * single-threaded FUSE server is in the middle of handling a request
> > that needs a memory allocation
> > * memory allocation triggers direct reclaim
> > * direct reclaim waits on a folio under writeback
> > * the FUSE server can't write back the folio since it's stuck in
> > direct reclaim
> >
> > To work around this, we allocate a temporary folio and copy over the
> > original folio to the temporary folio so that writeback can be
> > immediately cleared on the original folio. This additionally requires us
> > to maintain an internal rb tree to keep track of writeback state on the
> > temporary folios.
> >
> > A recent change prevents reclaim logic from waiting on writeback for
> > folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it.
> > This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which
> > will prevent FUSE folios from running into the reclaim deadlock described
> > above) and removes the temporary folio + extra copying and the internal
> > rb tree.
> >
> > fio benchmarks --
> > (using averages observed from 10 runs, throwing away outliers)
> >
> > Setup:
> > sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount
> > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount
> >
> > fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G
> > --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount
> >
> > bs = 1k 4k 1M
> > Before 351 MiB/s 1818 MiB/s 1851 MiB/s
> > After 341 MiB/s 2246 MiB/s 2685 MiB/s
> > % diff -3% 23% 45%
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
>
> IIUC this patch seems to break commit
> 8b284dc47291daf72fe300e1138a2e7ed56f38ab ("fuse: writepages: handle same
> page rewrites").
>
Interesting! My understanding was that we only needed that commit
because we were clearing writeback on the original folio before
writeback had actually finished.
Now that folio writeback state is accounted for normally (eg through
writeback being set/cleared on the original folio), does the
folio_wait_writeback() call we do in fuse_page_mkwrite() not mitigate
this?
> > - /*
> > - * Being under writeback is unlikely but possible. For example direct
> > - * read to an mmaped fuse file will set the page dirty twice; once when
> > - * the pages are faulted with get_user_pages(), and then after the read
> > - * completed.
> > - */
>
> In short, the target scenario is like:
>
> ```
> # open a fuse file and mmap
> fd1 = open("fuse-file-path", ...)
> uaddr = mmap(fd1, ...)
>
> # DIRECT read to the mmaped fuse file
> fd2 = open("ext4-file-path", O_DIRECT, ...)
> read(fd2, uaddr, ...)
> # get_user_pages() of uaddr, and triggers faultin
> # a_ops->dirty_folio() <--- mark PG_dirty
>
> # when DIRECT IO completed:
> # a_ops->dirty_folio() <--- mark PG_dirty
If you have the direct io function call stack at hand, could you point
me to the function where the direct io completion marks this folio as
dirty?
> ```
>
> The auxiliary write request list was introduced to fix this.
>
> I'm not sure if there's an alternative other than the auxiliary list to
> fix it, e.g. calling folio_wait_writeback() in a_ops->dirty_folio() so
> that the same folio won't get dirtied when the writeback has not
> completed yet?
>
I'm curious how other filesystems solve for this - this seems like a
generic situation other filesystems would run into as well.
Thanks,
Joanne
>
>
> --
> Thanks,
> Jingbo
next prev parent reply other threads:[~2024-11-11 21:30 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-07 23:56 [PATCH v4 0/6] fuse: remove temp page copies in writeback Joanne Koong
2024-11-07 23:56 ` [PATCH v4 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Joanne Koong
2024-11-09 0:10 ` Shakeel Butt
2024-11-11 21:11 ` Joanne Koong
2024-11-15 19:33 ` Joanne Koong
2024-11-15 20:17 ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Joanne Koong
2024-11-09 0:16 ` Shakeel Butt
2024-11-07 23:56 ` [PATCH v4 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Joanne Koong
2024-11-08 17:33 ` SeongJae Park
2024-11-08 18:56 ` David Hildenbrand
2024-11-08 19:00 ` David Hildenbrand
2024-11-08 21:27 ` Shakeel Butt
2024-11-08 21:42 ` Joanne Koong
2024-11-08 22:16 ` Shakeel Butt
2024-11-08 22:20 ` Joanne Koong
2024-11-08 21:59 ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 5/6] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-08 8:48 ` Jingbo Xu
2024-11-08 22:33 ` Joanne Koong
2024-11-11 8:32 ` Jingbo Xu
2024-11-11 21:30 ` Joanne Koong [this message]
2024-11-12 2:31 ` Jingbo Xu
2024-11-13 19:11 ` Joanne Koong
2024-11-12 9:25 ` Jingbo Xu
2024-11-14 0:39 ` Joanne Koong
2024-11-14 1:46 ` Jingbo Xu
2024-11-14 18:19 ` Joanne Koong
2024-11-15 2:18 ` Jingbo Xu
2024-11-15 18:29 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJnrk1YUPZhCUhGqu+bBngzrG-yCCRLZc7fiOfXQZ0dxCHJV8Q@mail.gmail.com \
--to=joannelkoong@gmail.com \
--cc=bernd.schubert@fastmail.fm \
--cc=jefflexu@linux.alibaba.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox