linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	shakeel.butt@linux.dev,  josef@toxicpanda.com,
	linux-mm@kvack.org, bernd.schubert@fastmail.fm,
	 kernel-team@meta.com
Subject: Re: [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree
Date: Fri, 8 Nov 2024 14:33:50 -0800	[thread overview]
Message-ID: <CAJnrk1YDUEkKRDetmX_5Dw4HkJ+qCkG5WCUHGvp-SWzCo4WLpA@mail.gmail.com> (raw)
In-Reply-To: <1b3a36fe-1f62-410c-97fa-d59e7385f683@linux.alibaba.com>

On Fri, Nov 8, 2024 at 12:49 AM Jingbo Xu <jefflexu@linux.alibaba.com> wrote:
>
> Hi, Joanne,
>
> Thanks for the continuing work!
>
> On 11/8/24 7:56 AM, Joanne Koong wrote:
> > Currently, we allocate and copy data to a temporary folio when
> > handling writeback in order to mitigate the following deadlock scenario
> > that may arise if reclaim waits on writeback to complete:
> > * single-threaded FUSE server is in the middle of handling a request
> >   that needs a memory allocation
> > * memory allocation triggers direct reclaim
> > * direct reclaim waits on a folio under writeback
> > * the FUSE server can't write back the folio since it's stuck in
> >   direct reclaim
> >
> > To work around this, we allocate a temporary folio and copy over the
> > original folio to the temporary folio so that writeback can be
> > immediately cleared on the original folio. This additionally requires us
> > to maintain an internal rb tree to keep track of writeback state on the
> > temporary folios.
> >
> > A recent change prevents reclaim logic from waiting on writeback for
> > folios whose mappings have the AS_WRITEBACK_MAY_BLOCK flag set in it.
> > This commit sets AS_WRITEBACK_MAY_BLOCK on FUSE inode mappings (which
> > will prevent FUSE folios from running into the reclaim deadlock described
> > above) and removes the temporary folio + extra copying and the internal
> > rb tree.
> >
> > fio benchmarks --
> > (using averages observed from 10 runs, throwing away outliers)
> >
> > Setup:
> > sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount
> >  ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount
> >
> > fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G
> > --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount
> >
> >         bs =  1k          4k            1M
> > Before  351 MiB/s     1818 MiB/s     1851 MiB/s
> > After   341 MiB/s     2246 MiB/s     2685 MiB/s
> > % diff        -3%          23%         45%
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
>
> > @@ -1622,7 +1543,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> >                       return res;
> >               }
> >       }
> > -     if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) {
> > +     if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + count - 1))) {
>
> filemap_range_has_writeback() is not equivalent to
> fuse_range_is_writeback(), as it will return true as long as there's any
> locked or dirty page?  I can't find an equivalent helper function at
> hand though.
>

Hi Jingbo,

I couldn't find an equivalent helper function either. My guess is that
filemap_range_has_writeback() returns true if the page is locked
because it doesn't have a way of determining if the page is dirty or
not (it seems like if a page is locked, then we can't read the
writeback bit on it) so it errs on the side of assuming yes.
For this case, it seems okay to me to use
filemap_range_has_writeback() because if we get back a false positive
(eg filemap_range_has_writeback() returns true when it's actually
false), the only cost is the overhead of an additional
fuse_sync_writes() call but fuse_sync_writes() will return immediately
from the wait().

>
>
> > @@ -3423,7 +3143,6 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
> >       fi->iocachectr = 0;
> >       init_waitqueue_head(&fi->page_waitq);
> >       init_waitqueue_head(&fi->direct_io_waitq);
> > -     fi->writepages = RB_ROOT;
>
> It seems that 'struct rb_root writepages' is not removed from fuse_inode
> structure.
>

Nice catch! I'll remove this from the fuse_inode struct in v5.

>
> Besides, I also looked through the former 5 patches and can't find any
> obvious errors at the very first glance.  Hopefully the MM guys could
> offer more professional reviews.
>

Thanks for looking through this code in this version and the past
versions of this patchset too. It's much appreciated!

> --
> Thanks,
> Jingbo


  reply	other threads:[~2024-11-08 22:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-07 23:56 [PATCH v4 0/6] fuse: remove temp page copies in writeback Joanne Koong
2024-11-07 23:56 ` [PATCH v4 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Joanne Koong
2024-11-09  0:10   ` Shakeel Butt
2024-11-11 21:11     ` Joanne Koong
2024-11-15 19:33       ` Joanne Koong
2024-11-15 20:17         ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Joanne Koong
2024-11-09  0:16   ` Shakeel Butt
2024-11-07 23:56 ` [PATCH v4 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Joanne Koong
2024-11-08 17:33   ` SeongJae Park
2024-11-08 18:56     ` David Hildenbrand
2024-11-08 19:00       ` David Hildenbrand
2024-11-08 21:27         ` Shakeel Butt
2024-11-08 21:42           ` Joanne Koong
2024-11-08 22:16             ` Shakeel Butt
2024-11-08 22:20               ` Joanne Koong
2024-11-08 21:59     ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 5/6] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-08  8:48   ` Jingbo Xu
2024-11-08 22:33     ` Joanne Koong [this message]
2024-11-11  8:32   ` Jingbo Xu
2024-11-11 21:30     ` Joanne Koong
2024-11-12  2:31       ` Jingbo Xu
2024-11-13 19:11         ` Joanne Koong
2024-11-12  9:25   ` Jingbo Xu
2024-11-14  0:39     ` Joanne Koong
2024-11-14  1:46       ` Jingbo Xu
2024-11-14 18:19         ` Joanne Koong
2024-11-15  2:18           ` Jingbo Xu
2024-11-15 18:29             ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJnrk1YDUEkKRDetmX_5Dw4HkJ+qCkG5WCUHGvp-SWzCo4WLpA@mail.gmail.com \
    --to=joannelkoong@gmail.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=jefflexu@linux.alibaba.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=shakeel.butt@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox