From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FEC6D18133 for ; Mon, 14 Oct 2024 17:24:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C7076B0085; Mon, 14 Oct 2024 13:24:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 84F7B6B0088; Mon, 14 Oct 2024 13:24:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C8626B0089; Mon, 14 Oct 2024 13:24:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 440F26B0085 for ; Mon, 14 Oct 2024 13:24:32 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7C3651A1131 for ; Mon, 14 Oct 2024 17:24:17 +0000 (UTC) X-FDA: 82672881858.18.558568B Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) by imf16.hostedemail.com (Postfix) with ESMTP id 7F55918000E for ; Mon, 14 Oct 2024 17:24:23 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UIWbGzYm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728926638; a=rsa-sha256; cv=none; b=RJRxlTOEXWQ//iarOmOHpOp5UXqZcwJLMDjnSA/uCax9WvdQ/tgAYD5MmWZSG3iyQKZZ09 XZJZuu4v80Htvp0YN7hSg9uhslrf8lvvqz8XeDwUHQzmQHSIfU4aC3a/UlN5IN6Jk0+4Lv ghSIj+zE7GPz3UGjTEoo5+bBDjW1x68= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UIWbGzYm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of joannelkoong@gmail.com designates 209.85.222.43 as permitted sender) smtp.mailfrom=joannelkoong@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728926638; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q9d6Lt5yhR8qSLdnfYruYJJ7zyIejDaH/bNnbV3qrIk=; b=FIakN2KrlbHK8csS10+VozBkdkShIoDq47E3LGKqQ9MmlAQkEJ0QhTgSbRB7+ja6VGB/C8 qGC4kvrdhX0dftePiV4HUkwTekvJ8OPa/QN6bsXN3rAW/yqjcTW8ygxiGYZnrd21/0G+gi mvrluRdHhOzNeNGw8bjycxzPwgKVG4s= Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-84fc7b58d4dso1209316241.0 for ; Mon, 14 Oct 2024 10:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728926669; x=1729531469; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Q9d6Lt5yhR8qSLdnfYruYJJ7zyIejDaH/bNnbV3qrIk=; b=UIWbGzYmCGCOHVnUNjRvLVhAlIcQ2L0SjYpiw0MrtM/C0E3PhorBU1U3p64FVYVEQJ xhtlfAUVCoEh1NQA3ZjDZNbSXqWHCiFsox94oVI4+pXoJ9vq0UFKJoui7grUQdwOm4GF DDhEgPujy+z2Qym5sB4CgsFHa9Q27wxcqXcRLvM3USwRXsH/mmkWLFiS1eih1MzmmNxC BErqIylCQP3a3oOp6ufZL3dRrE7InWGqIP2wx5V8CF8d6QGFeFi9ORYp6pBBf0MAD8sn jLzb5XLn6zqJ8mZvSmXIXraA6Z7kVT6Nog/rIlWhe41bUbyRKaXrl8W/BMT4icwBkK0k xkZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728926669; x=1729531469; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q9d6Lt5yhR8qSLdnfYruYJJ7zyIejDaH/bNnbV3qrIk=; b=dmcDLtf+GhgL+vHXzma3jkwocRaPQkZBe9ghVoqvGHkwQ4ZmmDoZULEKhQkhOUOUOk dKKJxo98fvx/+fGgoqPgyAfY2R7BjQP8MJPovTfnhwvufBg2s0y+nKYbRfLxNGt4UpvF ducXVHhwXwPYWYkXrTgd9VQOeUuEcLvHtjAI39FSDCGItjdTTuj3JUb2rlFYCu9pnqgM MdM92WVW/JFY6riN2kTMokdDXggZbjAGcCpsGUUvF2oXJTGTX4kyUzrRzW+198iN2kZ1 bfR37fZ2UQQ8kl3X33X16CEDtaNzNw06vWXzAkoURHKPZVltE2zVzdxxeXCHHlnl6fgn BSAQ== X-Forwarded-Encrypted: i=1; AJvYcCUdYc7ADZMzQktTYOAEhf1l6m0gz/ZAdrAe1OPJ+FoVOew9ikOjWqo7bvU6VjL1SBZ6nVvmnXC2KQ==@kvack.org X-Gm-Message-State: AOJu0YydeHF7Nt2yXa3goeIi94+/obuzMlMbXtZ6UQY3q504ERY4lU0X TWAQ4kyLmE6OlRowIxGRCp52dmGTydeB9+gViS1WeMGGCt9xhQl5QqNuUwGsb4eHugx2qAr4z9K LzJe5zHj9LU0adAjbTCRLueCGqM0S166y X-Google-Smtp-Source: AGHT+IEeeW+z6uzXKTojIbLml75slpZT7o6F/6LksIjdrdFMipXF82ot9LYjN2udV3XdgBrxCR+QJhCAGoYDVIK6pX4= X-Received: by 2002:a05:6102:a49:b0:4a4:79cf:be83 with SMTP id ada2fe7eead31-4a479cfc130mr4497825137.10.1728926668528; Mon, 14 Oct 2024 10:24:28 -0700 (PDT) MIME-Version: 1.0 References: <20241011223434.1307300-1-joannelkoong@gmail.com> <20241011223434.1307300-3-joannelkoong@gmail.com> In-Reply-To: From: Joanne Koong Date: Mon, 14 Oct 2024 10:24:17 -0700 Message-ID: Subject: Re: [PATCH 2/2] fuse: remove tmp folio for writebacks and internal rb tree To: Shakeel Butt Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org, josef@toxicpanda.com, bernd.schubert@fastmail.fm, jefflexu@linux.alibaba.com, hannes@cmpxchg.org, linux-mm@kvack.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 7rxyna54affjn9167ergxzpcd8xhxrn1 X-Rspamd-Queue-Id: 7F55918000E X-Rspamd-Server: rspam02 X-HE-Tag: 1728926663-971160 X-HE-Meta: U2FsdGVkX19BBz21Qw0f+I/oBdSrknVx0GJQ6MiKjsPLT1xCNzn20nSGCU4NuR4R0OP99BBoVLgU46QiPFC301m4+wKS0FtLW6+tjzZG033I3oqxAC0fyEn3S6aimW5ez3mUQADkZazw7CyN3LecRo3pj740pwUWiLsRySdhPi3mHk2g2Z27ucL8IMxv7dVnNPxNyx4K9JgID+5fBUwesCRWqkkFmMy82z+8fAhWsjz6LV+hT+I1wj8mtv01diO2oHn+Atc0gqbbqd2vrd8H1O2T8fM5mVgq/yC4NC3FgCG81FuOuu6zPHZSQq4/jcEx4Cc+TFsrs7/Ohrb7SEHlLfxVWRv6k+ywV8DdUcO5AC22cl3livxYEWJtO0xXrOuF7rySeLk9H81dD5DNURT1ESTFz2pwphp6KccSpRUq7Fml/tsQJL3/YAADQ6CwqjaeBRmqJGrf4FU/aWV6BAPWwTFrWntaOGXFs/JZ3ylCPnqLHBuPDvDW7S7W4MYSlC/oUgMhBNQO4zOCrOYMggm1T8rDyoH8F5qswPVZjUVWWDhxVZ+/AIyadbbe5G5ORI/k0my7ZePjOwYnSWtYY2Byi//8IMmpB8jSNfa1BacMt+ZofRwT4QhbZgULcWgFqA0Ar4B2X7EEdWu5nzyQbT5oBz1jAbv6HtYTPkbp7iaRUrOwgxM84DV98f+Fyt0d9WVhrh0j4l3pynKlOG5CenttR5An/fJK6rRnhVDo6ikrGJVNLUqoh+15ka5d+ID4veP2HHU6wHkYLOhKTynsx9HldWivDupi9e2hp9bUp0eVBYcSnIle2aJv8RHd7XQwUJC3zZK4v7jex8ST7xgiceEbJFjyVPQJUwIBMALGWji0QQrZz/pCCPcE2tLliQHVHSMAHziYSnsZaFckEBlhtjDjDkH/VA6P7MCiitpc/TF+tDKCgP7iDeW7tN1Dm3P3z9OGl5WVSesHgOXDNb2qTL3 L0llyOM+ wTV6hK9pBBzeJJm8amnGdu7UBCsR890GOGmZwtkKqwW6XL3SsBTjl4SjfzhCY1fgeSLx5Yx7M4EQqh5bPYIfkZq1Gh+wc4g/B6ccwt/UGAsEPRo8GbnwB/+WQxu1tjPUcK2fR4RFa/MksoX9XAg/sm5pzqgqg1hCv9ZVp2k3Po6ltbOEUOFw6GEFPkvYp2NOLhR0E61Z0JYhVJhR3r8RWVfD01rFsepPzPgJ6iVjqAjFtO1rq7p5JimaCQ8O8uiw19H/z2rsFr/VPCjUdLHX5IKcE9hqla7+1tJc4MTCR3aP75uOrEwt+NOF27onNpfAJZZjXAZCVHdiiToKL+vQH6lt35w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Oct 12, 2024 at 9:56=E2=80=AFPM Shakeel Butt wrote: > > On Fri, Oct 11, 2024 at 03:34:34PM GMT, Joanne Koong wrote: > > Currently, we allocate and copy data to a temporary folio when > > handling writeback in order to mitigate the following deadlock scenario > > that may arise if reclaim waits on writeback to complete: > > * single-threaded FUSE server is in the middle of handling a request > > that needs a memory allocation > > * memory allocation triggers direct reclaim > > * direct reclaim waits on a folio under writeback > > * the FUSE server can't write back the folio since it's stuck in > > direct reclaim > > > > To work around this, we allocate a temporary folio and copy over the > > original folio to the temporary folio so that writeback can be > > immediately cleared on the original folio. This additionally requires u= s > > to maintain an internal rb tree to keep track of writeback state on the > > temporary folios. > > > > This change sets the address space operations flag > > ASOP_NO_RECLAIM_IN_WRITEBACK so that FUSE folios are not reclaimed and > > I couldn't find where ASOP_NO_RECLAIM_IN_WRITEBACK is being set for > fuse. Agh this patch was from an earlier branch in my tree and I forgot to add this line when I patched it in: diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 9fee9f3062db..192b9f5f6b25 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3094,6 +3094,7 @@ static const struct address_space_operations fuse_file_aops =3D { .direct_IO =3D fuse_direct_IO, .write_begin =3D fuse_write_begin, .write_end =3D fuse_write_end, + .asop_flags =3D ASOP_NO_RECLAIM_IN_WRITEBACK, }; sorry about that. With your suggestion of adding the flag instead to the "enum mapping_flags", I'll be setting it in v2 in the fuse_init_file_inode() function. Thanks, Joanne > > > waited on while in writeback, and removes the temporary folio + > > extra copying and the internal rb tree. > > > > fio benchmarks -- > > (using averages observed from 10 runs, throwing away outliers) > > > > Setup: > > sudo mount -t tmpfs -o size=3D30G tmpfs ~/tmp_mount > > ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=3D4= -o source=3D~/tmp_mount ~/fuse_mount > > > > fio --name=3Dwriteback --ioengine=3Dsync --rw=3Dwrite --bs=3D{1k,4k,1M}= --size=3D2G > > --numjobs=3D2 --ramp_time=3D30 --group_reporting=3D1 --directory=3D/roo= t/fuse_mount > > > > bs =3D 1k 4k 1M > > Before 351 MiB/s 1818 MiB/s 1851 MiB/s > > After 341 MiB/s 2246 MiB/s 2685 MiB/s > > % diff -3% 23% 45% > > > > Signed-off-by: Joanne Koong > > --- > > fs/fuse/file.c | 321 +++++-------------------------------------------- > > 1 file changed, 27 insertions(+), 294 deletions(-) > > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > > index 4304e44f32e6..9fee9f3062db 100644 > > --- a/fs/fuse/file.c > > +++ b/fs/fuse/file.c > > @@ -415,74 +415,11 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_o= wner_t id) > > > > struct fuse_writepage_args { > > struct fuse_io_args ia; > > - struct rb_node writepages_entry; > > struct list_head queue_entry; > > - struct fuse_writepage_args *next; > > struct inode *inode; > > struct fuse_sync_bucket *bucket; > > }; > > > > -static struct fuse_writepage_args *fuse_find_writeback(struct fuse_ino= de *fi, > > - pgoff_t idx_from, pgoff_t idx= _to) > > -{ > > - struct rb_node *n; > > - > > - n =3D fi->writepages.rb_node; > > - > > - while (n) { > > - struct fuse_writepage_args *wpa; > > - pgoff_t curr_index; > > - > > - wpa =3D rb_entry(n, struct fuse_writepage_args, writepage= s_entry); > > - WARN_ON(get_fuse_inode(wpa->inode) !=3D fi); > > - curr_index =3D wpa->ia.write.in.offset >> PAGE_SHIFT; > > - if (idx_from >=3D curr_index + wpa->ia.ap.num_pages) > > - n =3D n->rb_right; > > - else if (idx_to < curr_index) > > - n =3D n->rb_left; > > - else > > - return wpa; > > - } > > - return NULL; > > -} > > - > > -/* > > - * Check if any page in a range is under writeback > > - */ > > -static bool fuse_range_is_writeback(struct inode *inode, pgoff_t idx_f= rom, > > - pgoff_t idx_to) > > -{ > > - struct fuse_inode *fi =3D get_fuse_inode(inode); > > - bool found; > > - > > - if (RB_EMPTY_ROOT(&fi->writepages)) > > - return false; > > - > > - spin_lock(&fi->lock); > > - found =3D fuse_find_writeback(fi, idx_from, idx_to); > > - spin_unlock(&fi->lock); > > - > > - return found; > > -} > > - > > -static inline bool fuse_page_is_writeback(struct inode *inode, pgoff_t= index) > > -{ > > - return fuse_range_is_writeback(inode, index, index); > > -} > > - > > -/* > > - * Wait for page writeback to be completed. > > - * > > - * Since fuse doesn't rely on the VM writeback tracking, this has to > > - * use some other means. > > - */ > > -static void fuse_wait_on_page_writeback(struct inode *inode, pgoff_t i= ndex) > > -{ > > - struct fuse_inode *fi =3D get_fuse_inode(inode); > > - > > - wait_event(fi->page_waitq, !fuse_page_is_writeback(inode, index))= ; > > -} > > - > > /* > > * Wait for all pending writepages on the inode to finish. > > * > > @@ -876,7 +813,7 @@ static int fuse_do_readpage(struct file *file, stru= ct page *page) > > * page-cache page, so make sure we read a properly synced > > * page. > > */ > > - fuse_wait_on_page_writeback(inode, page->index); > > + folio_wait_writeback(page_folio(page)); > > > > attr_ver =3D fuse_get_attr_version(fm->fc); > > > > @@ -1024,8 +961,7 @@ static void fuse_readahead(struct readahead_contro= l *rac) > > ap =3D &ia->ap; > > nr_pages =3D __readahead_batch(rac, ap->pages, nr_pages); > > for (i =3D 0; i < nr_pages; i++) { > > - fuse_wait_on_page_writeback(inode, > > - readahead_index(rac) = + i); > > + folio_wait_writeback(page_folio(ap->pages[i])); > > ap->descs[i].length =3D PAGE_SIZE; > > } > > ap->num_pages =3D nr_pages; > > @@ -1147,7 +1083,7 @@ static ssize_t fuse_send_write_pages(struct fuse_= io_args *ia, > > int err; > > > > for (i =3D 0; i < ap->num_pages; i++) > > - fuse_wait_on_page_writeback(inode, ap->pages[i]->index); > > + folio_wait_writeback(page_folio(ap->pages[i])); > > > > fuse_write_args_fill(ia, ff, pos, count); > > ia->write.in.flags =3D fuse_write_flags(iocb); > > @@ -1583,7 +1519,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, s= truct iov_iter *iter, > > return res; > > } > > } > > - if (!cuse && fuse_range_is_writeback(inode, idx_from, idx_to)) { > > + if (!cuse && filemap_range_has_writeback(mapping, pos, (pos + cou= nt - 1))) { > > if (!write) > > inode_lock(inode); > > fuse_sync_writes(inode); > > @@ -1780,13 +1716,17 @@ static ssize_t fuse_splice_write(struct pipe_in= ode_info *pipe, struct file *out, > > static void fuse_writepage_free(struct fuse_writepage_args *wpa) > > { > > struct fuse_args_pages *ap =3D &wpa->ia.ap; > > + struct folio *folio; > > int i; > > > > if (wpa->bucket) > > fuse_sync_bucket_dec(wpa->bucket); > > > > - for (i =3D 0; i < ap->num_pages; i++) > > - __free_page(ap->pages[i]); > > + for (i =3D 0; i < ap->num_pages; i++) { > > + folio =3D page_folio(ap->pages[i]); > > + folio_end_writeback(folio); > > + folio_put(folio); > > + } > > > > fuse_file_put(wpa->ia.ff, false); > > > > @@ -1799,7 +1739,7 @@ static void fuse_writepage_finish_stat(struct ino= de *inode, struct page *page) > > struct backing_dev_info *bdi =3D inode_to_bdi(inode); > > > > dec_wb_stat(&bdi->wb, WB_WRITEBACK); > > - dec_node_page_state(page, NR_WRITEBACK_TEMP); > > + dec_node_page_state(page, NR_WRITEBACK); > > wb_writeout_inc(&bdi->wb); > > } > > > > @@ -1822,7 +1762,6 @@ static void fuse_send_writepage(struct fuse_mount= *fm, > > __releases(fi->lock) > > __acquires(fi->lock) > > { > > - struct fuse_writepage_args *aux, *next; > > struct fuse_inode *fi =3D get_fuse_inode(wpa->inode); > > struct fuse_write_in *inarg =3D &wpa->ia.write.in; > > struct fuse_args *args =3D &wpa->ia.ap.args; > > @@ -1858,18 +1797,8 @@ __acquires(fi->lock) > > > > out_free: > > fi->writectr--; > > - rb_erase(&wpa->writepages_entry, &fi->writepages); > > fuse_writepage_finish(wpa); > > spin_unlock(&fi->lock); > > - > > - /* After rb_erase() aux request list is private */ > > - for (aux =3D wpa->next; aux; aux =3D next) { > > - next =3D aux->next; > > - aux->next =3D NULL; > > - fuse_writepage_finish_stat(aux->inode, aux->ia.ap.pages[0= ]); > > - fuse_writepage_free(aux); > > - } > > - > > fuse_writepage_free(wpa); > > spin_lock(&fi->lock); > > } > > @@ -1897,43 +1826,6 @@ __acquires(fi->lock) > > } > > } > > > > -static struct fuse_writepage_args *fuse_insert_writeback(struct rb_roo= t *root, > > - struct fuse_writepage_arg= s *wpa) > > -{ > > - pgoff_t idx_from =3D wpa->ia.write.in.offset >> PAGE_SHIFT; > > - pgoff_t idx_to =3D idx_from + wpa->ia.ap.num_pages - 1; > > - struct rb_node **p =3D &root->rb_node; > > - struct rb_node *parent =3D NULL; > > - > > - WARN_ON(!wpa->ia.ap.num_pages); > > - while (*p) { > > - struct fuse_writepage_args *curr; > > - pgoff_t curr_index; > > - > > - parent =3D *p; > > - curr =3D rb_entry(parent, struct fuse_writepage_args, > > - writepages_entry); > > - WARN_ON(curr->inode !=3D wpa->inode); > > - curr_index =3D curr->ia.write.in.offset >> PAGE_SHIFT; > > - > > - if (idx_from >=3D curr_index + curr->ia.ap.num_pages) > > - p =3D &(*p)->rb_right; > > - else if (idx_to < curr_index) > > - p =3D &(*p)->rb_left; > > - else > > - return curr; > > - } > > - > > - rb_link_node(&wpa->writepages_entry, parent, p); > > - rb_insert_color(&wpa->writepages_entry, root); > > - return NULL; > > -} > > - > > -static void tree_insert(struct rb_root *root, struct fuse_writepage_ar= gs *wpa) > > -{ > > - WARN_ON(fuse_insert_writeback(root, wpa)); > > -} > > - > > static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args= *args, > > int error) > > { > > @@ -1953,41 +1845,6 @@ static void fuse_writepage_end(struct fuse_mount= *fm, struct fuse_args *args, > > if (!fc->writeback_cache) > > fuse_invalidate_attr_mask(inode, FUSE_STATX_MODIFY); > > spin_lock(&fi->lock); > > - rb_erase(&wpa->writepages_entry, &fi->writepages); > > - while (wpa->next) { > > - struct fuse_mount *fm =3D get_fuse_mount(inode); > > - struct fuse_write_in *inarg =3D &wpa->ia.write.in; > > - struct fuse_writepage_args *next =3D wpa->next; > > - > > - wpa->next =3D next->next; > > - next->next =3D NULL; > > - tree_insert(&fi->writepages, next); > > - > > - /* > > - * Skip fuse_flush_writepages() to make it easy to crop r= equests > > - * based on primary request size. > > - * > > - * 1st case (trivial): there are no concurrent activities= using > > - * fuse_set/release_nowrite. Then we're on safe side bec= ause > > - * fuse_flush_writepages() would call fuse_send_writepage= () > > - * anyway. > > - * > > - * 2nd case: someone called fuse_set_nowrite and it is wa= iting > > - * now for completion of all in-flight requests. This ha= ppens > > - * rarely and no more than once per page, so this should = be > > - * okay. > > - * > > - * 3rd case: someone (e.g. fuse_do_setattr()) is in the m= iddle > > - * of fuse_set_nowrite..fuse_release_nowrite section. Th= e fact > > - * that fuse_set_nowrite returned implies that all in-fli= ght > > - * requests were completed along with all of their second= ary > > - * requests. Further primary requests are blocked by neg= ative > > - * writectr. Hence there cannot be any in-flight request= s and > > - * no invocations of fuse_writepage_end() while we're in > > - * fuse_set_nowrite..fuse_release_nowrite section. > > - */ > > - fuse_send_writepage(fm, next, inarg->offset + inarg->size= ); > > - } > > fi->writectr--; > > fuse_writepage_finish(wpa); > > spin_unlock(&fi->lock); > > @@ -2074,19 +1931,18 @@ static void fuse_writepage_add_to_bucket(struct= fuse_conn *fc, > > } > > > > static void fuse_writepage_args_page_fill(struct fuse_writepage_args *= wpa, struct folio *folio, > > - struct folio *tmp_folio, uint32= _t page_index) > > + uint32_t page_index) > > { > > struct inode *inode =3D folio->mapping->host; > > struct fuse_args_pages *ap =3D &wpa->ia.ap; > > > > - folio_copy(tmp_folio, folio); > > - > > - ap->pages[page_index] =3D &tmp_folio->page; > > + folio_get(folio); > > + ap->pages[page_index] =3D &folio->page; > > ap->descs[page_index].offset =3D 0; > > ap->descs[page_index].length =3D PAGE_SIZE; > > > > inc_wb_stat(&inode_to_bdi(inode)->wb, WB_WRITEBACK); > > - inc_node_page_state(&tmp_folio->page, NR_WRITEBACK_TEMP); > > + inc_node_page_state(&folio->page, NR_WRITEBACK); > > } > > > > static struct fuse_writepage_args *fuse_writepage_args_setup(struct fo= lio *folio, > > @@ -2121,18 +1977,12 @@ static int fuse_writepage_locked(struct folio *= folio) > > struct fuse_inode *fi =3D get_fuse_inode(inode); > > struct fuse_writepage_args *wpa; > > struct fuse_args_pages *ap; > > - struct folio *tmp_folio; > > struct fuse_file *ff; > > - int error =3D -ENOMEM; > > - > > - tmp_folio =3D folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); > > - if (!tmp_folio) > > - goto err; > > + int error =3D -EIO; > > > > - error =3D -EIO; > > ff =3D fuse_write_file_get(fi); > > if (!ff) > > - goto err_nofile; > > + goto err; > > > > wpa =3D fuse_writepage_args_setup(folio, ff); > > error =3D -ENOMEM; > > @@ -2143,22 +1993,17 @@ static int fuse_writepage_locked(struct folio *= folio) > > ap->num_pages =3D 1; > > > > folio_start_writeback(folio); > > - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, 0); > > + fuse_writepage_args_page_fill(wpa, folio, 0); > > > > spin_lock(&fi->lock); > > - tree_insert(&fi->writepages, wpa); > > list_add_tail(&wpa->queue_entry, &fi->queued_writes); > > fuse_flush_writepages(inode); > > spin_unlock(&fi->lock); > > > > - folio_end_writeback(folio); > > - > > return 0; > > > > err_writepage_args: > > fuse_file_put(ff, false); > > -err_nofile: > > - folio_put(tmp_folio); > > err: > > mapping_set_error(folio->mapping, error); > > return error; > > @@ -2168,7 +2013,6 @@ struct fuse_fill_wb_data { > > struct fuse_writepage_args *wpa; > > struct fuse_file *ff; > > struct inode *inode; > > - struct page **orig_pages; > > unsigned int max_pages; > > }; > > > > @@ -2203,68 +2047,11 @@ static void fuse_writepages_send(struct fuse_fi= ll_wb_data *data) > > struct fuse_writepage_args *wpa =3D data->wpa; > > struct inode *inode =3D data->inode; > > struct fuse_inode *fi =3D get_fuse_inode(inode); > > - int num_pages =3D wpa->ia.ap.num_pages; > > - int i; > > > > spin_lock(&fi->lock); > > list_add_tail(&wpa->queue_entry, &fi->queued_writes); > > fuse_flush_writepages(inode); > > spin_unlock(&fi->lock); > > - > > - for (i =3D 0; i < num_pages; i++) > > - end_page_writeback(data->orig_pages[i]); > > -} > > - > > -/* > > - * Check under fi->lock if the page is under writeback, and insert it = onto the > > - * rb_tree if not. Otherwise iterate auxiliary write requests, to see = if there's > > - * one already added for a page at this offset. If there's none, then= insert > > - * this new request onto the auxiliary list, otherwise reuse the exist= ing one by > > - * swapping the new temp page with the old one. > > - */ > > -static bool fuse_writepage_add(struct fuse_writepage_args *new_wpa, > > - struct page *page) > > -{ > > - struct fuse_inode *fi =3D get_fuse_inode(new_wpa->inode); > > - struct fuse_writepage_args *tmp; > > - struct fuse_writepage_args *old_wpa; > > - struct fuse_args_pages *new_ap =3D &new_wpa->ia.ap; > > - > > - WARN_ON(new_ap->num_pages !=3D 0); > > - new_ap->num_pages =3D 1; > > - > > - spin_lock(&fi->lock); > > - old_wpa =3D fuse_insert_writeback(&fi->writepages, new_wpa); > > - if (!old_wpa) { > > - spin_unlock(&fi->lock); > > - return true; > > - } > > - > > - for (tmp =3D old_wpa->next; tmp; tmp =3D tmp->next) { > > - pgoff_t curr_index; > > - > > - WARN_ON(tmp->inode !=3D new_wpa->inode); > > - curr_index =3D tmp->ia.write.in.offset >> PAGE_SHIFT; > > - if (curr_index =3D=3D page->index) { > > - WARN_ON(tmp->ia.ap.num_pages !=3D 1); > > - swap(tmp->ia.ap.pages[0], new_ap->pages[0]); > > - break; > > - } > > - } > > - > > - if (!tmp) { > > - new_wpa->next =3D old_wpa->next; > > - old_wpa->next =3D new_wpa; > > - } > > - > > - spin_unlock(&fi->lock); > > - > > - if (tmp) { > > - fuse_writepage_finish_stat(new_wpa->inode, new_ap->pages[= 0]); > > - fuse_writepage_free(new_wpa); > > - } > > - > > - return false; > > } > > > > static bool fuse_writepage_need_send(struct fuse_conn *fc, struct page= *page, > > @@ -2273,15 +2060,6 @@ static bool fuse_writepage_need_send(struct fuse= _conn *fc, struct page *page, > > { > > WARN_ON(!ap->num_pages); > > > > - /* > > - * Being under writeback is unlikely but possible. For example d= irect > > - * read to an mmaped fuse file will set the page dirty twice; onc= e when > > - * the pages are faulted with get_user_pages(), and then after th= e read > > - * completed. > > - */ > > - if (fuse_page_is_writeback(data->inode, page->index)) > > - return true; > > - > > /* Reached max pages */ > > if (ap->num_pages =3D=3D fc->max_pages) > > return true; > > @@ -2291,7 +2069,7 @@ static bool fuse_writepage_need_send(struct fuse_= conn *fc, struct page *page, > > return true; > > > > /* Discontinuity */ > > - if (data->orig_pages[ap->num_pages - 1]->index + 1 !=3D page->ind= ex) > > + if (ap->pages[ap->num_pages - 1]->index + 1 !=3D page->index) > > return true; > > > > /* Need to grow the pages array? If so, did the expansion fail? = */ > > @@ -2308,9 +2086,7 @@ static int fuse_writepages_fill(struct folio *fol= io, > > struct fuse_writepage_args *wpa =3D data->wpa; > > struct fuse_args_pages *ap =3D &wpa->ia.ap; > > struct inode *inode =3D data->inode; > > - struct fuse_inode *fi =3D get_fuse_inode(inode); > > struct fuse_conn *fc =3D get_fuse_conn(inode); > > - struct folio *tmp_folio; > > int err; > > > > if (wpa && fuse_writepage_need_send(fc, &folio->page, ap, data)) = { > > @@ -2318,54 +2094,23 @@ static int fuse_writepages_fill(struct folio *f= olio, > > data->wpa =3D NULL; > > } > > > > - err =3D -ENOMEM; > > - tmp_folio =3D folio_alloc(GFP_NOFS | __GFP_HIGHMEM, 0); > > - if (!tmp_folio) > > - goto out_unlock; > > - > > - /* > > - * The page must not be redirtied until the writeout is completed > > - * (i.e. userspace has sent a reply to the write request). Other= wise > > - * there could be more than one temporary page instance for each = real > > - * page. > > - * > > - * This is ensured by holding the page lock in page_mkwrite() whi= le > > - * checking fuse_page_is_writeback(). We already hold the page l= ock > > - * since clear_page_dirty_for_io() and keep it held until we add = the > > - * request to the fi->writepages list and increment ap->num_pages= . > > - * After this fuse_page_is_writeback() will indicate that the pag= e is > > - * under writeback, so we can release the page lock. > > - */ > > if (data->wpa =3D=3D NULL) { > > err =3D -ENOMEM; > > wpa =3D fuse_writepage_args_setup(folio, data->ff); > > - if (!wpa) { > > - folio_put(tmp_folio); > > + if (!wpa) > > goto out_unlock; > > - } > > fuse_file_get(wpa->ia.ff); > > data->max_pages =3D 1; > > ap =3D &wpa->ia.ap; > > } > > folio_start_writeback(folio); > > > > - fuse_writepage_args_page_fill(wpa, folio, tmp_folio, ap->num_page= s); > > - data->orig_pages[ap->num_pages] =3D &folio->page; > > + fuse_writepage_args_page_fill(wpa, folio, ap->num_pages); > > > > err =3D 0; > > - if (data->wpa) { > > - /* > > - * Protected by fi->lock against concurrent access by > > - * fuse_page_is_writeback(). > > - */ > > - spin_lock(&fi->lock); > > - ap->num_pages++; > > - spin_unlock(&fi->lock); > > - } else if (fuse_writepage_add(wpa, &folio->page)) { > > + ap->num_pages++; > > + if (!data->wpa) > > data->wpa =3D wpa; > > - } else { > > - folio_end_writeback(folio); > > - } > > out_unlock: > > folio_unlock(folio); > > > > @@ -2394,21 +2139,12 @@ static int fuse_writepages(struct address_space= *mapping, > > if (!data.ff) > > return -EIO; > > > > - err =3D -ENOMEM; > > - data.orig_pages =3D kcalloc(fc->max_pages, > > - sizeof(struct page *), > > - GFP_NOFS); > > - if (!data.orig_pages) > > - goto out; > > - > > err =3D write_cache_pages(mapping, wbc, fuse_writepages_fill, &da= ta); > > if (data.wpa) { > > WARN_ON(!data.wpa->ia.ap.num_pages); > > fuse_writepages_send(&data); > > } > > > > - kfree(data.orig_pages); > > -out: > > fuse_file_put(data.ff, false); > > return err; > > } > > @@ -2433,7 +2169,7 @@ static int fuse_write_begin(struct file *file, st= ruct address_space *mapping, > > if (IS_ERR(folio)) > > goto error; > > > > - fuse_wait_on_page_writeback(mapping->host, folio->index); > > + folio_wait_writeback(folio); > > > > if (folio_test_uptodate(folio) || len >=3D folio_size(folio)) > > goto success; > > @@ -2497,13 +2233,11 @@ static int fuse_launder_folio(struct folio *fol= io) > > { > > int err =3D 0; > > if (folio_clear_dirty_for_io(folio)) { > > - struct inode *inode =3D folio->mapping->host; > > - > > /* Serialize with pending writeback for the same page */ > > - fuse_wait_on_page_writeback(inode, folio->index); > > + folio_wait_writeback(folio); > > err =3D fuse_writepage_locked(folio); > > if (!err) > > - fuse_wait_on_page_writeback(inode, folio->index); > > + folio_wait_writeback(folio); > > } > > return err; > > } > > @@ -2547,7 +2281,7 @@ static vm_fault_t fuse_page_mkwrite(struct vm_fau= lt *vmf) > > return VM_FAULT_NOPAGE; > > } > > > > - fuse_wait_on_page_writeback(inode, page->index); > > + folio_wait_writeback(page_folio(page)); > > return VM_FAULT_LOCKED; > > } > > > > @@ -3375,7 +3109,6 @@ void fuse_init_file_inode(struct inode *inode, un= signed int flags) > > fi->iocachectr =3D 0; > > init_waitqueue_head(&fi->page_waitq); > > init_waitqueue_head(&fi->direct_io_waitq); > > - fi->writepages =3D RB_ROOT; > > > > if (IS_ENABLED(CONFIG_FUSE_DAX)) > > fuse_dax_inode_init(inode, flags); > > -- > > 2.43.5 > >