* [PATCH v2 0/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
@ 2025-12-15 3:00 Joanne Koong
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
0 siblings, 1 reply; 19+ messages in thread
From: Joanne Koong @ 2025-12-15 3:00 UTC (permalink / raw)
To: akpm
Cc: david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
This patch reverts fuse back to its original behavior of sync being a no-op.
This fixes the userspace regression reported by Athul and J. upstream in
[1][2] where if there is a bug in a fuse server that causes the server to
never complete writeback, it will make wait_sb_inodes() wait forever.
Thanks,
Joanne
[1] https://lore.kernel.org/regressions/CAJnrk1ZjQ8W8NzojsvJPRXiv9TuYPNdj8Ye7=Cgkj=iV_i8EaA@mail.gmail.com/T/#t
[2] https://lore.kernel.org/linux-fsdevel/aT7JRqhUvZvfUQlV@eldamar.lan/
Changelog:
v1: https://lore.kernel.org/linux-mm/20251120184211.2379439-1-joannelkoong@gmail.com/
* Change AS_WRITEBACK_MAY_HANG to AS_NO_DATA_INTEGRITY and keep
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM as is.
Joanne Koong (1):
fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
fs/fs-writeback.c | 3 ++-
fs/fuse/file.c | 4 +++-
include/linux/pagemap.h | 11 +++++++++++
3 files changed, 16 insertions(+), 2 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 3:00 [PATCH v2 0/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes() Joanne Koong
@ 2025-12-15 3:00 ` Joanne Koong
2025-12-15 17:09 ` Bernd Schubert
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Joanne Koong @ 2025-12-15 3:00 UTC (permalink / raw)
To: akpm
Cc: david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
Skip waiting on writeback for inodes that belong to mappings that do not
have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
mapping flag).
This restores fuse back to prior behavior where syncs are no-ops. This
is needed because otherwise, if a system is running a faulty fuse
server that does not reply to issued write requests, this will cause
wait_sb_inodes() to wait forever.
Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fs-writeback.c | 3 ++-
fs/fuse/file.c | 4 +++-
include/linux/pagemap.h | 11 +++++++++++
3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..ab2e279ed3c2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
* do not have the mapping lock. Skip it here, wb completion
* will remove it.
*/
- if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
+ if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
+ mapping_no_data_integrity(mapping))
continue;
spin_unlock_irq(&sb->s_inode_wblist_lock);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 01bc894e9c2b..3b2a171e652f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
inode->i_fop = &fuse_file_operations;
inode->i_data.a_ops = &fuse_file_aops;
- if (fc->writeback_cache)
+ if (fc->writeback_cache) {
mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
+ mapping_set_no_data_integrity(&inode->i_data);
+ }
INIT_LIST_HEAD(&fi->write_files);
INIT_LIST_HEAD(&fi->queued_writes);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 31a848485ad9..ec442af3f886 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -210,6 +210,7 @@ enum mapping_flags {
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
account usage to user cgroups */
+ AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
@@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
}
+static inline void mapping_set_no_data_integrity(struct address_space *mapping)
+{
+ set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}
+
+static inline bool mapping_no_data_integrity(const struct address_space *mapping)
+{
+ return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
+}
+
static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
{
return mapping->gfp_mask;
--
2.47.3
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
@ 2025-12-15 17:09 ` Bernd Schubert
2025-12-16 7:07 ` Joanne Koong
2025-12-16 18:13 ` J. Neuschäfer
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Bernd Schubert @ 2025-12-15 17:09 UTC (permalink / raw)
To: Joanne Koong, akpm
Cc: david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
On 12/15/25 04:00, Joanne Koong wrote:
> Skip waiting on writeback for inodes that belong to mappings that do not
> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> mapping flag).
>
> This restores fuse back to prior behavior where syncs are no-ops. This
> is needed because otherwise, if a system is running a faulty fuse
> server that does not reply to issued write requests, this will cause
> wait_sb_inodes() to wait forever.
>
> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> fs/fs-writeback.c | 3 ++-
> fs/fuse/file.c | 4 +++-
> include/linux/pagemap.h | 11 +++++++++++
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6800886c4d10..ab2e279ed3c2 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> * do not have the mapping lock. Skip it here, wb completion
> * will remove it.
> */
> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> + mapping_no_data_integrity(mapping))
> continue;
>
> spin_unlock_irq(&sb->s_inode_wblist_lock);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..3b2a171e652f 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>
> inode->i_fop = &fuse_file_operations;
> inode->i_data.a_ops = &fuse_file_aops;
> - if (fc->writeback_cache)
> + if (fc->writeback_cache) {
> mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> + mapping_set_no_data_integrity(&inode->i_data);
> + }
For a future commit, maybe we could add a FUSE_INIT flag that allows privileged
fuse server to not set this? Maybe even in combination with an enforced request
timeout?
>
> INIT_LIST_HEAD(&fi->write_files);
> INIT_LIST_HEAD(&fi->queued_writes);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 31a848485ad9..ec442af3f886 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -210,6 +210,7 @@ enum mapping_flags {
> AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
> account usage to user cgroups */
> + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
> /* Bits 16-25 are used for FOLIO_ORDER */
> AS_FOLIO_ORDER_BITS = 5,
> AS_FOLIO_ORDER_MIN = 16,
> @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
> return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> }
>
> +static inline void mapping_set_no_data_integrity(struct address_space *mapping)
> +{
> + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> +static inline bool mapping_no_data_integrity(const struct address_space *mapping)
> +{
> + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
> {
> return mapping->gfp_mask;
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 17:09 ` Bernd Schubert
@ 2025-12-16 7:07 ` Joanne Koong
0 siblings, 0 replies; 19+ messages in thread
From: Joanne Koong @ 2025-12-16 7:07 UTC (permalink / raw)
To: Bernd Schubert
Cc: akpm, david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer,
carnil, linux-fsdevel, stable
On Tue, Dec 16, 2025 at 1:09 AM Bernd Schubert <bernd@bsbernd.com> wrote:
>
> On 12/15/25 04:00, Joanne Koong wrote:
> > Skip waiting on writeback for inodes that belong to mappings that do not
> > have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> > mapping flag).
> >
> > This restores fuse back to prior behavior where syncs are no-ops. This
> > is needed because otherwise, if a system is running a faulty fuse
> > server that does not reply to issued write requests, this will cause
> > wait_sb_inodes() to wait forever.
> >
> > Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> > Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> > Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > fs/fs-writeback.c | 3 ++-
> > fs/fuse/file.c | 4 +++-
> > include/linux/pagemap.h | 11 +++++++++++
> > 3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 6800886c4d10..ab2e279ed3c2 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> > * do not have the mapping lock. Skip it here, wb completion
> > * will remove it.
> > */
> > - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> > + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> > + mapping_no_data_integrity(mapping))
> > continue;
> >
> > spin_unlock_irq(&sb->s_inode_wblist_lock);
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 01bc894e9c2b..3b2a171e652f 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
> >
> > inode->i_fop = &fuse_file_operations;
> > inode->i_data.a_ops = &fuse_file_aops;
> > - if (fc->writeback_cache)
> > + if (fc->writeback_cache) {
> > mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> > + mapping_set_no_data_integrity(&inode->i_data);
> > + }
>
> For a future commit, maybe we could add a FUSE_INIT flag that allows privileged
> fuse server to not set this? Maybe even in combination with an enforced request
> timeout?
That sounds good, thanks for reviewing this, Bernd!
>
> >
> > INIT_LIST_HEAD(&fi->write_files);
> > INIT_LIST_HEAD(&fi->queued_writes);
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 31a848485ad9..ec442af3f886 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -210,6 +210,7 @@ enum mapping_flags {
> > AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> > AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
> > account usage to user cgroups */
> > + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
> > /* Bits 16-25 are used for FOLIO_ORDER */
> > AS_FOLIO_ORDER_BITS = 5,
> > AS_FOLIO_ORDER_MIN = 16,
> > @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
> > return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> > }
> >
> > +static inline void mapping_set_no_data_integrity(struct address_space *mapping)
> > +{
> > + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> > +}
> > +
> > +static inline bool mapping_no_data_integrity(const struct address_space *mapping)
> > +{
> > + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> > +}
> > +
> > static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
> > {
> > return mapping->gfp_mask;
>
>
> Reviewed-by: Bernd Schubert <bschubert@ddn.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
2025-12-15 17:09 ` Bernd Schubert
@ 2025-12-16 18:13 ` J. Neuschäfer
2026-01-02 17:42 ` Joanne Koong
` (2 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: J. Neuschäfer @ 2025-12-16 18:13 UTC (permalink / raw)
To: Joanne Koong
Cc: akpm, david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer,
carnil, linux-fsdevel, stable
On Sun, Dec 14, 2025 at 07:00:43PM -0800, Joanne Koong wrote:
> Skip waiting on writeback for inodes that belong to mappings that do not
> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> mapping flag).
>
> This restores fuse back to prior behavior where syncs are no-ops. This
> is needed because otherwise, if a system is running a faulty fuse
> server that does not reply to issued write requests, this will cause
> wait_sb_inodes() to wait forever.
>
> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
I can confirm that this patch fixes the issue I reported.
(Tested by applying it on top of v6.19-rc1)
Tested-by: J. Neuschäfer <j.neuschaefer@gmx.net>
Thank you very much!
> ---
> fs/fs-writeback.c | 3 ++-
> fs/fuse/file.c | 4 +++-
> include/linux/pagemap.h | 11 +++++++++++
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6800886c4d10..ab2e279ed3c2 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> * do not have the mapping lock. Skip it here, wb completion
> * will remove it.
> */
> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> + mapping_no_data_integrity(mapping))
> continue;
>
> spin_unlock_irq(&sb->s_inode_wblist_lock);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..3b2a171e652f 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>
> inode->i_fop = &fuse_file_operations;
> inode->i_data.a_ops = &fuse_file_aops;
> - if (fc->writeback_cache)
> + if (fc->writeback_cache) {
> mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> + mapping_set_no_data_integrity(&inode->i_data);
> + }
>
> INIT_LIST_HEAD(&fi->write_files);
> INIT_LIST_HEAD(&fi->queued_writes);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 31a848485ad9..ec442af3f886 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -210,6 +210,7 @@ enum mapping_flags {
> AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
> account usage to user cgroups */
> + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
> /* Bits 16-25 are used for FOLIO_ORDER */
> AS_FOLIO_ORDER_BITS = 5,
> AS_FOLIO_ORDER_MIN = 16,
> @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
> return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> }
>
> +static inline void mapping_set_no_data_integrity(struct address_space *mapping)
> +{
> + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> +static inline bool mapping_no_data_integrity(const struct address_space *mapping)
> +{
> + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
> {
> return mapping->gfp_mask;
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
2025-12-15 17:09 ` Bernd Schubert
2025-12-16 18:13 ` J. Neuschäfer
@ 2026-01-02 17:42 ` Joanne Koong
2026-01-03 18:03 ` Andrew Morton
2026-01-06 9:33 ` Jan Kara
4 siblings, 0 replies; 19+ messages in thread
From: Joanne Koong @ 2026-01-02 17:42 UTC (permalink / raw)
To: akpm
Cc: david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
On Sun, Dec 14, 2025 at 7:05 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> Skip waiting on writeback for inodes that belong to mappings that do not
> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> mapping flag).
>
> This restores fuse back to prior behavior where syncs are no-ops. This
> is needed because otherwise, if a system is running a faulty fuse
> server that does not reply to issued write requests, this will cause
> wait_sb_inodes() to wait forever.
>
> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Hi Andrew,
This patch fixes a user regression that's been reported a few times
upstream [1][2]. Bernd (who works on fuse) has given his Reviewed-by
for the changes and J. has verified that it fixes the issues he saw.
Is there anything else needed to move this patch forward?
Thanks,
Joanne
[1] https://lore.kernel.org/regressions/mwBOip3XK77dn-UJtlk-uQ1N6i3nwsKticZyQdPYzQcsk0dsjXl4oOAh-Neoxv-0TlpKnt_FEJwx8ses5VJglGLJUW-bIG8KWchtoDwCnnA=@protonmail.com/
[2] https://lore.kernel.org/linux-fsdevel/aT7JRqhUvZvfUQlV@eldamar.lan/
> ---
> fs/fs-writeback.c | 3 ++-
> fs/fuse/file.c | 4 +++-
> include/linux/pagemap.h | 11 +++++++++++
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6800886c4d10..ab2e279ed3c2 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> * do not have the mapping lock. Skip it here, wb completion
> * will remove it.
> */
> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> + mapping_no_data_integrity(mapping))
> continue;
>
> spin_unlock_irq(&sb->s_inode_wblist_lock);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..3b2a171e652f 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>
> inode->i_fop = &fuse_file_operations;
> inode->i_data.a_ops = &fuse_file_aops;
> - if (fc->writeback_cache)
> + if (fc->writeback_cache) {
> mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> + mapping_set_no_data_integrity(&inode->i_data);
> + }
>
> INIT_LIST_HEAD(&fi->write_files);
> INIT_LIST_HEAD(&fi->queued_writes);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 31a848485ad9..ec442af3f886 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -210,6 +210,7 @@ enum mapping_flags {
> AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
> account usage to user cgroups */
> + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
> /* Bits 16-25 are used for FOLIO_ORDER */
> AS_FOLIO_ORDER_BITS = 5,
> AS_FOLIO_ORDER_MIN = 16,
> @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
> return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> }
>
> +static inline void mapping_set_no_data_integrity(struct address_space *mapping)
> +{
> + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> +static inline bool mapping_no_data_integrity(const struct address_space *mapping)
> +{
> + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
> {
> return mapping->gfp_mask;
> --
> 2.47.3
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
` (2 preceding siblings ...)
2026-01-02 17:42 ` Joanne Koong
@ 2026-01-03 18:03 ` Andrew Morton
2026-01-04 18:54 ` David Hildenbrand (Red Hat)
2026-01-06 9:33 ` Jan Kara
4 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2026-01-03 18:03 UTC (permalink / raw)
To: Joanne Koong
Cc: david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
On Sun, 14 Dec 2025 19:00:43 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> Skip waiting on writeback for inodes that belong to mappings that do not
> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> mapping flag).
>
> This restores fuse back to prior behavior where syncs are no-ops. This
> is needed because otherwise, if a system is running a faulty fuse
> server that does not reply to issued write requests, this will cause
> wait_sb_inodes() to wait forever.
>
> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
> ..
>
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> * do not have the mapping lock. Skip it here, wb completion
> * will remove it.
> */
> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> + mapping_no_data_integrity(mapping))
> continue;
It's not obvious why a no-data-integrity mapping would want to skip
writeback - what do these things have to do with each other?
So can we please have a v2 which has a comment here explaining this to the
reader?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-03 18:03 ` Andrew Morton
@ 2026-01-04 18:54 ` David Hildenbrand (Red Hat)
2026-01-05 19:55 ` Joanne Koong
0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-04 18:54 UTC (permalink / raw)
To: Andrew Morton, Joanne Koong
Cc: miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
On 1/3/26 19:03, Andrew Morton wrote:
> On Sun, 14 Dec 2025 19:00:43 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
>
>> Skip waiting on writeback for inodes that belong to mappings that do not
>> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
>> mapping flag).
>>
>> This restores fuse back to prior behavior where syncs are no-ops. This
>> is needed because otherwise, if a system is running a faulty fuse
>> server that does not reply to issued write requests, this will cause
>> wait_sb_inodes() to wait forever.
>>
>> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
>> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
>> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>
>> ..
>>
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
>> * do not have the mapping lock. Skip it here, wb completion
>> * will remove it.
>> */
>> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
>> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
>> + mapping_no_data_integrity(mapping))
>> continue;
>
> It's not obvious why a no-data-integrity mapping would want to skip
> writeback - what do these things have to do with each other?
>
> So can we please have a v2 which has a comment here explaining this to the
> reader?
Sorry for not replying earlier, I missed a couple of mails sent to my
@redhat address due to @gmail being force-unsubscribed from linux-mm ...
Probably sufficient to add at the beginning of the commit:
"Above the while() loop in wait_sb_inodes(), we document that we must
wait for all pages under writeback for data integrity. Consequently, if
a mapping, like fuse, traditionally does not have data integrity
semantics, there is no need to wait at all; we can simply skip these inodes.
So skip ..."
--
Cheers
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-04 18:54 ` David Hildenbrand (Red Hat)
@ 2026-01-05 19:55 ` Joanne Koong
0 siblings, 0 replies; 19+ messages in thread
From: Joanne Koong @ 2026-01-05 19:55 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Andrew Morton, miklos, linux-mm, athul.krishna.kr, j.neuschaefer,
carnil, linux-fsdevel, stable
On Sun, Jan 4, 2026 at 10:54 AM David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> On 1/3/26 19:03, Andrew Morton wrote:
> > On Sun, 14 Dec 2025 19:00:43 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> >> Skip waiting on writeback for inodes that belong to mappings that do not
> >> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> >> mapping flag).
> >>
> >> This restores fuse back to prior behavior where syncs are no-ops. This
> >> is needed because otherwise, if a system is running a faulty fuse
> >> server that does not reply to issued write requests, this will cause
> >> wait_sb_inodes() to wait forever.
> >>
> >> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> >> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> >> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> >>
> >> ..
> >>
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> >> * do not have the mapping lock. Skip it here, wb completion
> >> * will remove it.
> >> */
> >> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> >> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> >> + mapping_no_data_integrity(mapping))
> >> continue;
> >
> > It's not obvious why a no-data-integrity mapping would want to skip
> > writeback - what do these things have to do with each other?
> >
> > So can we please have a v2 which has a comment here explaining this to the
> > reader?
>
> Sorry for not replying earlier, I missed a couple of mails sent to my
> @redhat address due to @gmail being force-unsubscribed from linux-mm ...
>
> Probably sufficient to add at the beginning of the commit:
>
> "Above the while() loop in wait_sb_inodes(), we document that we must
> wait for all pages under writeback for data integrity. Consequently, if
> a mapping, like fuse, traditionally does not have data integrity
> semantics, there is no need to wait at all; we can simply skip these inodes.
>
> So skip ..."
Sounds good, I'll send out v3 with these changes. Thanks for the
feedback, Andrew and David.
>
> --
> Cheers
>
> David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
` (3 preceding siblings ...)
2026-01-03 18:03 ` Andrew Morton
@ 2026-01-06 9:33 ` Jan Kara
2026-01-06 10:05 ` David Hildenbrand (Red Hat)
2026-01-06 23:30 ` Joanne Koong
4 siblings, 2 replies; 19+ messages in thread
From: Jan Kara @ 2026-01-06 9:33 UTC (permalink / raw)
To: Joanne Koong
Cc: akpm, david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer,
carnil, linux-fsdevel, stable
[Thanks to Andrew for CCing me on patch commit]
On Sun 14-12-25 19:00:43, Joanne Koong wrote:
> Skip waiting on writeback for inodes that belong to mappings that do not
> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> mapping flag).
>
> This restores fuse back to prior behavior where syncs are no-ops. This
> is needed because otherwise, if a system is running a faulty fuse
> server that does not reply to issued write requests, this will cause
> wait_sb_inodes() to wait forever.
>
> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
OK, but the difference 0c58a97f919c introduced goes much further than just
wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the
other variants waiting for folio_writeback() to clear) returned immediately
because folio writeback was done as soon as we've copied the content into
the temporary page. Now they will block waiting for the server to finish
the IO. So e.g. fsync() will block waiting for the server in
file_write_and_wait_range() now, instead of blocking in fuse_fsync_common()
-> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting
for the server so that folio_writeback can be cleared.
So I understand your patch fixes the regression with suspend blocking but I
don't have a high confidence we are not just starting a whack-a-mole game
catching all the places that previously hiddenly depended on
folio_writeback getting cleared without any involvement of untrusted fuse
server and now this changed. So do we have some higher-level idea what is /
is not guaranteed with stuck fuse server?
Honza
> ---
> fs/fs-writeback.c | 3 ++-
> fs/fuse/file.c | 4 +++-
> include/linux/pagemap.h | 11 +++++++++++
> 3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6800886c4d10..ab2e279ed3c2 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> * do not have the mapping lock. Skip it here, wb completion
> * will remove it.
> */
> - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> + mapping_no_data_integrity(mapping))
> continue;
>
> spin_unlock_irq(&sb->s_inode_wblist_lock);
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 01bc894e9c2b..3b2a171e652f 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>
> inode->i_fop = &fuse_file_operations;
> inode->i_data.a_ops = &fuse_file_aops;
> - if (fc->writeback_cache)
> + if (fc->writeback_cache) {
> mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> + mapping_set_no_data_integrity(&inode->i_data);
> + }
>
> INIT_LIST_HEAD(&fi->write_files);
> INIT_LIST_HEAD(&fi->queued_writes);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 31a848485ad9..ec442af3f886 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -210,6 +210,7 @@ enum mapping_flags {
> AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
> account usage to user cgroups */
> + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
> /* Bits 16-25 are used for FOLIO_ORDER */
> AS_FOLIO_ORDER_BITS = 5,
> AS_FOLIO_ORDER_MIN = 16,
> @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
> return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> }
>
> +static inline void mapping_set_no_data_integrity(struct address_space *mapping)
> +{
> + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> +static inline bool mapping_no_data_integrity(const struct address_space *mapping)
> +{
> + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> +}
> +
> static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
> {
> return mapping->gfp_mask;
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 9:33 ` Jan Kara
@ 2026-01-06 10:05 ` David Hildenbrand (Red Hat)
2026-01-06 13:13 ` Miklos Szeredi
2026-01-06 23:30 ` Joanne Koong
1 sibling, 1 reply; 19+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 10:05 UTC (permalink / raw)
To: Jan Kara, Joanne Koong
Cc: akpm, miklos, linux-mm, athul.krishna.kr, j.neuschaefer, carnil,
linux-fsdevel, stable
On 1/6/26 10:33, Jan Kara wrote:
> [Thanks to Andrew for CCing me on patch commit]
>
> On Sun 14-12-25 19:00:43, Joanne Koong wrote:
>> Skip waiting on writeback for inodes that belong to mappings that do not
>> have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
>> mapping flag).
>>
>> This restores fuse back to prior behavior where syncs are no-ops. This
>> is needed because otherwise, if a system is running a faulty fuse
>> server that does not reply to issued write requests, this will cause
>> wait_sb_inodes() to wait forever.
>>
>> Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
>> Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
>> Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
> OK, but the difference 0c58a97f919c introduced goes much further than just
> wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the
> other variants waiting for folio_writeback() to clear) returned immediately
> because folio writeback was done as soon as we've copied the content into
> the temporary page. Now they will block waiting for the server to finish
> the IO. So e.g. fsync() will block waiting for the server in
> file_write_and_wait_range() now, instead of blocking in fuse_fsync_common()
> -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting
> for the server so that folio_writeback can be cleared.
>
> So I understand your patch fixes the regression with suspend blocking but I
> don't have a high confidence we are not just starting a whack-a-mole game
Yes, I think so, and I think it is [1] not even only limited to
writeback [2].
> catching all the places that previously hiddenly depended on
> folio_writeback getting cleared without any involvement of untrusted fuse
> server and now this changed.
Even worse, it's not only untrusted fuse servers, but also
trusted-but-buggy fuse servers, unfortunately. As Joanne wrote in v1:
"
As reported by Athul upstream in [1], there is a userspace regression
caused by commit 0c58a97f919c ("fuse: remove tmp folio for writebacks
and internal rb tree") where if there is a bug in a fuse server that
causes the server to never complete writeback, it will make
wait_sb_inodes() wait forever, causing sync paths to hang.
"
> So do we have some higher-level idea what is /
> is not guaranteed with stuck fuse server?
Joanne first proposed AS_WRITEBACK_MAY_HANG, which I disliked [2] for
various reasons because the semantics are weird. I am strongly against
using such a flag to arbitrarily skip waiting for writeback on folios in
the tree.
The patch here is at least logically the right thing to do when only
looking at the wait_sb_inodes() writeback situation [3] and why it is
even ok to skip waiting for writeback, and the fix Joanne originally
proposed.
To handle the bigger picture (I raised another problematic instance in
[4]): I don't know how to handle that without properly fixing fuse. Fuse
folks should really invest some time to solve this problem for good.
As a big temporary kernel hack, we could add a
AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback
directly inside folio_wait_writeback() -- not arbitrarily skipping it in
callers -- and possibly other places (readahead, not sure). That would
restore the old behavior.
Well, not quite, because the semantics that folio_wait_writeback()
promises -- writeback flag at least cleared once, like required here for
data integrity -- are just not true anymore.
And it would still break migration of folios that are under writeback
even though waiting for writeback even for migration even though in
99.9999% of all cases with trusted fuse server will do the right thing.
Just nasty.
Of course, we could set AS_ANY_WAITING_UTTERLY_BROKEN in fuse only
conditionally, but the fact that buggy trusted fuse servers are now a
thing, it all stops making any sense because we would have to set that
flag always.
There is no easy way to get back the old behavior without reverting to
the old way of using buffer pages I guess.
[1]
https://lore.kernel.org/linux-mm/504d100d-b8f3-475b-b575-3adfd17627b5@kernel.org/[2]
https://lore.kernel.org/linux-mm/f8da9ee0-f136-4366-b63a-1812fda11304@kernel.org/[3]
https://lore.kernel.org/linux-mm/6d0948f5-e739-49f3-8e23-359ddbf3da8f@kernel.org/[4]
https://lore.kernel.org/linux-mm/504d100d-b8f3-475b-b575-3adfd17627b5@kernel.org/
--
Cheers
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 10:05 ` David Hildenbrand (Red Hat)
@ 2026-01-06 13:13 ` Miklos Szeredi
2026-01-06 13:55 ` Jan Kara
2026-01-06 14:33 ` David Hildenbrand (Red Hat)
0 siblings, 2 replies; 19+ messages in thread
From: Miklos Szeredi @ 2026-01-06 13:13 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
> > So I understand your patch fixes the regression with suspend blocking but I
> > don't have a high confidence we are not just starting a whack-a-mole game
Joanne did a thorough analysis, so I still have hope. Missing a case
in such a complex thing is not unexpected.
> Yes, I think so, and I think it is [1] not even only limited to
> writeback [2].
You are referring to DoS against compaction?
It is a much more benign issue, since compaction will just skip locked
pages, AFAIU (wasn't always so:
https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).
Not saying it shouldn't be fixed, but it should be a separate discussion.
> To handle the bigger picture (I raised another problematic instance in
> [4]): I don't know how to handle that without properly fixing fuse. Fuse
> folks should really invest some time to solve this problem for good.
Fixing it generically in fuse would necessarily involve bringing back
some sort of temp buffer. The performance penalty could be minimized,
but complexity is what really hurts.
Maybe doing whack-a-mole results in less mess overall :-/
> As a big temporary kernel hack, we could add a
> AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback
> directly inside folio_wait_writeback() -- not arbitrarily skipping it in
> callers -- and possibly other places (readahead, not sure). That would
> restore the old behavior.
No it wouldn't, since the old code had surrogate methods for waiting
on outstanding writes, which were called on fsync, etc.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 13:13 ` Miklos Szeredi
@ 2026-01-06 13:55 ` Jan Kara
2026-01-06 14:33 ` David Hildenbrand (Red Hat)
1 sibling, 0 replies; 19+ messages in thread
From: Jan Kara @ 2026-01-06 13:55 UTC (permalink / raw)
To: Miklos Szeredi
Cc: David Hildenbrand (Red Hat),
Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On Tue 06-01-26 14:13:55, Miklos Szeredi wrote:
> On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>
> > > So I understand your patch fixes the regression with suspend blocking but I
> > > don't have a high confidence we are not just starting a whack-a-mole game
>
> Joanne did a thorough analysis, so I still have hope. Missing a case
> in such a complex thing is not unexpected.
>
> > Yes, I think so, and I think it is [1] not even only limited to
> > writeback [2].
>
> You are referring to DoS against compaction?
>
> It is a much more benign issue, since compaction will just skip locked
> pages, AFAIU (wasn't always so:
> https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).
>
> Not saying it shouldn't be fixed, but it should be a separate discussion.
>
> > To handle the bigger picture (I raised another problematic instance in
> > [4]): I don't know how to handle that without properly fixing fuse. Fuse
> > folks should really invest some time to solve this problem for good.
>
> Fixing it generically in fuse would necessarily involve bringing back
> some sort of temp buffer. The performance penalty could be minimized,
> but complexity is what really hurts.
>
> Maybe doing whack-a-mole results in less mess overall :-/
OK, I was wondering about the bigger picture and now I see there's none :)
I can live with this workaround for now as its blast radius is relatively
small and we can see if some other practical issues appear in the future
(in which case I'll probably push for a more systemic solution).
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 13:13 ` Miklos Szeredi
2026-01-06 13:55 ` Jan Kara
@ 2026-01-06 14:33 ` David Hildenbrand (Red Hat)
2026-01-06 15:21 ` Miklos Szeredi
1 sibling, 1 reply; 19+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 14:33 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On 1/6/26 14:13, Miklos Szeredi wrote:
> On Tue, 6 Jan 2026 at 11:05, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>
>>> So I understand your patch fixes the regression with suspend blocking but I
>>> don't have a high confidence we are not just starting a whack-a-mole game
>
> Joanne did a thorough analysis, so I still have hope. Missing a case
> in such a complex thing is not unexpected.
>
>> Yes, I think so, and I think it is [1] not even only limited to
>> writeback [2].
>
> You are referring to DoS against compaction?
In previous discussions it was raised that readahead runs into similar
problems.
I don't recall all the details, but I think that we might end up holding
the folio lock forever while the fuse user space daemon is supposed to
fill the page with data; anybody trying to lock the folio would
similarly deadlock.
Maybe only compaction/migration is affected by that, hard to tell.
>
> It is a much more benign issue, since compaction will just skip locked
> pages, AFAIU (wasn't always so:
> https://lore.kernel.org/all/1288817005.4235.11393.camel@nimitz/).
>
> Not saying it shouldn't be fixed, but it should be a separate discussion.
Right. But as I pointed out in [4], there are other call paths where we
might end up waiting for writeback unless I am missing something.
So it has whack-a-mole smell to it.
>
>> To handle the bigger picture (I raised another problematic instance in
>> [4]): I don't know how to handle that without properly fixing fuse. Fuse
>> folks should really invest some time to solve this problem for good.
>
> Fixing it generically in fuse would necessarily involve bringing back
> some sort of temp buffer. The performance penalty could be minimized,
> but complexity is what really hurts.
I'm not sure about temp buffers. During early discussions there were
ideas about canceling writeback and instead marking the folio dirty
again. I assume there is a non-trivial solution space left unexplored
for now.
>
> Maybe doing whack-a-mole results in less mess overall :-/
>
Maybe :) I'm fine with the patch as is as well.
>> As a big temporary kernel hack, we could add a
>> AS_ANY_WAITING_UTTERLY_BROKEN and simply refuse to wait for writeback
>> directly inside folio_wait_writeback() -- not arbitrarily skipping it in
>> callers -- and possibly other places (readahead, not sure). That would
>> restore the old behavior.
>
> No it wouldn't, since the old code had surrogate methods for waiting
> on outstanding writes, which were called on fsync, etc.
Yeah, I raised some "except" below, I assume there are more. No that I
would want to go down that path :)
--
Cheers
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 14:33 ` David Hildenbrand (Red Hat)
@ 2026-01-06 15:21 ` Miklos Szeredi
2026-01-06 15:41 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2026-01-06 15:21 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On Tue, 6 Jan 2026 at 15:34, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
> I don't recall all the details, but I think that we might end up holding
> the folio lock forever while the fuse user space daemon is supposed to
> fill the page with data; anybody trying to lock the folio would
> similarly deadlock.
Right.
> Maybe only compaction/migration is affected by that, hard to tell.
Can't imagine anything beyond actual I/O and folio logistics
(reclaim/compaction) that would want to touch the page lock.
I/O has the right to wait forever on the folio if the server is stuck,
that doesn't count as a deadlock.
The logistics functions are careful to use folio_trylock(), but they
could give a hint to fuse via a callback that they'd like to have this
particular folio. In that case fuse would be free to cancel the read
and let the whole thing be retried with a new folio.
What we really need is a failing test case, the rest should be easy ;-)
> I'm not sure about temp buffers. During early discussions there were
> ideas about canceling writeback and instead marking the folio dirty
> again. I assume there is a non-trivial solution space left unexplored
> for now.
That might work combined with the suggested callback to fix the
compaction issue.
But I don't see how it would be a generic replacement for the tmp page code.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 15:21 ` Miklos Szeredi
@ 2026-01-06 15:41 ` David Hildenbrand (Red Hat)
2026-01-06 16:05 ` Miklos Szeredi
0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 15:41 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On 1/6/26 16:21, Miklos Szeredi wrote:
> On Tue, 6 Jan 2026 at 15:34, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>
>> I don't recall all the details, but I think that we might end up holding
>> the folio lock forever while the fuse user space daemon is supposed to
>> fill the page with data; anybody trying to lock the folio would
>> similarly deadlock.
>
> Right.
>
>> Maybe only compaction/migration is affected by that, hard to tell.
>
> Can't imagine anything beyond actual I/O and folio logistics
> (reclaim/compaction) that would want to touch the page lock.
I assume the usual suspects, including mm/memory-failure.c.
memory_failure() not only contains a folio_wait_writeback() but also a
folio_lock(), so twice the fun :)
--
Cheers
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 15:41 ` David Hildenbrand (Red Hat)
@ 2026-01-06 16:05 ` Miklos Szeredi
2026-01-06 17:54 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2026-01-06 16:05 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On Tue, 6 Jan 2026 at 16:41, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
> I assume the usual suspects, including mm/memory-failure.c.
>
> memory_failure() not only contains a folio_wait_writeback() but also a
> folio_lock(), so twice the fun :)
As long as it's run from a workqueue it shouldn't affect the rest of
the system, right? The wq thread will consume a nontrivial amount of
resources, I suppose, so it would be better to implement those waits
asynchronously.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 16:05 ` Miklos Szeredi
@ 2026-01-06 17:54 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-06 17:54 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Jan Kara, Joanne Koong, akpm, linux-mm, athul.krishna.kr,
j.neuschaefer, carnil, linux-fsdevel, stable
On 1/6/26 17:05, Miklos Szeredi wrote:
> On Tue, 6 Jan 2026 at 16:41, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>
>> I assume the usual suspects, including mm/memory-failure.c.
>>
>> memory_failure() not only contains a folio_wait_writeback() but also a
>> folio_lock(), so twice the fun :)
>
> As long as it's run from a workqueue it shouldn't affect the rest of
> the system, right? The wq thread will consume a nontrivial amount of
> resources, I suppose, so it would be better to implement those waits
> asynchronously.
Good question. I know that memory_failure() can be triggered out of
various context, but I never traced it back to its origin.
--
Cheers
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()
2026-01-06 9:33 ` Jan Kara
2026-01-06 10:05 ` David Hildenbrand (Red Hat)
@ 2026-01-06 23:30 ` Joanne Koong
1 sibling, 0 replies; 19+ messages in thread
From: Joanne Koong @ 2026-01-06 23:30 UTC (permalink / raw)
To: Jan Kara
Cc: akpm, david, miklos, linux-mm, athul.krishna.kr, j.neuschaefer,
carnil, linux-fsdevel, stable
On Tue, Jan 6, 2026 at 1:34 AM Jan Kara <jack@suse.cz> wrote:
>
Hi Jan,
> [Thanks to Andrew for CCing me on patch commit]
Sorry, I didn't mean to exclude you. I hadn't realized the
fs-writeback.c file had maintainers/reviewers listed for it. I'll make
sure to cc you next time.
>
> On Sun 14-12-25 19:00:43, Joanne Koong wrote:
> > Skip waiting on writeback for inodes that belong to mappings that do not
> > have data integrity guarantees (denoted by the AS_NO_DATA_INTEGRITY
> > mapping flag).
> >
> > This restores fuse back to prior behavior where syncs are no-ops. This
> > is needed because otherwise, if a system is running a faulty fuse
> > server that does not reply to issued write requests, this will cause
> > wait_sb_inodes() to wait forever.
> >
> > Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree")
> > Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
> > Reported-by: J. Neuschäfer <j.neuschaefer@gmx.net>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>
> OK, but the difference 0c58a97f919c introduced goes much further than just
> wait_sb_inodes(). Before 0c58a97f919c also filemap_fdatawait() (and all the
> other variants waiting for folio_writeback() to clear) returned immediately
> because folio writeback was done as soon as we've copied the content into
> the temporary page. Now they will block waiting for the server to finish
> the IO. So e.g. fsync() will block waiting for the server in
> file_write_and_wait_range() now, instead of blocking in fuse_fsync_common()
> -> fuse_simple_request(). Similarly e.g. truncate(2) will now block waiting
> for the server so that folio_writeback can be cleared.
>
> So I understand your patch fixes the regression with suspend blocking but I
> don't have a high confidence we are not just starting a whack-a-mole game
> catching all the places that previously hiddenly depended on
> folio_writeback getting cleared without any involvement of untrusted fuse
> server and now this changed. So do we have some higher-level idea what is /
> is not guaranteed with stuck fuse server?
The implications of 0c58a97f919c (eg clearing folio writeback only
when the server has completed writeback instead of clearing writeback
and returning immediately) had some analysis and discussion in this
prior thread [1]. Copying/pasting a snippet from the cover letter:
"With removing the temp page, writeback state is now only cleared on the dirty
page after the server has written it back to disk. This may take an
indeterminate amount of time. As well, there is also the possibility of
malicious or well-intentioned but buggy servers where writeback may in the
worst case scenario, never complete. This means that any
folio_wait_writeback() on a dirty page belonging to a FUSE filesystem needs to
be carefully audited.
In particular, these are the cases that need to be accounted for:
* potentially deadlocking in reclaim, as mentioned above
* potentially stalling sync(2)
* potentially stalling page migration / compaction
This patchset adds a new mapping flag, AS_WRITEBACK_INDETERMINATE, which
filesystems may set on its inode mappings to indicate that writeback
operations may take an indeterminate amount of time to complete. FUSE will set
this flag on its mappings. This patchset adds checks to the critical parts of
reclaim, sync, and page migration logic where writeback may be waited on.
Please note the following:
* For sync(2), waiting on writeback will be skipped for FUSE, but this has no
effect on existing behavior. Dirty FUSE pages are already not guaranteed to
be written to disk by the time sync(2) returns (eg writeback is cleared on
the dirty page but the server may not have written out the temp page to disk
yet). If the caller wishes to ensure the data has actually been synced to
disk, they should use fsync(2)/fdatasync(2) instead.
* AS_WRITEBACK_INDETERMINATE does not indicate that the folios should never be
waited on when in writeback. There are some cases where the wait is
desirable. For example, for the sync_file_range() syscall, it is fine to
wait on the writeback since the caller passes in a fd for the operation."
That was from v6 of the patchset and some things were changed between
that and the final version landed in v8 [2] (most notably, changing
AS_WRITEBACK_INDETERMINATE to AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM and
dropping the sync + page migration skips), but I think that analysis
of what cases need to be accounted for / audited remains the same. I
don't think there are any places beyond those 3 listed above that have
a core intrinsic dependency on folio writeback being cleared cleanly
(eg without any involvement of an untrusted fuse server).
For the fsync() and truncate() examples you mentioned, I don't think
it's an issue that these now wait for the server to finish the I/O and
hang if the server doesn't. I think it's actually more correct
behavior than what we had with temp pages, eg imo these actually ought
to wait for the writeback to have been completed by the server. If the
server is malicious / buggy and fsync/truncate hangs, I think that's
fine given that fsync/truncate is initiated by the user on a specific
file descriptor (as opposed to the generic sync()) (and imo it should
hang if it can't actually be executed correctly because the server is
malfunctioning).
As for why this sync user regression has surfaced and now needs to be
addressed, I don't think it's because there's a whack-a-mole game
where we're ad-hoc having to patch up places we didn't realize could
be broken by folio writeback potentially hanging. The original
patchset [1] contained patches that addressed the sync and compaction
case (eg maintaining the original behavior that the temp pages had),
so I don't think this is something that was missed. These patches were
dropped because in the discussion in [1], they seemed pointless to
mitigate / guard against when there already exists other ways
migration/sync could be stalled by a malicious/buggy fuse server. What
I missed was that it's more common than I had thought for
well-intentioned servers to not correctly implement writeback
handling, and that even if it's userspace's "fault", it's still
considered a kernel regression if buggy code previously sufficed but
now doesn't.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@gmail.com/T/#u
[2] https://lore.kernel.org/linux-fsdevel/CAJfpegveOFoL-XzDKQZZ4U6UF_AetNwTUDbfmf7rdJasRFm3xA@mail.gmail.com/T/#m56255519bf9af421ae07014208ccd68a96e72d52
>
> Honza
>
> > ---
> > fs/fs-writeback.c | 3 ++-
> > fs/fuse/file.c | 4 +++-
> > include/linux/pagemap.h | 11 +++++++++++
> > 3 files changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 6800886c4d10..ab2e279ed3c2 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -2751,7 +2751,8 @@ static void wait_sb_inodes(struct super_block *sb)
> > * do not have the mapping lock. Skip it here, wb completion
> > * will remove it.
> > */
> > - if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK))
> > + if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) ||
> > + mapping_no_data_integrity(mapping))
> > continue;
> >
> > spin_unlock_irq(&sb->s_inode_wblist_lock);
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 01bc894e9c2b..3b2a171e652f 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -3200,8 +3200,10 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
> >
> > inode->i_fop = &fuse_file_operations;
> > inode->i_data.a_ops = &fuse_file_aops;
> > - if (fc->writeback_cache)
> > + if (fc->writeback_cache) {
> > mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data);
> > + mapping_set_no_data_integrity(&inode->i_data);
> > + }
> >
> > INIT_LIST_HEAD(&fi->write_files);
> > INIT_LIST_HEAD(&fi->queued_writes);
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 31a848485ad9..ec442af3f886 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -210,6 +210,7 @@ enum mapping_flags {
> > AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
> > AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't
> > account usage to user cgroups */
> > + AS_NO_DATA_INTEGRITY = 11, /* no data integrity guarantees */
> > /* Bits 16-25 are used for FOLIO_ORDER */
> > AS_FOLIO_ORDER_BITS = 5,
> > AS_FOLIO_ORDER_MIN = 16,
> > @@ -345,6 +346,16 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct addres
> > return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);
> > }
> >
> > +static inline void mapping_set_no_data_integrity(struct address_space *mapping)
> > +{
> > + set_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> > +}
> > +
> > +static inline bool mapping_no_data_integrity(const struct address_space *mapping)
> > +{
> > + return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);
> > +}
> > +
> > static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)
> > {
> > return mapping->gfp_mask;
> > --
> > 2.47.3
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-01-06 23:30 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-15 3:00 [PATCH v2 0/1] fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes() Joanne Koong
2025-12-15 3:00 ` [PATCH v2 1/1] " Joanne Koong
2025-12-15 17:09 ` Bernd Schubert
2025-12-16 7:07 ` Joanne Koong
2025-12-16 18:13 ` J. Neuschäfer
2026-01-02 17:42 ` Joanne Koong
2026-01-03 18:03 ` Andrew Morton
2026-01-04 18:54 ` David Hildenbrand (Red Hat)
2026-01-05 19:55 ` Joanne Koong
2026-01-06 9:33 ` Jan Kara
2026-01-06 10:05 ` David Hildenbrand (Red Hat)
2026-01-06 13:13 ` Miklos Szeredi
2026-01-06 13:55 ` Jan Kara
2026-01-06 14:33 ` David Hildenbrand (Red Hat)
2026-01-06 15:21 ` Miklos Szeredi
2026-01-06 15:41 ` David Hildenbrand (Red Hat)
2026-01-06 16:05 ` Miklos Szeredi
2026-01-06 17:54 ` David Hildenbrand (Red Hat)
2026-01-06 23:30 ` Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox