Re: [syzbot] [btrfs?] kernel BUG in __folio_start_writeback

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Qu Wenruo <wqu@suse.com>
To: Matthew Wilcox <willy@infradead.org>,
	syzbot <syzbot+aac7bff85be224de5156@syzkaller.appspotmail.com>
Cc: akpm@linux-foundation.org, clm@fb.com, dsterba@suse.com,
	josef@toxicpanda.com, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [btrfs?] kernel BUG in __folio_start_writeback
Date: Mon, 25 Nov 2024 11:00:40 +1030	[thread overview]
Message-ID: <b57b3d18-7a70-4efa-a356-809c6ab29c02@suse.com> (raw)
In-Reply-To: <Z0OaHcMWcRtohZfz@casper.infradead.org>



在 2024/11/25 07:56, Matthew Wilcox 写道:
> On Sun, Nov 24, 2024 at 05:45:18AM -0800, syzbot wrote:
>>
>>   __fput+0x5ba/0xa50 fs/file_table.c:458
>>   task_work_run+0x24f/0x310 kernel/task_work.c:239
>>   resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
>>   exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
>>   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>   syscall_exit_to_user_mode+0x13f/0x340 kernel/entry/common.c:218
>>   do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
>>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> This is:
> 
>          VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
> 
> ie we've called __folio_start_writeback() on a folio which is already
> under writeback.
> 
> Higher up in the trace, we have the useful information:
> 
>   page: refcount:6 mapcount:0 mapping:ffff888077139710 index:0x3 pfn:0x72ae5
>   memcg:ffff888140adc000
>   aops:btrfs_aops ino:105 dentry name(?):"file2"
>   flags: 0xfff000000040ab(locked|waiters|uptodate|lru|private|writeback|node=0|zone=1|lastcpupid=0x7ff)
>   raw: 00fff000000040ab ffffea0001c8f408 ffffea0000939708 ffff888077139710
>   raw: 0000000000000003 0000000000000001 00000006ffffffff ffff888140adc000
>   page dumped because: VM_BUG_ON_FOLIO(folio_test_writeback(folio))
>   page_owner tracks the page as allocated
> 
> The interesting part of the page_owner stacktrace is:
> 
>    filemap_alloc_folio_noprof+0xdf/0x500
>    __filemap_get_folio+0x446/0xbd0
>    prepare_one_folio+0xb6/0xa20
>    btrfs_buffered_write+0x6bd/0x1150
>    btrfs_direct_write+0x52d/0xa30
>    btrfs_do_write_iter+0x2a0/0x760
>    do_iter_readv_writev+0x600/0x880
>    vfs_writev+0x376/0xba0
> 
> (ie not very interesting)
> 
>> Workqueue: btrfs-delalloc btrfs_work_helper
>> RIP: 0010:__folio_start_writeback+0xc06/0x1050 mm/page-writeback.c:3119
>> Call Trace:
>>   <TASK>
>>   process_one_folio fs/btrfs/extent_io.c:187 [inline]
>>   __process_folios_contig+0x31c/0x540 fs/btrfs/extent_io.c:216
>>   submit_one_async_extent fs/btrfs/inode.c:1229 [inline]
>>   submit_compressed_extents+0xdb3/0x16e0 fs/btrfs/inode.c:1632
>>   run_ordered_work fs/btrfs/async-thread.c:245 [inline]
>>   btrfs_work_helper+0x56b/0xc50 fs/btrfs/async-thread.c:324
>>   process_one_work kernel/workqueue.c:3229 [inline]
> 
> This looks like a race?
> 
> process_one_folio() calls
> btrfs_folio_clamp_set_writeback calls
> btrfs_subpage_set_writeback:
> 
>          spin_lock_irqsave(&subpage->lock, flags);
>          bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits)
> ;
>          if (!folio_test_writeback(folio))
>                  folio_start_writeback(folio);
>          spin_unlock_irqrestore(&subpage->lock, flags);
> 
> so somebody else set writeback after we tested for writeback here.

The test VM is using X86_64, thus we won't go into the subpage routine, 
but directly call folio_start_writeback().

> 
> One thing that comes to mind is that _usually_ we take folio_lock()
> first, then start writeback, then call folio_unlock() and btrfs isn't
> doing that here (afaict).  Maybe that's not the source of the bug?

We still hold the folio locked, do submission then unlock.

You can check extent_writepage(), where at the entrance we check if the 
folio is still locked.
Then inside extent_writepage_io() we do the submission, setting the 
folio writeback inside submit_one_sector().
Eventually unlock the folio at the end of extent_writepage(), that's for 
the uncompressed writes.

There are a lot of special handling for async submission (compression), 
but it  still holds the folio locked, do compression and submission, and 
unlock, just all in another thread (this case).

So it looks like something is wrong when transferring the ownership of 
the page cache folios to the compression path, or some not properly 
handled error path.

Unfortunately I'm not really able to reproduce the case using the 
reproducer...

Thanks,
Qu



> 
> If it is, should we have a VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio)
> in __folio_start_writeback()?  Or is there somewhere that can't lock the
> folio before starting writeback?
>

next prev parent reply	other threads:[~2024-11-25  0:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-24 13:45 syzbot
2024-11-24 21:26 ` Matthew Wilcox
2024-11-25  0:30   ` Qu Wenruo [this message]
2024-11-25 10:44     ` Aleksandr Nogikh
2024-11-26  8:43       ` Qu Wenruo
2024-11-26  6:42 ` Qu Wenruo
2024-11-26  7:35   ` syzbot
2024-11-28 18:56 ` syzbot
2024-11-28 21:26   ` Qu Wenruo
2024-11-29 21:17 ` Qu Wenruo
2024-11-30  1:51   ` syzbot
2024-11-30  4:27     ` Qu Wenruo
2024-11-30  6:36 ` Qu Wenruo
2024-11-30  7:01   ` syzbot
2025-01-23  5:06 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b57b3d18-7a70-4efa-a356-809c6ab29c02@suse.com \
    --to=wqu@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=syzbot+aac7bff85be224de5156@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox