From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15DC9E6689E for ; Mon, 25 Nov 2024 10:45:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 984E96B0085; Mon, 25 Nov 2024 05:45:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90E136B0088; Mon, 25 Nov 2024 05:45:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AF106B0089; Mon, 25 Nov 2024 05:45:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 591976B0085 for ; Mon, 25 Nov 2024 05:45:03 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D1254160523 for ; Mon, 25 Nov 2024 10:45:02 +0000 (UTC) X-FDA: 82824284634.20.9D8BDC5 Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by imf10.hostedemail.com (Postfix) with ESMTP id 9F29DC0003 for ; Mon, 25 Nov 2024 10:44:59 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eAG59cRB; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of nogikh@google.com designates 209.85.167.169 as permitted sender) smtp.mailfrom=nogikh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732531499; a=rsa-sha256; cv=none; b=cv+Kq8r8OHOjwG0Y4Vkph3Le2Uuws/qTjQCxKJlCS+ybzG6cmNjPFM5nXGZWcISkUtZ4Ou PZzkryJdkPE2XvTmQQifHcWIeD8wV/DbOKy2b9qLF0DithsDiak73r7FJoKR38BgUa8ACI qhqyRJzaX+ZbHZzVjS+zNxklzx3tYjw= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eAG59cRB; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of nogikh@google.com designates 209.85.167.169 as permitted sender) smtp.mailfrom=nogikh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732531499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o2ywDb4pqcHbTILd3FflHBb9fI4CP4o/IDmngu0zY28=; b=VVVpbeRh1roErwjPTj3dhyZBuNl6GGYLfspI/8lXVndVb/590B/pNw5gmYZncHKcgol7Xi 4Z3JHCNZBEeJspuql40j0XBRoDiMtYZ1YxoivvYho0Bw4iHtxbZPY1AZu/6c6CGOh925Xf O3r6w2OAVY24kARiKfw9okwNhP6OvZI= Received: by mail-oi1-f169.google.com with SMTP id 5614622812f47-3e600add5dcso2069311b6e.2 for ; Mon, 25 Nov 2024 02:45:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732531500; x=1733136300; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=o2ywDb4pqcHbTILd3FflHBb9fI4CP4o/IDmngu0zY28=; b=eAG59cRBBUSm4Pw755aD/e709BCH5s/JWm7fvvAXad92j52dn33sLrJPxvAtsCXbm8 cAo7W1rHx8uJMFgHt+7QOkZtyWlJZk6rcwPip3BHTEMPH8ALGKGeQ8uB1ImY2ENmOUnP RgJGq3uX1O8SeK07TNqGu7pzL6c/kIZ87uJ0g2IqnUbDJwDcKG35rnmXinXgRz2M+A47 f9QsQ//lWL1AzBoYvXWNsoOfIbS55DTEzpQdTV0F8C+73y3tBdRoZ82c5Pu73YQs2mq6 pwM8UqBbrrn8chgA89mfDaa0jWXhIM4I0/ZEhR7Hv9ai35ME9dgSF6fCXG1KdMzpWwfb Ygtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732531500; x=1733136300; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o2ywDb4pqcHbTILd3FflHBb9fI4CP4o/IDmngu0zY28=; b=QQoG0r5jqvN3vKrgDqOTOWpgnwdyfS2kzQYNfZ8Uwg52D4sUDvr1ZaAz5Eg0NtzXid mlHl9e/EYn0SNLWd4Uhm9BCCBVowQJzKqWn4uySuV2R6XktAKylpNePz8SSZ8tkR7fTR EGz4w9JGlEf+clOX2nIADLDXkqQ9px2/mPwEg7jfC+/YCBL2NILhvu7osO6r59x8fMoF APNksBDCjOyOoUUQx+AVAuKsRZu8cBq7kVtDNnrzWSf12ss9QQB8AIzh1ieopNdOwc/c 4S0n3+Y1LAA7yEVWjqD/DBWO9QQSvzqfYEijgjrDMczqzCM6j5BpXhCkky6B1O/pxFLt tc8Q== X-Forwarded-Encrypted: i=1; AJvYcCWZmKIVWawCV1ow6g6wH0zD30qjc/yh6M2pzQUdkDqqCcb3WtBDPR2tNgu0E5xEErK8jJdagCvWJQ==@kvack.org X-Gm-Message-State: AOJu0Yx+WrcqVPY8HDnE96PIFF5f+3BFEzzAeSVix+MLtljCDJ6VM2Zp rUgZnYHYDaZ5wG3q788wPJPrIJejIhBiJLLzLTwgqTRe0+SO7esJnjW3+YR8f0OVuyr4J7EFje8 Zq/3i8drKXcXXxlcabR92dnqV/W87Z/JPVoSc X-Gm-Gg: ASbGncvyzHmMxx0jZ2uyNYYkkQsaErno3/02UJD6ogGWpTFVf8DI7fuvyCg5EO7ldQq moZ7dEibK4O56gK5uJ8ZB5iE8kkWOgkmyWuEG76+joOqXSSkwg7VcHkKEXaiReQ== X-Google-Smtp-Source: AGHT+IH8TwjCvQn3Y6hEpYBfkn+q5l9OPxBB2DzQD1EFl2WttTCP0fZZCnC5hlT6wr5ifs2eGd9Q1HKmy9NrK+jFSMo= X-Received: by 2002:a05:6808:1882:b0:3e6:1057:21af with SMTP id 5614622812f47-3e915af0592mr8691789b6e.41.1732531499790; Mon, 25 Nov 2024 02:44:59 -0800 (PST) MIME-Version: 1.0 References: <67432dee.050a0220.1cc393.0041.GAE@google.com> In-Reply-To: From: Aleksandr Nogikh Date: Mon, 25 Nov 2024 11:44:48 +0100 Message-ID: Subject: Re: [syzbot] [btrfs?] kernel BUG in __folio_start_writeback To: Qu Wenruo Cc: Matthew Wilcox , syzbot , akpm@linux-foundation.org, clm@fb.com, dsterba@suse.com, josef@toxicpanda.com, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 9F29DC0003 X-Rspamd-Server: rspam11 X-Stat-Signature: wcd95hi88ne3b4q6kgw1c36j5xtw5kjr X-HE-Tag: 1732531499-246843 X-HE-Meta: U2FsdGVkX19C5e4CICR9hobej7SqIUhVGHyMKIn6H7NWvEoCSQj7f1JCc08Z7s7Td4aA9EOARwnyUeRnNHGhggaDMnKfqUormCZrsv7YOKkOy8XJX+tVF/fRy2GUuSRDB+7GAfktwjvACi/ISpNpPoUSvWOgybGEQ+OT2ChSGgk9FI8TxyuCFrIuNUhpPzc81XFZtWlWukj1O+8cS3Ks+VoUXDgjd/uRMgs8yy2Sh3FQrAyc0egvVhd1W7w8PLEhDwqh4qFHwKZ59RidA2MHmFFKQ14Zo0nG49K6Gq3rrA0fGC9JEHioTdeNCUj8MU6z17e4/8ATk4tanKOophjBh6/FTqObaHVTIzVLN41aI+OZjD/GR+hOd+q48XqfFawENIhj4oCF1JxL2jewNfGWr0nXCZtceyR9Ag0two9YWNMA2W+q1ao0S19teGibh79XpYYlWNwZ8EdaWxYeLipM/9qP4/3f2pLCLDnKgdDy7xJmSBjvGOP/BPW3iU5fqtLqERt5QqFtfUJe6ytGtmhpujYaHBF6UqgYRICfxpnRJtURjEQwfatFEr0dP14FWIY5Ctyw5U/t/rO2ti66Cdh0G/XwDfA8l/lCuGp0rxfhe2KGM96NtrHNc3jF3QE4rUEIZo3oxOx3IVvGEo8sOy+qeFG6MNqEpyGVGszsYlvjmAr3G9G7C7XlwBnjlcvBu55TepryEVBEc6W4nQvScase/qOyPLFHAr31umRCBokd3cwA61J46UwTqsea9hnVt+Ijp8qAmfReIKTtvCbLcIuAraG7YsxoWrhggySTz8YLrxvf5es5v4f6FsXr61dsSlDgtp5zLMQRBJs3LGYmfHThp+4x1pEHirPH3T40Yy/K6P0ExxqeKrpVY8FMMTR8FQf5boPB+1F3uxQSGDVuAStnQabyTLmRlzr75MQAI5hEvhA9E0naKAXkRdhlXUETnbP96faT1rvrXu9sLNtXIs9 +m3Xos1D jM99tf28sZPQm8u+drxzEbtXQ/xAnLD6oj6gQbcZG5Bxq5nSZZi9FUf9ZV71IZ1dkEX2WfTHCObzNcCdWXfs2u3a5My0Xni2R+Upq3mFD08tIInunhGoedvta8SmB3t1rTY4yZ5JEi5go7fopf7niJSe+pM2y1qHrdqTCjWv0umHp7xanSrrxP6XRew8+N/q8xX0Df4B60uVhTw1jsbnBQzWn+lvbxZs7z/9e9BRgw+IeUPe3abN+lWPrPrQhhtQkB2RsBkMXEeV6VUv3o3tdtvhoXGaRBOdf5ZYej3o5mT0PKxRa8h2U292JJMRHe9qaLWN1O4ibYpUhOdguy4BgA0mmajJQlUfCzLuV+uJhVUSJ9BBa1CroXrbhd1IN/bekgFiNyZ75BOArOxiWuRCSedHC8uKhoRzYbEKZMh+R+2KmhPev+W12nsKbNXuCx2Ks+LZXzU3y9LMLCDB6uAxsA5KC7w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 25, 2024 at 1:30=E2=80=AFAM 'Qu Wenruo' via syzkaller-bugs wrote: > > > > =E5=9C=A8 2024/11/25 07:56, Matthew Wilcox =E5=86=99=E9=81=93: > > On Sun, Nov 24, 2024 at 05:45:18AM -0800, syzbot wrote: > >> > >> __fput+0x5ba/0xa50 fs/file_table.c:458 > >> task_work_run+0x24f/0x310 kernel/task_work.c:239 > >> resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] > >> exit_to_user_mode_loop kernel/entry/common.c:114 [inline] > >> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > >> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > >> syscall_exit_to_user_mode+0x13f/0x340 kernel/entry/common.c:218 > >> do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 > >> entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > This is: > > > > VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); > > > > ie we've called __folio_start_writeback() on a folio which is already > > under writeback. > > > > Higher up in the trace, we have the useful information: > > > > page: refcount:6 mapcount:0 mapping:ffff888077139710 index:0x3 pfn:0x= 72ae5 > > memcg:ffff888140adc000 > > aops:btrfs_aops ino:105 dentry name(?):"file2" > > flags: 0xfff000000040ab(locked|waiters|uptodate|lru|private|writeback= |node=3D0|zone=3D1|lastcpupid=3D0x7ff) > > raw: 00fff000000040ab ffffea0001c8f408 ffffea0000939708 ffff888077139= 710 > > raw: 0000000000000003 0000000000000001 00000006ffffffff ffff888140adc= 000 > > page dumped because: VM_BUG_ON_FOLIO(folio_test_writeback(folio)) > > page_owner tracks the page as allocated > > > > The interesting part of the page_owner stacktrace is: > > > > filemap_alloc_folio_noprof+0xdf/0x500 > > __filemap_get_folio+0x446/0xbd0 > > prepare_one_folio+0xb6/0xa20 > > btrfs_buffered_write+0x6bd/0x1150 > > btrfs_direct_write+0x52d/0xa30 > > btrfs_do_write_iter+0x2a0/0x760 > > do_iter_readv_writev+0x600/0x880 > > vfs_writev+0x376/0xba0 > > > > (ie not very interesting) > > > >> Workqueue: btrfs-delalloc btrfs_work_helper > >> RIP: 0010:__folio_start_writeback+0xc06/0x1050 mm/page-writeback.c:311= 9 > >> Call Trace: > >> > >> process_one_folio fs/btrfs/extent_io.c:187 [inline] > >> __process_folios_contig+0x31c/0x540 fs/btrfs/extent_io.c:216 > >> submit_one_async_extent fs/btrfs/inode.c:1229 [inline] > >> submit_compressed_extents+0xdb3/0x16e0 fs/btrfs/inode.c:1632 > >> run_ordered_work fs/btrfs/async-thread.c:245 [inline] > >> btrfs_work_helper+0x56b/0xc50 fs/btrfs/async-thread.c:324 > >> process_one_work kernel/workqueue.c:3229 [inline] > > > > This looks like a race? > > > > process_one_folio() calls > > btrfs_folio_clamp_set_writeback calls > > btrfs_subpage_set_writeback: > > > > spin_lock_irqsave(&subpage->lock, flags); > > bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sector= size_bits) > > ; > > if (!folio_test_writeback(folio)) > > folio_start_writeback(folio); > > spin_unlock_irqrestore(&subpage->lock, flags); > > > > so somebody else set writeback after we tested for writeback here. > > The test VM is using X86_64, thus we won't go into the subpage routine, > but directly call folio_start_writeback(). > > > > > One thing that comes to mind is that _usually_ we take folio_lock() > > first, then start writeback, then call folio_unlock() and btrfs isn't > > doing that here (afaict). Maybe that's not the source of the bug? > > We still hold the folio locked, do submission then unlock. > > You can check extent_writepage(), where at the entrance we check if the > folio is still locked. > Then inside extent_writepage_io() we do the submission, setting the > folio writeback inside submit_one_sector(). > Eventually unlock the folio at the end of extent_writepage(), that's for > the uncompressed writes. > > There are a lot of special handling for async submission (compression), > but it still holds the folio locked, do compression and submission, and > unlock, just all in another thread (this case). > > So it looks like something is wrong when transferring the ownership of > the page cache folios to the compression path, or some not properly > handled error path. > > Unfortunately I'm not really able to reproduce the case using the > reproducer... I've just tried to reproduce locally using the downloadable assets and the kernel crashed ~ after 1 minute of running the attached C repro. [ 87.616440][ T9044] ------------[ cut here ]------------ [ 87.617126][ T9044] kernel BUG at mm/page-writeback.c:3119! [ 87.619308][ T9044] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PT= I [ 87.620174][ T9044] CPU: 1 UID: 0 PID: 9044 Comm: kworker/u10:6 Not tainted 6.12.0-syzkaller-08446-g228a1157fb9f #0 Here are the instructions I followed: https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md#run-a= -c-reproducer --=20 Aleksandr > > Thanks, > Qu > > > > > > > If it is, should we have a VM_BUG_ON_FOLIO(!folio_test_locked(folio), f= olio) > > in __folio_start_writeback()? Or is there somewhere that can't lock th= e > > folio before starting writeback? > > >