From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B9FDCEDDBD for ; Fri, 11 Oct 2024 07:27:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2795D6B00A3; Fri, 11 Oct 2024 03:27:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2297A6B00A4; Fri, 11 Oct 2024 03:27:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C9516B00A5; Fri, 11 Oct 2024 03:27:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D8F7A6B00A3 for ; Fri, 11 Oct 2024 03:27:37 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3254BAC6AE for ; Fri, 11 Oct 2024 07:27:29 +0000 (UTC) X-FDA: 82660491354.28.63960E7 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf25.hostedemail.com (Postfix) with ESMTP id 68FD7A0013 for ; Fri, 11 Oct 2024 07:27:34 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=mO4khwbU; dmarc=pass (policy=reject) header.from=flyingcircus.io; spf=pass (imf25.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728631627; a=rsa-sha256; cv=none; b=fGb824L/6ReLLfuvBXD71HDQjhWKTEnkXUuqrIYglIh9tC52wYhYKP8yMCoMGDgE7OEI9s iZt7xs+jjx/iX6p3Iw92t5mATJ2ogNZLI4ORxoDGbiRxh1T6IDG/e934rTOHsJwS9dY8zj qgNCVQTuBkFlLyQXZW+Mfeb/LpzBFWc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=mO4khwbU; dmarc=pass (policy=reject) header.from=flyingcircus.io; spf=pass (imf25.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728631627; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ht6MJDY3Fy1CYRDV0JIK9gBKG7L45D1Bq8tOJWehI3s=; b=rkz+TWe+wv0pjdX0aVXJJ1R7FNzqvu8HpYHz0/nAg1w1WGbg54GcfQiVQecpVZ/z8f/AOt 2DGWkejyNS7dMsyZdG/qHJU8RSzQzCLlbAsQz/Xga9er5X5OXZBWMbJMAmsj1W9G7HX8Ee JMpWC8tM+mfX/5TMXkdlWzDdoY+hRfE= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1728631651; bh=ht6MJDY3Fy1CYRDV0JIK9gBKG7L45D1Bq8tOJWehI3s=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=mO4khwbUxEnu4l2dyNWJXue8da8L8Yn/aSjtiP2qwAm8zB34elmAzZx7DsyisDY6O KXA6p4A5My7kw+7+hfnDGO5qaVi+ZoZe2k45KtnF9mvzRhNwpuiBCFC5iIvQ7ghj7o a8yoEMynJU15/73u0XbsH3a8wt0as9J41bfh+7uM= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3818.100.11.1.3\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: Date: Fri, 11 Oct 2024 09:27:08 +0200 Cc: Linus Torvalds , Dave Chinner , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: <381863DE-17A7-4D4E-8F28-0F18A4CEFC31@flyingcircus.io> References: <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> To: Chris Mason X-Rspam-User: X-Stat-Signature: knk5q8w34rtdqtoqpjzjgcgqadrmcx7d X-Rspamd-Queue-Id: 68FD7A0013 X-Rspamd-Server: rspam02 X-HE-Tag: 1728631654-152416 X-HE-Meta: U2FsdGVkX18DB2qP8vS8HGq3QYQ9qzzvKn413QTJn2gneZY7b1UqCrkxpYKb4eYIIdyZXAgiXtt9/KptcSfhh4jT3J4/VyTiozGFzENmqhXMTf69+0x3zeTu9Zn8bzJX55VTJNs7Agc9joK+oCNV5/0/+ID6CjGA4OTJdHeL1t8e5NkiJZLSsnIGJFiN1Pz6DVHOTBZW4nDcU86LWNuYYRkinqI3jbiGuOSbRjiXr+z5W9/l+JleS+dKiHZsZncBU5s9CJg+VUSO/SO6QlMjaClh51qiMLf/OGOV2F8hCqBY+vmXjiLBNcTJDaUCIRp5dzom856CokzuhsY4opB+70kBngJpmnyhbDnLyWwxG5WcTrEY5Rg2Dic8L7Aef9DVM6tIWBIp5LE+Aj917Ks7epddS8ztC/ZsIDDMbUyPC8bv3Jrz1OwcVJNaG6TqyDhFcWCe6CTZ2QZP4ioy7PInlWF2SrxTOXRAu9IWGm09v4zqrk7SJDa9TdUTKp7WOGKBDPCfza0FdCv2qWryacY8LM+40Cj2VtI/DHd0u0kmrl1heyZzqfzfz3J4z7LFzlK1yexdPAyiU3HBhBvrgKfQDB7qqaUccMRPcNmIn8i3weFscL64TrFuwn98+J9jSNlh/p7JAKQPoYu/soUjgQNIp0TZuYni+J99LCC6eGkxolJQdvSpcb80YBGkw77GIRIo1vmzOSq9wr3QdVlxpmSou10uCr3zR10RnQ+KX+y1OT1TafNRx6aMHlHHAgqL9btiPaCUAXoVDelXI/gbna9zPn/66Td327FVB/vH5Lgxvm9JJAKQqPOeiC996i5RI98ZTZ6Dz0hPO1hRZjfyLXBYL2RXrlBPZ+EUmafGPcxIYRjrDV8d9erGu/V+HxX5E52CzQ8JHOkpTA0mf5bWxcYb0OOJnhY34BN4scbC/EEbe+cWb+MPuv0xZjZplNVcWLBYy9G/ZgGnqCtsFkm2SoV bwzrjpR1 mepdroyyZTwQWKPfSTycSnY1jU8kg2nBr46kq99YXwWiT/3/5SlC3FaMuEwrwzumzw/0VwmaIQWnh/7JNfwdIezvcTtbV9jLlan2aTGYLDlz/7h7asu44Dafi4WZME7R99vl3yWMvWOZRpaWBq26PdKjisrx1bNec40paQW4RoukdNEgxgovcgBtC+cWXHnE1gNrEyUmz4YhLn6/p3+IdqPrnxRfz9JptixxRoUvdRaZDtF4hRUA8Ugyhi+yJn6gb4VpFDzbWgkezYimQkGqH8L7k88wZazv8y+vc/dzdTuYIDUo2hfyt4hE501SqgEGUnRPv9KTXHw2NxQ8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, > On 10. Oct 2024, at 08:29, Christian Theune = wrote: >=20 >=20 >> On 1. Oct 2024, at 02:56, Chris Mason wrote: >>=20 >> Not disagreeing with Linus at all, but given that you've got IO >> throttling too, we might really just be waiting. It's hard to tell >> because the hung task timeouts only give you information about one = process. >>=20 >> I've attached a minimal version of a script we use here to show all = the >> D state processes, it might help explain things. The only problem is >> you have to actually ssh to the box and run it when you're stuck. >>=20 >> The idea is to print the stack trace of every D state process, and = then >> also print out how often each unique stack trace shows up. When = we're >> deadlocked on something, there are normally a bunch of the same stack >> (say waiting on writeback) and then one jerk sitting around in a >> different stack who is causing all the trouble. >=20 > I think I should be able to trigger this. I=E2=80=99ve seen around a = 100 of those issues over the last week and the chance of it happening = correlates with a certain workload that should be easy to trigger. Also, = the condition remains for at around 5 minutes, so I should be able to = trace it when I see the alert in an interactive session. >=20 > I=E2=80=99ve verified I can run your script and I=E2=80=99ll get back = to you in the next days. I wasn=E2=80=99t able to create a reproducer after all so I=E2=80=99ve = set up alerting. I just caught one right away, but it unblocked quickly after I logged = in: The original message that triggered the alert was: [Oct11 09:18] INFO: task nix-build:157920 blocked for more than 122 = seconds. [ +0.000937] Not tainted 6.11.0 #1-NixOS [ +0.000540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" = disables this message. [ +0.000902] task:nix-build state:D stack:0 pid:157920 = tgid:157920 ppid:157919 flags:0x00000002 [ +0.001098] Call Trace: [ +0.000306] [ +0.000279] __schedule+0x3a3/0x1300 [ +0.000478] schedule+0x27/0xf0 [ +0.000392] io_schedule+0x46/0x70 [ +0.000436] folio_wait_bit_common+0x13f/0x340 [ +0.000572] ? __pfx_wake_page_function+0x10/0x10 [ +0.000592] folio_wait_writeback+0x2b/0x80 [ +0.000466] truncate_inode_partial_folio+0x5e/0x1b0 [ +0.000586] truncate_inode_pages_range+0x1de/0x400 [ +0.000595] evict+0x29f/0x2c0 [ +0.000396] ? iput+0x6e/0x230 [ +0.000408] ? _atomic_dec_and_lock+0x39/0x50 [ +0.000542] do_unlinkat+0x2de/0x330 [ +0.000402] __x64_sys_unlink+0x3f/0x70 [ +0.000419] do_syscall_64+0xb7/0x200 [ +0.000407] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ +0.000556] RIP: 0033:0x7f2bb5d1056b [ +0.000473] RSP: 002b:00007ffc013c8588 EFLAGS: 00000206 ORIG_RAX: = 0000000000000057 [ +0.000942] RAX: ffffffffffffffda RBX: 000055963c267500 RCX: = 00007f2bb5d1056b [ +0.000859] RDX: 0000000000000000 RSI: 0000000000000000 RDI: = 000055963c268c80 [ +0.000800] RBP: 000055963c267690 R08: 0000000000016020 R09: = 0000000000000000 [ +0.000977] R10: 00000000000000f0 R11: 0000000000000206 R12: = 00007ffc013c85c8 [ +0.000826] R13: 00007ffc013c85ac R14: 00007ffc013c8ed0 R15: = 00005596441e42b0 [ +0.000833] Then after logging in I caught it once with walker.py - this was about a = minute after the alert triggered I think. I=E2=80=99ll add timestamps to = walker.py in the next instances: 157920 nix-build D [<0>] folio_wait_bit_common+0x13f/0x340 [<0>] folio_wait_writeback+0x2b/0x80 [<0>] truncate_inode_partial_folio+0x5e/0x1b0 [<0>] truncate_inode_pages_range+0x1de/0x400 [<0>] evict+0x29f/0x2c0 [<0>] do_unlinkat+0x2de/0x330 [<0>] __x64_sys_unlink+0x3f/0x70 [<0>] do_syscall_64+0xb7/0x200 [<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f ----- stack summary 1 hit: [<0>] folio_wait_bit_common+0x13f/0x340 [<0>] folio_wait_writeback+0x2b/0x80 [<0>] truncate_inode_partial_folio+0x5e/0x1b0 [<0>] truncate_inode_pages_range+0x1de/0x400 [<0>] evict+0x29f/0x2c0 [<0>] do_unlinkat+0x2de/0x330 [<0>] __x64_sys_unlink+0x3f/0x70 [<0>] do_syscall_64+0xb7/0x200 [<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f I tried once again after 1-2 seconds and got this: 157920 nix-build D [<0>] xlog_wait_on_iclog+0x167/0x180 [xfs] [<0>] xfs_log_force_seq+0x8d/0x150 [xfs] [<0>] xfs_file_fsync+0x195/0x2a0 [xfs] [<0>] __x64_sys_fdatasync+0x52/0x90 [<0>] do_syscall_64+0xb7/0x200 [<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f ----- stack summary 1 hit: [<0>] xlog_wait_on_iclog+0x167/0x180 [xfs] [<0>] xfs_log_force_seq+0x8d/0x150 [xfs] [<0>] xfs_file_fsync+0x195/0x2a0 [xfs] [<0>] __x64_sys_fdatasync+0x52/0x90 [<0>] do_syscall_64+0xb7/0x200 [<0>] entry_SYSCALL_64_after_hwframe+0x77/0x7f and after that the process was done and exited. The last traceback looks = unlocked already. I=E2=80=99m going to gather a few more instances during the day and will = post them as a batch later. Christian --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick