From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 926CCD0D795 for ; Fri, 11 Oct 2024 13:50:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 990656B00A0; Fri, 11 Oct 2024 09:50:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 940616B00A2; Fri, 11 Oct 2024 09:50:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 807A36B00A3; Fri, 11 Oct 2024 09:50:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 623346B00A0 for ; Fri, 11 Oct 2024 09:50:42 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B51E91A13BB for ; Fri, 11 Oct 2024 13:50:32 +0000 (UTC) X-FDA: 82661456598.15.7131051 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf11.hostedemail.com (Postfix) with ESMTP id 2BE2040011 for ; Fri, 11 Oct 2024 13:50:35 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=DRMwYd+b; spf=pass (imf11.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728654500; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KvNGlemhaNExRrqlaQX0CrDFggZF6YQd0fZVnjowWsM=; b=UETxstyrPTGGp3NA6F7yxPyIa7AQUQJhlybHxmJ/Lc5tvgZGzh06vEfawCSgjQLGAKmnIQ AFJaoj1+RHvDPMCDTdK2vg0qormedu7bO4q1kYZY0QGWbMVrhosCQZaD3b0amGzgJ/8LGp XDepuNOljwiuo9/1LZqoL0h6Sss9PtU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728654500; a=rsa-sha256; cv=none; b=bUVKutdCZlYd74uSWZjlAmBwQZ5r+zLJeGYoz2OM4Cli2TLj7848I4Dashpek3ZdOmDurc hVi1IVEExcBYrfX5xB2E7YcuYUk+0fJ7cW9cEltWGZmPG5/cHf/2MGUU91ZgnBRXUo5Q+K Isp4DnaKce/Arou/a+mLyRL/Y3idQMY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=DRMwYd+b; spf=pass (imf11.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1728654635; bh=KvNGlemhaNExRrqlaQX0CrDFggZF6YQd0fZVnjowWsM=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=DRMwYd+bbgVy+htDpLRrpNmovu27Zg0dxG7AOb97R+c4AJMEVG7JtuUnp+AolUgjv ONa2do625TUZFB7b7qLtIESnLirPx0g4oqwY6gNqwRAmMLaIiDZK4iz6TFZ2X8WAZX zEZePLfyx26unk2qsQTvoyHz51ToIUE8oIpKUzxQ= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3818.100.11.1.3\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: Date: Fri, 11 Oct 2024 15:50:12 +0200 Cc: Linus Torvalds , Dave Chinner , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: References: <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <02121707-E630-4E7E-837B-8F53B4C28721@flyingcircus.io> <381863DE-17A7-4D4E-8F28-0F18A4CEFC31@flyingcircus.io> <0A480EBE-9B4D-49CC-9A32-3526F32426E6@flyingcircus.io> To: Chris Mason X-Stat-Signature: dznazi5ybhrft86dxegirdjkcwfwhp4b X-Rspamd-Queue-Id: 2BE2040011 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1728654635-73790 X-HE-Meta: U2FsdGVkX1+0fh3mEtxETHjYrCci6LHbyP2BQY1CMEHwY/gDv1QDSO0cQPlIsoCs7RIGVdO5/LZHI2yZyuqVuEbVVifpyAFaRxAqRp8F5LeTugB3BFvLipejTcoF71Vi1Jx97679sPZn4zdhekO5TpY2gZRvF922ojipXwMK0oPeapPyRyeQLSbP4Dmanu1C0dgFxeovXXq4ucKB5VuoqhHIPALP3yErwx6+0rbp43FFci0T425ZdWLadLYk40JrIDcg9S61wDtRc632ZLywP5fNemF9sSdwPuw2zDQWE1N9TdGN2/s16j4IsXwvRz8N4LUeWGctf7dNatXzpprGkCCfxkXGEkjhzD8aXyzoG+h/Ybs8VR7XJA8ECKenjQDKj5dOnD0QYcltSKRrrUG3UIFA7HbDbqkMmhxJFe7lSP6aMFoxXCi0470jW+Yp2zcRG4PyQ5yLOldm78p478TA7Z+KsXNM+IL52FEv2HYtfbFBgjjEDvBppmWCEirq+La1swMQ2olRaOUUTTgsPvG/u0dfwo+3/1srFr7DaJ77TtV6KO2ORWoL5byyxHd00Gua39boXSVVEN0stbZ0L2ltEOnpvgVhcseiGgQZ/tqAf7so0ZDF7g+eEnSGojnfVQW+xCgokQfYuoSFgbd7Mf+Y0XxwnrRhx0BlTeEjeUmrVo8zYEsaa3drPgZNuurS0PvK3FtIA0W0CWq9b28EEt+hq4riXzExPVN7/pHQMIaPuxRk7ui4CbXSugHrJyGIFbth4A3dXfNz3ObiKXXcfaUBSVCwGGH9e32d0f1PsIeAGcgWQGRE4brpItgrIEc2ZQ4W9owbTQ2Duy2B1F2Hh0JtvjNsmVD00G0rbqSCdk7pyFwK89FMd5t77FHC2k1SvV1Be5Pz+MOv25dnuniC5FNHKpYvY0U3if+xlcHS7jt8y0vjZuRepSW/ngtbjdUNjoFBqKZdJD2UL7a0X+J+tZd dK1kQtqV gfzOS0AJxyFNfmH+8vLeAA9IFQPyejBPX0p72UsV58xc1ejSgMEfczeLCIAce9QFvqjkc/1NxcV3/GE81Dm38NCB5J+drPNx7I0h5bFgVWtBqi3ddlGqentwXSDZRNoB25Wee3N2O/r8CdEw3DY85is/dyZCUbODfZVdytnrE2VvswEXws4BkYSVKLIoynDIsFtNvbfrURf3LtC3tgAO28xSay5Y8aLsFfHaFKl5hG1DC5CY0RbSVkHAnmGqgPBhqI+PwnmtC7V/gqY+Ca+AC7u6C5KJc/acTUd9ZxL7F+hFlsh1iMXAZ/ZmG8R1v365LDXpx6au2+BLsaOIgT/RPec4G1UzNX3waLAZlR6+JzyaAo8wYiBqIyx2QFoC9Yv19ZpU3vV/QGvl78ja2jZmbuGvxTYZ91ODamdsB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, > On 11. Oct 2024, at 15:06, Chris Mason wrote: >=20 > - It's actually taking the IO a long time to finish. We can poke at = the > pending requests, how does the device look in the VM? (virtio, scsi = etc). I _think_ that=E2=80=99s not it. This is a Qemu w/ virtio-block + Ceph = stack with 2x10G and fully SSD backed. The last 24 hours show operation = latency at less than 0.016ms. Ceph=E2=80=99s slow request warning (30s = limit) has not triggered in the last 24 hours. Also, aside from a VM that was exhausting its Qemu io throttling for a = minute (and stuck in completely different tracebacks) the only blocked = task reports from the last 48 hours was this specific process. I=E2=80=99d expect that we=E2=80=99d see a lot more reports about IO = issues from multiple VMs and multiple loads at the same time when the = storage misbehaves (we did experience those in the long long past in = older Ceph versions and with spinning rust, so I=E2=80=99m pretty = confident (at the moment) this isn=E2=80=99t a storage issue per se). Incidentally this now reminds me of a different (maybe not?) issue that = I=E2=80=99ve been trying to track down with mdraid/xfs: https://marc.info/?l=3Dlinux-raid&m=3D172295385102939&w=3D2 This is only tested on an older kernel so far (5.15.138) and we ended up = seeing IOPS stuck in the md device but not below it. However, MD isn=E2=80= =99t involved here. I made the connection because the original traceback = also shows it stuck in =E2=80=9Cwait_on_page_writeback=E2=80=9D, but = maybe that=E2=80=99s a red herring: [Aug 6 09:35] INFO: task .backy-wrapped:2615 blocked for more than 122 = seconds. [ +0.008130] Not tainted 5.15.138 #1-NixOS [ +0.005194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables = this message. [ +0.008895] task:.backy-wrapped state:D stack: 0 pid: 2615 ppid: 1 = flags:0x00000002 [ +0.000005] Call Trace: [ +0.000002] [ +0.000004] __schedule+0x373/0x1580 [ +0.000009] ? xlog_cil_commit+0x559/0x880 [xfs] [ +0.000041] schedule+0x5b/0xe0 [ +0.000001] io_schedule+0x42/0x70 [ +0.000001] wait_on_page_bit_common+0x119/0x380 [ +0.000005] ? __page_cache_alloc+0x80/0x80 [ +0.000002] wait_on_page_writeback+0x22/0x70 [ +0.000001] truncate_inode_pages_range+0x26f/0x6d0 [ +0.000006] evict+0x15f/0x180 [ +0.000003] __dentry_kill+0xde/0x170 [ +0.000001] dput+0x15b/0x330 [ +0.000002] do_renameat2+0x34e/0x5b0 [ +0.000003] __x64_sys_rename+0x3f/0x50 [ +0.000002] do_syscall_64+0x3a/0x90 [ +0.000002] entry_SYSCALL_64_after_hwframe+0x62/0xcc [ +0.000003] RIP: 0033:0x7fdd1885275b [ +0.000002] RSP: 002b:00007ffde643ad18 EFLAGS: 00000246 ORIG_RAX: = 0000000000000052 [ +0.000002] RAX: ffffffffffffffda RBX: 00007ffde643adb0 RCX: = 00007fdd1885275b [ +0.000001] RDX: 0000000000000000 RSI: 00007fdd09a3d3d0 RDI: = 00007fdd098549d0 [ +0.000001] RBP: 00007ffde643ad60 R08: 00000000ffffffff R09: = 0000000000000000 [ +0.000001] R10: 00007ffde643af90 R11: 0000000000000246 R12: = 00000000ffffff9c [ +0.000000] R13: 00000000ffffff9c R14: 000000000183cab0 R15: = 00007fdd0b128810 [ +0.000001] [ +0.000011] INFO: task kworker/u64:0:2380262 blocked for more than 122 = seconds. [ +0.008309] Not tainted 5.15.138 #1-NixOS [ +0.005190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables = this message. [ +0.008895] task:kworker/u64:0 state:D stack: 0 pid:2380262 ppid: 2 = flags:0x00004000 [ +0.000004] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt] [ +0.000006] Call Trace: [ +0.000001] [ +0.000001] __schedule+0x373/0x1580 [ +0.000003] schedule+0x5b/0xe0 [ +0.000001] md_bitmap_startwrite+0x177/0x1e0 [ +0.000004] ? finish_wait+0x90/0x90 [ +0.000004] add_stripe_bio+0x449/0x770 [raid456] [ +0.000005] raid5_make_request+0x1cf/0xbd0 [raid456] [ +0.000003] ? kmem_cache_alloc_node_trace+0x391/0x3e0 [ +0.000004] ? linear_map+0x44/0x90 [dm_mod] [ +0.000005] ? finish_wait+0x90/0x90 [ +0.000001] ? __blk_queue_split+0x516/0x580 [ +0.000003] md_handle_request+0x122/0x1b0 [ +0.000003] md_submit_bio+0x6e/0xb0 [ +0.000001] __submit_bio+0x18f/0x220 [ +0.000002] ? crypt_page_alloc+0x46/0x60 [dm_crypt] [ +0.000002] submit_bio_noacct+0xbe/0x2d0 [ +0.000001] kcryptd_crypt+0x392/0x550 [dm_crypt] [ +0.000002] process_one_work+0x1d6/0x360 [ +0.000003] worker_thread+0x4d/0x3b0 [ +0.000002] ? process_one_work+0x360/0x360 [ +0.000001] kthread+0x118/0x140 [ +0.000001] ? set_kthread_struct+0x50/0x50 [ +0.000001] ret_from_fork+0x22/0x30 [ +0.000004] =E2=80=A6(more md kworker tasks pile up here) Christian --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick