From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEDEDEEEC02 for ; Wed, 18 Sep 2024 06:37:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C7DD6B0082; Wed, 18 Sep 2024 02:37:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 476FE6B0083; Wed, 18 Sep 2024 02:37:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33E9C6B0085; Wed, 18 Sep 2024 02:37:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 17BF06B0082 for ; Wed, 18 Sep 2024 02:37:08 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B14961A031F for ; Wed, 18 Sep 2024 06:37:07 +0000 (UTC) X-FDA: 82576901694.10.6200A45 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf27.hostedemail.com (Postfix) with ESMTP id 760CF40003 for ; Wed, 18 Sep 2024 06:37:05 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=sMnqDXXB; spf=pass (imf27.hostedemail.com: domain of axboe@kernel.dk designates 209.85.128.48 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726641314; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ONBa+excFFO7rQj3mSGSspHjsKaP6ICID9dssfbxrK8=; b=6NOkvCY2ik7UKF1nbS0DjNl+ai392UNLzrEIDnivxVwY8L2/z7QugfMfO2RwtW6jxc1mHO 4/7eAZf8+o/o1jC/GP3xq0bNKiLEfXV7KR5nrBMYB6hvGsU0uhB9SaR2JuJZUlOzUKm8w5 z5WQsgKkfD01k0J1KXYVCbnPYrJ5fUI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726641314; a=rsa-sha256; cv=none; b=GkV5po0D4gZmOLDJMOnBOn42m7734qow402YniMsCzCPQP3Cr1Y93oJvnI8HpN30xykP1z qiuswuBjnnMfMWreL4eV/gNN9jfZywOf6HXiTS6xb1qGrtA0iOkyAAJd4aZ8v88u8b+a8b QeEBcMnDZQ+kx5fz6SU9rZkfz6lVKZw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=sMnqDXXB; spf=pass (imf27.hostedemail.com: domain of axboe@kernel.dk designates 209.85.128.48 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-42cba6cdf32so53653795e9.1 for ; Tue, 17 Sep 2024 23:37:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1726641424; x=1727246224; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ONBa+excFFO7rQj3mSGSspHjsKaP6ICID9dssfbxrK8=; b=sMnqDXXB2bVMloCWSez+nnp/OaIlfhg2wj+//B94Ux0miD6al/f+ouySiJjb/FFPcK FFa5/aDF+JEK99kNB2qJsYuAHRU+BzYFGjw341UXTUwzaWWEMBr77hxSWL0RIvcT4/t9 MxTdqQSjlb/aFfskq+qL5St/XqgL6koEPjtMu6SrG/m3nLZ+cUnYLsqozVm6Vu5j7ruA wxBReCMjSwISv5JQwGY8xwvelgJSOlB78QP62gTHgNO65tw8NmqpIHbawhTZre4QLDOt 9rz+45atf3T2St2hd7lcX2MrJcNQBdc7125yv6pDtW1tVKeBY6IcVu6cmBmmuFFwwpDV 6Gpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726641424; x=1727246224; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ONBa+excFFO7rQj3mSGSspHjsKaP6ICID9dssfbxrK8=; b=Dgbzo3haMNHczsfHdolSiACdxEj6nNyfBXKCA6r9zrgnrvUwph9NEmc5m08fJAcbZu 9B5mYDi+yOBszsZEpH6l6URUyYG4fD5ibt3plE8FVtcC7RKZAfrRKcRFmOYZdY3DwqT+ 4+UOOHE9fIx3mNlRX2UfcnrJen27pf0EFPZoE/qgL+yWIcSE8i8YlRosuh3vieb5+6Pc 1pIR1LZHJz6Nip3Jd81qFP3CemwxLvsHPkC4t685mlhT9H6Yn+V8TCFG6s0zAkmnyWkM /Iyziw5DlOuhgR6PL9YVDTH+HtupH5Vsq9/QuWB5XgRbShxYvWal/Beaow9bFLbwPm7g CdFw== X-Forwarded-Encrypted: i=1; AJvYcCWZcBLhKha1wDCWTDtc9cCcmwNBpJkC9gz5TxvoCQtvmYuryJ/z+7xf0cIqfD3tQln7GnU/3Ubijg==@kvack.org X-Gm-Message-State: AOJu0YwVpdsc3sa1ethK4zgL+R6kYdwXJ/uMYIq6rW2bnZ4NPDJuKsnq URt7Da7WGzqsk8TuvEbs/eb10JAVugn4D3i5m7EZXcwdVkBDU9Ai6SVDPPhq3iA= X-Google-Smtp-Source: AGHT+IFuyOiC/XvUY4Wd1OtS/NrMuj2AqikvELaUO76m0bhp4vCBZtJUmcPgp8JUIdbzh/H1w+GxiA== X-Received: by 2002:a05:600c:1e0f:b0:42c:bc04:58a5 with SMTP id 5b1f17b1804b1-42cdb58e4b1mr140112675e9.33.1726641423740; Tue, 17 Sep 2024 23:37:03 -0700 (PDT) Received: from [192.168.0.216] ([185.44.53.103]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42e7054c580sm7966815e9.45.2024.09.17.23.37.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Sep 2024 23:37:02 -0700 (PDT) Message-ID: <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> Date: Wed, 18 Sep 2024 00:37:02 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Matthew Wilcox , Chris Mason Cc: Linus Torvalds , Dave Chinner , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 760CF40003 X-Stat-Signature: cw3my6frdaqehxyyowxd8gqbpwzargjb X-HE-Tag: 1726641425-266993 X-HE-Meta: U2FsdGVkX1+I7LSi2F6mmzal2rcC93SavSWJzDElHplJ+eWQNeDFylY0joclkhXCiyljCQTXkudAT7BcoEPM9Itqa4VI2hkrKB0KADuRET1hICMciZVhBUBpi9zDBTsJHA53MKatmKprSFRz6xQyRML4Cme9f4EHWSdIAvj/01CAQTScaeIJc7oJ9ePvQjkEqKdLvH5zm2MHmymhP7KQOnvO/6MDo3dQInbFONtArQwd30TQJjvBRwjq+8QOws3oLu3iKuYWyP5v9xJgAU+YVPyEnepiVZamuEXPjcGrypl6rYwslzzWZ4CRyzWWY2GPfjp1g43wp/WX30k4S44lyNzr96kGx2tsAiI+5CdlJ0E5RIan9dtqY+KoDQF+ZGIcs0huWtS2xuyd9qaFdACZ3QmdvJh+OAyo/hCT5zWvvFwH7UjeV6Tob6G0OcD/pJFLubLpFxpZPKQI123VPXu0LjXSjmO79C1P3xHVI67vUgo/7O8/Ul0j/B10wivYKVD1owhEc5xloudkuX7nj0pPI/Yoh5RtOUfRKeRQVMZ1cc2eSiCqAh1QEcgISEKrOC4AktezkqUKORo+7DF5nhjEW8dA5mGK1PyzVHyeccpWkQcQt/2mBZ44j6koZZc/uF+JHfuUu4UoAorl+kf9YpIYcdok9U/Vt+D69OUTlmhNhL/6QHh8Hrbc3Ttn/5X7H/114GY5bY/98mH0ZCGEzTYaoITHj6KonWVEGyyV0987CKmb7qO5n0K6niQhemAsovGWSYZoNTdFo+HMhOVWC4MIZxPq/gpW2sxIerrW+tNRD+HJ9tuA+Q0JHpaJ8zLpFmt66gnozAjuD2n5gc/pn5ujUW5SRftn+9YtMPrUN+l+gI9hELg2ICn5qcVOxTI5v1J/8k7QUHMKvvd86cJPkqXfeGZEPv48AYXQnqYnfE0l799euzcPC6FU+YpnV1x+E93JHlA+oNsW5Azh7xKpAsA zsa8MTmM SELD1GxT4yIguRprzb1XmtAfCPe7VpeYoZXOab5KzzxGd4AMaV8ba35byfnR+wbwHV7Dj68ZwGDtEFXzE9v5pMaSs9PR+0ejg+4UTXywY/9JnvEYxg8oplmVqgPaZfY/mxHgG/U1Cv1mXrP+Ojm10j3k4mrj76zUp5KnkKavRzz2l/3EcqRoz+JA0Dya4+Sx3DzCFFuG8CbEyhzA3NqxF+DtTiaBEFc9oV5hhld4Dw5yMzYozRl47ZfcBT2lJVjBlyV1UiFAN+lfvg8MQ7lWby/7b5IU6OlgT8pjh7iMORofL3tDnakWr95fZe/3QzXNmNFjBWCQHHIMxAx+TaxnASpLcRUvB7WKicmE+IzXikJdQZpH0gHlzfKoCulx6T4KZ3E3CQkgZuIppRDoc7aHgkZ2l4rdxnezc0a/cI5CzoxRXFpU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/17/24 7:25 AM, Matthew Wilcox wrote: > On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote: >> On 9/17/24 5:32 AM, Matthew Wilcox wrote: >>> On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote: >>>> I've got a bunch of assertions around incorrect folio->mapping and I'm >>>> trying to bash on the ENOMEM for readahead case. There's a GFP_NOWARN >>>> on those, and our systems do run pretty short on ram, so it feels right >>>> at least. We'll see. >>> >>> I've been running with some variant of this patch the whole way across >>> the Atlantic, and not hit any problems. But maybe with the right >>> workload ...? >>> >>> There are two things being tested here. One is whether we have a >>> cross-linked node (ie a node that's in two trees at the same time). >>> The other is whether the slab allocator is giving us a node that already >>> contains non-NULL entries. >>> >>> If you could throw this on top of your kernel, we might stand a chance >>> of catching the problem sooner. If it is one of these problems and not >>> something weirder. >>> >> >> This fires in roughly 10 seconds for me on top of v6.11. Since array seems >> to always be 1, I'm not sure if the assertion is right, but hopefully you >> can trigger yourself. > > Whoops. > > $ git grep XA_RCU_FREE > lib/xarray.c:#define XA_RCU_FREE ((struct xarray *)1) > lib/xarray.c: node->array = XA_RCU_FREE; > > so you walked into a node which is currently being freed by RCU. Which > isn't a problem, of course. I don't know why I do that; it doesn't seem > like anyone tests it. The jetlag is seriously kicking in right now, > so I'm going to refrain from saying anything more because it probably > won't be coherent. Based on a modified reproducer from Chris (N threads reading from a file, M threads dropping pages), I can pretty quickly reproduce the xas_descend() spin on 6.9 in a vm with 128 cpus. Here's some debugging output with a modified version of your patch too, that ignores XA_RCU_FREE: node ffff8e838a01f788 max 59 parent 0000000000000000 shift 0 count 0 values 0 array ffff8e839dfa86a0 list ffff8e838a01f7a0 ffff8e838a01f7a0 marks 0 0 0 WARNING: CPU: 106 PID: 1554 at lib/xarray.c:405 xas_alloc.cold+0x26/0x4b which is: XA_NODE_BUG_ON(node, memchr_inv(&node->slots, 0, sizeof(void *) * XA_CHUN K_SIZE)); and: node ffff8e838a01f788 offset 59 parent ffff8e838b0419c8 shift 0 count 252 values 0 array ffff8e839dfa86a0 list ffff8e838a01f7a0 ffff8e838a01f7a0 marks 0 0 0 which is: XA_NODE_BUG_ON(node, node->count > XA_CHUNK_SIZE); and for this particular run, 2 threads spinning: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P1555 rcu: Tasks blocked on level-1 rcu_node (CPUs 64-79): P1556 rcu: (detected by 97, t=2102 jiffies, g=7821, q=293800 ncpus=128) task:reader state:R running task stack:0 pid:1555 tgid:1551 ppid:1 flags:0x00004006 Call Trace: ? __schedule+0x37f/0xaa0 ? sysvec_apic_timer_interrupt+0x96/0xb0 ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? xas_load+0x74/0xe0 ? xas_load+0x10/0xe0 ? xas_find+0x162/0x1b0 ? find_lock_entries+0x1ac/0x360 ? find_lock_entries+0x76/0x360 ? mapping_try_invalidate+0x5d/0x130 ? generic_fadvise+0x110/0x240 ? xfd_validate_state+0x1e/0x70 ? ksys_fadvise64_64+0x50/0x90 ? __x64_sys_fadvise64+0x18/0x20 ? do_syscall_64+0x5d/0x180 ? entry_SYSCALL_64_after_hwframe+0x4b/0x53 task:reader state:R running task stack:0 pid:1556 tgid:1551 ppid:1 flags:0x00004006 The reproducer takes ~30 seconds, and will lead to anywhere from 1..N threads spinning here. Now for the kicker - this doesn't reproduce in 6.10 and onwards. There are only a few changes here that are relevant, seemingly, and the prime candidates are: commit a4864671ca0bf51c8e78242951741df52c06766f Author: Kairui Song Date: Tue Apr 16 01:18:55 2024 +0800 lib/xarray: introduce a new helper xas_get_order and the followup filemap change: commit 6758c1128ceb45d1a35298912b974eb4895b7dd9 Author: Kairui Song Date: Tue Apr 16 01:18:56 2024 +0800 mm/filemap: optimize filemap folio adding and reverting those two on 6.10 hits it again almost immediately. Didn't look into these commit, but looks like they inadvertently also fixed this corruption issue. -- Jens Axboe