From: Chris Mason <clm@meta.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>,
Christian Theune <ct@flyingcircus.io>,
linux-mm@kvack.org,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Daniel Dao <dqminh@cloudflare.com>,
Dave Chinner <david@fromorbit.com>,
regressions@lists.linux.dev, regressions@leemhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
Date: Fri, 13 Sep 2024 11:30:41 -0400 [thread overview]
Message-ID: <d4a1cca4-96b8-4692-81f0-81c512f55ccf@meta.com> (raw)
In-Reply-To: <CAHk-=wh5LRp6Tb2oLKv1LrJWuXKOvxcucMfRMmYcT-npbo0=_A@mail.gmail.com>
On 9/12/24 6:25 PM, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> When I saw Christian's report, I seemed to recall that we ran into this
>> at Meta too. And we did, and hence have been reverting it since our 5.19
>> release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
>> things that are known broken.
>
> I do think that if we have big sites just reverting it as known broken
> and can't figure out why, we should do so upstream too.
I've mentioned this in the past to both Willy and Dave Chinner, but so
far all of my attempts to reproduce it on purpose have failed. It's
awkward because I don't like to send bug reports that I haven't
reproduced on a non-facebook kernel, but I'm pretty confident this bug
isn't specific to us.
I'll double down on repros again during plumbers and hopefully come up
with a recipe for explosions. On other important datapoint is that we
also enable huge folios via tmpfs mount -o huge=within_size.
That hasn't hit problems, and we've been doing it for years, but of
course the tmpfs usage is pretty different from iomap/xfs.
We have two workloads that have reliably seen large folios bugs in prod.
This is all on bare metal systems, some are two socket, some single,
nothing really exotic.
1) On 5.19 kernels, knfsd reading and writing to XFS. We needed
O(hundreds) of knfsd servers running for about 8 hours to see one hit.
The issue looked similar to Christian Theune's rcu stalls, but since it
was just one CPU spinning away, I was able to perf probe and drgn my way
to some details. The xarray for the file had a series of large folios:
[ index 0 large folio from the correct file ]
[ index 1: large folio from the correct file ]
...
[ index N: large folio from a completely different file ]
[ index N+1: large folio from the correct file ]
I'm being sloppy with index numbers, but the important part is that
we've got a large folio from the wrong file in the middle of the bunch.
filemap_read() iterates over batches of folios from the xarray, but if
one of the folios in the batch has folio->offset out of order with the
rest, the whole thing turns into a infinite loop. It's not really a
filemap_read() bug, the batch coming back from the xarray is just incorrect.
2) On 6.9 kernels, we saw a BUG_ON() during inode eviction because
mapping->nrpages was non-zero. I'm assuming it's really just a
different window into the same bug. Crash dump analysis was less
conclusive because the xarray itself was always empty, but turning off
large folios made the problem go away.
This happened ~5-10 times a day, and the service had a few thousand
machines running 6.9. If I can't make an artificial repro, I'll try and
talk the service owners into setting up a production shadow to hammer on
it with additional debugging.
We also disabled large folios for our 6.4 kernel, but Stefan actually
tracked that bug down:
commit a48d5bdc877b85201e42cef9c2fdf5378164c23a
Author: Stefan Roesch <shr@devkernel.io>
Date: Mon Nov 6 10:19:18 2023 -0800
mm: fix for negative counter: nr_file_hugepages
We didn't have time to revalidate with large folios back on afterwards.
-chris
next prev parent reply other threads:[~2024-09-13 15:31 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 21:18 Christian Theune
2024-09-12 21:55 ` Matthew Wilcox
2024-09-12 22:11 ` Christian Theune
2024-09-12 22:12 ` Jens Axboe
2024-09-12 22:25 ` Linus Torvalds
2024-09-12 22:30 ` Jens Axboe
2024-09-12 22:56 ` Linus Torvalds
2024-09-13 3:44 ` Matthew Wilcox
2024-09-13 13:23 ` Christian Theune
2024-09-13 12:11 ` Christian Brauner
2024-09-16 13:29 ` Matthew Wilcox
2024-09-18 9:51 ` Christian Brauner
2024-09-13 15:30 ` Chris Mason [this message]
2024-09-13 15:51 ` Matthew Wilcox
2024-09-13 16:33 ` Chris Mason
2024-09-13 18:15 ` Matthew Wilcox
2024-09-13 21:24 ` Linus Torvalds
2024-09-13 21:30 ` Matthew Wilcox
2024-09-13 16:04 ` David Howells
2024-09-13 16:37 ` Chris Mason
2024-09-16 0:00 ` Dave Chinner
2024-09-16 4:20 ` Linus Torvalds
2024-09-16 8:47 ` Chris Mason
2024-09-17 9:32 ` Matthew Wilcox
2024-09-17 9:36 ` Chris Mason
2024-09-17 10:11 ` Christian Theune
2024-09-17 11:13 ` Chris Mason
2024-09-17 13:25 ` Matthew Wilcox
2024-09-18 6:37 ` Jens Axboe
2024-09-18 9:28 ` Chris Mason
2024-09-18 12:23 ` Chris Mason
2024-09-18 13:34 ` Matthew Wilcox
2024-09-18 13:51 ` Linus Torvalds
2024-09-18 14:12 ` Matthew Wilcox
2024-09-18 14:39 ` Linus Torvalds
2024-09-18 17:12 ` Matthew Wilcox
2024-09-18 16:37 ` Chris Mason
2024-09-19 1:43 ` Dave Chinner
2024-09-19 3:03 ` Linus Torvalds
2024-09-19 3:12 ` Linus Torvalds
2024-09-19 3:38 ` Jens Axboe
2024-09-19 4:32 ` Linus Torvalds
2024-09-19 4:42 ` Jens Axboe
2024-09-19 4:36 ` Matthew Wilcox
2024-09-19 4:46 ` Jens Axboe
2024-09-19 5:20 ` Jens Axboe
2024-09-19 4:46 ` Linus Torvalds
2024-09-20 13:54 ` Chris Mason
2024-09-24 15:58 ` Matthew Wilcox
2024-09-24 17:16 ` Sam James
2024-09-25 16:06 ` Kairui Song
2024-09-25 16:42 ` Christian Theune
2024-09-27 14:51 ` Sam James
2024-09-27 14:58 ` Jens Axboe
2024-10-01 21:10 ` Kairui Song
2024-09-24 19:17 ` Chris Mason
2024-09-24 19:24 ` Linus Torvalds
2024-09-19 6:34 ` Christian Theune
2024-09-19 6:57 ` Linus Torvalds
2024-09-19 10:19 ` Christian Theune
2024-09-30 17:34 ` Christian Theune
2024-09-30 18:46 ` Linus Torvalds
2024-09-30 19:25 ` Christian Theune
2024-09-30 20:12 ` Linus Torvalds
2024-09-30 20:56 ` Matthew Wilcox
2024-09-30 22:42 ` Davidlohr Bueso
2024-09-30 23:00 ` Davidlohr Bueso
2024-09-30 23:53 ` Linus Torvalds
2024-10-01 0:56 ` Chris Mason
2024-10-01 7:54 ` Christian Theune
2024-10-10 6:29 ` Christian Theune
2024-10-11 7:27 ` Christian Theune
2024-10-11 9:08 ` Christian Theune
2024-10-11 13:06 ` Chris Mason
2024-10-11 13:50 ` Christian Theune
2024-10-12 17:01 ` Linus Torvalds
2024-12-02 10:44 ` Christian Theune
2024-10-01 2:22 ` Dave Chinner
2024-09-16 7:14 ` Christian Theune
2024-09-16 12:16 ` Matthew Wilcox
2024-09-18 8:31 ` Christian Theune
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d4a1cca4-96b8-4692-81f0-81c512f55ccf@meta.com \
--to=clm@meta.com \
--cc=axboe@kernel.dk \
--cc=ct@flyingcircus.io \
--cc=david@fromorbit.com \
--cc=dqminh@cloudflare.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox