linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Chris Mason <clm@meta.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	Christian Theune <ct@flyingcircus.io>,
	linux-mm@kvack.org,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Dao <dqminh@cloudflare.com>,
	Dave Chinner <david@fromorbit.com>,
	regressions@lists.linux.dev, regressions@leemhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)
Date: Fri, 13 Sep 2024 16:51:40 +0100	[thread overview]
Message-ID: <ZuRfjGhAtXizA7Hu@casper.infradead.org> (raw)
In-Reply-To: <d4a1cca4-96b8-4692-81f0-81c512f55ccf@meta.com>

On Fri, Sep 13, 2024 at 11:30:41AM -0400, Chris Mason wrote:
> I've mentioned this in the past to both Willy and Dave Chinner, but so
> far all of my attempts to reproduce it on purpose have failed.  It's
> awkward because I don't like to send bug reports that I haven't
> reproduced on a non-facebook kernel, but I'm pretty confident this bug
> isn't specific to us.

I don't think the bug is specific to you either.  It's been hit by
several people ... but it's really hard to hit ;-(  

> I'll double down on repros again during plumbers and hopefully come up
> with a recipe for explosions.  On other important datapoint is that we

I appreciate the effort!

> The issue looked similar to Christian Theune's rcu stalls, but since it
> was just one CPU spinning away, I was able to perf probe and drgn my way
> to some details.  The xarray for the file had a series of large folios:
> 
> [ index 0 large folio from the correct file ]
> [ index 1: large folio from the correct file ]
> ...
> [ index N: large folio from a completely different file ]
> [ index N+1: large folio from the correct file ]
> 
> I'm being sloppy with index numbers, but the important part is that
> we've got a large folio from the wrong file in the middle of the bunch.

If you could get the precise index numbers, that would be an important
clue.  It would be interesting to know the index number in the xarray
where the folio was found rather than folio->index (as I suspect that
folio->index is completely bogus because folio->mapping is wrong).
But gathering that info is going to be hard.

Maybe something like this?

+++ b/mm/filemap.c
@@ -2317,6 +2317,12 @@ static void filemap_get_read_batch(struct address_space *mapping,
                if (unlikely(folio != xas_reload(&xas)))
                        goto put_folio;

+{
+       struct address_space *fmapping = READ_ONCE(folio->mapping);
+       if (fmapping != NULL && fmapping != mapping)
+               printk("bad folio at %lx\n", xas.xa_index);
+}
+
                if (!folio_batch_add(fbatch, folio))
                        break;
                if (!folio_test_uptodate(folio))

(could use VM_BUG_ON_FOLIO() too, but i'm not sure that the identity of
the bad folio we've found is as interesting as where we found it)



  reply	other threads:[~2024-09-13 15:51 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-12 21:18 Christian Theune
2024-09-12 21:55 ` Matthew Wilcox
2024-09-12 22:11   ` Christian Theune
2024-09-12 22:12   ` Jens Axboe
2024-09-12 22:25     ` Linus Torvalds
2024-09-12 22:30       ` Jens Axboe
2024-09-12 22:56         ` Linus Torvalds
2024-09-13  3:44           ` Matthew Wilcox
2024-09-13 13:23             ` Christian Theune
2024-09-13 12:11       ` Christian Brauner
2024-09-16 13:29         ` Matthew Wilcox
2024-09-18  9:51           ` Christian Brauner
2024-09-13 15:30       ` Chris Mason
2024-09-13 15:51         ` Matthew Wilcox [this message]
2024-09-13 16:33           ` Chris Mason
2024-09-13 18:15             ` Matthew Wilcox
2024-09-13 21:24               ` Linus Torvalds
2024-09-13 21:30                 ` Matthew Wilcox
2024-09-13 16:04       ` David Howells
2024-09-13 16:37         ` Chris Mason
2024-09-16  0:00       ` Dave Chinner
2024-09-16  4:20         ` Linus Torvalds
2024-09-16  8:47           ` Chris Mason
2024-09-17  9:32             ` Matthew Wilcox
2024-09-17  9:36               ` Chris Mason
2024-09-17 10:11               ` Christian Theune
2024-09-17 11:13               ` Chris Mason
2024-09-17 13:25                 ` Matthew Wilcox
2024-09-18  6:37                   ` Jens Axboe
2024-09-18  9:28                     ` Chris Mason
2024-09-18 12:23                       ` Chris Mason
2024-09-18 13:34                       ` Matthew Wilcox
2024-09-18 13:51                         ` Linus Torvalds
2024-09-18 14:12                           ` Matthew Wilcox
2024-09-18 14:39                             ` Linus Torvalds
2024-09-18 17:12                               ` Matthew Wilcox
2024-09-18 16:37                             ` Chris Mason
2024-09-19  1:43                         ` Dave Chinner
2024-09-19  3:03                           ` Linus Torvalds
2024-09-19  3:12                             ` Linus Torvalds
2024-09-19  3:38                               ` Jens Axboe
2024-09-19  4:32                                 ` Linus Torvalds
2024-09-19  4:42                                   ` Jens Axboe
2024-09-19  4:36                                 ` Matthew Wilcox
2024-09-19  4:46                                   ` Jens Axboe
2024-09-19  5:20                                     ` Jens Axboe
2024-09-19  4:46                                   ` Linus Torvalds
2024-09-20 13:54                                   ` Chris Mason
2024-09-24 15:58                                     ` Matthew Wilcox
2024-09-24 17:16                                     ` Sam James
2024-09-25 16:06                                       ` Kairui Song
2024-09-25 16:42                                         ` Christian Theune
2024-09-27 14:51                                         ` Sam James
2024-09-27 14:58                                           ` Jens Axboe
2024-10-01 21:10                                             ` Kairui Song
2024-09-24 19:17                                     ` Chris Mason
2024-09-24 19:24                                       ` Linus Torvalds
2024-09-19  6:34                               ` Christian Theune
2024-09-19  6:57                                 ` Linus Torvalds
2024-09-19 10:19                                   ` Christian Theune
2024-09-30 17:34                                     ` Christian Theune
2024-09-30 18:46                                       ` Linus Torvalds
2024-09-30 19:25                                         ` Christian Theune
2024-09-30 20:12                                           ` Linus Torvalds
2024-09-30 20:56                                             ` Matthew Wilcox
2024-09-30 22:42                                               ` Davidlohr Bueso
2024-09-30 23:00                                                 ` Davidlohr Bueso
2024-09-30 23:53                                               ` Linus Torvalds
2024-10-01  0:56                                       ` Chris Mason
2024-10-01  7:54                                         ` Christian Theune
2024-10-10  6:29                                         ` Christian Theune
2024-10-11  7:27                                           ` Christian Theune
2024-10-11  9:08                                             ` Christian Theune
2024-10-11 13:06                                               ` Chris Mason
2024-10-11 13:50                                                 ` Christian Theune
2024-10-12 17:01                                                 ` Linus Torvalds
2024-12-02 10:44                                                   ` Christian Theune
2024-10-01  2:22                                       ` Dave Chinner
2024-09-16  7:14         ` Christian Theune
2024-09-16 12:16           ` Matthew Wilcox
2024-09-18  8:31           ` Christian Theune

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZuRfjGhAtXizA7Hu@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=clm@meta.com \
    --cc=ct@flyingcircus.io \
    --cc=david@fromorbit.com \
    --cc=dqminh@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox