From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-bcachefs@vger.kernel.org
Subject: Re: bcachefs dropped writes with lockless buffered io path, COMPACTION/MIGRATION=y
Date: Tue, 27 Aug 2024 04:36:33 +0100 [thread overview]
Message-ID: <Zs1JwTsgNQiKXkdE@casper.infradead.org> (raw)
In-Reply-To: <ieb2nptxxk2apxfijk3qcjoxlz5uitsl5jn6tigunjmuqmkrwm@le74h3edr6oy>
On Mon, Aug 26, 2024 at 11:29:52PM -0400, Kent Overstreet wrote:
> We had a report of corruption on nixos, on tests that build a system
> image, it bisected to the patch that enabled buffered writes without
> taking the inode lock:
>
> https://evilpiepirate.org/git/bcachefs.git/commit/?id=7e64c86cdc6c
>
> It appears that dirty folios are being dropped somehow; corrupt files,
> when checked against good copies, have ranges of 0s that are 4k aligned
> (modulo 2k, likely a misaligned partition).
>
> Interestingly, it only triggers for QEMU - the test fails pretty
> consistently and we have a lot of nixos users, we'd notice (via nix
> store verifies) if the corruption was more widespread. We believe it
> only triggers with QEMU's snapshots mode (but don't quote me on that).
Just to be crystal clear here, the corruption happens while running
bcachefs in the qemu guest, and it doesn't matter what the host
filesystem is?
Or did I misunderstand, and it occurs while running anything inside qemu
on top of a bcachefs host?
> Further digging implicates CONFIG_COMPACTION or CONFIG_MIGRATION.
>
> Testing with COMPACTION, MIGRATION=n and TRANSPARENT_HUGEPAGE=y passes
> reliably.
>
> On the bcachefs side, I've been testing with that patch reduced to just
> "don't take inode lock if not extending"; i.e. killing the fancy stuff
> to preserve write atomicity. It really does appear to be "don't take
> inode lock -> dirty folios get dropped".
>
> It's not a race with truncate, or anything silly like that; bcachefs has
> the pagecache add lock, which serves here for locking vs. truncate.
>
> So - this is a real head scratcher. The inode lock really doesn't do
> much in IO paths, it's there for synchronization with truncate and write
> vs. write atomicity - the mm paths know nothing about it. Page
> fault/mkwrite paths don't take it at all; a buffered non-extending write
> should be able to work similarly: the folio lock should be entirely
> sufficient here.
>
> Anyone got any bright ideas?
No, but I'm going to sleep on it.
next prev parent reply other threads:[~2024-08-27 3:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-27 3:29 Kent Overstreet
2024-08-27 3:36 ` Matthew Wilcox [this message]
2024-08-27 3:40 ` Kent Overstreet
2024-08-27 3:46 ` Kent Overstreet
2024-08-27 7:57 ` clonejo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zs1JwTsgNQiKXkdE@casper.infradead.org \
--to=willy@infradead.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox