From: Kent Overstreet <kent.overstreet@linux.dev>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
linux-bcachefs@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()
Date: Sun, 20 Oct 2024 17:40:55 -0400 [thread overview]
Message-ID: <yp6bt4ux2nemboyytvv2vyd32em53gvcipty6xsqm7yfu7inoe@x4vrz4es2z25> (raw)
In-Reply-To: <CAHk-=wjQDCtzBUiRPuKfyjGFiR9JZi82ENyzdvKei4W3pxt=tA@mail.gmail.com>
On Sun, Oct 20, 2024 at 02:21:50PM -0700, Linus Torvalds wrote:
> On Sun, 20 Oct 2024 at 13:54, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Sun, 20 Oct 2024 at 13:30, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> > >
> > > Latency for journal replay?
> >
> > No, latency for the journaling itself.
>
> Side note: latency of the journal replay can actually be quite
> critical indeed for any "five nines" operation, and big journals are
> not necessarily a good idea for that reason.
>
> There's a very real reason many places don't use filesystems that do
> fsck any more.
I need to ask one of the guys with a huge filesystem (if you're
listening and have numbers, please chime in), but I don't think journal
replay is bad compared to system boot time.
At this point it would be completely trivial to do journal replay in the
background, after the filesystem is mounted: all we need to do prior to
mount is read the journal and sort+dedup the keys, replaying all the
updates is the expensive part - but like I mentioned the btree API
transparently overlays the journal keys until journal replay is
finished, and this was necessary for solving various bootstrap issues.
So if someone complains, I'll flip that on and we'll start testing it.
Fsck is the real concern, yes, and there's lots to be done there. I have
the majority of the work completed for online fsck, but that isn't
enough - because if fsck takes a week to complete and it takes most of
system capacity while it's running, that's not acceptable either (and
that would be the case today if you tried bcachefs on a petabyte
filesystem).
So for that, we need to be making as many of the consistency checks and
repair things that fsck does things that we can do whenever other
operations are touching that metadata (and this is mainly what I mean
when I mean self healing), and we need to either reduce our dependency
on passes that go "walk everything and check references", or add ways to
shard them (and only check parts of the filesystem that are suspected to
have damage). Checking extent backpointers is the big offender, and
fortunately that's the easiest one to fix.
next prev parent reply other threads:[~2024-10-20 21:41 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-19 21:00 Kent Overstreet
2024-10-20 11:45 ` Lorenzo Stoakes
2024-10-20 13:00 ` Kent Overstreet
2024-10-20 16:44 ` Lorenzo Stoakes
2024-10-20 17:03 ` Kent Overstreet
2024-10-20 18:46 ` Linus Torvalds
2024-10-20 18:53 ` Kent Overstreet
2024-10-20 19:09 ` Linus Torvalds
2024-10-20 19:09 ` Linus Torvalds
2024-10-20 19:16 ` Kent Overstreet
2024-10-21 16:15 ` Uladzislau Rezki
2024-10-20 20:10 ` Kent Overstreet
2024-10-20 20:19 ` Linus Torvalds
2024-10-20 20:29 ` Kent Overstreet
2024-10-20 20:54 ` Linus Torvalds
2024-10-20 21:21 ` Linus Torvalds
2024-10-20 21:40 ` Kent Overstreet [this message]
2024-10-27 19:58 ` Kent Overstreet
2024-10-20 21:29 ` Kent Overstreet
2024-10-20 21:30 ` Linus Torvalds
2024-10-20 21:42 ` Kent Overstreet
2024-10-20 21:51 ` Joshua Ashton
2024-10-20 21:57 ` Kent Overstreet
2024-10-21 8:46 ` Janpieter Sollie
2024-10-21 9:22 ` Janpieter Sollie
2024-10-20 19:10 ` Kent Overstreet
2024-10-20 19:53 ` Vlastimil Babka
2024-10-20 20:08 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yp6bt4ux2nemboyytvv2vyd32em53gvcipty6xsqm7yfu7inoe@x4vrz4es2z25 \
--to=kent.overstreet@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox