linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	linux-bcachefs@vger.kernel.org, linux-mm@kvack.org,
	 Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	 Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()
Date: Sun, 20 Oct 2024 16:08:54 -0400	[thread overview]
Message-ID: <ocptjlg2m7djlgqymlcoo5d6s5smrjncy4xjxanhr4kk6fg55q@4255xpx24dgf> (raw)
In-Reply-To: <dccb8ec3-2563-4df7-a52b-0829b9391e43@suse.cz>

On Sun, Oct 20, 2024 at 09:53:06PM +0200, Vlastimil Babka wrote:
> On 10/20/24 20:53, Kent Overstreet wrote:
> > On Sun, Oct 20, 2024 at 11:46:11AM -0700, Linus Torvalds wrote:
> >> On Sun, 20 Oct 2024 at 10:04, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >> >
> >> > But given that vmalloc() already supports > INT_MAX requests, and memory
> >> > sizes keep growing so 2GB is getting pretty small - I think it's time,
> >> > this is going to come up in other places sooner or later.
> >> 
> >> No.
> >> 
> >> If you need 2GB+ memory for filesystem operations, you fix your code.
> > 
> > This is for journal replay, where we've got a big array of keys and we
> > need to sort them.
> > 
> > The keys have to fit in memory (and had to fit in memory previously, for
> > them to be dirty in the journal);
> 
> What if the disk is moved to a smaller system, should the fs still mount
> there? (I don't mean such a small system that it can't vmalloc() 2GB
> specifically, but in principle...)

You'll have to do journal replay on the bigger system. Once you've done
that, it'll work just fine on the smaller system.

(Now, trying to work with a 75TB filesystem on a small machine is going
to be really painful if you ever need to fsck. That's just an inherently
hard problem, but we've got fsck scalability/performance improvements in
the works).

But journal replay does inherently require the whole contents of the
journal to fit in memory - we have to do the sort + dedup so that we can
overlay the contents of the journal over the btree until journal replay
is finished so that we can get a consistent view of the filesystem,
which we need so that we can run the allocator, and go read-write, which
we need in order to do journal replay.

Fun bootstrap problems.


      reply	other threads:[~2024-10-20 20:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-19 21:00 Kent Overstreet
2024-10-20 11:45 ` Lorenzo Stoakes
2024-10-20 13:00   ` Kent Overstreet
2024-10-20 16:44     ` Lorenzo Stoakes
2024-10-20 17:03       ` Kent Overstreet
2024-10-20 18:46         ` Linus Torvalds
2024-10-20 18:53           ` Kent Overstreet
2024-10-20 19:09             ` Linus Torvalds
2024-10-20 19:09               ` Linus Torvalds
2024-10-20 19:16                 ` Kent Overstreet
2024-10-21 16:15                   ` Uladzislau Rezki
2024-10-20 20:10                 ` Kent Overstreet
2024-10-20 20:19                   ` Linus Torvalds
2024-10-20 20:29                     ` Kent Overstreet
2024-10-20 20:54                       ` Linus Torvalds
2024-10-20 21:21                         ` Linus Torvalds
2024-10-20 21:40                           ` Kent Overstreet
2024-10-27 19:58                           ` Kent Overstreet
2024-10-20 21:29                         ` Kent Overstreet
2024-10-20 21:30                           ` Linus Torvalds
2024-10-20 21:42                             ` Kent Overstreet
2024-10-20 21:51                       ` Joshua Ashton
2024-10-20 21:57                         ` Kent Overstreet
2024-10-21  8:46                       ` Janpieter Sollie
2024-10-21  9:22                         ` Janpieter Sollie
2024-10-20 19:10               ` Kent Overstreet
2024-10-20 19:53             ` Vlastimil Babka
2024-10-20 20:08               ` Kent Overstreet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ocptjlg2m7djlgqymlcoo5d6s5smrjncy4xjxanhr4kk6fg55q@4255xpx24dgf \
    --to=kent.overstreet@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox