Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Janpieter Sollie <janpieter.sollie@kabelmail.de>
To: Kent Overstreet <kent.overstreet@linux.dev>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	linux-bcachefs@vger.kernel.org, linux-mm@kvack.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()
Date: Mon, 21 Oct 2024 10:46:47 +0200	[thread overview]
Message-ID: <be353d22-8a87-4f69-93bd-bb0f3aae6f51@kabelmail.de> (raw)
In-Reply-To: <ikaf72w2oap3crjrybbd5jp267slnb7dygz4m62dfw3edu2ppj@f7dv2qdx3yga>

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

Op 20/10/2024 om 22:29 schreef Kent Overstreet:
> On Sun, Oct 20, 2024 at 01:19:42PM -0700, Linus Torvalds wrote:
>
>> Enough said, and you're just making shit up to make excuses.
>>
>> Also, you might want to start look at latency numbers in addition to
>> throughput. If your journal replay needs an *index* that is 2G in
>> size, you may have other issues.
> Latency for journal replay?
>
> No, journal replay is only something happens at mount after an unclean
> shutdown. We can afford to take some time there, and journal replay
> performance hasn't been a concern.
>
>> Your journal size is insane, and your "artificial cap on performance"
>> had better come with numbers.
> I'm not going to run custom benchmarks just for a silly argument, sorry.
>
> But on a fileserver with 128 GB of ram and a 75 TB filesystem (yes,
> that's likely a dedicated fileserver), we can quite easily justify a
> btree node cache of perhaps 10GB, and on random update workloads the
> journal does need to be that big - otherwise our btree node write size
> goes down and throughput suffers.
>
Is this idea based on "the user only has 1 FS per device?"
I assume I have this setup (and it probably is, looks like mine).
I have 3 bcachefs filesystems each taking 10% of RAM.
So, I end up with a memory load of 30% dedicated to bcachefs caching.
If I read your argument, you say "I want a large btree node cache,
because that's making the fs more efficient".  No doubts about that.

VFS buffering may already save you a lot of lookups you're actually
building the btree node cache for.
Theoretically, there's a large difference about how they work,
but in practice, what files will it lookup mostly?
Probably the few ones you already have in your vfs buffer.
The added value of keeping a large "metadata" cache seems doubtful.

I have my doubts about trading 15G of buffer to 15G of btree node cache:
You lose the opportunity to share those 15G ram between all filesystems.
On the other hand, when you perform many different file lookups,
it will shine with everything it has.

Maybe some tuning parameter could help here?
it will at least limit the "insane" required journal size

Janpieter Sollie

[-- Attachment #2: Type: text/html, Size: 3152 bytes --]

next prev parent reply	other threads:[~2024-10-21  8:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-19 21:00 Kent Overstreet
2024-10-20 11:45 ` Lorenzo Stoakes
2024-10-20 13:00   ` Kent Overstreet
2024-10-20 16:44     ` Lorenzo Stoakes
2024-10-20 17:03       ` Kent Overstreet
2024-10-20 18:46         ` Linus Torvalds
2024-10-20 18:53           ` Kent Overstreet
2024-10-20 19:09             ` Linus Torvalds
2024-10-20 19:09               ` Linus Torvalds
2024-10-20 19:16                 ` Kent Overstreet
2024-10-21 16:15                   ` Uladzislau Rezki
2024-10-20 20:10                 ` Kent Overstreet
2024-10-20 20:19                   ` Linus Torvalds
2024-10-20 20:29                     ` Kent Overstreet
2024-10-20 20:54                       ` Linus Torvalds
2024-10-20 21:21                         ` Linus Torvalds
2024-10-20 21:40                           ` Kent Overstreet
2024-10-27 19:58                           ` Kent Overstreet
2024-10-20 21:29                         ` Kent Overstreet
2024-10-20 21:30                           ` Linus Torvalds
2024-10-20 21:42                             ` Kent Overstreet
2024-10-20 21:51                       ` Joshua Ashton
2024-10-20 21:57                         ` Kent Overstreet
2024-10-21  8:46                       ` Janpieter Sollie [this message]
2024-10-21  9:22                         ` Janpieter Sollie
2024-10-20 19:10               ` Kent Overstreet
2024-10-20 19:53             ` Vlastimil Babka
2024-10-20 20:08               ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be353d22-8a87-4f69-93bd-bb0f3aae6f51@kabelmail.de \
    --to=janpieter.sollie@kabelmail.de \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox