From: Janpieter Sollie <janpieter.sollie@kabelmail.de>
To: Kent Overstreet <kent.overstreet@linux.dev>,
Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
linux-bcachefs@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Uladzislau Rezki <urezki@gmail.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()
Date: Mon, 21 Oct 2024 11:22:51 +0200 [thread overview]
Message-ID: <be58ca67-bc25-494a-b0a4-93e026c88f10@kabelmail.de> (raw)
In-Reply-To: <be353d22-8a87-4f69-93bd-bb0f3aae6f51@kabelmail.de>
[-- Attachment #1.1.1: Type: text/plain, Size: 3191 bytes --]
Op 21/10/2024 om 10:46 schreef Janpieter Sollie:
> Op 20/10/2024 om 22:29 schreef Kent Overstreet:
>>>
>>>
>>
>> I'm not going to run custom benchmarks just for a silly argument, sorry.
>> But on a fileserver with 128 GB of ram and a 75 TB filesystem
>> (yes, that's likely a dedicated fileserver),
>> we can quite easily justify a btree node cache of perhaps 10GB,
>> and on random update workloads the journal does need to be that big -
>> otherwise our btree node write size goes down and throughput suffers.
>>
> Is this idea based on "the user only has 1 FS per device?"
> I assume I have this setup (and it probably is, looks like mine).
> I have 3 bcachefs filesystems each taking 10% of RAM.
> So, I end up with a memory load of 30% dedicated to bcachefs caching.
> If I read your argument, you say "I want a large btree node cache,
> because that's making the fs more efficient". No doubts about that.
>
> VFS buffering may already save you a lot of lookups you're actually
> building the btree node cache for.
> Theoretically, there's a large difference about how they work,
> but in practice, what files will it lookup mostly?
> Probably the few ones you already have in your vfs buffer.
> The added value of keeping a large "metadata" cache seems doubtful.
>
> I have my doubts about trading 15G of buffer to 15G of btree node cache:
> You lose the opportunity to share those 15G ram between all filesystems.
> On the other hand, when you perform many different file lookups,
> it will shine with everything it has.
>
> Maybe some tuning parameter could help here?
> it will at least limit the "insane" required journal size
>
> Janpieter Sollie
And things quickly grow out of hand here:
a bcachefs report on fs usage:
blablabla (other disks)
A device dedicated to metadata:
SSDM (device 6): sdg1 rw
data buckets fragmented
free: 39254491136 149744
sb: 3149824 13 258048
journal: 312475648 1192
btree: 428605440 1635
user: 0 0
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
unstriped: 0 0
capacity: 39998980096 152584
Oops ... the journal size is more than 70% of the fs data!
Janpieter Sollie
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 37877 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2024-10-21 9:24 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-19 21:00 Kent Overstreet
2024-10-20 11:45 ` Lorenzo Stoakes
2024-10-20 13:00 ` Kent Overstreet
2024-10-20 16:44 ` Lorenzo Stoakes
2024-10-20 17:03 ` Kent Overstreet
2024-10-20 18:46 ` Linus Torvalds
2024-10-20 18:53 ` Kent Overstreet
2024-10-20 19:09 ` Linus Torvalds
2024-10-20 19:09 ` Linus Torvalds
2024-10-20 19:16 ` Kent Overstreet
2024-10-21 16:15 ` Uladzislau Rezki
2024-10-20 20:10 ` Kent Overstreet
2024-10-20 20:19 ` Linus Torvalds
2024-10-20 20:29 ` Kent Overstreet
2024-10-20 20:54 ` Linus Torvalds
2024-10-20 21:21 ` Linus Torvalds
2024-10-20 21:40 ` Kent Overstreet
2024-10-27 19:58 ` Kent Overstreet
2024-10-20 21:29 ` Kent Overstreet
2024-10-20 21:30 ` Linus Torvalds
2024-10-20 21:42 ` Kent Overstreet
2024-10-20 21:51 ` Joshua Ashton
2024-10-20 21:57 ` Kent Overstreet
2024-10-21 8:46 ` Janpieter Sollie
2024-10-21 9:22 ` Janpieter Sollie [this message]
2024-10-20 19:10 ` Kent Overstreet
2024-10-20 19:53 ` Vlastimil Babka
2024-10-20 20:08 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be58ca67-bc25-494a-b0a4-93e026c88f10@kabelmail.de \
--to=janpieter.sollie@kabelmail.de \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox