linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Janpieter Sollie <janpieter.sollie@kabelmail.de>
To: Kent Overstreet <kent.overstreet@linux.dev>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	linux-bcachefs@vger.kernel.org, linux-mm@kvack.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc()
Date: Mon, 21 Oct 2024 11:22:51 +0200	[thread overview]
Message-ID: <be58ca67-bc25-494a-b0a4-93e026c88f10@kabelmail.de> (raw)
In-Reply-To: <be353d22-8a87-4f69-93bd-bb0f3aae6f51@kabelmail.de>


[-- Attachment #1.1.1: Type: text/plain, Size: 3191 bytes --]

Op 21/10/2024 om 10:46 schreef Janpieter Sollie:
> Op 20/10/2024 om 22:29 schreef Kent Overstreet:
>>>
>>>
>>
>> I'm not going to run custom benchmarks just for a silly argument, sorry.
>> But on a fileserver with 128 GB of ram and a 75 TB filesystem
>> (yes, that's likely a dedicated fileserver),
>> we can quite easily justify a btree node cache of perhaps 10GB,
>> and on random update workloads the journal does need to be that big -
>> otherwise our btree node write size goes down and throughput suffers.
>>
> Is this idea based on "the user only has 1 FS per device?"
> I assume I have this setup (and it probably is, looks like mine).
> I have 3 bcachefs filesystems each taking 10% of RAM.
> So, I end up with a memory load of 30% dedicated to bcachefs caching.
> If I read your argument, you say "I want a large btree node cache,
> because that's making the fs more efficient".  No doubts about that.
>
> VFS buffering may already save you a lot of lookups you're actually
> building the btree node cache for.
> Theoretically, there's a large difference about how they work,
> but in practice, what files will it lookup mostly?
> Probably the few ones you already have in your vfs buffer.
> The added value of keeping a large "metadata" cache seems doubtful.
>
> I have my doubts about trading 15G of buffer to 15G of btree node cache:
> You lose the opportunity to share those 15G ram between all filesystems.
> On the other hand, when you perform many different file lookups,
> it will shine with everything it has.
>
> Maybe some tuning parameter could help here?
> it will at least limit the "insane" required journal size
>
> Janpieter Sollie
And things quickly grow out of hand here:

a bcachefs report on fs usage:

blablabla (other disks)

A device dedicated to metadata:

SSDM (device 6):                sdg1              rw
                                data         buckets    fragmented
  free:                  39254491136          149744
  sb:                        3149824              13        258048
  journal:                 312475648            1192
  btree:                   428605440            1635
  user:                            0               0
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  unstriped:                       0               0
  capacity:              39998980096          152584

Oops ... the journal size is more than 70% of the fs data!

Janpieter Sollie

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 37877 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2024-10-21  9:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-19 21:00 Kent Overstreet
2024-10-20 11:45 ` Lorenzo Stoakes
2024-10-20 13:00   ` Kent Overstreet
2024-10-20 16:44     ` Lorenzo Stoakes
2024-10-20 17:03       ` Kent Overstreet
2024-10-20 18:46         ` Linus Torvalds
2024-10-20 18:53           ` Kent Overstreet
2024-10-20 19:09             ` Linus Torvalds
2024-10-20 19:09               ` Linus Torvalds
2024-10-20 19:16                 ` Kent Overstreet
2024-10-21 16:15                   ` Uladzislau Rezki
2024-10-20 20:10                 ` Kent Overstreet
2024-10-20 20:19                   ` Linus Torvalds
2024-10-20 20:29                     ` Kent Overstreet
2024-10-20 20:54                       ` Linus Torvalds
2024-10-20 21:21                         ` Linus Torvalds
2024-10-20 21:40                           ` Kent Overstreet
2024-10-27 19:58                           ` Kent Overstreet
2024-10-20 21:29                         ` Kent Overstreet
2024-10-20 21:30                           ` Linus Torvalds
2024-10-20 21:42                             ` Kent Overstreet
2024-10-20 21:51                       ` Joshua Ashton
2024-10-20 21:57                         ` Kent Overstreet
2024-10-21  8:46                       ` Janpieter Sollie
2024-10-21  9:22                         ` Janpieter Sollie [this message]
2024-10-20 19:10               ` Kent Overstreet
2024-10-20 19:53             ` Vlastimil Babka
2024-10-20 20:08               ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be58ca67-bc25-494a-b0a4-93e026c88f10@kabelmail.de \
    --to=janpieter.sollie@kabelmail.de \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox