On Sun, Oct 20, 2024 at 01:19:42PM -0700, Linus Torvalds wrote:
Enough said, and you're just making shit up to make excuses.
Also, you might want to start look at latency numbers in addition to
throughput. If your journal replay needs an *index* that is 2G in
size, you may have other issues.
Latency for journal replay?
No, journal replay is only something happens at mount after an unclean
shutdown. We can afford to take some time there, and journal replay
performance hasn't been a concern.
Your journal size is insane, and your "artificial cap on performance"
had better come with numbers.
I'm not going to run custom benchmarks just for a silly argument, sorry.
But on a fileserver with 128 GB of ram and a 75 TB filesystem (yes,
that's likely a dedicated fileserver), we can quite easily justify a
btree node cache of perhaps 10GB, and on random update workloads the
journal does need to be that big - otherwise our btree node write size
goes down and throughput suffers.