Op 21/10/2024 om 10:46 schreef Janpieter Sollie: > Op 20/10/2024 om 22:29 schreef Kent Overstreet: >>> >>> >> >> I'm not going to run custom benchmarks just for a silly argument, sorry. >> But on a fileserver with 128 GB of ram and a 75 TB filesystem >> (yes, that's likely a dedicated fileserver), >> we can quite easily justify a btree node cache of perhaps 10GB, >> and on random update workloads the journal does need to be that big - >> otherwise our btree node write size goes down and throughput suffers. >> > Is this idea based on "the user only has 1 FS per device?" > I assume I have this setup (and it probably is, looks like mine). > I have 3 bcachefs filesystems each taking 10% of RAM. > So, I end up with a memory load of 30% dedicated to bcachefs caching. > If I read your argument, you say "I want a large btree node cache, > because that's making the fs more efficient".  No doubts about that. > > VFS buffering may already save you a lot of lookups you're actually > building the btree node cache for. > Theoretically, there's a large difference about how they work, > but in practice, what files will it lookup mostly? > Probably the few ones you already have in your vfs buffer. > The added value of keeping a large "metadata" cache seems doubtful. > > I have my doubts about trading 15G of buffer to 15G of btree node cache: > You lose the opportunity to share those 15G ram between all filesystems. > On the other hand, when you perform many different file lookups, > it will shine with everything it has. > > Maybe some tuning parameter could help here? > it will at least limit the "insane" required journal size > > Janpieter Sollie And things quickly grow out of hand here: a bcachefs report on fs usage: blablabla (other disks) A device dedicated to metadata: SSDM (device 6):                sdg1              rw                                data         buckets    fragmented  free:                  39254491136          149744  sb:                        3149824              13        258048  journal:                 312475648            1192  btree:                   428605440            1635  user:                            0               0  cached:                          0               0  parity:                          0               0  stripe:                          0               0  need_gc_gens:                    0               0  need_discard:                    0               0  unstriped:                       0               0  capacity:              39998980096          152584 Oops ... the journal size is more than 70% of the fs data! Janpieter Sollie