Hi guys, So, linuxcon is approaching. To help making our discussions more productive, I sketched a basic prototype of a kmem cgroup that can control the size of the dentry cache. I am sending the code here so you guys can have an idea, but keep in mind this is a *sketch*. This is my view of how our controller *could be*, not necessarily what it *should be*. All your input is more than welcome. Let me first explain a bit of my approach: (there are some comments inline as well) * So far it only works with the slab (you will see that something similar can be done for at least the slub) Since most of us is concerned mostly with memory abuse (I think), I neglected for simplicity the initial memory allocated for the arrays. Only when cache_grow is called to allocated more pages, is that we bill then. * I avoid resorting to the shrinkers, trying to free the slab pages themselves whenever possible. * We don't limit the size of all caches. They have to register themselves explicitly (and in this PoC, I am using the dentry cache as an example) * The object is billed to whoever touched it first. Other policies are of course possible. What I am *not* concerned about in this PoC: (left for future work, if needed) - unified user/memory kernel memory reclaim - changes to the shrinkers. - changes to the limit once it is already in place - per-cgroup display in /proc/slabinfo - task movement - a whole lot of other stuff. * Hey glommer, do you have numbers? Yes, I have 8 numbers. And since 8 is also a number, then I have 9 numbers. So what I did was to type "find /" in a freshly booted system (my laptop). I just ran each iteration once, so nothing scientific. I halved the limits until the allocations started to fail, which was more or less around 256K hard limit. Find is also not a workload that pins the dentries in memory for very long. Other kinds of workloads will display different results here... Base: (non-patched kernel) real 0m16.091s user 0m0.567s sys 0m6.649s Patched kernel, root cgroup (unlimited. max used mem: 22Mb) real 0m15.853s user 0m0.511s sys 0m6.417s 16Mb/4Mb (HardLimit/SoftLimit) real 0m16.596s user 0m0.560s sys 0m6.947s 8Mb/4Mb real 0m16.975s user 0m0.568s sys 0m7.047s 4Mb/2Mb real 0m16.713s user 0m0.554s sys 0m7.022s 2Mb/1Mb real 0m17.001s user 0m0.544s sys 0m7.118s 1Mb/512K real 0m16.671s user 0m0.530s sys 0m7.067s 512k/256k real 0m17.395s user 0m0.567s sys 0m7.179s So, what those initial numbers do tell us, is that the performance penalty for the root cgroup is not expected to be that bad. When the limits start to be hit, a penalty is incurred, which is under the expectations.