On Fri, Dec 15, 2023 at 12:22 AM Chris Li wrote: > > Hi Fabian, > > On Thu, Dec 14, 2023 at 10:00 AM Fabian Deutsch wrote: > > > Yep - for container use-cases. > > > > Now a few thoughts in this direction: > > - With swap per cgroup you loose the big "statistical" benefit of having swap on a node level. well, it depends on the size of the cgroup (i.e. system.slice is quite large). > > Just to clarify, the "node" you mean the "node" in kubernetes sense, > which is the whole machine. In the Linux kernel MM context, the node > often refers to the NUMA memory node, that is not what you mean here, > right? Correct - I was referring to Kubernetes, and not numa nodes. > > > - With todays node level swap, and setting memory.swap.max=0 for all cgroups allows you toachieve a similar behavior (only opt-in cgroups will get swap). > > - the above approach however will still have a shared swap backend for all cgroups. > > Yes, the "memory.swap.tires" idea is trying to allow cgroups to select > a subset of the swap backend in a specific order. It is still in the > early stage of discussion. If you have any suggestion or feedback in > that direction, I am looking forward to hearing that. Interesting. There have been concerns to leak confidential data accidentally when it's getting written to a swap device. The other less discussed item was QoS for swap io traffic. At a first glance it seems like tires could help with the second use-case. - fabian > > Chris >