On Fri, Dec 15, 2023 at 12:22 AM Chris Li <chrisl@kernel.org> wrote:
>
> Hi Fabian,
>
> On Thu, Dec 14, 2023 at 10:00 AM Fabian Deutsch <fdeutsch@redhat.com>
wrote:
>
> > Yep - for container use-cases.
> >
> > Now a few thoughts in this direction:
> > - With swap per cgroup you loose the big "statistical" benefit of
having swap on a node level. well, it depends on the size of the cgroup
(i.e. system.slice is quite large).
>
> Just to clarify, the "node" you mean the "node" in kubernetes sense,
> which is the whole machine. In the Linux kernel MM context, the node
> often refers to the NUMA memory node, that is not what you mean here,
> right?

Correct - I was referring to Kubernetes, and not numa nodes.

>
> > - With todays node level swap, and setting memory.swap.max=0 for all
cgroups allows you toachieve a similar behavior (only opt-in cgroups will
get swap).
> > - the above approach however will still have a shared swap backend for
all cgroups.
>
> Yes, the "memory.swap.tires" idea is trying to allow cgroups to select
> a subset of the swap backend in a specific order. It is still in the
> early stage of discussion. If you have any suggestion or feedback in
> that direction, I am looking forward to hearing that.

Interesting. There have been concerns to leak confidential data
accidentally when it's getting written to a swap device.

The other less discussed item was QoS for swap io traffic.

At a first glance it seems like tires could help with the second use-case.

- fabian

>
> Chris
>