linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Casey Schaufler <casey@schaufler-ca.com>
Cc: akpm@linux-foundation.org, paul@paul-moore.com,
	jmorris@namei.org,  serge@hallyn.com, linux-mm@kvack.org,
	linux-security-module@vger.kernel.org,  bpf@vger.kernel.org,
	ligang.bdlg@bytedance.com, mhocko@suse.com
Subject: Re: [RFC PATCH -mm 0/4] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf
Date: Mon, 13 Nov 2023 11:15:06 +0800	[thread overview]
Message-ID: <CALOAHbD+_0tHcm72Q6TM=EXDoZFrVWAsi4AC8_xGqK3wGkEy3g@mail.gmail.com> (raw)
In-Reply-To: <188dc90e-864f-4681-88a5-87401c655878@schaufler-ca.com>

On Mon, Nov 13, 2023 at 12:45 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/11/2023 11:34 PM, Yafang Shao wrote:
> > Background
> > ==========
> >
> > In our containerized environment, we've identified unexpected OOM events
> > where the OOM-killer terminates tasks despite having ample free memory.
> > This anomaly is traced back to tasks within a container using mbind(2) to
> > bind memory to a specific NUMA node. When the allocated memory on this node
> > is exhausted, the OOM-killer, prioritizing tasks based on oom_score,
> > indiscriminately kills tasks. This becomes more critical with guaranteed
> > tasks (oom_score_adj: -998) aggravating the issue.
>
> Is there some reason why you can't fix the callers of mbind(2)?
> This looks like an user space configuration error rather than a
> system security issue.

It appears my initial description may have caused confusion. In this
scenario, the caller is an unprivileged user lacking any capabilities.
While a privileged user, such as root, experiencing this issue might
indicate a user space configuration error, the concerning aspect is
the potential for an unprivileged user to disrupt the system easily.
If this is perceived as a misconfiguration, the question arises: What
is the correct configuration to prevent an unprivileged user from
utilizing mbind(2)?"

>
> >
> > The selected victim might not have allocated memory on the same NUMA node,
> > rendering the killing ineffective. This patch aims to address this by
> > disabling MPOL_BIND in container environments.
> >
> > In the container environment, our aim is to consolidate memory resource
> > control under the management of kubelet. If users express a preference for
> > binding their memory to a specific NUMA node, we encourage the adoption of
> > a standardized approach. Specifically, we recommend configuring this memory
> > policy through kubelet using cpuset.mems in the cpuset controller, rather
> > than individual users setting it autonomously. This centralized approach
> > ensures that NUMA nodes are globally managed through kubelet, promoting
> > consistency and facilitating streamlined administration of memory resources
> > across the entire containerized environment.
>
> Changing system behavior for a single use case doesn't seem prudent.
> You're introducing a bunch of kernel code to avoid fixing a broken
> user space configuration.

Currently, there is no mechanism in place to proactively prevent an
unprivileged user from utilizing mbind(2). The approach adopted is to
monitor mbind(2) through a BPF program and trigger an alert if its
usage is detected. However, beyond this monitoring, the only recourse
is to verbally communicate with the user, advising against the use of
mbind(2). As a result, users will question why mbind(2) isn't outright
prohibited in the first place.

-- 
Regards
Yafang


  reply	other threads:[~2023-11-13  3:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20231112073424.4216-1-laoar.shao@gmail.com>
2023-11-12 16:45 ` Casey Schaufler
2023-11-13  3:15   ` Yafang Shao [this message]
2023-11-13  8:50     ` Ondrej Mosnacek
2023-11-13 21:23       ` Casey Schaufler
2023-11-14  2:30       ` Yafang Shao
2023-11-14 10:15     ` Michal Hocko
2023-11-14 11:59       ` Yafang Shao
2023-11-14 16:57         ` Casey Schaufler
2023-11-15  1:52           ` Yafang Shao
2023-11-15  8:45             ` Michal Hocko
2023-11-15  9:33               ` Yafang Shao
2023-11-15 14:26                 ` Yafang Shao
2023-11-15 17:09                   ` Casey Schaufler
2023-11-16  1:41                     ` Yafang Shao
2023-11-15 17:00                 ` Michal Hocko
2023-11-16  2:22                   ` Yafang Shao
2023-11-12 20:32 ` Paul Moore
2023-11-13  3:17   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALOAHbD+_0tHcm72Q6TM=EXDoZFrVWAsi4AC8_xGqK3wGkEy3g@mail.gmail.com' \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=casey@schaufler-ca.com \
    --cc=jmorris@namei.org \
    --cc=ligang.bdlg@bytedance.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox