[RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org, paul@paul-moore.com,
	jmorris@namei.org, serge@hallyn.com, omosnace@redhat.com,
	mhocko@suse.com
Cc: linux-mm@kvack.org, linux-security-module@vger.kernel.org,
	bpf@vger.kernel.org, ligang.bdlg@bytedance.com,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH v2 0/6] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf
Date: Wed, 22 Nov 2023 14:15:53 +0000	[thread overview]
Message-ID: <20231122141559.4228-1-laoar.shao@gmail.com> (raw)

Background
==========

In our containerized environment, we've identified unexpected OOM events
where the OOM-killer terminates tasks despite having ample free memory.
This anomaly is traced back to tasks within a container using mbind(2) to
bind memory to a specific NUMA node. When the allocated memory on this node
is exhausted, the OOM-killer, prioritizing tasks based on oom_score,
indiscriminately kills tasks. 

The Challenge 
============
In a containerized environment, independent memory binding by a user can
lead to unexpected system issues or disrupt tasks being run by other users
on the same server. If a user genuinely requires memory binding, we will
allocate dedicated servers to them by leveraging kubelet deployment.

Currently, users possess the ability to autonomously bind their memory to
specific nodes without explicit agreement or authorization from our end.
It's imperative that we establish a method to prevent this behavior.

Proposed Solutions
=================

- Introduce Capability to Disable MPOL_BIND
  Currently, any task can perform MPOL_BIND without specific capabilities.
  Enforcing CAP_SYS_RESOURCE or CAP_SYS_NICE could be an option, but this
  may have unintended consequences. Capabilities, being broad, might grant
  unnecessary privileges. We should explore alternatives to prevent
  unexpected side effects.

- Use LSM BPF to Disable MPOL_BIND
  Introduce LSM hooks for syscalls such as mbind(2), set_mempolicy(2), and
  set_mempolicy_home_node(2) to disable MPOL_BIND. This approach is more
  flexibility and allows for fine-grained control without unintended
  consequences. A sample LSM BPF program is included, demonstrating
  practical implementation in a production environment.

- seccomp
  seccomp is relatively heavyweight, making it less suitable for
  enabling in our production environment:
  - Both kubelet and containers need adaptation to support it.
  - Dynamically altering security policies for individual containers
    without interrupting their operations isn't straightforward.

Future Considerations
=====================

In addition, there's room for enhancement in the OOM-killer for cases
involving CONSTRAINT_MEMORY_POLICY. It would be more beneficial to
prioritize selecting a victim that has allocated memory on the same NUMA
node. My exploration on the lore led me to a proposal[0] related to this
matter, although consensus seems elusive at this point. Nevertheless,
delving into this specific topic is beyond the scope of the current
patchset.

[0] https://lore.kernel.org/lkml/20220512044634.63586-1-ligang.bdlg@bytedance.com/


Changes:
- RFC v1 -> RFC v2:
  - Refine the commit log to avoid misleading
  - Use one common lsm hook instead and add comment for it
  - Add selinux implementation
  - Other improments in mempolicy
- RFC v1: https://lwn.net/Articles/951188/

Yafang Shao (6):
  mm, doc: Add doc for MPOL_F_NUMA_BALANCING
  mm: mempolicy: Revise comment regarding mempolicy mode flags
  mm, security: Fix missed security_task_movememory() in mbind(2)
  mm, security: Add lsm hook for memory policy adjustment
  security: selinux: Implement set_mempolicy hook
  selftests/bpf: Add selftests for set_mempolicy with a lsm prog

 .../admin-guide/mm/numa_memory_policy.rst     | 27 +++++++
 include/linux/lsm_hook_defs.h                 |  3 +
 include/linux/security.h                      |  9 +++
 include/uapi/linux/mempolicy.h                |  2 +-
 mm/mempolicy.c                                | 17 +++-
 security/security.c                           | 13 +++
 security/selinux/hooks.c                      |  8 ++
 security/selinux/include/classmap.h           |  2 +-
 tools/testing/selftests/bpf/Makefile          |  2 +-
 .../selftests/bpf/prog_tests/set_mempolicy.c  | 79 +++++++++++++++++++
 .../selftests/bpf/progs/test_set_mempolicy.c  | 29 +++++++
 11 files changed, 187 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c

-- 
2.30.1 (Apple Git-130)

next             reply	other threads:[~2023-11-22 14:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-22 14:15 Yafang Shao [this message]
2023-11-22 14:15 ` [RFC PATCH v2 1/6] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
2023-11-23  6:37   ` Huang, Ying
2023-11-22 14:15 ` [RFC PATCH v2 2/6] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
2023-11-23  6:30   ` Huang, Ying
2023-11-23 12:21     ` Yafang Shao
2023-11-22 14:15 ` [RFC PATCH v2 3/6] mm, security: Fix missed security_task_movememory() in mbind(2) Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231122141559.4228-1-laoar.shao@gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=jmorris@namei.org \
    --cc=ligang.bdlg@bytedance.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=omosnace@redhat.com \
    --cc=paul@paul-moore.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox