* [PATCH v3 1/7] mm, doc: Add doc for MPOL_F_NUMA_BALANCING
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
2023-12-01 9:46 ` [PATCH v3 2/7] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao
The document on MPOL_F_NUMA_BALANCING was missed in the initial commit
The MPOL_F_NUMA_BALANCING document was inadvertently omitted from the
initial commit bda420b98505 ("numa balancing: migrate on fault among
multiple bound nodes")
Let's ensure its inclusion.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
.../admin-guide/mm/numa_memory_policy.rst | 27 +++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index eca38fa81e0f..19071b71979c 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -332,6 +332,33 @@ MPOL_F_RELATIVE_NODES
MPOL_PREFERRED policies that were created with an empty nodemask
(local allocation).
+MPOL_F_NUMA_BALANCING (since Linux 5.12)
+ When operating in MPOL_BIND mode, enables NUMA balancing for tasks,
+ contingent upon kernel support. This feature optimizes page
+ placement within the confines of the specified memory binding
+ policy. The addition of the MPOL_F_NUMA_BALANCING flag augments the
+ control mechanism for NUMA balancing:
+
+ - The sysctl knob numa_balancing governs global activation or
+ deactivation of NUMA balancing.
+
+ - Even if sysctl numa_balancing is enabled, NUMA balancing remains
+ disabled by default for memory areas or applications utilizing
+ explicit memory policies.
+
+ - The MPOL_F_NUMA_BALANCING flag facilitates NUMA balancing
+ activation for applications employing explicit memory policies
+ (MPOL_BIND).
+
+ This flags enables various optimizations for page placement through
+ NUMA balancing. For instance, when an application's memory is bound
+ to multiple nodes (MPOL_BIND), the hint page fault handler attempts
+ to migrate accessed pages to reduce cross-node access if the
+ accessing node aligns with the policy nodemask.
+
+ If the flag isn't supported by the kernel, or is used with mode
+ other than MPOL_BIND, -1 is returned and errno is set to EINVAL.
+
Memory Policy Reference Counting
================================
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 2/7] mm: mempolicy: Revise comment regarding mempolicy mode flags
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
2023-12-01 9:46 ` [PATCH v3 1/7] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
2023-12-01 9:46 ` [PATCH v3 3/7] mm, security: Fix missed security_task_movememory() Yafang Shao
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao,
Eric Dumazet
MPOL_F_STATIC_NODES, MPOL_F_RELATIVE_NODES, and MPOL_F_NUMA_BALANCING are
mode flags applicable to both set_mempolicy(2) and mbind(2) system calls.
It's worth noting that MPOL_F_NUMA_BALANCING was initially introduced in
commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
nodes") exclusively for set_mempolicy(2). However, it was later made a
shared flag for both set_mempolicy(2) and mbind(2) following
commit 6d2aec9e123b ("mm/mempolicy: do not allow illegal
MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind()").
This revised version aims to clarify the details regarding the mode flags.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
---
include/uapi/linux/mempolicy.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index a8963f7ef4c2..afed4a45f5b9 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -26,7 +26,7 @@ enum {
MPOL_MAX, /* always last member of enum */
};
-/* Flags for set_mempolicy */
+/* Flags for set_mempolicy() or mbind() */
#define MPOL_F_STATIC_NODES (1 << 15)
#define MPOL_F_RELATIVE_NODES (1 << 14)
#define MPOL_F_NUMA_BALANCING (1 << 13) /* Optimize with NUMA balancing if possible */
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 3/7] mm, security: Fix missed security_task_movememory()
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
2023-12-01 9:46 ` [PATCH v3 1/7] mm, doc: Add doc for MPOL_F_NUMA_BALANCING Yafang Shao
2023-12-01 9:46 ` [PATCH v3 2/7] mm: mempolicy: Revise comment regarding mempolicy mode flags Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
2023-12-01 20:50 ` Serge E. Hallyn
2023-12-01 9:46 ` [PATCH v3 4/7] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
` (3 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao
Considering that MPOL_F_NUMA_BALANCING or mbind(2) using either
MPOL_MF_MOVE or MPOL_MF_MOVE_ALL are capable of memory movement, it's
essential to include security_task_movememory() to cover this
functionality as well. It was identified during a code review.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
mm/mempolicy.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590ee1c89..1eafe81d782e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1259,8 +1259,15 @@ static long do_mbind(unsigned long start, unsigned long len,
if (!new)
flags |= MPOL_MF_DISCONTIG_OK;
- if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
+ err = security_task_movememory(current);
+ if (err) {
+ mpol_put(new);
+ return err;
+ }
lru_cache_disable();
+ }
+
{
NODEMASK_SCRATCH(scratch);
if (scratch) {
@@ -1450,6 +1457,8 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode,
/* Basic parameter sanity check used by both mbind() and set_mempolicy() */
static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
{
+ int err;
+
*flags = *mode & MPOL_MODE_FLAGS;
*mode &= ~MPOL_MODE_FLAGS;
@@ -1460,6 +1469,9 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
if (*flags & MPOL_F_NUMA_BALANCING) {
if (*mode != MPOL_BIND)
return -EINVAL;
+ err = security_task_movememory(current);
+ if (err)
+ return err;
*flags |= (MPOL_F_MOF | MPOL_F_MORON);
}
return 0;
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 3/7] mm, security: Fix missed security_task_movememory()
2023-12-01 9:46 ` [PATCH v3 3/7] mm, security: Fix missed security_task_movememory() Yafang Shao
@ 2023-12-01 20:50 ` Serge E. Hallyn
2023-12-03 2:57 ` Yafang Shao
0 siblings, 1 reply; 10+ messages in thread
From: Serge E. Hallyn @ 2023-12-01 20:50 UTC (permalink / raw)
To: Yafang Shao
Cc: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang,
linux-mm, linux-security-module, bpf, ligang.bdlg
On Fri, Dec 01, 2023 at 09:46:32AM +0000, Yafang Shao wrote:
> Considering that MPOL_F_NUMA_BALANCING or mbind(2) using either
> MPOL_MF_MOVE or MPOL_MF_MOVE_ALL are capable of memory movement, it's
> essential to include security_task_movememory() to cover this
> functionality as well. It was identified during a code review.
Hm - this doesn't have any bad side effects for you when using selinux?
The selinux_task_movememory() hook checks for PROCESS__SETSCHED privs.
The two existing security_task_movememory() calls are in cases where we
expect the caller to be affecting another task identified by pid, so
that makes sense. Is an MPOL_MV_MOVE to move your own pages actually
analogous to that?
Much like the concern you mentioned in your intro about requiring
CAP_SYS_NICE and thereby expanding its use, it seems that here you
will be regressing some mbind users unless the granting of PROCESS__SETSCHED
is widened.
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
> mm/mempolicy.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 10a590ee1c89..1eafe81d782e 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1259,8 +1259,15 @@ static long do_mbind(unsigned long start, unsigned long len,
> if (!new)
> flags |= MPOL_MF_DISCONTIG_OK;
>
> - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
MPOL_MF_MOVE_ALL already has a CAP_SYS_NICE check. Does that
suffice for that one?
> + err = security_task_movememory(current);
> + if (err) {
> + mpol_put(new);
> + return err;
> + }
> lru_cache_disable();
> + }
> +
> {
> NODEMASK_SCRATCH(scratch);
> if (scratch) {
> @@ -1450,6 +1457,8 @@ static int copy_nodes_to_user(unsigned long __user *mask, unsigned long maxnode,
> /* Basic parameter sanity check used by both mbind() and set_mempolicy() */
> static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
> {
> + int err;
> +
> *flags = *mode & MPOL_MODE_FLAGS;
> *mode &= ~MPOL_MODE_FLAGS;
>
> @@ -1460,6 +1469,9 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
> if (*flags & MPOL_F_NUMA_BALANCING) {
> if (*mode != MPOL_BIND)
> return -EINVAL;
> + err = security_task_movememory(current);
> + if (err)
> + return err;
> *flags |= (MPOL_F_MOF | MPOL_F_MORON);
> }
> return 0;
> --
> 2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v3 3/7] mm, security: Fix missed security_task_movememory()
2023-12-01 20:50 ` Serge E. Hallyn
@ 2023-12-03 2:57 ` Yafang Shao
0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-03 2:57 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: akpm, paul, jmorris, omosnace, mhocko, ying.huang, linux-mm,
linux-security-module, bpf, ligang.bdlg
On Sat, Dec 2, 2023 at 4:50 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Fri, Dec 01, 2023 at 09:46:32AM +0000, Yafang Shao wrote:
> > Considering that MPOL_F_NUMA_BALANCING or mbind(2) using either
> > MPOL_MF_MOVE or MPOL_MF_MOVE_ALL are capable of memory movement, it's
> > essential to include security_task_movememory() to cover this
> > functionality as well. It was identified during a code review.
>
> Hm - this doesn't have any bad side effects for you when using selinux?
> The selinux_task_movememory() hook checks for PROCESS__SETSCHED privs.
> The two existing security_task_movememory() calls are in cases where we
> expect the caller to be affecting another task identified by pid, so
> that makes sense. Is an MPOL_MV_MOVE to move your own pages actually
> analogous to that?
>
> Much like the concern you mentioned in your intro about requiring
> CAP_SYS_NICE and thereby expanding its use, it seems that here you
> will be regressing some mbind users unless the granting of PROCESS__SETSCHED
> is widened.
Ah, it appears that this change might lead to regression. I overlooked
its association with the PROCESS__SETSCHED privilege. I'll exclude
this patch from the upcoming version.
Thanks for your review.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 4/7] mm, security: Add lsm hook for memory policy adjustment
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
` (2 preceding siblings ...)
2023-12-01 9:46 ` [PATCH v3 3/7] mm, security: Fix missed security_task_movememory() Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
2023-12-01 9:46 ` [PATCH v3 5/7] security: selinux: Implement set_mempolicy hook Yafang Shao
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao
In a containerized environment, independent memory binding by a user can
lead to unexpected system issues or disrupt tasks being run by other users
on the same server. If a user genuinely requires memory binding, we will
allocate dedicated servers to them by leveraging kubelet deployment.
At present, users have the capability to bind their memory to a specific
node without explicit agreement or authorization from us. Consequently, a
new LSM hook is introduced to mitigate this. This implementation allows us
to exercise fine-grained control over memory policy adjustments within our
container environment
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/linux/lsm_hook_defs.h | 3 +++
include/linux/security.h | 9 +++++++++
mm/mempolicy.c | 8 ++++++++
security/security.c | 13 +++++++++++++
4 files changed, 33 insertions(+)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index ff217a5ce552..558012719f98 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -419,3 +419,6 @@ LSM_HOOK(int, 0, uring_override_creds, const struct cred *new)
LSM_HOOK(int, 0, uring_sqpoll, void)
LSM_HOOK(int, 0, uring_cmd, struct io_uring_cmd *ioucmd)
#endif /* CONFIG_IO_URING */
+
+LSM_HOOK(int, 0, set_mempolicy, unsigned long mode, unsigned short mode_flags,
+ nodemask_t *nmask, unsigned int flags)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1d1df326c881..cc4a19a0888c 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -484,6 +484,8 @@ int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen);
int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen);
int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen);
int security_locked_down(enum lockdown_reason what);
+int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+ nodemask_t *nmask, unsigned int flags);
#else /* CONFIG_SECURITY */
static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data)
@@ -1395,6 +1397,13 @@ static inline int security_locked_down(enum lockdown_reason what)
{
return 0;
}
+
+static inline int
+security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+ nodemask_t *nmask, unsigned int flags)
+{
+ return 0;
+}
#endif /* CONFIG_SECURITY */
#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1eafe81d782e..9a260dd24a4b 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1495,6 +1495,10 @@ static long kernel_mbind(unsigned long start, unsigned long len,
if (err)
return err;
+ err = security_set_mempolicy(lmode, mode_flags, &nodes, flags);
+ if (err)
+ return err;
+
return do_mbind(start, len, lmode, mode_flags, &nodes, flags);
}
@@ -1589,6 +1593,10 @@ static long kernel_set_mempolicy(int mode, const unsigned long __user *nmask,
if (err)
return err;
+ err = security_set_mempolicy(lmode, mode_flags, &nodes, 0);
+ if (err)
+ return err;
+
return do_set_mempolicy(lmode, mode_flags, &nodes);
}
diff --git a/security/security.c b/security/security.c
index dcb3e7014f9b..685ad7993753 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5337,3 +5337,16 @@ int security_uring_cmd(struct io_uring_cmd *ioucmd)
return call_int_hook(uring_cmd, 0, ioucmd);
}
#endif /* CONFIG_IO_URING */
+
+/**
+ * security_set_mempolicy() - Check if memory policy can be adjusted
+ * @mode: The memory policy mode to be set
+ * @mode_flags: optional mode flags
+ * @nmask: modemask to which the mode applies
+ * @flags: mode flags for mbind(2) only
+ */
+int security_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+ nodemask_t *nmask, unsigned int flags)
+{
+ return call_int_hook(set_mempolicy, 0, mode, mode_flags, nmask, flags);
+}
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 5/7] security: selinux: Implement set_mempolicy hook
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
` (3 preceding siblings ...)
2023-12-01 9:46 ` [PATCH v3 4/7] mm, security: Add lsm hook for memory policy adjustment Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
2023-12-01 9:46 ` [PATCH v3 6/7] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
2023-12-01 9:46 ` [PATCH v3 7/7] NOT kernel/man2/mbind.2: Add mode flag MPOL_F_NUMA_BALANCING Yafang Shao
6 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao
Add a SELinux access control for the newly introduced set_mempolicy lsm
hook. A new permission "setmempolicy" is defined under the "process" class
for it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
security/selinux/hooks.c | 8 ++++++++
security/selinux/include/classmap.h | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index feda711c6b7b..1528d4dcfa03 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4238,6 +4238,13 @@ static int selinux_userns_create(const struct cred *cred)
USER_NAMESPACE__CREATE, NULL);
}
+static int selinux_set_mempolicy(unsigned long mode, unsigned short mode_flags,
+ nodemask_t *nmask, unsigned int flags)
+{
+ return avc_has_perm(current_sid(), task_sid_obj(current), SECCLASS_PROCESS,
+ PROCESS__SETMEMPOLICY, NULL);
+}
+
/* Returns error only if unable to parse addresses */
static int selinux_parse_skb_ipv4(struct sk_buff *skb,
struct common_audit_data *ad, u8 *proto)
@@ -7072,6 +7079,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
LSM_HOOK_INIT(task_kill, selinux_task_kill),
LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode),
LSM_HOOK_INIT(userns_create, selinux_userns_create),
+ LSM_HOOK_INIT(set_mempolicy, selinux_set_mempolicy),
LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission),
LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index a3c380775d41..c280d92a409f 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -51,7 +51,7 @@ const struct security_class_mapping secclass_map[] = {
"getattr", "setexec", "setfscreate", "noatsecure", "siginh",
"setrlimit", "rlimitinh", "dyntransition", "setcurrent",
"execmem", "execstack", "execheap", "setkeycreate",
- "setsockcreate", "getrlimit", NULL } },
+ "setsockcreate", "getrlimit", "setmempolicy", NULL } },
{ "process2",
{ "nnp_transition", "nosuid_transition", NULL } },
{ "system",
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 6/7] selftests/bpf: Add selftests for set_mempolicy with a lsm prog
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
` (4 preceding siblings ...)
2023-12-01 9:46 ` [PATCH v3 5/7] security: selinux: Implement set_mempolicy hook Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
2023-12-01 9:46 ` [PATCH v3 7/7] NOT kernel/man2/mbind.2: Add mode flag MPOL_F_NUMA_BALANCING Yafang Shao
6 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao
The result as follows,
#261/1 set_mempolicy/MPOL_BIND_with_lsm:OK
#261/2 set_mempolicy/MPOL_DEFAULT_with_lsm:OK
#261/3 set_mempolicy/MPOL_BIND_without_lsm:OK
#261/4 set_mempolicy/MPOL_DEFAULT_without_lsm:OK
#261 set_mempolicy:OK
Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
.../selftests/bpf/prog_tests/set_mempolicy.c | 81 +++++++++++++++++++
.../selftests/bpf/progs/test_set_mempolicy.c | 28 +++++++
2 files changed, 109 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
create mode 100644 tools/testing/selftests/bpf/progs/test_set_mempolicy.c
diff --git a/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
new file mode 100644
index 000000000000..6d115ecedb10
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/set_mempolicy.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <linux/mempolicy.h>
+#include <test_progs.h>
+#include "test_set_mempolicy.skel.h"
+
+#define SIZE 4096
+
+static void mempolicy_bind(bool success)
+{
+ unsigned long mask = 1;
+ char *addr;
+ int err;
+
+ addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+ if (!ASSERT_OK_PTR(addr, "mmap"))
+ return;
+
+ /* -lnuma is required by mbind(2), so use __NR_mbind to avoid the dependency. */
+ err = syscall(__NR_mbind, addr, SIZE, MPOL_BIND, &mask, sizeof(mask), 0);
+ if (success)
+ ASSERT_OK(err, "mbind_success");
+ else
+ ASSERT_ERR(err, "mbind_fail");
+
+ munmap(addr, SIZE);
+}
+
+static void mempolicy_default(void)
+{
+ char *addr;
+ int err;
+
+ addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+ if (!ASSERT_OK_PTR(addr, "mmap"))
+ return;
+
+ err = syscall(__NR_mbind, addr, SIZE, MPOL_DEFAULT, NULL, 0, 0);
+ ASSERT_OK(err, "mbind_success");
+
+ munmap(addr, SIZE);
+}
+
+void test_set_mempolicy(void)
+{
+ struct test_set_mempolicy *skel;
+ int err;
+
+ skel = test_set_mempolicy__open();
+ if (!ASSERT_OK_PTR(skel, "open"))
+ return;
+
+ skel->bss->target_pid = getpid();
+
+ err = test_set_mempolicy__load(skel);
+ if (!ASSERT_OK(err, "load"))
+ goto destroy;
+
+ /* Attach LSM prog first */
+ err = test_set_mempolicy__attach(skel);
+ if (!ASSERT_OK(err, "attach"))
+ goto destroy;
+
+ /* syscall to adjust memory policy */
+ if (test__start_subtest("MPOL_BIND_with_lsm"))
+ mempolicy_bind(false);
+ if (test__start_subtest("MPOL_DEFAULT_with_lsm"))
+ mempolicy_default();
+
+destroy:
+ test_set_mempolicy__destroy(skel);
+
+ if (test__start_subtest("MPOL_BIND_without_lsm"))
+ mempolicy_bind(true);
+ if (test__start_subtest("MPOL_DEFAULT_without_lsm"))
+ mempolicy_default();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_set_mempolicy.c b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
new file mode 100644
index 000000000000..b5356d5fcb8b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_set_mempolicy.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Yafang Shao <laoar.shao@gmail.com> */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int target_pid;
+
+static int mem_policy_adjustment(u64 mode)
+{
+ struct task_struct *task = bpf_get_current_task_btf();
+
+ if (task->pid != target_pid)
+ return 0;
+
+ if (mode != MPOL_BIND)
+ return 0;
+ return -1;
+}
+
+SEC("lsm/set_mempolicy")
+int BPF_PROG(setmempolicy, u64 mode, u16 mode_flags, nodemask_t *nmask, u32 flags)
+{
+ return mem_policy_adjustment(mode);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.30.1 (Apple Git-130)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 7/7] NOT kernel/man2/mbind.2: Add mode flag MPOL_F_NUMA_BALANCING
2023-12-01 9:46 [PATCH v3 0/7] mm, security, bpf: Fine-grained control over memory policy adjustments with lsm bpf Yafang Shao
` (5 preceding siblings ...)
2023-12-01 9:46 ` [PATCH v3 6/7] selftests/bpf: Add selftests for set_mempolicy with a lsm prog Yafang Shao
@ 2023-12-01 9:46 ` Yafang Shao
6 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2023-12-01 9:46 UTC (permalink / raw)
To: akpm, paul, jmorris, serge, omosnace, mhocko, ying.huang
Cc: linux-mm, linux-security-module, bpf, ligang.bdlg, Yafang Shao,
Alejandro Colomar, Michael Kerrisk
In Linux Kernel 5.12, a new mode flag, MPOL_F_NUMA_BALANCING, was
added to set_mempolicy() to optimize the page placement among the
NUMA nodes with the NUMA balancing mechanism even if the memory of
the applications is bound with MPOL_BIND.
In Linux Kernel 5.15, this mode flag was extended to mbind(2). Let's
also add man-page for mbind(2). It is copied from set_mempoicy(2)
man-page with subtle modifications.
Related kernel commits:
bda420b985054a3badafef23807c4b4fa38a3dff
6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
---
man2/mbind.2 | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/man2/mbind.2 b/man2/mbind.2
index ba1b81ae9..dac784389 100644
--- a/man2/mbind.2
+++ b/man2/mbind.2
@@ -142,6 +142,23 @@ The supported
.I "mode flags"
are:
.TP
+.BR MPOL_F_NUMA_BALANCING " (since Linux 5.15)"
+.\" commit bda420b985054a3badafef23807c4b4fa38a3dff
+.\" commit 6d2aec9e123bb9c49cb5c7fc654f25f81e688e8c
+When
+.I mode
+is
+.BR MPOL_BIND ,
+enable the kernel NUMA balancing for the task if it is supported by the kernel.
+If the flag isn't supported by the kernel, or is used with
+.I mode
+other than
+.BR MPOL_BIND ,
+\-1 is returned and
+.I errno
+is set to
+.BR EINVAL .
+.TP
.BR MPOL_F_STATIC_NODES " (since Linux-2.6.26)"
A nonempty
.I nodemask
--
2.39.3
^ permalink raw reply [flat|nested] 10+ messages in thread