linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-unstable] mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE)
@ 2022-08-01 21:09 Zach O'Keefe
  2022-08-02  9:09 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Zach O'Keefe @ 2022-08-01 21:09 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-api, linux-kernel, Axel Rasmussen,
	James Houghton, Hugh Dickins, Yang Shi, Miaohe Lin,
	David Hildenbrand, David Rientjes, Matthew Wilcox, Michal Hocko,
	Pasha Tatashin, Peter Xu, Rongwei Wang, SeongJae Park, Song Liu,
	Vlastimil Babka, Zi Yan, Andrea Arcangeli, Arnd Bergmann,
	Chris Kennelly, Chris Zankel, Helge Deller, Ivan Kokshaysky,
	James E.J. Bottomley, Jens Axboe, Kirill A. Shutemov,
	Matt Turner, Max Filippov, Minchan Kim, Patrick Xia,
	Pavel Begunkov, Thomas Bogendoerfer, Zach O'Keefe

process_madvise(MADV_COLLAPSE) currently requires CAP_SYS_ADMIN when not
acting on the caller's own mm.  This is maximally restrictive, and
perpetuates existing issues with CAP_SYS_ADMIN.  Remove this requirement.

When acting on an external process' memory, the biggest concerns for
process_madvise(MADV_COLLAPSE) are (1) being able to influence process
performance by moving memory, possibly between nodes, that is mapped
into the address space of external process(es), (2) defeat of
address-space-layout randomization, and (3), being able to increase
process RSS and memcg usage, possibly causing memcg OOM.

process_madvise(2) already enforces CAP_SYS_NICE and PTRACE_MODE_READ (in
PTRACE_MODE_FSCREDS mode).  A process with these credentials can already
accomplish (1) and (2) via move_pages(MPOL_MF_MOVE_ALL), and (3) via
process_madvise(MADV_WILLNEED).

process_madvise(MADV_COLLAPSE) may also circumvent sysfs THP settings.
When acting on one's own memory (which is equivalent to
madvise(MADV_COLLAPSE)), this is deemed acceptable, since aside from the
possibility of hoarding available hugepages (which is currently already
possible) no harm to the system can be done.  When acting on an external
process' memory, circumventing sysfs THP settings should provide no
additional threat compared to the ones listed.  As such, imposing
additional capabilities (such as CAP_SETUID, as a way to ensure the
caller could have just altered the sysfs THP settings themselves)
provides no extra protection.

Fixes: 7ec952341312 ("mm/madvise: add MADV_COLLAPSE to process_madvise()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 mm/madvise.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index f9e11b6c9916..af97100a0727 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1170,16 +1170,14 @@ madvise_behavior_valid(int behavior)
 	}
 }
 
-static bool
-process_madvise_behavior_valid(int behavior, struct task_struct *task)
+static bool process_madvise_behavior_valid(int behavior)
 {
 	switch (behavior) {
 	case MADV_COLD:
 	case MADV_PAGEOUT:
 	case MADV_WILLNEED:
-		return true;
 	case MADV_COLLAPSE:
-		return task == current || capable(CAP_SYS_ADMIN);
+		return true;
 	default:
 		return false;
 	}
@@ -1457,7 +1455,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 		goto free_iov;
 	}
 
-	if (!process_madvise_behavior_valid(behavior, task)) {
+	if (!process_madvise_behavior_valid(behavior)) {
 		ret = -EINVAL;
 		goto release_task;
 	}
-- 
2.37.1.455.g008518b4e5-goog



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-04 17:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-01 21:09 [PATCH mm-unstable] mm/madvise: remove CAP_SYS_ADMIN requirement for process_madvise(MADV_COLLAPSE) Zach O'Keefe
2022-08-02  9:09 ` Michal Hocko
2022-08-02  9:48   ` Zach O'Keefe
2022-08-02 12:04     ` Michal Hocko
2022-08-02 19:42       ` Zach O'Keefe
2022-08-04 17:46         ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox