From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FAF3C02181 for ; Fri, 24 Jan 2025 17:32:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80F39280081; Fri, 24 Jan 2025 12:32:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BEDE280079; Fri, 24 Jan 2025 12:32:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6ADAD280081; Fri, 24 Jan 2025 12:32:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4DC5D280079 for ; Fri, 24 Jan 2025 12:32:09 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id F401D14160B for ; Fri, 24 Jan 2025 17:32:08 +0000 (UTC) X-FDA: 83043038778.29.C50F725 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf01.hostedemail.com (Postfix) with ESMTP id F069240003 for ; Fri, 24 Jan 2025 17:32:06 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HQgIChqv; spf=pass (imf01.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737739927; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P9tU465r4veHmtQGjcmCOL+KaVRkBXBk0y82tXRvmPg=; b=IMc7H08vPjDlW/hn5PlZtJBDwc5sgerb2+h1+DvQ3IvKF/OlyfuRviBIq0Kxm/IrRyJ8/3 sPCIbKnNlO+jtHIdU7HbWEqM16mkpLY99+EXkdtoAXdnXnmfwgWWC7yMgrXHSJQWDOx66Z R/5+PTt9tULO/LorAa6XOTRNJZE8hzw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737739927; a=rsa-sha256; cv=none; b=ZZ719rz+YNZ6B/JO6ux13aFTwrPoVUwm6x8FwSbqujVrg2rFA0a7hvNzqEJYlSO9vDy3oO xbpLD0mdRIyLdlzT72zldaWDQdWy2ZoxxPrvnLTrarZhbVZdLaI00dMcgRRPoISJ8rVtqr sQDNqBXk5IcWnTqYlt2bDWZEP9Evjx8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HQgIChqv; spf=pass (imf01.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2ef87d24c2dso3405650a91.1 for ; Fri, 24 Jan 2025 09:32:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737739925; x=1738344725; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=P9tU465r4veHmtQGjcmCOL+KaVRkBXBk0y82tXRvmPg=; b=HQgIChqvMVZzDoV2UFJUoIbFU9mpacFHWgZrzWcROgt63OwYHeYndO+W2oT7yc2L7a qNk2ZKAt9bNayb4Zx69WXV+rTSnC9gEeMpNJdVJvR9MHnwtNzSPlsTmxIFJPUiIoTPBJ SwhpoD2k3XpJ2NhbqNv3rFi4gkyqIpGdJ8QQP4a4eNtkT3vsDADXmRDkHqzsupMMHAs1 BLkYQaH7sW+0pFqV8zut2p8YDBWJtlodocca8nGzPEwCQec+LxdviL9qpgDkq2/0d3Yg eoa0um7OkMV67f7tNJlTsg41G00vKVdZv258ysWV9TSNNTjZDqhyMnDdKpPKEkZrAMN9 Bt9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737739925; x=1738344725; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P9tU465r4veHmtQGjcmCOL+KaVRkBXBk0y82tXRvmPg=; b=gYG77MoCrR9czs8ug9IiOOO5Ilr2rNcGU8dGK85BUUWMCjU8q8LZ6oraV++TfbusiL JLDPa7Mp7b6KffTk0OIXPLZZw4JyzlCPRYqmJ0/1ZZpo/bnkFFjkhg3OV4jVksutToSQ o8HzrUuhW+s9e3UxvsJB1Z6oTsH8jfilRfC4XCA/XcDeQ0TODFEPffyahvD2khMdOmSr 6Kqdj7MV4hWmM924LzybNlhqT5QGUQSnWV+3RQyrkigQSOzlfwZRYFJnPxnM55qedaKi o/LLzPHQsBakNOr6w7Tn7iRL/4kFtq8YNxVsj/4c+YXyBOTL5L2xL4UUS0B/4pipwBvc NZfQ== X-Forwarded-Encrypted: i=1; AJvYcCVF1LN3j34GSfl4GwFjGK5qS0tUAlTQ8tWaSlXUXClBI0a/JHsh/NgdycXmYzvLDop5qrBVUpwPmw==@kvack.org X-Gm-Message-State: AOJu0YxcqYahazaE7znAAxIJ6pdkMmx2hCeKjJLLlcqSLnVNWnv28eO5 Cpz2N1FZhDeihrzQp7ZfQqQ2c/bwyI06LRDC7SrNpOZKHGRDakXlMxyIHZOaVCUSAU3PoxKmk6r /RtEm9cOQPw4W74uH13jQOLrJgoc= X-Gm-Gg: ASbGncs3CgAwYpkihfXbM7N1PVWiooMjV0yeXNdGVUpfWNPS2irICs9F7uoggB1mTuF LUw98f228nq2LTeWXRbZ7zA82wR6wx1Sl9ca45XGn8JePPMGQX3Ii8LruXCUBxTS7eCwrhn9PaI JG6g== X-Google-Smtp-Source: AGHT+IHv177rknNWtVl10ZByeW9NL8yl4cw9RJrKhUy4O48kbuJiw1c2KWX22rSEvTe1hGREuZSZr4B6zTzN2smJdUs= X-Received: by 2002:a05:6a00:84f:b0:725:df1a:288 with SMTP id d2e1a72fcca58-72dafaf8ab3mr52503267b3a.24.1737739925486; Fri, 24 Jan 2025 09:32:05 -0800 (PST) MIME-Version: 1.0 References: <20250123214342.4145818-1-andrii@kernel.org> <20250124-zander-restaurant-7583fe1634b9@brauner> In-Reply-To: <20250124-zander-restaurant-7583fe1634b9@brauner> From: Andrii Nakryiko Date: Fri, 24 Jan 2025 09:31:53 -0800 X-Gm-Features: AWEUYZki9vGi1HFUl8Dz1x-Ht8YaCMiKiPorW6F6sk3BICcrwFMYNypBn50mLzw Message-ID: Subject: Re: [PATCH] mm,procfs: allow read-only remote mm access under CAP_PERFMON To: Christian Brauner Cc: Andrii Nakryiko , linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, peterz@infradead.org, mingo@kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, shakeel.butt@linux.dev, rppt@kernel.org, liam.howlett@oracle.com, surenb@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 5du7s3ge86zyiz1heheufauxumywgynd X-Rspam-User: X-Rspamd-Queue-Id: F069240003 X-Rspamd-Server: rspam03 X-HE-Tag: 1737739926-507215 X-HE-Meta: U2FsdGVkX18O6ldVZhlttUKChGg92CDhHzxiDOM9wQ9Jsddk0TeGkvF7+EHjyAPNWVFydu0WJaIobUoSoddv7S+rYEZ8Unv9zh+yS1Yams736B8jPsULh49VtImXKACKEF4oiWGKKbHaYG4S7vpAe+gkPFOjY9UolxWYZWmuLv/q988YlUjF/GvsVxKRkNLNkVM4JjwuIoAqo1o/3sng7uIdOhCXJL7oiJv92Vxy7eFk85ZbG7lAzKJ8HIfnNp8OR+hBtAdAD51rNAsDk6RcNEjkZRKsmMAlYQU0bvfmKq3UXCkbQFsaeDeQZ4Srf6NdADnSZSz/HNsleJElGicssZ0Aav5KE+kjlR916H+lUqt3lFa1SMjqezjPgp5122+u76nR1gaUb3iR972K9o/rdBti33u+/5wMBfghBvKTCYbwRvbByBx+bO+hDb+frcvBpkStiYbCaiQ145Im/LwNZ5Ebh9X5NSZeysKlY3ft/XN5paw2CbssMk/2G2FSajOfuQMP9OLRGynxV0fqYEcMcl6W415A3ptSkqzPzTQRLspDwLJlTCSzq2agne3U6hbPMYytus4B6GHn4zwUGUb9x9zNvyHAvOoWbpKLyMkErSTUebQa40ezbgRssSPXaAuleYuaUT04xo9dmgkwVWqA2wBKzAenmQzTDkFnwbUwrB97RIZD+wnK5uyoj0uYSPLDCfc3L0qR542R8o65FsTL0ZW9SlyK1D/SWADJSlX+QnYj+We89l/XjKVPjPRAelF/zprFX0S5HSlx3jvkJoiZxcxHQD7ycgXgbjeivkQ9h7i22rR3kdBbtTYEU7IWgTDghNDeuzNm3l69at5037nOJRqVZZgJwmVfvieZ5jc1fMw352gyd9dGTkkozrXgNWnWEFvNSnwqdLLYYi5HbcgCIRBK8KVKQ2prUhKgDHUmJu66gzm3iIgzA1cnlxqFOqABSZ3sy4fWeq9yLcCBRze L0Ko8FtQ S7OpX7SGLUA0e3lTbWGx9mnVwcWUnwYKvkzXAbbbhM5JD3ropP/VXW6s0sfZidMfqotfNLEIdnglcJGfrhqm+9Yz946NWnu4A//mr+hnVczKndd26XErYcObqhyw5ZnW47uzVw0Ogge9opHQBTyaURpJiAI8OkLEKtaYntFht5rZVsXFbIAiMYqOMBALo8vXnZUww1Xj5f1r/9wDVNkZRqOfgnglWq2jT9ULWofcQzGxjJeqQSaaOtGsB0gNHuP0y9FAY1NfvofIJ14y8rq2hja2JCORZb07eylvrC9DkMpohK4hX4qMD+YL4PXSCFE+tLG8c1FkraZhIMRxpydsPEiAuZu0ZWf1NWjir07WSstvQsAxD5WdiMkLesolODlBASNvNAnAy324CvucGVYwOWzf0rZD1cNeUgaw02QBUJRCVi4g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 24, 2025 at 1:45=E2=80=AFAM Christian Brauner wrote: > > On Thu, Jan 23, 2025 at 01:43:42PM -0800, Andrii Nakryiko wrote: > > It's very common for various tracing and profiling toolis to need to > > access /proc/PID/maps contents for stack symbolization needs to learn > > which shared libraries are mapped in memory, at which file offset, etc. > > Currently, access to /proc/PID/maps requires CAP_SYS_PTRACE (unless we > > are looking at data for our own process, which is a trivial case not to= o > > relevant for profilers use cases). > > > > Unfortunately, CAP_SYS_PTRACE implies way more than just ability to > > discover memory layout of another process: it allows to fully control > > arbitrary other processes. This is problematic from security POV for > > applications that only need read-only /proc/PID/maps (and other similar > > read-only data) access, and in large production settings CAP_SYS_PTRACE > > is frowned upon even for the system-wide profilers. > > > > On the other hand, it's already possible to access similar kind of > > information (and more) with just CAP_PERFMON capability. E.g., setting > > up PERF_RECORD_MMAP collection through perf_event_open() would give one > > similar information to what /proc/PID/maps provides. > > > > CAP_PERFMON, together with CAP_BPF, is already a very common combinatio= n > > for system-wide profiling and observability application. As such, it's > > reasonable and convenient to be able to access /proc/PID/maps with > > CAP_PERFMON capabilities instead of CAP_SYS_PTRACE. > > > > For procfs, these permissions are checked through common mm_access() > > helper, and so we augment that with cap_perfmon() check *only* if > > requested mode is PTRACE_MODE_READ. I.e., PTRACE_MODE_ATTACH wouldn't b= e > > permitted by CAP_PERFMON. > > > > Besides procfs itself, mm_access() is used by process_madvise() and > > process_vm_{readv,writev}() syscalls. The former one uses > > PTRACE_MODE_READ to avoid leaking ASLR metadata, and as such CAP_PERFMO= N > > seems like a meaningful allowable capability as well. > > > > process_vm_{readv,writev} currently assume PTRACE_MODE_ATTACH level of > > permissions (though for readv PTRACE_MODE_READ seems more reasonable, > > but that's outside the scope of this change), and as such won't be > > affected by this patch. > > > > Signed-off-by: Andrii Nakryiko > > --- > > kernel/fork.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index ded49f18cd95..c57cb3ad9931 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -1547,6 +1547,15 @@ struct mm_struct *get_task_mm(struct task_struct= *task) > > } > > EXPORT_SYMBOL_GPL(get_task_mm); > > > > +static bool can_access_mm(struct mm_struct *mm, struct task_struct *ta= sk, unsigned int mode) > > +{ > > + if (mm =3D=3D current->mm) > > + return true; > > + if ((mode & PTRACE_MODE_READ) && perfmon_capable()) > > + return true; > > Just fyi, I suspect that this will trigger new audit denials if the task > doesn't have CAP_SYS_ADMIN or CAP_PERFORM in the initial user namespace > but where it would still have access through ptrace_may_access(). Such > changes have led to complaints before. > > I'm not sure how likely that is but it might be noticable. If that's the > case ns_capable_noaudit(&init_user_ns, ...) would help. Yep, thanks. Not sure if this is the problem, but I'm open to changing this. I can also switch the order and do perfmon_capable() check after ptrace_may_access() to mitigate this problem? I guess that's what I'm going to do in v2. > > > + return ptrace_may_access(task, mode); > > +} > > + > > struct mm_struct *mm_access(struct task_struct *task, unsigned int mod= e) > > { > > struct mm_struct *mm; > > @@ -1559,7 +1568,7 @@ struct mm_struct *mm_access(struct task_struct *t= ask, unsigned int mode) > > mm =3D get_task_mm(task); > > if (!mm) { > > mm =3D ERR_PTR(-ESRCH); > > - } else if (mm !=3D current->mm && !ptrace_may_access(task, mode))= { > > + } else if (!can_access_mm(mm, task, mode)) { > > mmput(mm); > > mm =3D ERR_PTR(-EACCES); > > } > > -- > > 2.43.5 > >