From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A55EC02182 for ; Fri, 24 Jan 2025 01:03:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 783EF280017; Thu, 23 Jan 2025 20:02:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 733536B009C; Thu, 23 Jan 2025 20:02:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FB22280017; Thu, 23 Jan 2025 20:02:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3F6CA6B009B for ; Thu, 23 Jan 2025 20:02:59 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F261E1209BF for ; Fri, 24 Jan 2025 01:02:58 +0000 (UTC) X-FDA: 83040546036.17.1BFE7BA Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf02.hostedemail.com (Postfix) with ESMTP id 147D280016 for ; Fri, 24 Jan 2025 01:02:56 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ibs6BszH; spf=pass (imf02.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737680577; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D0aJPrChJkjhsAZYwCY7691Ium0idIWe4UuZ5CwOmvw=; b=mJ9tIA36UPBZ+mj+BjucAV1nPYx/bw+c37yOmWU+b1vi/Nl9+lVj+5ejkRhbXw9gnzuKy4 BlGCYH6MkLjZ+MdCFb/F/psvsZ+rIJmMo0+4/rkEnjiSwIpDwOy1/FEOiN/x42D5ptyN2l pdlx6O1/0zFwRimj5UGp/lCi82ikAJw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737680577; a=rsa-sha256; cv=none; b=wXYrf2VV5B0HOqY78ihsUg42WGSeQ51BhRnAztu/yVCj0whS1X0msIZJwlHeT/nxZ05+wN 3F2JmgVzQIt5Yiiy7Qz+QUE8rZQSv4NCzWxWRazMfojnpNWHyVDtU3PTmFsMt+MYXqD9Bd 43aKyPnNzfhJXDYzgQKDGrbilmmGqiA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ibs6BszH; spf=pass (imf02.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-2164b662090so30258365ad.1 for ; Thu, 23 Jan 2025 17:02:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737680576; x=1738285376; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=D0aJPrChJkjhsAZYwCY7691Ium0idIWe4UuZ5CwOmvw=; b=Ibs6BszHSvtroD6Rsw/Xx8fsmo+dmU+HpS6g91Mm97fmTIPUSYHUQhlrlphFUi2pnH /WOpnUaKjgu9y+UqImJOQhWhtDDUl3GghVCujy12/F0ZQTjBCiSSgzOB/ok4JdAVYP8v iLnevn2w6OCGg4C6tzCE5eqDQ4SlnQcnaQi1v/j5mcCIqRiiYjenZzyZIaRbJZrLgx7y HMJesR50fox/gOYyZ1xpr0O8vGCXL1zSgUk4kv3G4VxLU15wYZcevcRXloA9eZ7h913Z hA0QgG3ELEYn3eq8RsOkCHKgvjsEsXZMGRFH+vRT1G/HYgvJpFu0LuxLKtm2AuFUKkNF lYUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737680576; x=1738285376; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D0aJPrChJkjhsAZYwCY7691Ium0idIWe4UuZ5CwOmvw=; b=A+r4wzq6Jf4a0xEodGLmksqcppE1rAuMjX+RPF3cP0Ww+kG93WAurKT0flGU1jLWfq fBTfIMliOufBzB+qqCG6f41kh0mb/RlS7/YfevWqcZP7nZkaCBaiD8ZK32jJZMXoaKIk s+sUbxUIkXwRuWwu1cRgmdGH3HNEZsbhcS58xJmGvoz81lkDq+QGcNOEOV5/SFFO0L2S McMB+SyVyp/kh8yGgQKHfPPvNSa949j+kHS988ziFS+qdO0l+HG3unDRifKS3hi1FYqx C4HC0EXbXSVspgovxZLGZb2VhEQ1VhH9U7PtUWsTkL0OKNPijnyRBNpyiz0Twj2XdH7Z oaKQ== X-Forwarded-Encrypted: i=1; AJvYcCV6i8mCZMZJ3Izsm1DOycaQ3EfTsnrqfE4ehD43DDSUrea/p0N3ni905oyQqChhnx4PHJf3aztFeQ==@kvack.org X-Gm-Message-State: AOJu0YzPWSrbq+CWTDMQiS1sdR8xGG8LMuwa5aRjQF66Nl+ZZ0xd8mX8 Gl02K+0ebmM0Y7orC48J04HO+U2EDwMRH+XJIRrcYfGEetnBslgpRWlXWFqZAZldo7lV/mDs0Yx LYay7qZeIp0is/8wIfOYuYF2474KcM03q X-Gm-Gg: ASbGncsIXuYUCGXoi3NbkWfiw93S7OTdejCcgHc6VpfD7te4fNozgFgkPO40cfU5NLl 3KUHjxOhewBUVP3Wq7g2pRUjW2PPqEUOD6e8vyfwFGvvhiu33gvuarhkcnUBQUdSV8T/uiHxV4Y AsFg== X-Google-Smtp-Source: AGHT+IEuyDzr4dFkkBvqvAW5OHSqHScPRQtMfqosR3W768baJCU6A9tUKJV1y5CH1vjwgjK9jbohovckbO/Io8mhO64= X-Received: by 2002:a05:6a00:4214:b0:727:3c8f:3707 with SMTP id d2e1a72fcca58-72dafbf3b34mr41968026b3a.23.1737680575847; Thu, 23 Jan 2025 17:02:55 -0800 (PST) MIME-Version: 1.0 References: <20250123214342.4145818-1-andrii@kernel.org> <202501231526.A3C13EC5@keescook> In-Reply-To: From: Andrii Nakryiko Date: Thu, 23 Jan 2025 17:02:40 -0800 X-Gm-Features: AbW1kvaNIXPwtdjTpZnh9pk24SpZKxJaM3CayvMBHXdj24HdZVOFF_DSCaFQHy4 Message-ID: Subject: Re: [PATCH] mm,procfs: allow read-only remote mm access under CAP_PERFMON To: Jann Horn Cc: Kees Cook , Suren Baghdasaryan , Andrii Nakryiko , linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, peterz@infradead.org, mingo@kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, shakeel.butt@linux.dev, rppt@kernel.org, liam.howlett@oracle.com, linux-security-module@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 7q5bdjcbkzs638tgbtij9je9estdbuji X-Rspam-User: X-Rspamd-Queue-Id: 147D280016 X-Rspamd-Server: rspam03 X-HE-Tag: 1737680576-193476 X-HE-Meta: U2FsdGVkX19oYzghb0fBZ/K5cj99Zw4QzYsiCh9nNwAw2wd0XM8ougZBDMncox5GNdwScrhigU1qMpA1Ua9OqJtfZoEQ3G0IFCjPdPqkqKeeVjbk+d9g8XwdjRkXOXj1mnazG6220KnR/e/otyBjEO/vmPlmRRku4CAPTUL6TbfYlr20K0l8j47Ru0+sPEJZuPMjDALXNC1YXzptcfDKiE03B7rchuHIeSZGdd2mApJ5VHg92N5WgXQ2acxvfWbZMsEE6hwB2SqD32LtdRiHPpOB0sdoqYZiBSMkSAznmfVn6yWL57fHrXtj5r13O6SB3CJRV6vgyXgzyaMKawiYkke7ZUxMnMzbaX9Wa9+W5A4VhJjldQVNLxjhbchUgLVGm3yvkZJcMQ/b2NoVRwtkCIKfWWk7mmLhhP3HkPoBboPll6XhIDdhZwqmct+JmanjPT7lNLuysWP+5Yg+vvmW/1aPw/tft+f3YrXxUSDIJlo7lVXTwAQEqHUhIGxIkMXmTJiABvE/fadUAgbj1igR3LBFjfDqTVgFvnrrfCOnshfJyhtwj0+RKT5H2PEuRrjESA7nHnMwbcZgTf+kmaw67u//b8u9WfKR6gIya7+zxzW4+i79KTZDhaw2U1yFSzD1kXAWVWcJYX2YHuSo8UAbegQRb7yhLyDxL1C4Z9V1R19nL8XoZIqsfL/oyGta+zuf7DhbGdT1aTmLQ7UwmpRSb+Azs1UNCI/zGVls1v3m1icQ7+atqepwybPrmFL5cwWZu4gzS2dwNBN0xhPAVD/EUnvuugnqJGfaZh64HAaURYUhpCuOw+ZSAsF+x5aqwIrGzu0ynNd/CCI/x3XXvJ4j2LG0ymo/CvMaRAdNKnu7TMmA3MiJjlA1O6n3MXZOTKaXtXhNkuQgKk7u5k4C+/Gvs3LUrAoBW1bkdiJCA6LkI66cO1aUdAVVomSAnUD0GYD3plMrWJh6maIZBDhSn22 kCMAJfQz G6RTrher0Unb4giSsvRU/MSGYdHhLGtsAokLfEwROU065GGfLLLhlfv2DlMBw6ulVhdDlZXNgRMs9dgHwMavb+HnbaafjC+xSDrP6U8jxQ4npbUVAXza1sfEiTrQppKCL7NSPsHPEiS/VzDNMT9taAom7fupg1YstasogCaWRDpSMx3WsUcNAHpAEHCZV4zT92L65MKP/FyH2Uj/M1FouOc0XgyeuonKNoqd+5aH0D+Kvlbb6H31nXb4h4oaBKlppNoBPaHPmnAlq/HIVhZLZAhtC1H4JivoPy+vfaPXN2WRXtYLw1cj++VezfyMILrPXl9j1rQcHtXXQ2SrJq9jvqbWTyCoRt5GOHOu8ZFAln9dZnQ72/Jk+dmLfB4+KonZLYn909n8f79ZTVHu19p++V/c4Hl/Za43RlOv/XBF4NG3UfguHKQKD6J9wUz1EcOc5z+3A X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 23, 2025 at 3:55=E2=80=AFPM Jann Horn wrote: > > On Fri, Jan 24, 2025 at 12:47=E2=80=AFAM Kees Cook wrot= e: > > On Thu, Jan 23, 2025 at 01:52:52PM -0800, Suren Baghdasaryan wrote: > > > On Thu, Jan 23, 2025 at 1:44=E2=80=AFPM Andrii Nakryiko wrote: > > > > > > > > It's very common for various tracing and profiling toolis to need t= o > > > > access /proc/PID/maps contents for stack symbolization needs to lea= rn > > > > which shared libraries are mapped in memory, at which file offset, = etc. > > > > Currently, access to /proc/PID/maps requires CAP_SYS_PTRACE (unless= we > > > > are looking at data for our own process, which is a trivial case no= t too > > > > relevant for profilers use cases). > > > > > > > > Unfortunately, CAP_SYS_PTRACE implies way more than just ability to > > > > discover memory layout of another process: it allows to fully contr= ol > > > > arbitrary other processes. This is problematic from security POV fo= r > > > > applications that only need read-only /proc/PID/maps (and other sim= ilar > > > > read-only data) access, and in large production settings CAP_SYS_PT= RACE > > > > is frowned upon even for the system-wide profilers. > > > > > > > > On the other hand, it's already possible to access similar kind of > > > > information (and more) with just CAP_PERFMON capability. E.g., sett= ing > > > > up PERF_RECORD_MMAP collection through perf_event_open() would give= one > > > > similar information to what /proc/PID/maps provides. > > > > > > > > CAP_PERFMON, together with CAP_BPF, is already a very common combin= ation > > > > for system-wide profiling and observability application. As such, i= t's > > > > reasonable and convenient to be able to access /proc/PID/maps with > > > > CAP_PERFMON capabilities instead of CAP_SYS_PTRACE. > > > > > > > > For procfs, these permissions are checked through common mm_access(= ) > > > > helper, and so we augment that with cap_perfmon() check *only* if > > > > requested mode is PTRACE_MODE_READ. I.e., PTRACE_MODE_ATTACH wouldn= 't be > > > > permitted by CAP_PERFMON. > > > > > > > > Besides procfs itself, mm_access() is used by process_madvise() and > > > > process_vm_{readv,writev}() syscalls. The former one uses > > > > PTRACE_MODE_READ to avoid leaking ASLR metadata, and as such CAP_PE= RFMON > > > > seems like a meaningful allowable capability as well. > > > > > > > > process_vm_{readv,writev} currently assume PTRACE_MODE_ATTACH level= of > > > > permissions (though for readv PTRACE_MODE_READ seems more reasonabl= e, > > > > but that's outside the scope of this change), and as such won't be > > > > affected by this patch. > > > > > > CC'ing Jann and Kees. > > > > > > > > > > > Signed-off-by: Andrii Nakryiko > > > > --- > > > > kernel/fork.c | 11 ++++++++++- > > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > > index ded49f18cd95..c57cb3ad9931 100644 > > > > --- a/kernel/fork.c > > > > +++ b/kernel/fork.c > > > > @@ -1547,6 +1547,15 @@ struct mm_struct *get_task_mm(struct task_st= ruct *task) > > > > } > > > > EXPORT_SYMBOL_GPL(get_task_mm); > > > > > > > > +static bool can_access_mm(struct mm_struct *mm, struct task_struct= *task, unsigned int mode) > > > > +{ > > > > + if (mm =3D=3D current->mm) > > > > + return true; > > > > + if ((mode & PTRACE_MODE_READ) && perfmon_capable()) > > > > + return true; > > > > + return ptrace_may_access(task, mode); > > > > +} > > > > nit: "may" tends to be used more than "can" for access check function n= aming. > > > > So, this will bypass security_ptrace_access_check() within > > ptrace_may_access(). CAP_PERFMON may be something LSMs want visibility > > into. > > > > It also bypasses the dumpability check in __ptrace_may_access(). (Shoul= d > > non-dumpability block visibility into "maps" under CAP_PERFMON?) > > > > This change provides read access for CAP_PERFMON to: > > > > /proc/$pid/maps > > /proc/$pid/smaps > > /proc/$pid/mem > > /proc/$pid/environ > > /proc/$pid/auxv > > /proc/$pid/attr/* > > /proc/$pid/smaps_rollup > > /proc/$pid/pagemap > > > > /proc/$pid/mem access seems way out of bounds for CAP_PERFMON. environ > > and auxv maybe too much also. The "attr" files seem iffy. pagemap may b= e > > reasonable. > > FWIW, my understanding is that if you can use perf_event_open() on a > process, you can also grab large amounts of stack memory contents from > that process via PERF_SAMPLE_STACK_USER/sample_stack_user. (The idea > there is that stack unwinding for userspace stacks is complicated, so > it's the profiler's job to turn a pile of raw stack contents and a > register snapshot into a stack trace.) So _to some extent_ I think it > is already possible to read memory of another process via CAP_PERFMON. > Whether that is desirable or not I don't know, though I guess it's > hard to argue that there's a qualitative security difference between > reading register contents and reading stack memory... If I'm allowed to bring in BPF capabilities coupled with CAP_PERFMON, then you can read not just stack, but pretty much anything both inside the kernel memory (e.g., through bpf_probe_read_kernel()) and user-space (bpf_probe_read_user() for current user task, and more generally bpf_copy_from_user_task() for an arbitrary task for which we have struct task_struct). But we don't really allow access to /proc/PID/mem here, because it's PTRACE_MODE_ATTACH (which is sort of like read/write vs read-only). Similarly, it would be relevant for process_vm_readv(), but that one (currently) is also PTRACE_MODE_ATTACH.