From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 768B8D60D03 for ; Tue, 19 Nov 2024 00:40:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8EC26B007B; Mon, 18 Nov 2024 19:40:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B6F876B0082; Mon, 18 Nov 2024 19:40:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A06D46B0085; Mon, 18 Nov 2024 19:40:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 831C36B007B for ; Mon, 18 Nov 2024 19:40:01 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E2F5F40010 for ; Tue, 19 Nov 2024 00:40:00 +0000 (UTC) X-FDA: 82800983916.20.66D5373 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf04.hostedemail.com (Postfix) with ESMTP id 28C8A40008 for ; Tue, 19 Nov 2024 00:38:55 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QN28r1oz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of jannh@google.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731976708; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YzMNvJAMgAye6FGH0t0zMVzWqRVdNrG+1sIZy8+p89A=; b=uRWrm3Wj9RwWplhDuVhhcg0cCMiNnnMS7kXPFkji49eO6twbU+o/cszyoBF0tK7t6KaAZW UD7b79EKqeyR7p+43/KeQuJF0xv5/Sj4MQN5RNTeZQLrT5tzIt+4INY4eU3bYZyfD7Rr9G UMk7RGMU1Xug/S1WBmJGe0gzcvYT8C0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QN28r1oz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of jannh@google.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731976708; a=rsa-sha256; cv=none; b=lmHlTlMCF1kEPEktUBGItGmSGVrP6CRKYu7XJxdY4hnihalgRd75+HSWnhkPFvxFDS0V0a fBjfb/IHVKDgSCk1w3sDsL6vfiUBAKu/7y15zyAgl2eZ/Ue6evpk/E/Z6L8nON+7MpC+GW 4kZgVpDxRGcqjM3bN31qkhrLvJjhgcc= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5cfc264b8b6so4259a12.0 for ; Mon, 18 Nov 2024 16:39:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731976797; x=1732581597; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=YzMNvJAMgAye6FGH0t0zMVzWqRVdNrG+1sIZy8+p89A=; b=QN28r1oz+syJWTM9ruRZ4gHzbro+p7nJt/5+m2kFtVnBa2LAzt0x2zJBIeYBs/0/5p 0jmgfcPrCWwwMSb4OHOLI9Ni3nWPi+TcrS/OiLyGnFGtnAsfGQTPeIf8QgPdkcS9T0PX P8/tElOHvfKjHQcvPqdfXNv7elFnDsvwzLzheDYXPu7v87LKfl1vfE2WxNJWDuBFJGfx hOppIYAN0x7fvdjXKExW53ppSddsRj1W9jnLpgMg6sq9d29Fu3YmtA3qJHs6NgLKYkpD VMMfMiLAc3D75PtYLlSRyelOy41LWaFT/mRSAKhnxVmfMlMjh1ef3roePykz4JJr+pZn eD5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731976797; x=1732581597; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YzMNvJAMgAye6FGH0t0zMVzWqRVdNrG+1sIZy8+p89A=; b=RqeqTXh7ez2NVC3eoufQxMVtkr01CL3WXmme+mPoRUZkJtlxbSeJRcmUjWUABa9Mj/ nRUav+xT75jmnFweEygdsSecYBhm/Osl6THT7y7KwIygHskXau9leuC18T9A0McHJYDH qGUMAYhj5OH53gbgnTrrGWrgdNXYD3ZDVzVrkp8Ym1CgLPL5Oz7R/y/4WKF0tvrhkbDK VMDSXRs7Zxc70KzYybkZYfeJ5E2C1UpO1dEG3wDYGFeEMUntnGGG7Ev2haifOnWsqpU9 8ZKiytQTGjrFpMLndFJ4tyBoypIiTBcWImezp7gh1Btfuqu+hWKgOD34hXGayiL0G4pH vcyg== X-Forwarded-Encrypted: i=1; AJvYcCVkEw8i5hcda7BD4qanBpWXRncgnRXgBpXRPjx48uM1V6339N9HU/rOUePx+uo+rlujt1V5djr8og==@kvack.org X-Gm-Message-State: AOJu0YziCK6ap9AvcCZBat+s7TMr3EOl2C51R690TDAghQSOT8T/VJcl jfMurXpmvSBmSqH9+DrIgb0ukLSNJ3Q5j8CSTDshY+8UKdZSRE9xLaKD3vyZUY40d9iLBWb6JQI vkgbdhgYmCLrNLY0Qi8AoodkPlcUWtXPo8ygU X-Gm-Gg: ASbGncsn43eLeQVnflWykgOHuhjYjSPRfUkM6XiQ5VBz3t6nbn/s+411gdokd9i4Jgl n8W6n4awSGEeXbYHKxFhzkRgDODx9AOEn/1mNpc94nVYEKTjNklUTdlLVMRVt X-Google-Smtp-Source: AGHT+IEqFrrGyfsezRMsp/X151aTE55F7rPkKrYZLMq7HZJ1MQF2NOGKpWL8huBQFIpX81LUzRGmY7/k9ZOF+cLi7qc= X-Received: by 2002:a05:6402:1351:b0:5cf:bd9a:41ec with SMTP id 4fb4d7f45d1cf-5cfdec244d3mr32883a12.2.1731976796952; Mon, 18 Nov 2024 16:39:56 -0800 (PST) MIME-Version: 1.0 References: <20241116175922.3265872-1-pasha.tatashin@soleen.com> In-Reply-To: From: Jann Horn Date: Tue, 19 Nov 2024 01:39:19 +0100 Message-ID: Subject: Re: [RFCv1 0/6] Page Detective To: Pasha Tatashin Cc: Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org, akpm@linux-foundation.org, corbet@lwn.net, derek.kiernan@amd.com, dragan.cvetic@amd.com, arnd@arndb.de, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, tj@kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, Liam.Howlett@oracle.com, vbabka@suse.cz, shuah@kernel.org, vegard.nossum@oracle.com, vattunuru@marvell.com, schalla@marvell.com, david@redhat.com, willy@infradead.org, osalvador@suse.de, usama.anjum@collabora.com, andrii@kernel.org, ryan.roberts@arm.com, peterx@redhat.com, oleg@redhat.com, tandersen@netflix.com, rientjes@google.com, gthelen@google.com, linux-hardening@vger.kernel.org, Kernel Hardening Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 28C8A40008 X-Stat-Signature: 14yr6jktno3yfy35o6fc9e58ypshb1gw X-HE-Tag: 1731976735-905250 X-HE-Meta: U2FsdGVkX18HcTHLgNVYtP8fItKTN2bd5OZUXnkeSwIV9Js0K8gn6qdK6EO2wjKMqh8qsH1q2K5jdKymBcu4vYz6pLMqoz35KGSwXde5EjkT/AgFzrQi6vA+kzHHBBL+uwMHZ/qVQlIcuXqaz9Wc6pJuHCrxLJVsZ12/D++IN0ZL4mVPwbg0Imipl618FHgMGZ1VeQzwolIckmCwpAnJZM4WvXxo6BReT2OfKEq/99q34qpmQPIyUF1EFfIKIjjbYQ9uhU2RCowHwZLpA3SdO47YDSrqjJ43ojIUYghR2KChMSQQmXhE3SUPWmZQTv8nblJwEKJugmoXajmul8ws2+Qnvk2d+9CFSSoQwOIdDy5OWkBBZtNMEFP18PRRv3IUTOi/q/mJL4NJDt4ecWt29ptfWnKsKgTNrX5BSdAURlZx3LyK2kbA6uSk2sFUm0lr4rzOvzbkUv8VcYjNoCBDDtHrdeaD6wSRVjguUfOTzbD9AYNKn7ZUJi6TSBaCZabM9LH0KJDIsOmsf7RZJyvRRz7WSgeMy9Kq4qoJdRP2eB31xfmw6+XnAtmq3B+QOzQMVl4gRG3PNNuaqw3Rp638mc+3+dHj9mRGJUYE1ATyJp2ZC8VCOJ0CB8/0EfitjLFwHD6rIQFCZtEOZW6ic2XX6Cvpa+U3YkS5a6dMhyGcAk0mmXM38W5p5xjsmjbGEyay+R92bQbLa93MbDO/nOaWOTT0/JZH0/De6LB2w4w0jBuPWf+7s+1Y2Zn0Ck+X8iSuT9cZ/7tjfjflj0Tjbmi8Kd6lrO9KfXK33ppiEuqRA9DNJLM+TuWzVMAsMX9YXsKwmUI7BHFACxBvd3uxHOB+CVWfVSEBdHl1GJqYgZdvMjtgoKQ+KOL5KbN+144Ew4n9KTdFRdzcvH7DCjb27KmU/GBbbVAMqTDtCDzFufA0XrdZo091I+HYrCG0DhmYXB3k2euVF9zLuHFnaHTxcrm FS4KkCK0 WAU6lrmeJ+lNW45Ki+4fZVMFHm94c+c7Eu+A4Z/LB1NFlv74kTHZt7GnmplRRBGwh8gz8wKpLokrqaSJp987aQHx5dUVDbBVEZZUH5oTmFNthoqv8lN8eCkFUqGalArsNLsG7I4OvEMbXqKVs8P1e6Xq7VzJruq4L5/GfRSvk4Mxc/QyokDVV1lcpbUA9SUlieLvVcGoM6u2TPw9+bEXkb8E2F/+d33ZQBDgzRJv8NbJS53N4aQgHJFCZTeqCMNML1WGMvlxPtpLYbfG83o5JIN72cF5/kGi6dAcNqkvOvNmIOfI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 18, 2024 at 11:24=E2=80=AFPM Pasha Tatashin wrote: > On Mon, Nov 18, 2024 at 7:54=E2=80=AFAM Jann Horn wrot= e: > > > > On Mon, Nov 18, 2024 at 12:17=E2=80=AFPM Lorenzo Stoakes > > wrote: > > > On Sat, Nov 16, 2024 at 05:59:16PM +0000, Pasha Tatashin wrote: > > > > It operates through the Linux debugfs interface, with two files: "v= irt" > > > > and "phys". > > > > > > > > The "virt" file takes a virtual address and PID and outputs informa= tion > > > > about the corresponding page. > > > > > > > > The "phys" file takes a physical address and outputs information ab= out > > > > that page. > > > > > > > > The output is presented via kernel log messages (can be accessed wi= th > > > > dmesg), and includes information such as the page's reference count= , > > > > mapping, flags, and memory cgroup. It also shows whether the page i= s > > > > mapped in the kernel page table, and if so, how many times. > > > > > > I mean, even though I'm not a huge fan of kernel pointer hashing etc.= this > > > is obviously leaking as much information as you might want about kern= el > > > internal state to the point of maybe making the whole kernel pointer > > > hashing thing moot. > > > > > > I know this requires CAP_SYS_ADMIN, but there are things that also re= quire > > > that which _still_ obscure kernel pointers. > > > > > > And you're outputting it all to dmesg. > > > > > > So yeah, a security person (Jann?) would be better placed to comment = on > > > this than me, but are we sure we want to do this when not in a > > > CONFIG_DEBUG_VM* kernel? > > > > I guess there are two parts to this - what root is allowed to do, and > > what information we're fine with exposing to dmesg. > > > > If the lockdown LSM is not set to LOCKDOWN_CONFIDENTIALITY_MAX, the > > kernel allows root to read kernel memory through some interfaces - in > > particular, BPF allows reading arbitrary kernel memory, and perf > > allows reading at least some stuff (like kernel register states). With > > lockdown in the most restrictive mode, the kernel tries to prevent > > root from reading arbitrary kernel memory, but we don't really change > > how much information goes into dmesg. (And I imagine you could > > probably still get kernel pointers out of BPF somehow even in the most > > restrictive lockdown mode, but that's probably not relevant.) > > > > The main issue with dmesg is that some systems make its contents > > available to code that is not running with root privileges; and I > > think it is also sometimes stored persistently in unencrypted form > > (like in EFI pstore) even when everything else on the system is > > encrypted. > > So on one hand, we definitely shouldn't print the contents of random > > chunks of memory into dmesg without a good reason; on the other hand, > > for example we do already print kernel register state on WARN() (which > > often includes kernel pointers and could theoretically include more > > sensitive data too). > > > > So I think showing page metadata to root when requested is probably > > okay as a tradeoff? And dumping that data into dmesg is maybe not > > great, but acceptable as long as only root can actually trigger this? > > > > I don't really have a strong opinion on this... > > > > > > To me, a bigger issue is that dump_page() looks like it might be racy, > > which is maybe not terrible in debugging code that only runs when > > something has already gone wrong, but bad if it is in code that root > > can trigger on demand? > > Hi Jann, thank you for reviewing this proposal. > > Presumably, the interface should be used only when something has gone > wrong but has not been noticed by the kernel. That something is > usually checksums failures that are outside of the kernel: i.e. during > live migration, snapshotting, filesystem journaling, etc. We already > have interfaces that provide data from the live kernel that could be > racy, i.e. crash utility. Ah, yes, I'm drawing a distinction here between "something has gone wrong internally in the kernel and the kernel does some kinda-broken best-effort self-diagnostics" and "userspace thinks something is broken and asks the kernel". > > __dump_page() copies the given page with > > memcpy(), which I don't think guarantees enough atomicity with > > concurrent updates of page->mapping or such, so dump_mapping() could > > probably run on a bogus pointer. Even without torn pointers, I think > > there could be a UAF if the page's mapping is destroyed while we're > > going through dump_page(), since the page might not be locked. And in > > dump_mapping(), the strncpy_from_kernel_nofault() also doesn't guard > > against concurrent renaming of the dentry, which I think again would > > probably result in UAF. > > Since we are holding a reference on the page at the time of > dump_page(), the identity of the page should not really change, but > dentry can be renamed. Can you point me to where a refcounted reference to the page comes from when page_detective_metadata() calls dump_page_lvl()? > > So I think dump_page() in its current form is not something we should > > expose to a userspace-reachable API. > > We use dump_page() all over WARN_ONs in MM code where pages might not > be locked, but this is a good point, that while even the existing > usage might be racy, providing a user-reachable API potentially makes > it worse. I will see if I could add some locking before dump_page(), > or make a dump_page variant that does not do dump_mapping(). To be clear, I am not that strongly opposed to racily reading data such that the data may not be internally consistent or such; but this is a case of racy use-after-free reads that might end up dumping entirely unrelated memory contents into dmesg. I think we should properly protect against that in an API that userspace can invoke. Otherwise, if we race, we might end up writing random memory contents into dmesg; and if we are particularly unlucky, those random memory contents could be PII or authentication tokens or such. I'm not entirely sure what the right approach is here; I guess it makes sense that when the kernel internally detects corruption, dump_page doesn't take references on pages it accesses to avoid corrupting things further. If you are looking at a page based on a userspace request, I guess you could access the page with the necessary locking to access its properties under the normal locking rules? (If anyone else has opinions either way on this line I'm trying to draw between kernel-internal debug paths and userspace-triggerable debugging, feel free to share; I hope my mental model makes sense but I could imagine other folks having a different model of this?)