From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCF63C02185 for ; Fri, 17 Jan 2025 20:16:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 011936B0082; Fri, 17 Jan 2025 15:16:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EDCFB6B0083; Fri, 17 Jan 2025 15:16:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D56C36B0085; Fri, 17 Jan 2025 15:16:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B0FDF6B0082 for ; Fri, 17 Jan 2025 15:16:05 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 26CB680D9F for ; Fri, 17 Jan 2025 20:15:36 +0000 (UTC) X-FDA: 83018049072.05.3D63138 Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) by imf01.hostedemail.com (Postfix) with ESMTP id 1953640006 for ; Fri, 17 Jan 2025 20:15:33 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=YswWhV2Z; spf=pass (imf01.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.42 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737144934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hp9zLJvqXDsmT/YVqgMgXNRqVduI1VilfID2VFMHy6U=; b=8YfthO9jYHs7fmkA95fXZb6/R/NeEfUGynbhYtiJgeJ/Ivd7khHiolYAa2nGCnpuN4+w9m ZStWisdDBcA7R8FOtCgE/KOSL8SEHJK+pYRTaiO+b5+Yvd6M29bW8XLPUy6uXoDOUyvENZ joHVQZW3k0KJyTPjNQp4OsJC3pU3UtU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=YswWhV2Z; spf=pass (imf01.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.210.42 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737144934; a=rsa-sha256; cv=none; b=XGfDr8f2YHLMpGFO5sCjqFF8RnqhIQm6EsBX4tTVx0aTuQDSrcquzx4LrXM6eb9sPJEGrR FdPa9MhL72SCn0/QIOg8WCTWOkpRc2tlbu2DlO7mF8+OVOR+gLGbuPyDzgj44U3jjFFh6b 45/JlOMiKmfiKn4XB/stUL/0t9BoQmo= Received: by mail-ot1-f42.google.com with SMTP id 46e09a7af769-72342c2eb5cso62971a34.3 for ; Fri, 17 Jan 2025 12:15:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1737144924; x=1737749724; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hp9zLJvqXDsmT/YVqgMgXNRqVduI1VilfID2VFMHy6U=; b=YswWhV2Zx8ocJhI6zZRBNu6TJwGLcZDI9actax91J2wiZcnBy8+YE2GU9nWAHQ214B 2tPt69Z5Gm7RBtXjTC/wW3R1Vh0kESpd8eNLiFLIhSXCXAicprLRW4YmFqcoH0OVrnEn T5mGByoVsn3u5YbOIR7ynAlwmLr7ARN39CShQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737144924; x=1737749724; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hp9zLJvqXDsmT/YVqgMgXNRqVduI1VilfID2VFMHy6U=; b=JoUy6KKZlqe1h1g1FOd2zs2bn7gd/kVO4htVMcsyesq4Q34EaZKDgD5gSANpB2Iher afgTL/NR4nWu/QsJ7VKQc/HV4E5Fn6eCo3maKYg8cp5i2gwHF/qfR5h5F4lChJJbiGNJ 12ET60rZIddrgSCQ5zGCl5t4MF9iE3yTXlCLYy+0ojzVcL2ZdVXmid9nO8S07XIe84W5 AureAArlWBMaF0+5/iODDfApiSn0o+7y7GLaTAmj8VTQs/g4x41Br6JprOeT5Mj5SXYg Bj/VvonJPf6dS8tHxkp0BaanIVcVYqIJ106H0ZOwmKZdT2h2+QSqlrbrMyKg/aB0V9CD 8RUA== X-Forwarded-Encrypted: i=1; AJvYcCW4YvTBV6O5cR9tAxa/eaUCJQ1HsniDjXLSLN3sMzD8rMIhQYcwB9rn/WNbF4OiQr9tmaXRbb4Syg==@kvack.org X-Gm-Message-State: AOJu0YxoGiV+Vvoiresr9wNq2nqOxDZlMyUgv6dDuzSQl+YC9RUUVygZ QrZB8yjdOTg3dZvJwCvbonRyhHX+KdRb3f5ySsnzF4HeSgGFmuJyEr+Fv9jMv3e7oOl5o2WsyHZ 44fGYrdWwn/9GUE32gyyqo0MnwpcUBJ76UWZ3 X-Gm-Gg: ASbGnctLQ2zj4IcXMgtuTPz5F2WJ9iz0VY8nyG8mdM2zHUVnLdFjrs0TGmXYq/rIA78 8B4xChrIt9cUxaZ6+Ez1/nIJK3BiVeyOJPhiFGlQBVO7wC1bxaYQrsYHCkemNHUqJA8PjroU= X-Google-Smtp-Source: AGHT+IHEiGZ03eQhEBIhf5BLtIbqWWBcvn9z4gLXJ+XfmzEQCNzjg8hVb4Vq5Q2pOuijlkAJjAklJOQqoxYk9yK9yHo= X-Received: by 2002:a05:6830:61cc:b0:71e:1849:5ed9 with SMTP id 46e09a7af769-7249da5eb30mr838648a34.2.1737144924050; Fri, 17 Jan 2025 12:15:24 -0800 (PST) MIME-Version: 1.0 References: <20241125202021.3684919-1-jeffxu@google.com> <20241125202021.3684919-2-jeffxu@google.com> <202412171248.409B10D@keescook> <202501061647.6C8F34CB1A@keescook> <5cf1601b-70c3-45bb-81ef-416d89c415c2@lucifer.local> <7071878c-7857-4acd-ac27-f049cbc84de2@lucifer.local> <2e5de601da34342d8eb0d8319dcf81ff213c7ef0.camel@sipsolutions.net> In-Reply-To: From: Jeff Xu Date: Fri, 17 Jan 2025 12:15:12 -0800 X-Gm-Features: AbW1kvbfX_Ez_y9xI_Jf2U6tPOlD28z0La4l01YzXYWsN_RQRxp1A3CG0IIWSyM Message-ID: Subject: Re: [PATCH v4 1/1] exec: seal system mappings To: enh Cc: Pedro Falcato , Benjamin Berg , Lorenzo Stoakes , Kees Cook , akpm@linux-foundation.org, jannh@google.com, torvalds@linux-foundation.org, adhemerval.zanella@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, jorgelo@chromium.org, sroettger@google.com, ojeda@kernel.org, adobriyan@gmail.com, anna-maria@linutronix.de, mark.rutland@arm.com, linus.walleij@linaro.org, Jason@zx2c4.com, deller@gmx.de, rdunlap@infradead.org, davem@davemloft.net, hch@lst.de, peterx@redhat.com, hca@linux.ibm.com, f.fainelli@gmail.com, gerg@kernel.org, dave.hansen@linux.intel.com, mingo@kernel.org, ardb@kernel.org, Liam.Howlett@oracle.com, mhocko@suse.com, 42.hyeyoo@gmail.com, peterz@infradead.org, ardb@google.com, rientjes@google.com, groeck@chromium.org, mpe@ellerman.id.au, Vlastimil Babka , Andrei Vagin , Dmitry Safonov <0x7f454c46@gmail.com>, Mike Rapoport , Alexander Mikhalitsyn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1953640006 X-Stat-Signature: qsdu55f46c3cuhts1qquwe9qg75a9ggp X-Rspam-User: X-HE-Tag: 1737144933-685865 X-HE-Meta: U2FsdGVkX1/DZnXWUpLgZv/hHhocsMUQmyBsnl6CZCycPkMhIuMUyeMJCmCx2F8V2zuG9Y65ZQo6Lekriu21E0I0lvlP1xz6GsKWwwrivEGg+d5C23Xg0KFqO6YcjQX9cy6xuFsTfMUKcXAf1M9vH7KLWHQ6OVIrB9E4NJBtuBOFMwhanXGurTNgnSEGymVbpPYsAKHw/mguo3IGiSv5p2pPr9WDn9Zv8BdDg7PYneiDhTuObqBWsurGyHg2+nWQmz3MgFLUR52rhC/dXDBvbeAx/NCYCa8G7SasLmpRfpc8kOEzYJ4JQtkE0dIXX8jYFNkDgpN06fSIDXUn/HPBXhZiWE48OlD3YJ22F+BINpmc7A+FJSHZeTyLmMQ6KrTs7+eOxAe4xV9xSiw415GOHsVxHR/oJgs9K237OYS5pG7AzAG0kIMw5fgJX7QKC/IjPMax0TQ7yIEnr3F/8NywzT/P+YcQSIjEI8caSaVZGFHsO/Rf5lIvZOMxOSpxlkrvGxe4MV8Eb/Zz/7fuDa9dGIT4frZaQhSqX2lTED5UqXNb7W1EkvY817MfVk1eD2s2PZa6xWgEum5Q4b0Y3O21xYR+fofs1q2ebnfaCyNQ+69awVMBuyb2ow/n5Sb/Bdz9CLV2j5jVzPLWT5lDZZYMaHYOw1eVfZcxlb35aPhV5tfa9NGeFoh0CmpWk/AZ5MzS+Mg8t/7I6AFT7uvPi41SvbCKH/1TPbUv1zVtbYEIjbnRIrn8OS3JbO7Nn3z98glO6YMzr8glM9ywA4lpf82z3PdnK6/72k2bEKkAwgyvD19fM37e1FssEeKEtkrNHiXZsWRxt1QkwwrR5tXOiljEzrG2iva/FgOQv9sF7oplETb6AO3ypr2+yTqxs0I+AdPVTXINzAVAz/VteInJFyodIHk3IJ+Vvef26ablPW2Abbez+f9XLBlFLQPJ7TeH7cYZHuryTNnebtpyq14X0O1 T1akjA8b rtff3lH91zJ9KeXMX+CFOiSU5q9opCNUqEajStnuC+KoeGSa5LXza7vUbBAeGxhBzkq+lTfr/43cyJ7SC2QmBeef7/gHW9jfzOefpSKU1VM/Dsln7uAi0F/fDEBVgAjxFP9FV/vta4VMEdpNImxpif4XPgNdMep0hqcKMwo750L7fQG2I3pmVg9GTgx7yGHLowaUmtcnSBcFD3NjSIN9XjBgDNFzpHiBFDKSSInTopaE2CiI+P4nFtIf+60RD4McOsGLdZUexOYRIAbe4bq3XGEeBI4D17hka0/hVdwOzUOfrgnw9q8wtD37rizeA1z6Pg22C7OCZPvib4Vvi3losGhqwuZ9lKD6Kzro/6BbRnX5fw52wbC23TJ5gTxvOMxXycpzepcqjBTQD75QyyjljvrI/3tjc4HDKZ1/1HWOtr2+Dl4g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 17, 2025 at 11:35=E2=80=AFAM enh wrote: > > On Fri, Jan 17, 2025 at 1:20=E2=80=AFPM Jeff Xu wro= te: > > > > On Thu, Jan 16, 2025 at 9:18=E2=80=AFAM Pedro Falcato wrote: > > > > > > On Thu, Jan 16, 2025 at 5:02=E2=80=AFPM Benjamin Berg wrote: > > > > > > > > Hi Lorenzo, > > > > > > > > On Thu, 2025-01-16 at 15:48 +0000, Lorenzo Stoakes wrote: > > > > > On Wed, Jan 15, 2025 at 12:20:59PM -0800, Jeff Xu wrote: > > > > > > On Wed, Jan 15, 2025 at 11:46=E2=80=AFAM Lorenzo Stoakes > > > > > > wrote: > > > > > > > > > > [SNIP] > > > > > > > > > > > > > I've made it abundantly clear that this (NACKed) series canno= t allow the > > > > > > > kernel to be in a broken state even if a user sets flags to d= o so. > > > > > > > > > > > > > > This is because users might lack context to make this decisio= n and > > > > > > > incorrectly do so, and now we ship a known-broken kernel. > > > > > > > > > > > > > > You are now suggesting disabling the !CRIU requirement. Which= violates my > > > > > > > _requirements_ (not optional features). > > > > > > > > > > > > > Sure, I can add CRIU back. > > > > > > > > > > > > Are you fine with UML and gViso not working under this CONFIG ? > > > > > > UML/gViso doesn't use any KCONFIG like CRIU does. > > > > > > > > > > Yeah this is a concern, wouldn't we be able to catch UML with a f= lag? > > > > > > > > > > Apologies my fault for maybe not being totally up to date with th= is, but what > > > > > exactly was the gViso (is it gVisor actually?) > > > > > > > > UML is a separate architecture. It is a Linux kernel running as a > > > > userspace application on top of an unmodified host kernel. > > > > > > > > So really, UML is a mostly weird userspace program for the purpose = of > > > > this discussion. And a pretty buggy one too--it got broken by rseq > > > > already. > > > > > > > > What UML now does is: > > > > * Execute a tiny static binary > > > > * map special "stub" code/data pages at the topmost userspace addr= ess > > > > (replacing its stack) > > > > * continue execution inside the "stub" pages > > > > * unmap everything below the "stub" pages > > > > * use the unmap'ed area for userspace application mappings > > > > > > > > I believe that the "unmap everything" step will fail with this feat= ure. > > > > > > > > > > > > Now, I am sure one can come up with solutions, e.g.: > > > > 1. Simply print an explanation if the unmap() fails > > > > 2. Find an address that is guaranteed to be below the VDSO and u= se a > > > > smaller address space for the UML userspace. > > > > 3. Somehow tell the host kernel to not install the VDSO mappings > > > > 4. Add the host VDSO pages as a sealed VMA within UML to guard t= hem > > > > > > > > UML is a bit of a niche and I am not sure it is worth worrying abou= t it > > > > too much. > > > > > > I've been absent from this patch series in general, but this gave me > > > an idea: what if we let userspace seal these mappings itself? Since > > > glibc is already sealing things, it might as well seal these? > > > And then systems that _do_ care about this would set the glibc tunabl= e > > > and deal with the breakage. > > > > > > Is there something seriously wrong with this approach? Besides maybe > > > not having a super easy way to discover these mappings atm, I feel > > > like it would solve all of the policy issues people have been talking > > > about in these threads. > > > > > There are technical difficulties to seal vdso/vvar from the glibc > > side. The dynamic linker lacks vdso/vvar mapping size information, and > > architectural variations for vdso/vvar also means sealing from the > > kernel side is a simpler solution. Adhemerval has more details in case > > clarification is needed from the glibc side. > > as a maintainer of a different linux libc, i've long wanted a "tell me > everything there is to know about this vma" syscall rather than having > to parse /proc/maps... > That will be an interesting mm feature, i.e. query the vma information given an address. ASLR might be a thing to consider, there are sandbox solutions to block the read on /proc/pid/maps, such as landlock. The glibc's dynamic linker gets the mapping size info from the elf header of the .so, during execve() call. In a previous attempt of glibc sealing the vdso, the size of vdso.so (in PT_LOAD) was found to be inaccurate. To make the thing more difficult, the vvar size might not be present, iiuc. > ...but in this special case, is the vdso/vvar size ever anything other > than "one page" in practice? > yes. on x86, the vdso size can be two pages long. > > Additionally, uprobe mapping can't be sealed by the dynamic linker, > > dynamic linker can only apply sealing during execve() and dlopen(), > > uprobe mapping isn't created during those two calls. > > > > -Jeff > > > > > > > -- > > > Pedro