From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0C67C02194 for ; Thu, 6 Feb 2025 15:52:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 451956B0089; Thu, 6 Feb 2025 10:52:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 401776B008A; Thu, 6 Feb 2025 10:52:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A2D16B008C; Thu, 6 Feb 2025 10:52:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0BC0A6B0089 for ; Thu, 6 Feb 2025 10:52:10 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9E1C14B99A for ; Thu, 6 Feb 2025 15:52:09 +0000 (UTC) X-FDA: 83089961178.27.9360002 Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) by imf16.hostedemail.com (Postfix) with ESMTP id 99A4818001A for ; Thu, 6 Feb 2025 15:52:07 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FBd4kOzv; spf=pass (imf16.hostedemail.com: domain of enh@google.com designates 209.85.160.41 as permitted sender) smtp.mailfrom=enh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738857127; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ncyCUMlwr1q4MwNPZcdzv/OPKNB5ERDfxDAcdsQ59wc=; b=WrBn4C3cpGWcm5Pu/m45wSs2cpNqvhk7byGEadzB+r4Fw31UwxYeMpPYBxoQiDAVhdqxhx HSIhWLOs4qwBRNlXXkWI46FSnQucYGwJQeuvFdys9+oNzKgDzWnarNw9o0PUhAIxf19sg6 8OnuJ5TiY22rjgQ4eBqce7w7dwCd0EA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=FBd4kOzv; spf=pass (imf16.hostedemail.com: domain of enh@google.com designates 209.85.160.41 as permitted sender) smtp.mailfrom=enh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738857127; a=rsa-sha256; cv=none; b=4IjSmL94IrpZzlcx9SvHK/Qj+h4fZRnbBbkWeSYu7KZs9us85KsYAz7IihRj2chvIIH6I9 ji3JPF6VFkjlU9qbV31UpnqlqCUrHOG/YeMh29hRj2quK/jrj8HopN2TuWmKTbcS3YA4Gl FA5xMZ9QO2vt8NmtCaPyPGw3n7qQKYw= Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-2b83078ed33so224569fac.2 for ; Thu, 06 Feb 2025 07:52:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738857126; x=1739461926; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ncyCUMlwr1q4MwNPZcdzv/OPKNB5ERDfxDAcdsQ59wc=; b=FBd4kOzvd349ao6fjxFCnF6ZDyrLp2CQFYIeM91D0T/afm+M9FNgdB/nGWGRy/Dnib Cldx1ZUWO1R1WSlbMv0e66E5y3O5dNwnQLY8d9RPMsKGbZ77tuF40g9u5sO6unuX7QOn 5M19hVYvz0L4qizkt9bC4BlTZPca+lfA/20eU6LJs3FjQLYK6BbnhL3/InRuZ6MEn4x+ ExiNSEc8bzjW396i8DTHQtLsKysdFwYceY10aQ8+cnFtqew4CbsQER/C27fWi7tn9dPU ottRXCIyyUEfKOR2IqFsCocj8hJllIPAhHviDbnp+7bLhE9tPG/niAQpPu4jC6qlQyLM 3sBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738857126; x=1739461926; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ncyCUMlwr1q4MwNPZcdzv/OPKNB5ERDfxDAcdsQ59wc=; b=vpdN+M9if58rDltanFgGHtN6/SRLjK+uGzDmUgAU5HXZoOdmGfTMZCAEVWpGzn0MQZ DkRCySEtIOjmL0IpMhGDWE5NgQprQBjBpz8jLp53W69ZLD9gj4UgPFyIbSbEkD7p9eNW D2oNj6xmO1rPTjtVvuAacqtlWq9jFjEPRQiNSTHx3jWwwqVETtkXcNGUyH5I3xUq2/b1 /qonRfk2PiqxMYSb9ZK8kztGdS9qyepzC07TlgNZCG1zj04Jzs5SHCGit7ewN+/60kHy PUswn0unz/vmRZ70GLbi7d/jbtVfi4HWem/cMpw45c6RpOZrNRgPTDMlEi+aq1dRl+AU /5Ng== X-Forwarded-Encrypted: i=1; AJvYcCU2voMr4KrudV4LUd4owDud6gvcuHCUxsDRtd6z4S2P120Af0GVzsVDkEgLmN4HmjvthTKpOR/5xg==@kvack.org X-Gm-Message-State: AOJu0Yx52QXTx1yTmOmREOfEwZEhsXf5dWT9miC5Ef33Kl2I6mAII07C dpg6MS/onWWkZEEH8gnfn5N+CFz7GTyWLaPlx+7xH4YTt1PIMR8LFCflGjzdEbcgLxThI0X9yKR CIvRJWmBIMbOI1ylz98zatrjlNoM3OaJzY3QT X-Gm-Gg: ASbGncuWOhL12ixmAXvk0OP6P6T1c3lkz7ydUolqKrnzFYNOMxj6WE6JN76m/ymbb75 KI8OT8j+L81I7oYKQOb5PzrqQ9KZcAoKR3MHXkwb8lbw8IYU3GvesvVeOw1sR9x4qtSRZZA== X-Google-Smtp-Source: AGHT+IGHI7yAVXLvlNlCs3v0SKEotepWb9UfWcG+OprdiS+5n+vsUc8hOcDdllOoto/wJReH3eO28XCwjOAKCK2X1vY= X-Received: by 2002:a05:6870:80c9:b0:29e:5a89:8ed8 with SMTP id 586e51a60fabf-2b804f728f1mr4716147fac.11.1738857126288; Thu, 06 Feb 2025 07:52:06 -0800 (PST) MIME-Version: 1.0 References: <5cf1601b-70c3-45bb-81ef-416d89c415c2@lucifer.local> <7071878c-7857-4acd-ac27-f049cbc84de2@lucifer.local> <2e5de601da34342d8eb0d8319dcf81ff213c7ef0.camel@sipsolutions.net> <20250206135150-6c770e7d-9af8-4924-b760-82cff5092586@linutronix.de> <20250206154810-8b7bf2b4-435c-4930-a787-f9238cd0045d@linutronix.de> In-Reply-To: <20250206154810-8b7bf2b4-435c-4930-a787-f9238cd0045d@linutronix.de> From: enh Date: Thu, 6 Feb 2025 10:51:54 -0500 X-Gm-Features: AWEUYZn6CPh2B4zyNHxi4kuReb_Dep_OpvVIHc-KSEs4_biKleMDcUfMPCucf6k Message-ID: Subject: Re: [PATCH v4 1/1] exec: seal system mappings To: =?UTF-8?Q?Thomas_Wei=C3=9Fschuh?= Cc: Jeff Xu , Pedro Falcato , Benjamin Berg , Lorenzo Stoakes , Kees Cook , akpm@linux-foundation.org, jannh@google.com, torvalds@linux-foundation.org, adhemerval.zanella@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, jorgelo@chromium.org, sroettger@google.com, ojeda@kernel.org, adobriyan@gmail.com, anna-maria@linutronix.de, mark.rutland@arm.com, linus.walleij@linaro.org, Jason@zx2c4.com, deller@gmx.de, rdunlap@infradead.org, davem@davemloft.net, hch@lst.de, peterx@redhat.com, hca@linux.ibm.com, f.fainelli@gmail.com, gerg@kernel.org, dave.hansen@linux.intel.com, mingo@kernel.org, ardb@kernel.org, Liam.Howlett@oracle.com, mhocko@suse.com, 42.hyeyoo@gmail.com, peterz@infradead.org, ardb@google.com, rientjes@google.com, groeck@chromium.org, mpe@ellerman.id.au, Vlastimil Babka , Andrei Vagin , Dmitry Safonov <0x7f454c46@gmail.com>, Mike Rapoport , Alexander Mikhalitsyn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 99A4818001A X-Stat-Signature: kanoyru58ezhwzrydqhs3bu5dyodsnma X-Rspam-User: X-HE-Tag: 1738857127-87185 X-HE-Meta: U2FsdGVkX18wH6eRjTlHwxEas8NfVZQuUEYQLadFhFAXUioUKYxwQwgtuTubTu4gdmfsSytKQq+pJStr3yOv8IO/yrNrzZqJ6Awyw/EXdQCc00TPmDxALDaeFGeqZnq5XKWhKB4OJs2ebEsqTxvZxYUhBOk0FF/ynvL5mi8TWcEIfGqtIMFAwWgtdfzw7n4kZihRVe/WaCwN5xNka3e/ZN/EozDeDL7Px46V8O7kN//Nw7poZqtrVIXrtzzv4k6Cx6QEtwrZlKCKw8kCsgenJecfo3Sv0zC+Xnaz/Py8hvgawdLhnq1jR5gx9IOx94s5zziyDVbelLbEHrc3oqYmOIFO3kg5CfWrB8Mi/E0qacRMOVx0MlA1WSBJ2ybyrepMGWAd4Oxu97QIxxZxUvDNDKeWKhsw8AbkdPZ1x6g+dziQ7OsyVjciFV+bb9UloB3cMuBrxdkJzGgexwpJe0l87DPgv+WT1VCcJKU/D+t6nR3YV7FK4MXavASfFjb7kiNLL35nIdtnj2VkA1cthVJjDe8vbpOXun9XAUXjv847se/e1zYsvx460VoeKM/Zahm5UdomQvzOBCqSN+nL2hpBco75GGEd0YdHAYLuwuH36Vh00K77OV9TvlI29wmxNguYy+M3oEuBQ7gRbVlSOkid5RmBFoXscHO7VrTP+fKNN4E3zIn7Flpq8fsEYt6GQkUK0uovomfKhSG4hBdDxtuKOQFwZ1mdPWmj23EGIFVdzJ17Qzx2TllZs5dhblBmd2rqP0Bo0kEkDsvdERB2vUTxZYtT9qpHK8L7EU6s3zec8YUC7sPS7t8vQfCOV+RiRObir5St7w60Yl8GqO1LsVBL8se0F/V9wvNSZO9EccXyuYCeGxWA3mP/lS2/YZfbFHyMonq+uEdgsHV370R3t08/khq2nRw6osytW+j18Rk7dJCJMk57DBZs8BiYSarS8O8MkMrgeDXh+oW8N7e5F9z BBK8mXY1 6/3H9ZZEsJHyNCRpAY7cS/OaMmEg95mATmXK9t2psrvUwrd2QBaHeSHsYSX1AF4Zy+XW6z3baY/OKV1gOSQdNXq2SQE2WcJ3NvxD9UmAb40TJfne/kU4I3i82RG/84nqL2PecQwLdDHKAEVqw557iMSR9YsXnJKPxpY6u+CrsLJMhpUVgvZlMjJv/JByfhNk8kDo0IEfT5FahINgqbCtnu74YpTY6/db7PF6VEt0N8WxiDGVJmzoHDyBKJRdcy8fY3f1418eYaefxgVKZX23lS+zw/uoba3p/NubKRIesLCTnC2xOSnepyrjSpw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000386, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 6, 2025 at 10:28=E2=80=AFAM Thomas Wei=C3=9Fschuh wrote: > > On Thu, Feb 06, 2025 at 09:38:59AM -0500, enh wrote: > > On Thu, Feb 6, 2025 at 8:20=E2=80=AFAM Thomas Wei=C3=9Fschuh > > wrote: > > > > > > On Fri, Jan 17, 2025 at 02:35:18PM -0500, enh wrote: > > > > On Fri, Jan 17, 2025 at 1:20=E2=80=AFPM Jeff Xu wrote: > > > > > > > > > > > > > > There are technical difficulties to seal vdso/vvar from the glibc > > > > > side. The dynamic linker lacks vdso/vvar mapping size information= , and > > > > > architectural variations for vdso/vvar also means sealing from th= e > > > > > kernel side is a simpler solution. Adhemerval has more details in= case > > > > > clarification is needed from the glibc side. > > > > > > > > as a maintainer of a different linux libc, i've long wanted a "tell= me > > > > everything there is to know about this vma" syscall rather than hav= ing > > > > to parse /proc/maps... > > > > > > > > ...but in this special case, is the vdso/vvar size ever anything ot= her > > > > than "one page" in practice? > > > > > > x86 has two additional vvar pages for virtual clocks. > > > (Since v6.13 even split into their own mapping) > > > Loongarch has per-cpu vvar data which is larger than one page. > > > The vdso mapping is however many pages the code ends up being compile= d as, > > > for example on my current x86_64 distro kernel it's two pages. > > > In the near future, probably v6.14, vvars will be split over multiple > > > pages in general [0]. > > > > /me checks the nearest arm64 phone ... yeah, vdso is still only one > > page there but vvars is already more than one. > > Probably due to CONFIG_TIME_NS, see below. > > > is there a TL;DR (or RTFM link) for why this is so big? a quick look > > at the x86 suggests there should only be 640 bytes of various things > > plus a handful of bytes for the rng, and while arm64 looks very > > different, that looks like it's explicitly asking for a page (with the > > vdso_data_store stuff)? (i've never had any reason to look at vvars > > before, only vdso.) > > I don't think there is any real manual. > > The vvar data is *shared* between the kernel and userspace. > This is done by mapping the *same* physical memory into the kernel > ("vdso_data_store") and (read-only) into all userspace processes. > As PTEs always cover a full page and the kernel can not expose random > other internal kernel data into userspace, the vvars need to be in their > own dedicated page. > (The same is true for the vDSO code, uprobe trampoline, etc... mappings) > > The vDSO functions also need to be aware of time namespaces. This is > implemented by allocating one page per namespace and mapping this > in place of the regular vvar page. But the vDSO still needs to access > the regular vvar page for some information, so both are mapped. ah, i see. yeah, that makes sense. (amusingly, i almost quipped "it's not like there are _that_ many clocks to go in there" in my previous mail, forgetting that there are effectively an unbounded number of clocks thanks to this feature!) > Then on top come the rng state and some architecture-specific data. > These are currently part of the time page. So they also have to dance > around the time namespace mapping shenanigans. In addition they have to > coexist with the actual time data, which is currently done by manually > calculating byte offsets for them in the time page and hardcoding those. > > The linked series cleans this up by moving things into dedicated pages. > To make the code easier to understand and to make it possible to > add new data to the time page without running out of space or > introducing conflicts which need to be detected manually. > While this needs to allocate more pages, these are shared between the > whole system, so effectively it's cheap. It also requires more virtual > memory space in each process, but that shouldn't matter. > > > As for arm64 looking very different from x86: Hopefully not for long :-) (even as someone who doesn't work on the kernel, things like this are always helpful --- just having one thing to understand/your first grep being relevant is much nicer than "oh, wait ... which architecture was that?".) > > > Figuring out the start and size from /proc/maps, or the new > > > PROCMAP_QUERY ioctl, is not trivial, due to architectural variations. > > > > (obviously it's unsatisfying as a general interface, but in practice > > the VMAs i see asked about about directly -- rather than just rounded > > up in a diagnostic dump -- are either stacks ["what are the bounds of > > this stack, and does it have guard pages already?"] or code ["what > > file was the code at this pc mapped in from?"]. so while the vdso > > would come up, we'd never notice if vvars didn't work. if your sp/pc > > point there, we were already just going to bail anyway :-) ) > > Fair enough. > > This information was also a response to Jeff's parent mail, > as it would be relevant when sealing the mappings from ld.so. > > > > > > [0] https://lore.kernel.org/lkml/20250204-vdso-store-rng-v3-0-13a4669= dfc8c@linutronix.de/