From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D90F4C02194 for ; Thu, 6 Feb 2025 15:28:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A368280002; Thu, 6 Feb 2025 10:28:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3540C280001; Thu, 6 Feb 2025 10:28:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21B8A280002; Thu, 6 Feb 2025 10:28:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 038D7280001 for ; Thu, 6 Feb 2025 10:28:46 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 69E84A0F88 for ; Thu, 6 Feb 2025 15:28:46 +0000 (UTC) X-FDA: 83089902252.05.DB71302 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf19.hostedemail.com (Postfix) with ESMTP id 7ED701A0012 for ; Thu, 6 Feb 2025 15:28:44 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="tlh/g8Ot"; dkim=pass header.d=linutronix.de header.s=2020e header.b=cqK6j6ll; spf=pass (imf19.hostedemail.com: domain of t-8ch@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=t-8ch@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738855724; a=rsa-sha256; cv=none; b=yese5mZglRUgIxGijlsKNo1XRMS/Ciy46xJ0vn+mP+Lmoy8Djew0uotfJOoqMcC4tmHrlF Xu0NeJthrh0JpI1VzeoriHMPQtHQdnR4BdX355VAnt8ARI4aM8hS49/8akPKrL9BizMDdC pwat7FRI1+BHHZ8Og3J2v3/gAdTKQGY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="tlh/g8Ot"; dkim=pass header.d=linutronix.de header.s=2020e header.b=cqK6j6ll; spf=pass (imf19.hostedemail.com: domain of t-8ch@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=t-8ch@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738855724; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1pzgi0H/WeUn7swVZfsiMmckSC5AK7mrOGQcHWEcUMU=; b=6xrbEQv/JKPpZt0Awr6HoTY0w98F1s83pu1JQeuAeZVl9oEgAWShEuPur+Ky0ylWREHvSI uBOzEdbxqQr7HilfUwhEMsno+MBuJZePb2NsKeo1PwMdBgSP5DLV7BSPOuL58T/UlkJUO2 GhAhYFZ/mp8gC1iz0P/Ql4612asHIFQ= Date: Thu, 6 Feb 2025 16:28:40 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738855722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1pzgi0H/WeUn7swVZfsiMmckSC5AK7mrOGQcHWEcUMU=; b=tlh/g8OtwsWmbmER9SDkG9c831UOhZXdva7Zhf+YuZq03IZ/EyhwWltxZTNOfKfHDlCwaj kuxfFvnhqW2YsnnifCQ4XT92SjW9KoK2IBqdgvBcy4ShDvFCNZTag8aSD1m49cZZjuj7dL UqiiXOcuyKtyupfL+thxnXvN53+fEYUVleTYDuXLeVMsp0WBvSKissYOjCh0IFUgtoLnC1 TXkhrDiJsCUnBTbXiFSV2sBrbuCTPpW2K2xpJu+B+JBYbRBWnhHovp0A54FMUSBMVB+U9c UXeHaSv4q1yM7P7y9tfY3JQJIIq5SNzUBcpQNlMk2TiD2jIJ/O7in6GblgO+7A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738855722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1pzgi0H/WeUn7swVZfsiMmckSC5AK7mrOGQcHWEcUMU=; b=cqK6j6llxwUEqU96sGGSNcJQ1we0zqT8Zb6lF01qXPYy9zBw39FOUFhrvsqtY/g2xLQy9X CINXiWSkqWLUPoAw== From: Thomas =?utf-8?Q?Wei=C3=9Fschuh?= To: enh Cc: Jeff Xu , Pedro Falcato , Benjamin Berg , Lorenzo Stoakes , Kees Cook , akpm@linux-foundation.org, jannh@google.com, torvalds@linux-foundation.org, adhemerval.zanella@linaro.org, oleg@redhat.com, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, jorgelo@chromium.org, sroettger@google.com, ojeda@kernel.org, adobriyan@gmail.com, anna-maria@linutronix.de, mark.rutland@arm.com, linus.walleij@linaro.org, Jason@zx2c4.com, deller@gmx.de, rdunlap@infradead.org, davem@davemloft.net, hch@lst.de, peterx@redhat.com, hca@linux.ibm.com, f.fainelli@gmail.com, gerg@kernel.org, dave.hansen@linux.intel.com, mingo@kernel.org, ardb@kernel.org, Liam.Howlett@oracle.com, mhocko@suse.com, 42.hyeyoo@gmail.com, peterz@infradead.org, ardb@google.com, rientjes@google.com, groeck@chromium.org, mpe@ellerman.id.au, Vlastimil Babka , Andrei Vagin , Dmitry Safonov <0x7f454c46@gmail.com>, Mike Rapoport , Alexander Mikhalitsyn Subject: Re: [PATCH v4 1/1] exec: seal system mappings Message-ID: <20250206154810-8b7bf2b4-435c-4930-a787-f9238cd0045d@linutronix.de> References: <5cf1601b-70c3-45bb-81ef-416d89c415c2@lucifer.local> <7071878c-7857-4acd-ac27-f049cbc84de2@lucifer.local> <2e5de601da34342d8eb0d8319dcf81ff213c7ef0.camel@sipsolutions.net> <20250206135150-6c770e7d-9af8-4924-b760-82cff5092586@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Stat-Signature: 4ps4kkupaqruadsuzodighzb5hi8np1g X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7ED701A0012 X-Rspam-User: X-HE-Tag: 1738855724-868362 X-HE-Meta: U2FsdGVkX19cuml5rLveDJV+Gs6yIdPFLjAOlgZDI0QQiiJBgYlxfzg6eZ5799FUEirFtzBglYugJ6yniNZXcym5rnlf5C3xp5YTyUbujfcpGb1Khw3/H0oYZgloc4qcmiBqd/Q5xdZkwF+H/0QBugUeSR49aJacbw900BMVW1m8MPUa4FmD+FKdjQOBc7EiSU4WWpLRPwUS5fQ09kzxznjFP0nsJwFwgviL5mNrXJD9jrmtJfc/cFKtshEt/ogWeEmNCk9EPDluWjEDJQsWgUVQY8nTC+TQ4iqaNI36XkIJooSqBNc7EGAJau/RfO3DSLBUL1j82qrA0+jRzwWNgD/si8x/BRCsGDV5rIZT/H3cEHMEVcG2m4v5jYHeGNyY2eUjhpCiPzfBzvRmR+UJHhfw9pWEN/zgKCNK+2ElFny6S+Ajsyr5tLijw68H02v/sPHC5jrakwmCMvd+MNTcrAH6/VChMG++Ld/zpF8S4xZ3rScTxTsJmU3GMW3HMPmU25exsKDxpsvgiG6/h7eNvKY5DqKF/dgAlbUVVfxuc9E5zfwXLI7f+awl3KnQbhMASMv16p/U6oN0oJgEu/wSkC5mqmzgM9NCt9aAIgSdnwwCSSSctE5jcvBkXiVoHAnWok8Li1ZGJqc1P9twUDtRSxw0UtMmoPuetJHKnrDAZQj2j37ESa57Nza2wxcjRfCFUq/rxAXiPkgg11xn0lT0O0PTiezEUv1faLth7E+RYRDE1OGo7Q2iNtw7QyZuIct+0WAndcjgZ1YZLi6BbdeYvwwRMnywy+t7qDp4CxXjcJaajIDkJgNKO8L8zRPL138L9c/3Bvo6cFAo928kXiL4kvLcRqr9E6ANdKtdpqL8U90/6XV8HdV9xgpRIwlm1iD4co18PFf5YUg31PUMz1i/opQkqRE0No17w+vpDtzJborQPE9NMikv3aN6OJgjonKHTQu/tGhmv1lkOvt6qzV nSRYTimP ft4hhCOrgMqxN0+3hJaVvn/1sC0eSB6xmS4Jdzb8w6WE8z5ZDuS3bHcan+8xs1Eyt7sw5j4MZyqCGSmxffufQbwsy40H6zPGqt3SeJouu+yiN2QPHwcdWv1URhTcUq+uMlsyui0TWc6Ah+iZoLRW821eK8ol/TQaE3GHX8xANdJbS9QiLwcKrPfgRjwKz9cGtojEyI3cvursU7xnrK+GnyXzPvTp03PTIhyXdMakIUbHFUE0yYxxwQcbQnURcq4Jgk4r1tBjqseKwFiwfQifcJKgE8Npup78YcWXRWetlU3LVMeiwUIbBeKGWUpT2S3ZXFy/oE+qRyP8oP6ahNkDWFRTkB+pTPE5BpZ22EZB0ZUq07D0ZCZn0tFyM24X6r6S6/HEIIxC0hbDxlbhUUzTwU5mKRG4uq2gfhjzh7uV7gKkoJHWZ2dDxLJH56zAyzog8A3OP4+6Z+veVMU7Kn7kapWqhK6waFNrZ3C3Qjf/m5iDsTloP0nNHqVAsHCnd9GGaPxHl0Db+PpRFLZ6o0KcfkabMC895UjNjU7r1ly3Sk4AT4Z0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 06, 2025 at 09:38:59AM -0500, enh wrote: > On Thu, Feb 6, 2025 at 8:20 AM Thomas Weißschuh > wrote: > > > > On Fri, Jan 17, 2025 at 02:35:18PM -0500, enh wrote: > > > On Fri, Jan 17, 2025 at 1:20 PM Jeff Xu wrote: > > > > > > > > > > There are technical difficulties to seal vdso/vvar from the glibc > > > > side. The dynamic linker lacks vdso/vvar mapping size information, and > > > > architectural variations for vdso/vvar also means sealing from the > > > > kernel side is a simpler solution. Adhemerval has more details in case > > > > clarification is needed from the glibc side. > > > > > > as a maintainer of a different linux libc, i've long wanted a "tell me > > > everything there is to know about this vma" syscall rather than having > > > to parse /proc/maps... > > > > > > ...but in this special case, is the vdso/vvar size ever anything other > > > than "one page" in practice? > > > > x86 has two additional vvar pages for virtual clocks. > > (Since v6.13 even split into their own mapping) > > Loongarch has per-cpu vvar data which is larger than one page. > > The vdso mapping is however many pages the code ends up being compiled as, > > for example on my current x86_64 distro kernel it's two pages. > > In the near future, probably v6.14, vvars will be split over multiple > > pages in general [0]. > > /me checks the nearest arm64 phone ... yeah, vdso is still only one > page there but vvars is already more than one. Probably due to CONFIG_TIME_NS, see below. > is there a TL;DR (or RTFM link) for why this is so big? a quick look > at the x86 suggests there should only be 640 bytes of various things > plus a handful of bytes for the rng, and while arm64 looks very > different, that looks like it's explicitly asking for a page (with the > vdso_data_store stuff)? (i've never had any reason to look at vvars > before, only vdso.) I don't think there is any real manual. The vvar data is *shared* between the kernel and userspace. This is done by mapping the *same* physical memory into the kernel ("vdso_data_store") and (read-only) into all userspace processes. As PTEs always cover a full page and the kernel can not expose random other internal kernel data into userspace, the vvars need to be in their own dedicated page. (The same is true for the vDSO code, uprobe trampoline, etc... mappings) The vDSO functions also need to be aware of time namespaces. This is implemented by allocating one page per namespace and mapping this in place of the regular vvar page. But the vDSO still needs to access the regular vvar page for some information, so both are mapped. Then on top come the rng state and some architecture-specific data. These are currently part of the time page. So they also have to dance around the time namespace mapping shenanigans. In addition they have to coexist with the actual time data, which is currently done by manually calculating byte offsets for them in the time page and hardcoding those. The linked series cleans this up by moving things into dedicated pages. To make the code easier to understand and to make it possible to add new data to the time page without running out of space or introducing conflicts which need to be detected manually. While this needs to allocate more pages, these are shared between the whole system, so effectively it's cheap. It also requires more virtual memory space in each process, but that shouldn't matter. As for arm64 looking very different from x86: Hopefully not for long :-) > > Figuring out the start and size from /proc/maps, or the new > > PROCMAP_QUERY ioctl, is not trivial, due to architectural variations. > > (obviously it's unsatisfying as a general interface, but in practice > the VMAs i see asked about about directly -- rather than just rounded > up in a diagnostic dump -- are either stacks ["what are the bounds of > this stack, and does it have guard pages already?"] or code ["what > file was the code at this pc mapped in from?"]. so while the vdso > would come up, we'd never notice if vvars didn't work. if your sp/pc > point there, we were already just going to bail anyway :-) ) Fair enough. This information was also a response to Jeff's parent mail, as it would be relevant when sealing the mappings from ld.so. > > [0] https://lore.kernel.org/lkml/20250204-vdso-store-rng-v3-0-13a4669dfc8c@linutronix.de/