From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 489D1C021AA for ; Wed, 19 Feb 2025 20:56:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2654440182; Wed, 19 Feb 2025 15:56:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AB01E440179; Wed, 19 Feb 2025 15:56:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90209440182; Wed, 19 Feb 2025 15:56:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6E869440179 for ; Wed, 19 Feb 2025 15:56:46 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1A37C1604D3 for ; Wed, 19 Feb 2025 20:56:46 +0000 (UTC) X-FDA: 83137903212.30.5AC0D5B Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf29.hostedemail.com (Postfix) with ESMTP id 1E57F120007 for ; Wed, 19 Feb 2025 20:56:43 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fKGnk4pr; spf=pass (imf29.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739998604; a=rsa-sha256; cv=none; b=xeS+ttj5/7UMqavHjaHRBhgHlnODlP5ARNYismaFbGw14316zk96YwHO9iE6ciYbzZPQ3a znZnEojTO7DnJSBle1UWx96KvpXTUh1NC0mpwXAgVwdnhEnc55aT/8LnMWRiFKvcbT46z8 8lCAowtIeOW7E5g+K+b2Wd3oJV0GncM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fKGnk4pr; spf=pass (imf29.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739998604; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QnG8SEamObBuiIMCf7KmVArDMeAn6Gi4mO2vSDLCQt0=; b=zj9RcXTPdb1NtbQuYcIYjCwY/PT7QFVVLwEDPZKmTdhSNAbpPZuUiuPN4gHglue8fMPzb2 v+RHYSzgb9w+FSnR/a+Vvu/s1KPXa/TkdorhpRC9IO6feKqpjTALRPvWhNRZKEo2W8RASN 26oki/a3uwW4aiLSTLh+4ZFk8BpFPEM= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-22117c396baso8585ad.1 for ; Wed, 19 Feb 2025 12:56:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739998603; x=1740603403; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QnG8SEamObBuiIMCf7KmVArDMeAn6Gi4mO2vSDLCQt0=; b=fKGnk4pr+OCdppvo1kvLTeh0oczQHwqlLgiqDDeT7FDpuQJk6veQ2+hsmAsOlNYNDs 91pG9oX/YD6Qh+lSB7icXfbDCmZvP7zuwcAcrdeFUpP/Jw4+yqM873g2/ES5e4tjacIN 7UzvmTP9lJphodQs25rq8Hi3SwA9qfjowu31AzhTuMuOvX+IF5lS+8c9OYaCXHjKlyNr 0H4fr9GlaJQuJhwcCmRu+ZuaXZi1afdaZHv0jlqjX/6sKIzotCvjsS5RvOjaWrkL8LJA 8iECZEE99pP0h1d1kawLxJ13fON4VYjVj1D39j4KDvbrnxV5ID9NlhZbaiGi+WZg/5qS trXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739998603; x=1740603403; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QnG8SEamObBuiIMCf7KmVArDMeAn6Gi4mO2vSDLCQt0=; b=qsN5dIBfa3OCqKhT1RGKMMc7h/23+VKXIzSGiEyQpOk933Hy/AST5jXfO8lrEBR6wa hTWw6oHfFlglbwL+zQB4VaxdY/c/ZBjLyV1f22wVDE6kn3Hc/UOVzmOvVC7M06bLVo9U Yr7iWJ75dT0/CtI6GA6rQzwCvQCidITEmyb91PhFXBxf3k5YNYuW2j5/WDeMfZRiYowz roUuaWMh1du/YQRhpe+tgIVVOxkVPjQOPvUAuPJpNTYRGKk5hQ/kNsOqNzBRIVk+zCUN VhLMg05YZsw6EWCeJ5YX1Gfyyw/J1lYASS36Ouog5c51dPKhvocUZLNKNWTSiuBzJqiE EaBQ== X-Forwarded-Encrypted: i=1; AJvYcCXgKvKuelhD4YdZr/7zsmjUdp4FuDWZkxrWdq22tc/+2pl8ObtssFhaRfN9gqUyFHHZrN8hQZTobg==@kvack.org X-Gm-Message-State: AOJu0YxD6N00wBR8oHXiNCATOf4U6PbxzvW0n4cVx+wDkqgAGZX3O+fK RoTsGl0+Y2Hb96ramdsMO8+wWbbn8OcNu93Dd4ANYJKF67Ohv2irECV747CMhEkmq+VlEbf+y4p /I7iYkRJxyPE9D5i/FhiRN65YJQem/PxoW3+p X-Gm-Gg: ASbGncuK2xep+tGnugQWu0T5NPbn1qeVGA6/JiUeanzOrX+h8bypUmBqj6vWfpkABuy I70AH2e49uGK0vLImnDx6sKwFS98SDfLNalflNlg8p7oWIq6U5BiMKYq+V9tN/wfDpdIbB9QHNP 955oRBTFfm3CL4t+7lwdCG5xi1EYU= X-Google-Smtp-Source: AGHT+IFe/qAXXHMG11QgG9tLNjslwzcmEjlHXjncZ98KGM4VEwqJXA9x482HDiIZJNWdb53GcTKtRVIS9ahIvNUCJ8c= X-Received: by 2002:a17:902:f68c:b0:216:7aaa:4c5f with SMTP id d9443c01a7336-2218e0f2c10mr437695ad.3.1739998602713; Wed, 19 Feb 2025 12:56:42 -0800 (PST) MIME-Version: 1.0 References: <45297010-a0a4-4a42-84e8-6f4764eab3b3@lucifer.local> <41af4ffb-0383-4d00-9639-0bf16e1f5f37@redhat.com> In-Reply-To: From: Kalesh Singh Date: Wed, 19 Feb 2025 12:56:31 -0800 X-Gm-Features: AWEUYZkrEux9sPcJzlJVUSGba3dMcQMqfgKzAP1pAI9URr8_0NHa8B2vfhArMAE Message-ID: Subject: Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings To: Lorenzo Stoakes Cc: David Hildenbrand , Andrew Morton , Suren Baghdasaryan , "Liam R . Howlett" , Matthew Wilcox , Vlastimil Babka , "Paul E . McKenney" , Jann Horn , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shuah Khan , linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org, John Hubbard , Juan Yescas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: pybj9btqhtobrripdk5frw31xso4u49m X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1E57F120007 X-Rspam-User: X-HE-Tag: 1739998603-63132 X-HE-Meta: U2FsdGVkX1/9ppVwqCJHdvXZ7bYEFtItbMs6IcSGBCrJdlAX7VLGb8tZumfD5coPX8zv3lzNhDlA7D/EwU//7v7gGEIsJW624l8fTdWHYi0PfCy/Ed6QHZ4vz0B4qr+RRSxn366a7vFkoW+bpJsUWCEbeCYQVvDs5tpNZyrZ2WZ25od/lqNRTLDbt+4gWuuqq+UfjUH1/Q1gfpV9ttpPry2KOK5K6LuHUS7RtVPrS0S5Qej7YO+2NTMqtoK4USna5wlTpyg8wJ0W1pZ1l0pjldLOccC6txaEMR9xvD9W/9HN7h9Tvn5yLPRzJvbsTCYGGTuDopyWe9zz4mj236x5s6+lPke9hJSocxKxPyy3LDXC5oRqxCLFoL8VKhCF4bRL6aBVYGG+Njwss2uddjVkKo9QzURqiCNCnSUqw3mXlskajqKdmu0SykX0fsvauDTQx/xZiMrWVkndgjUovqFRCcmYgCWj4vKniZE+Y/m9YGC/16ILLWZluR8XURN7guZcs7vIskAXYCfHo8rO0E+KpMUnwBM0YWSkKzh7j0Szh1kl+iqn/f0WXPMTd3Kp8QdbZjWT/IQVoafap71QUVKy+YyFVGDDOcbfRBH6ilx1F8p/0gHdqLaNrQnbDt284S9MhUwU6FhP3pa+obva3XolLZWm/JheCygMr8+FB9KyfFDnGJXb64bjQ1vOkcThoBc1kktm+vzQdWNBuMNt/AkHkUE8jL4nxc6Q4clzPSzzdHrqJYNzfoc7GOmoAPmappG7KJktqRdTKd9PoFFWQr3pF2CsqHhKn4+q/JhSKxKblvub43VZafoddNKCf3ICqlNYdVp48i14GcC3M3ir8//w20GtK1o581eieHrEtDhhbGmFkeMRtK3hE7Q+LcAn2U7qTKIJMSox4YzfqkrRKuI9KBUziAZHdYVPUY8PCl4j2P5nqNSYmqIACq/b+alRlF+UPmPvIE4y4rFoDu7Fslb jl8hDWkq 2za8muEncoyK28DaXp1PK9IAD99LzoJWP1IOQZEjSKrBKVRe2cUvToZXi8gCxfL5SsnZsfwSEttIphKK8M9CJLwYXmM3fg5KMIDOADmCZoRIUK7yb4nkZzQoqMjQPiDbPd54LANY8MRTpO598U62jiFC+5lGQGavRaWwpunZnUuQICtF0eoDIf8ymj0r31WgYakRCm05fi6l1eIYg0nQFbwS/gC9/UHauFHKM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000052, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 19, 2025 at 11:20=E2=80=AFAM Lorenzo Stoakes wrote: > > On Wed, Feb 19, 2025 at 10:52:04AM -0800, Kalesh Singh wrote: > > On Wed, Feb 19, 2025 at 1:17=E2=80=AFAM Lorenzo Stoakes > > wrote: > > > > > > On Wed, Feb 19, 2025 at 10:15:47AM +0100, David Hildenbrand wrote: > > > > On 19.02.25 10:03, Lorenzo Stoakes wrote: > > > > > On Wed, Feb 19, 2025 at 12:25:51AM -0800, Kalesh Singh wrote: > > > > > > On Thu, Feb 13, 2025 at 10:18=E2=80=AFAM Lorenzo Stoakes > > > > > > wrote: > > > > > > > > > > > > > > The guard regions feature was initially implemented to suppor= t anonymous > > > > > > > mappings only, excluding shmem. > > > > > > > > > > > > > > This was done such as to introduce the feature carefully and = incrementally > > > > > > > and to be conservative when considering the various caveats a= nd corner > > > > > > > cases that are applicable to file-backed mappings but not to = anonymous > > > > > > > ones. > > > > > > > > > > > > > > Now this feature has landed in 6.13, it is time to revisit th= is and to > > > > > > > extend this functionality to file-backed and shmem mappings. > > > > > > > > > > > > > > In order to make this maximally useful, and since one may map= file-backed > > > > > > > mappings read-only (for instance ELF images), we also remove = the > > > > > > > restriction on read-only mappings and permit the establishmen= t of guard > > > > > > > regions in any non-hugetlb, non-mlock()'d mapping. > > > > > > > > > > > > Hi Lorenzo, > > > > > > > > > > > > Thank you for your work on this. > > > > > > > > > > You're welcome. > > > > > > > > > > > > > > > > > Have we thought about how guard regions are represented in /pro= c/*/[s]maps? > > > > > > > > > > This is off-topic here but... Yes, extensively. No they do not ap= pear > > > > > there. > > > > > > > > > > I thought you had attended LPC and my talk where I mentioned this > > > > > purposefully as a drawback? > > > > > > > > > > I went out of my way to advertise this limitation at the LPC talk= , in the > > > > > original series, etc. so it's a little disappointing that this is= being > > > > > brought up so late, but nobody else has raised objections to this= issue so > > > > > I think in general it's not a limitation that matters in practice= . > > > > > > > > > Sorry for raising this now, yes at the time I believe we discussed > > reducing the vma slab memory usage for the PROT_NONE mappings. I > > didn't imagine that apps could have dependencies on the mapped ELF > > ranges in /proc/self/[s]maps until recent breakages from a similar > > feature. Android itself doesn't depend on this but what I've seen is > > banking apps and apps that have obfuscation to prevent reverse > > engineering (the particulars of such obfuscation are a black box). > > Ack ok fair enough, sorry, but obviously you can understand it's > frustrating when I went to great lengths to advertise this not only at th= e > talk but in the original series. > > Really important to have these discussions early. Not that really we can = do > much about this, as inherently this feature cannot give you what you need= . > > Is it _only_ banking apps that do this? And do they exclusively read > /proc/$pid/maps? I mean there's nothing we can do about that, sorry. Not only banking apps but that's a common category. > If that's immutable, then unless you do your own very, very, very slow cu= stom > android maps implementation (that will absolutely break the /proc/$pid/ma= ps > scalability efforts atm) this is just a no-go. > Yeah unfortunately that's immutable as app versions are mostly independent from the OS version. We do have something that handles this by encoding the guard regions in the vm_flags, but as you can imagine it's not generic enough for upstream. > > > > > > > > > > > > > > In the field, I've found that many applications read the ranges= from > > > > > > /proc/self/[s]maps to determine what they can access (usually r= elated > > > > > > to obfuscation techniques). If they don't know of the guard reg= ions it > > > > > > would cause them to crash; I think that we'll need similar entr= ies to > > > > > > PROT_NONE (---p) for these, and generally to maintain consisten= cy > > > > > > between the behavior and what is being said from /proc/*/[s]map= s. > > > > > > > > > > No, we cannot have these, sorry. > > > > > > > > > > Firstly /proc/$pid/[s]maps describes VMAs. The entire purpose of = this > > > > > feature is to avoid having to accumulate VMAs for regions which a= re not > > > > > intended to be accessible. > > > > > > > > > > Secondly, there is no practical means for this to be accomplished= in > > > > > /proc/$pid/maps in _any_ way - as no metadata relating to a VMA i= ndicates > > > > > they have guard regions. > > > > > > > > > > This is intentional, because setting such metadata is simply not = practical > > > > > - why? Because when you try to split the VMA, how do you know whi= ch bit > > > > > gets the metadata and which doesn't? You can't without _reading p= age > > > > > tables_. > > > > Yeah the splitting becomes complicated with any vm flags for this... > > meaning any attempt to expose this in /proc/*/maps have to > > unconditionally walk the page tables :( > > It's not really complicated, it's _impossible_ unless you made literally > all VMA code walk page tables for every single operation. Which we are > emphatically not going to do :) > > And no, /proc/$pid/maps is _never_ going to walk page tables. For obvious > performance reasons. > > > > > > > > > > > > > /proc/$pid/smaps _does_ read page tables, but we can't start pret= ending > > > > > VMAs exist when they don't, this would be completely inaccurate, = would > > > > > break assumptions for things like mremap (which require a single = VMA) and > > > > > would be unworkable. > > > > > > > > > > The best that _could_ be achieved is to have a marker in /proc/$p= id/smaps > > > > > saying 'hey this region has guard regions somewhere'. > > > > > > > > And then simply expose it in /proc/$pid/pagemap, which is a better = interface > > > > for this pte-level information inside of VMAs. We should still have= a spare > > > > bit for that purpose in the pagemap entries. > > > > > > Ah yeah thanks David forgot about that! > > > > > > This is also a possibility if that'd solve your problems Kalesh? > > > > I'm not sure what is the correct interface to advertise these. Maybe > > smaps as you suggested since we already walk the page tables there? > > and pagemap bit for the exact pages as well? It won't solve this > > particular issue, as 1000s of in field apps do look at this through > > /proc/*/maps. But maybe we have to live with that... > > I mean why are we even considering this if you can't change this anywhere= ? > Confused by that. > > I'm afraid upstream can't radically change interfaces to suit this > scenario. > > We also can't change smaps in the way you want, it _has_ to still give > output per VMA information. Sorry I wasn't suggesting to change the entries in smaps, rather agreeing to your marker suggestion. Maybe a set of ranges for each smaps entry that has guards? It doesn't solve the use case, but does make these regions visible to userspace. > > The proposed change that would be there would be a flag or something > indicating that the VMA has guard regions _SOMEWHERE_ in it. > > Since this doesn't solve your problem, adds complexity, and nobody else > seems to need it, I would suggest this is not worthwhile and I'd rather n= ot > do this. > > Therefore for your needs there are literally only two choices here: > > 1. Add a bit to /proc/$pid/pagemap OR > 2. a new interface. > > I am not in favour of a new interface here, if we can just extend pagemap= . > > What you'd have to do is: > > 1. Find virtual ranges via /proc/$pid/maps > 2. iterate through /proc/$pid/pagemaps to retrieve state for all ranges. > Could we also consider an smaps field like: VmGuards: [AAA, BBB), [CCC, DDD), ... or something of that sort? > Since anything that would retrieve guard region state would need to walk > page tables, any approach would be slow and I don't think this would be a= ny > less slow than any other interface. > > This way you'd be able to find all guard regions all the time. > > This is just the trade-off for this feature unfortunately - its whole > design ethos is to allow modification of -faulting- behaviour without > having to modify -VMA- behaviour. > > But if it's banking apps whose code you can't control (surprised you don'= t > lock down these interfaces), I mean is this even useful to you? > > If your requirement is 'you have to change /proc/$pid/maps to show guard > regions' I mean the answer is that we can't. > > > > > We can argue that such apps are broken since they may trip on the > > SIGBUS off the end of the file -- usually this isn't the case for the > > ELF segment mappings. > > Or tearing of the maps interface, or things getting unmapped or or > or... It's really not a sane thing to do. > > > > > This is still useful for other cases, I just wanted to get some ideas > > if this can be extended to further use cases. > > Well I'm glad that you guys find it useful for _something_ ;) > > Again this wasn't written only for you (it is broadly a good feature for > upstream), but I did have your use case in mind, so I'm a little > disappointed that it doesn't help, as I like to solve problems. > > But I'm glad it solves at least some for you... I recall Liam had a proposal to store the guard ranges in the maple tree? I wonder if that can be used in combination with this approach to have a better representation of this? > > > > > Thanks, > > Kalesh > > > > > > > > > > This bit will be fought over haha > > > > > > > > > > > -- > > > > Cheers, > > > > > > > > David / dhildenb > > > >