From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1527CC021AA for ; Wed, 19 Feb 2025 08:26:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44E80280204; Wed, 19 Feb 2025 03:26:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FE48280203; Wed, 19 Feb 2025 03:26:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2ECD4280204; Wed, 19 Feb 2025 03:26:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 11B11280203 for ; Wed, 19 Feb 2025 03:26:07 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B9FE2A1495 for ; Wed, 19 Feb 2025 08:26:06 +0000 (UTC) X-FDA: 83136011532.02.5F152A6 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf02.hostedemail.com (Postfix) with ESMTP id C1FF080013 for ; Wed, 19 Feb 2025 08:26:04 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=s+5gav6b; spf=pass (imf02.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739953564; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+t5Onokz1b/svrph82CzzTpTF1Pr5YEmQG97WWbLUbs=; b=IKWcmDY1FgECBtcgL0Imin0zLPvquOT3x3hO39qx3c55uhr4jfI2pTf0Ys+rPqykqGoWyz JjXDOidPlamgEaW9SNfQCyDw8TV+01JlwsY9bE3wrYQxKpt8+BQmh6fL+qwcT29NZYZUOi y3k+ItiZKP9VmJdq2yALysKx4Ililtg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=s+5gav6b; spf=pass (imf02.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739953564; a=rsa-sha256; cv=none; b=u59MCGW3Gj4fvWK2z7LdAwL6saTuuUj7XNR72wqMKtw8JuxomN4fXolHm9ZjwWcpAjqa6U BaVSxPvh2OPH31eEishb1zrHjdUsp60VdDT/RTEC+6s5ljw3c4cJbWZg5O49eWpfRLzrlt WMJDQc2iPt3NAFso4pzjfOc6TkbtTig= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-220e0575f5bso617225ad.0 for ; Wed, 19 Feb 2025 00:26:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739953563; x=1740558363; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+t5Onokz1b/svrph82CzzTpTF1Pr5YEmQG97WWbLUbs=; b=s+5gav6bNiTcgTr/8w461Rhna4xfqwawmf1iKVgcLwWjUVx+Cs8d3+VTbcZPdC3fvm JcSqh0sh+kTdeOv9lNUNVzxL6PekIhqXmmKEzXN6bYtDC4AyVqZPuAVOBh0cgjGeMaMZ TjAMNoO0na5u7fvoMCivfdYDKUUn96VKC0UXmjaxM6CKPGSc22MIj2IK63W6yE7kqMZW tkiOWyYnSJyYH3lpjPIKuXl6YvzhccMeOailR1LwVA6CiROiE9t9+BQXVp/TsFVbPndn 5C8Wwz3Yz4wjc9dPmu+XBWwpantsDjXwOpc98jlH9SB45lLuBPlbvb2elchDY7DVYGG9 LqZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739953563; x=1740558363; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+t5Onokz1b/svrph82CzzTpTF1Pr5YEmQG97WWbLUbs=; b=Fte8xWPBdZRMZip2dF+QI1FjUH94svNjpKm67GzSmnfaqTuwB+cH6bjQtNI6EjnyZC lFg/zLytOgeaS5+bjtogWuAD0lCGtEH25WCzfIlzVZ7Hsh8FlifqqgMDA7yIx+QsjrjQ 9YkmUz+kwp17n5Q1+m7HcExGdqvQxLaZeAlbRVfFKEBXILEFrmo9WpIBS+ISEcxLIe/m 8Yz+/ohRhsEt1G3di20QRe0WuaHARTgV90Ng7TJd7pMmUhbOsH0a82d/U1l+kG2G6+R5 O3X7xe6A+LQnQiQyuqKVMB5I1LtQpx3drmyT40oXBjf22+TjanS1vIKyyLpZRUv9Z8uC WrMA== X-Forwarded-Encrypted: i=1; AJvYcCWUx2wyHo23Co3WL7SVRvej8rqJg/tOg+oxEgfmPmjGTSZYJhJzyZL4mjGzBoEQUuePYQDx6GFVJw==@kvack.org X-Gm-Message-State: AOJu0YyZQ2d1gITBzcuyAbhO5KivLJUvF/1ANiDWTnbwNFSTanRbXQDX zdM1axfl2hV/5/eyv0jL2XVViFRGV7XJ1TsK/VbRg8ZLKrcJaos0iMWBhaIYsVrgmJXaGdBfBaX xAd2EoGEqcx3PBvaVJQxCDfaqCtpkSuQZFMiI X-Gm-Gg: ASbGncuFuS5IyRqJpGLYAA2N4TIRgprTA+B5eak8mhjcp9vBSdsHO8SzOHLrNi9LuSh NLkmPHcinwG6i0uXGOW2dGUOi4f3bqxJPWjSrCqNKSQNtpVThwvZ0njpQjQalYU5ncergSJB2ck 502mTnImN4VNSKkCE3kwNpTCxqZ3Sm3Q== X-Google-Smtp-Source: AGHT+IEtWm56/q0Ma1nZnd6a0UtnIqMxj4FqJa/1GqPlJ/2nNKHewuObdmh1s2j/K5XMPESqCPuxQ8/36BYaDous39E= X-Received: by 2002:a17:902:e883:b0:220:c905:68a2 with SMTP id d9443c01a7336-22175a85109mr1518545ad.5.1739953563410; Wed, 19 Feb 2025 00:26:03 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Kalesh Singh Date: Wed, 19 Feb 2025 00:25:51 -0800 X-Gm-Features: AWEUYZnXTwP78vbAHDN0pfKRVGtNSGUzzBBn7zOfu-Xg7h4c2hpipi0C25LOOb4 Message-ID: Subject: Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings To: Lorenzo Stoakes Cc: Andrew Morton , Suren Baghdasaryan , "Liam R . Howlett" , Matthew Wilcox , Vlastimil Babka , "Paul E . McKenney" , Jann Horn , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shuah Khan , linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org, John Hubbard , Juan Yescas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C1FF080013 X-Stat-Signature: kbmeit9sr4r17rcrizoaxbgriahhmdu9 X-HE-Tag: 1739953564-607200 X-HE-Meta: U2FsdGVkX192kvVmB5Bt97+wC9Ir5jrRBogH1gorY7Vr77420zLnkiCLftVv5H9BDMbIkj3ulpi03aZkqGiQhnsWBxpRjIwQ1txt+4g2NDasBGaUinf9RJsHNxG4DScOAMJ5dm5AHg3FldzNTCdCjLlgLha79aAtILYiSW/JMhcHP3h7LTflKE8WcLOHDeABOGmWAKLRePqFnPgxE3Wzq3BDgvODMfF37SdFMJ3OT5cycvZhdrgrCiwi0jWmzjxxVgIz+Kt2sXwu+EuvoXxMId5HZolHwvsA+h8n2PIQrJsR5P7OFovM3gYLrk5tN0HemB0iXlFbDXIWUGAheiCSddR62VnHAOtdwe2chs5Xbj852gNOiNvYxx3eTpmMyGIHRTP79BKZ/ah2A7NMg4kH9ZjHho0nZpB2dN0CNyAyyllnO4KmGWuhZ9p/J9D19z8+kCCzLqtE0qIQI5RylfQ4vbhN/IHt2VSSmpw9qJznyeDXTA4kLXJ1Hvj+PsjFFl+wegsHdbX6k6R9BgaCS98OvbIW22fYFmqoIFA/OoSxawYXLFYgjnw8TNfUrAxIVK8VMLZUraTw+3SzMVcirmzg8078c8rUgX2QZof6ZoyBSoerO4mTGSWXJHvFUHe8fikhmy/DUE9tnvd7+GGXZDWifReFcAOSnf3iDtuLfxtn8rqwV6j75tWJWEBW00YrJOpSV7QiFMTvrIEHHWyp/EQ5Hm9ru2IGaVbTzkuDn3eIYTXxRxguW6lq3hNg1N3Ho26GUylYblo/HY6n2AsRp3F2Rwnj+gXG9RMOzlylOyViuo6z+Ks3a6hHfFaxibEwxjpRcGx8RCsV3hC17IfKZl1V7ImyWlI07Asq95qa0x6kLllVEIs+Y5HtANk1PKVQtrnWysc6RX1ILDuyJMTCginSvvNJuBVXeo2b8k/mwUOaxmaf0Qk+fS6MKovZHwp/XoTLwEDHYzwxEanr1EqAYTw lTE7bOPX 84Ylnw9HIJwys5hrdBxhW0prfbrXb9fu4Q47GMiWB7yHUHgGHAOvshZKE/NpW8PW/w7DB86MglQzZVf+w8N60GuQ5xPF2ypQOcVA33Z8e9UHXK0IvXxeRLUbqBY8xl7XOKVa6Sd50VcqGmvQflMV4eYDWATTi/ZgLfy+Q8CeTyX8kw2/Wuwl6K/He1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 13, 2025 at 10:18=E2=80=AFAM Lorenzo Stoakes wrote: > > The guard regions feature was initially implemented to support anonymous > mappings only, excluding shmem. > > This was done such as to introduce the feature carefully and incrementall= y > and to be conservative when considering the various caveats and corner > cases that are applicable to file-backed mappings but not to anonymous > ones. > > Now this feature has landed in 6.13, it is time to revisit this and to > extend this functionality to file-backed and shmem mappings. > > In order to make this maximally useful, and since one may map file-backed > mappings read-only (for instance ELF images), we also remove the > restriction on read-only mappings and permit the establishment of guard > regions in any non-hugetlb, non-mlock()'d mapping. Hi Lorenzo, Thank you for your work on this. Have we thought about how guard regions are represented in /proc/*/[s]maps? In the field, I've found that many applications read the ranges from /proc/self/[s]maps to determine what they can access (usually related to obfuscation techniques). If they don't know of the guard regions it would cause them to crash; I think that we'll need similar entries to PROT_NONE (---p) for these, and generally to maintain consistency between the behavior and what is being said from /proc/*/[s]maps. -- Kalesh > > It is permissible to permit the establishment of guard regions in read-on= ly > mappings because the guard regions only reduce access to the mapping, and > when removed simply reinstate the existing attributes of the underlying > VMA, meaning no access violations can occur. > > While the change in kernel code introduced in this series is small, the > majority of the effort here is spent in extending the testing to assert > that the feature works correctly across numerous file-backed mapping > scenarios. > > Every single guard region self-test performed against anonymous memory > (which is relevant and not anon-only) has now been updated to also be > performed against shmem and a mapping of a file in the working directory. > > This confirms that all cases also function correctly for file-backed guar= d > regions. > > In addition a number of other tests are added for specific file-backed > mapping scenarios. > > There are a number of other concerns that one might have with regard to > guard regions, addressed below: > > Readahead > ~~~~~~~~~ > > Readahead is a process through which the page cache is populated on the > assumption that sequential reads will occur, thus amortising I/O and, > through a clever use of the PG_readahead folio flag establishing during > major fault and checked upon minor fault, provides for asynchronous I/O t= o > occur as dat is processed, reducing I/O stalls as data is faulted in. > > Guard regions do not alter this mechanism which operations at the folio a= nd > fault level, but do of course prevent the faulting of folios that would > otherwise be mapped. > > In the instance of a major fault prior to a guard region, synchronous > readahead will occur including populating folios in the page cache which > the guard regions will, in the case of the mapping in question, prevent > access to. > > In addition, if PG_readahead is placed in a folio that is now inaccessibl= e, > this will prevent asynchronous readahead from occurring as it would > otherwise do. > > However, there are mechanisms for heuristically resetting this within > readahead regardless, which will 'recover' correct readahead behaviour. > > Readahead presumes sequential data access, the presence of a guard region > clearly indicates that, at least in the guard region, no such sequential > access will occur, as it cannot occur there. > > So this should have very little impact on any real workload. The far more > important point is as to whether readahead causes incorrect or > inappropriate mapping of ranges disallowed by the presence of guard > regions - this is not the case, as readahead does not 'pre-fault' memory = in > this fashion. > > At any rate, any mechanism which would attempt to do so would hit the usu= al > page fault paths, which correctly handle PTE markers as with anonymous > mappings. > > Fault-Around > ~~~~~~~~~~~~ > > The fault-around logic, in a similar vein to readahead, attempts to impro= ve > efficiency with regard to file-backed memory mappings, however it differs > in that it does not try to fetch folios into the page cache that are abou= t > to be accessed, but rather pre-maps a range of folios around the faulting > address. > > Guard regions making use of PTE markers makes this relatively trivial, as > this case is already handled - see filemap_map_folio_range() and > filemap_map_order0_folio() - in both instances, the solution is to simply > keep the established page table mappings and let the fault handler take > care of PTE markers, as per the comment: > > /* > * NOTE: If there're PTE markers, we'll leave them to be > * handled in the specific fault path, and it'll prohibit > * the fault-around logic. > */ > > This works, as establishing guard regions results in page table mappings > with PTE markers, and clearing them removes them. > > Truncation > ~~~~~~~~~~ > > File truncation will not eliminate existing guard regions, as the > truncation operation will ultimately zap the range via > unmap_mapping_range(), which specifically excludes PTE markers. > > Zapping > ~~~~~~~ > > Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(), > which specifically deals with guard entries, leaving them intact except i= n > instances such as process teardown or munmap() where they need to be > removed. > > Reclaim > ~~~~~~~ > > When reclaim is performed on file-backed folios, it ultimately invokes > try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte(= ) > will ultimately abort the operation for the guard region mapping. If larg= e, > then check_pte() will determine that this is a non-device private > entry/device-exclusive entry 'swap' PTE and thus abort the operation in > that instance. > > Therefore, no odd things happen in the instance of reclaim being attempte= d > upon a file-backed guard region. > > Hole Punching > ~~~~~~~~~~~~~ > > This updates the page cache and ultimately invokes unmap_mapping_range(), > which explicitly leaves PTE markers in place. > > Because the establishment of guard regions zapped any existing mappings t= o > file-backed folios, once the guard regions are removed then the > hole-punched region will be faulted in as usual and everything will behav= e > as expected. > > Lorenzo Stoakes (4): > mm: allow guard regions in file-backed and read-only mappings > selftests/mm: rename guard-pages to guard-regions > tools/selftests: expand all guard region tests to file-backed > tools/selftests: add file/shmem-backed mapping guard region tests > > mm/madvise.c | 8 +- > tools/testing/selftests/mm/.gitignore | 2 +- > tools/testing/selftests/mm/Makefile | 2 +- > .../mm/{guard-pages.c =3D> guard-regions.c} | 921 ++++++++++++++++-- > 4 files changed, 821 insertions(+), 112 deletions(-) > rename tools/testing/selftests/mm/{guard-pages.c =3D> guard-regions.c} (= 58%) > > -- > 2.48.1