linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kalesh Singh <kaleshsingh@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 "Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Vlastimil Babka <vbabka@suse.cz>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Jann Horn <jannh@google.com>,
	 David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 Shuah Khan <shuah@kernel.org>,
	linux-kselftest@vger.kernel.org,  linux-api@vger.kernel.org,
	John Hubbard <jhubbard@nvidia.com>,
	 Juan Yescas <jyescas@google.com>
Subject: Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings
Date: Wed, 19 Feb 2025 00:35:12 -0800	[thread overview]
Message-ID: <CAC_TJvfPNkJDWnG81GnJcFeMLYzN8=uM-oTrK6FKT7tD=E4TQg@mail.gmail.com> (raw)
In-Reply-To: <CAC_TJveMB1_iAUt81D5-+z8gArbVcbfDM=djCZG_bRVaCEMRmg@mail.gmail.com>

On Wed, Feb 19, 2025 at 12:25 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> On Thu, Feb 13, 2025 at 10:18 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > The guard regions feature was initially implemented to support anonymous
> > mappings only, excluding shmem.
> >
> > This was done such as to introduce the feature carefully and incrementally
> > and to be conservative when considering the various caveats and corner
> > cases that are applicable to file-backed mappings but not to anonymous
> > ones.
> >
> > Now this feature has landed in 6.13, it is time to revisit this and to
> > extend this functionality to file-backed and shmem mappings.
> >
> > In order to make this maximally useful, and since one may map file-backed
> > mappings read-only (for instance ELF images), we also remove the
> > restriction on read-only mappings and permit the establishment of guard
> > regions in any non-hugetlb, non-mlock()'d mapping.
>
> Hi Lorenzo,
>
> Thank you for your work on this.
>
> Have we thought about how guard regions are represented in /proc/*/[s]maps?
>
> In the field, I've found that many applications read the ranges from
> /proc/self/[s]maps to determine what they can access (usually related
> to obfuscation techniques). If they don't know of the guard regions it
> would cause them to crash; I think that we'll need similar entries to
> PROT_NONE (---p) for these, and generally to maintain consistency
> between the behavior and what is being said from /proc/*/[s]maps.

To clarify why the applications may not be aware of their guard
regions -- in the case of the ELF mappings these PROT_NONE (guard
regions) would be installed by the dynamic loader; or may be inherited
from the parent (zygote in Android's case).

>
> -- Kalesh
>
> >
> > It is permissible to permit the establishment of guard regions in read-only
> > mappings because the guard regions only reduce access to the mapping, and
> > when removed simply reinstate the existing attributes of the underlying
> > VMA, meaning no access violations can occur.
> >
> > While the change in kernel code introduced in this series is small, the
> > majority of the effort here is spent in extending the testing to assert
> > that the feature works correctly across numerous file-backed mapping
> > scenarios.
> >
> > Every single guard region self-test performed against anonymous memory
> > (which is relevant and not anon-only) has now been updated to also be
> > performed against shmem and a mapping of a file in the working directory.
> >
> > This confirms that all cases also function correctly for file-backed guard
> > regions.
> >
> > In addition a number of other tests are added for specific file-backed
> > mapping scenarios.
> >
> > There are a number of other concerns that one might have with regard to
> > guard regions, addressed below:
> >
> > Readahead
> > ~~~~~~~~~
> >
> > Readahead is a process through which the page cache is populated on the
> > assumption that sequential reads will occur, thus amortising I/O and,
> > through a clever use of the PG_readahead folio flag establishing during
> > major fault and checked upon minor fault, provides for asynchronous I/O to
> > occur as dat is processed, reducing I/O stalls as data is faulted in.
> >
> > Guard regions do not alter this mechanism which operations at the folio and
> > fault level, but do of course prevent the faulting of folios that would
> > otherwise be mapped.
> >
> > In the instance of a major fault prior to a guard region, synchronous
> > readahead will occur including populating folios in the page cache which
> > the guard regions will, in the case of the mapping in question, prevent
> > access to.
> >
> > In addition, if PG_readahead is placed in a folio that is now inaccessible,
> > this will prevent asynchronous readahead from occurring as it would
> > otherwise do.
> >
> > However, there are mechanisms for heuristically resetting this within
> > readahead regardless, which will 'recover' correct readahead behaviour.
> >
> > Readahead presumes sequential data access, the presence of a guard region
> > clearly indicates that, at least in the guard region, no such sequential
> > access will occur, as it cannot occur there.
> >
> > So this should have very little impact on any real workload. The far more
> > important point is as to whether readahead causes incorrect or
> > inappropriate mapping of ranges disallowed by the presence of guard
> > regions - this is not the case, as readahead does not 'pre-fault' memory in
> > this fashion.
> >
> > At any rate, any mechanism which would attempt to do so would hit the usual
> > page fault paths, which correctly handle PTE markers as with anonymous
> > mappings.
> >
> > Fault-Around
> > ~~~~~~~~~~~~
> >
> > The fault-around logic, in a similar vein to readahead, attempts to improve
> > efficiency with regard to file-backed memory mappings, however it differs
> > in that it does not try to fetch folios into the page cache that are about
> > to be accessed, but rather pre-maps a range of folios around the faulting
> > address.
> >
> > Guard regions making use of PTE markers makes this relatively trivial, as
> > this case is already handled - see filemap_map_folio_range() and
> > filemap_map_order0_folio() - in both instances, the solution is to simply
> > keep the established page table mappings and let the fault handler take
> > care of PTE markers, as per the comment:
> >
> >         /*
> >          * NOTE: If there're PTE markers, we'll leave them to be
> >          * handled in the specific fault path, and it'll prohibit
> >          * the fault-around logic.
> >          */
> >
> > This works, as establishing guard regions results in page table mappings
> > with PTE markers, and clearing them removes them.
> >
> > Truncation
> > ~~~~~~~~~~
> >
> > File truncation will not eliminate existing guard regions, as the
> > truncation operation will ultimately zap the range via
> > unmap_mapping_range(), which specifically excludes PTE markers.
> >
> > Zapping
> > ~~~~~~~
> >
> > Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(),
> > which specifically deals with guard entries, leaving them intact except in
> > instances such as process teardown or munmap() where they need to be
> > removed.
> >
> > Reclaim
> > ~~~~~~~
> >
> > When reclaim is performed on file-backed folios, it ultimately invokes
> > try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte()
> > will ultimately abort the operation for the guard region mapping. If large,
> > then check_pte() will determine that this is a non-device private
> > entry/device-exclusive entry 'swap' PTE and thus abort the operation in
> > that instance.
> >
> > Therefore, no odd things happen in the instance of reclaim being attempted
> > upon a file-backed guard region.
> >
> > Hole Punching
> > ~~~~~~~~~~~~~
> >
> > This updates the page cache and ultimately invokes unmap_mapping_range(),
> > which explicitly leaves PTE markers in place.
> >
> > Because the establishment of guard regions zapped any existing mappings to
> > file-backed folios, once the guard regions are removed then the
> > hole-punched region will be faulted in as usual and everything will behave
> > as expected.
> >
> > Lorenzo Stoakes (4):
> >   mm: allow guard regions in file-backed and read-only mappings
> >   selftests/mm: rename guard-pages to guard-regions
> >   tools/selftests: expand all guard region tests to file-backed
> >   tools/selftests: add file/shmem-backed mapping guard region tests
> >
> >  mm/madvise.c                                  |   8 +-
> >  tools/testing/selftests/mm/.gitignore         |   2 +-
> >  tools/testing/selftests/mm/Makefile           |   2 +-
> >  .../mm/{guard-pages.c => guard-regions.c}     | 921 ++++++++++++++++--
> >  4 files changed, 821 insertions(+), 112 deletions(-)
> >  rename tools/testing/selftests/mm/{guard-pages.c => guard-regions.c} (58%)
> >
> > --
> > 2.48.1


  reply	other threads:[~2025-02-19  8:35 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-13 18:16 Lorenzo Stoakes
2025-02-13 18:17 ` [PATCH 1/4] mm: allow guard regions in file-backed and read-only mappings Lorenzo Stoakes
2025-02-18 14:15   ` Vlastimil Babka
2025-02-18 16:01   ` David Hildenbrand
2025-02-18 16:12     ` Lorenzo Stoakes
2025-02-18 16:17       ` David Hildenbrand
2025-02-18 16:21         ` Lorenzo Stoakes
2025-02-18 16:27           ` David Hildenbrand
2025-02-18 16:49             ` Lorenzo Stoakes
2025-02-18 17:00               ` David Hildenbrand
2025-02-18 17:04                 ` Lorenzo Stoakes
2025-02-24 14:02   ` Lorenzo Stoakes
2025-02-13 18:17 ` [PATCH 2/4] selftests/mm: rename guard-pages to guard-regions Lorenzo Stoakes
2025-02-18 14:15   ` Vlastimil Babka
2025-03-02  8:35   ` Lorenzo Stoakes
2025-02-13 18:17 ` [PATCH 3/4] tools/selftests: expand all guard region tests to file-backed Lorenzo Stoakes
2025-02-18 14:17   ` Vlastimil Babka
2025-04-22 10:37   ` Ryan Roberts
2025-04-22 10:47     ` Lorenzo Stoakes
2025-04-22 11:03       ` Ryan Roberts
2025-04-22 11:07         ` Lorenzo Stoakes
2025-04-22 11:11           ` Ryan Roberts
2025-02-13 18:17 ` [PATCH 4/4] tools/selftests: add file/shmem-backed mapping guard region tests Lorenzo Stoakes
2025-02-18 14:18   ` Vlastimil Babka
2025-02-18 12:12 ` [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings Vlastimil Babka
2025-02-18 13:05   ` Lorenzo Stoakes
2025-02-18 14:35     ` David Hildenbrand
2025-02-18 14:53       ` Lorenzo Stoakes
2025-02-18 15:20         ` David Hildenbrand
2025-02-18 16:43           ` Lorenzo Stoakes
2025-02-18 17:14             ` David Hildenbrand
2025-02-18 17:20               ` Lorenzo Stoakes
2025-02-18 17:25                 ` David Hildenbrand
2025-02-18 17:28                   ` Lorenzo Stoakes
2025-02-18 17:31                     ` David Hildenbrand
2025-02-25 15:54                     ` Vlastimil Babka
2025-02-25 16:31                       ` David Hildenbrand
2025-02-25 16:37                       ` Lorenzo Stoakes
2025-02-25 16:48                         ` David Hildenbrand
2025-02-19  8:25 ` Kalesh Singh
2025-02-19  8:35   ` Kalesh Singh [this message]
2025-02-19  9:15     ` Lorenzo Stoakes
2025-02-19 17:32     ` Liam R. Howlett
2025-02-19  9:03   ` Lorenzo Stoakes
2025-02-19  9:15     ` David Hildenbrand
2025-02-19  9:17       ` Lorenzo Stoakes
2025-02-19 18:52         ` Kalesh Singh
2025-02-19 19:20           ` Lorenzo Stoakes
2025-02-19 20:56             ` Kalesh Singh
2025-02-20  8:51               ` Lorenzo Stoakes
2025-02-20  8:57                 ` David Hildenbrand
2025-02-20  9:04                   ` Lorenzo Stoakes
2025-02-20  9:23                     ` David Hildenbrand
2025-02-20  9:47                       ` Lorenzo Stoakes
2025-02-20 10:03                         ` David Hildenbrand
2025-02-20 10:15                           ` Lorenzo Stoakes
2025-02-20 12:44                             ` David Hildenbrand
2025-02-20 13:18                               ` Lorenzo Stoakes
2025-02-20 16:21                                 ` Suren Baghdasaryan
2025-02-20 18:08                                   ` Kalesh Singh
2025-02-21 11:04                                     ` Lorenzo Stoakes
2025-02-21 17:24                                       ` Kalesh Singh
2025-02-20  9:22                 ` Vlastimil Babka
2025-02-20  9:53                   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC_TJvfPNkJDWnG81GnJcFeMLYzN8=uM-oTrK6FKT7tD=E4TQg@mail.gmail.com' \
    --to=kaleshsingh@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jyescas@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=paulmck@kernel.org \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox