From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26C3EC021AA for ; Wed, 19 Feb 2025 08:35:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABCC2280208; Wed, 19 Feb 2025 03:35:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A6D60280205; Wed, 19 Feb 2025 03:35:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90DC1280208; Wed, 19 Feb 2025 03:35:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 72311280205 for ; Wed, 19 Feb 2025 03:35:28 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 00136B3B94 for ; Wed, 19 Feb 2025 08:35:27 +0000 (UTC) X-FDA: 83136035094.18.FF71A68 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf07.hostedemail.com (Postfix) with ESMTP id 1731740007 for ; Wed, 19 Feb 2025 08:35:25 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="F/D9aY6/"; spf=pass (imf07.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739954126; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nh11V3R3PpQccWp7H0vYXQcsX+FHgC2HvQI3gNl8VR0=; b=m1jPvLgeyJFH0niwkiOUtJBHCEaG3a18eE2TrqMc1rto9n8hP8p9R6DDVGYEpaLMssj15+ LzGYaqBmQkQ0EC3d6Hy+f7lCbCQPe4cXq9TNqNBW+iMx8EUPPIPUaM8avnsLDb0GoRD2hf Jy5WqBVcycc1Eh1SDV1CeAYUTYeuicI= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="F/D9aY6/"; spf=pass (imf07.hostedemail.com: domain of kaleshsingh@google.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=kaleshsingh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739954126; a=rsa-sha256; cv=none; b=FZTzbm7G15Xh7oR+X2D3hrxMAO+kVqWAcw/83znKy8YxwwE9og+0g5ai03zXB1N+DgQWhB iMa+qQOHSLlOYNi0b1BBn2XnibAlmnA3cat/7IiyvnBJBMCN7VdvIhemM/tcJ8VVhScgbK +Ch8WAuAJLSoZAXEZecdyejHi8ai77g= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-219f6ca9a81so607855ad.1 for ; Wed, 19 Feb 2025 00:35:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739954125; x=1740558925; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nh11V3R3PpQccWp7H0vYXQcsX+FHgC2HvQI3gNl8VR0=; b=F/D9aY6/L+KniMIEckVcqbPcM/7+nVtKUfUS7NX2P042F6yp/pN03WsmkYXAwG74oD L4CcS9uMeP6dPbPAJsm+4/LVXNBkYlqLC+cIJOFHYlI7y7jC+gN7mHHrFFYAlxCQ6K8n wZG/V9fXafxoG7E4wwRfFyK1pl6F5ZYjb24s81YMsEj4Q2zE7jcfhoxyIhxVEgxkdEab TIfFdNSbD1i3hCcZtZRU7dV13QuAxvjA2c8Y4Ot3ehZAaxPBy53Dc95LPf3ncNtKdNMF sjoRbw8t5Xz77/UK7E9wwLB+e1JNRIxAn0RXGMOFu7b69IpBDWjxd1PmW/W+NR0QjPmO e2Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739954125; x=1740558925; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nh11V3R3PpQccWp7H0vYXQcsX+FHgC2HvQI3gNl8VR0=; b=biWJnqTSjTSc873PBH4/bnyVtA2w9LBvfssd/s5DPKwiAhoYep1XMMJ2zmw5XtKs06 fb6zRG2NzzdyyWh2ajNk3HvFYgttHZOUEdNs8LHUqFTHjI4FwbCWCAI/fCc97j2dZ+Nx ZQDazqpTI4mlc87Q3h76gjyjcEG2D4TuHzuxFcXBp4BbI/EZSM9NvXmpzC0d3zBJRA3s ImHQ/kgJY04DykXm1BU0eg9mz0BWVi7wcaQWcpwZFc+p+7pF+lVe7qsjK7MWFpJutZJa Yz3D6W86pEDtYVnwsLPvYnaXL+7+X6Kp6MPHCjy6Gw1eRF8+PCo4Ys5tjKpPIo5NlBUa /PSg== X-Forwarded-Encrypted: i=1; AJvYcCXwHxldgZYwVnfpRhJFCV/wDgKBI3F+llNjTQFBkpw98Nvc+PLMGP9+zqp8XUIDLbF40Llo4Hh9Bg==@kvack.org X-Gm-Message-State: AOJu0YxFopoVpOHtTcCdZ6xhukB42geKkeA1C2EqtaLboSBzdTcAyP4x Wq3YrylDlkL0M+qwZoENx68tANSfYf9cTeO28qp9FnryRcWbJ4sUZyJHKNmmFuNFMdiUVIOg6qa lVsfsllSs6m1QXRWObmbvJm+up+sEYrhr6wYQ X-Gm-Gg: ASbGncu2Fj+gx2q53c5YIJLrae9Gw9vcMSW9Mxr3/IKnrYWzmwOqirp01jxo6fNgoMo OAwfRO7xZaZt5TrIEdac/d3v/nqdlx/EGx0XDDI600jbHizrdLEtXhDYHhImSkHyOoYhJcH3npB 0Inc+MISwv9RLzw5kErrIuhZJS4sE7BA== X-Google-Smtp-Source: AGHT+IF8lhN2VKRtkBG8lgw6HbFBNjGSvIa0tqHhUKQQlK5tOi+1VUQJdJ+dTtru3EjOzYbkzwwP/EYSufqagYHgcKs= X-Received: by 2002:a17:902:d491:b0:217:8612:b690 with SMTP id d9443c01a7336-221740eacc2mr2096905ad.8.1739954124714; Wed, 19 Feb 2025 00:35:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Kalesh Singh Date: Wed, 19 Feb 2025 00:35:12 -0800 X-Gm-Features: AWEUYZkFYO4honuUctBjczHtttLmlYcLayKN2sdlCl_V5hwlV_Nm6uGMdSDX7hk Message-ID: Subject: Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings To: Lorenzo Stoakes Cc: Andrew Morton , Suren Baghdasaryan , "Liam R . Howlett" , Matthew Wilcox , Vlastimil Babka , "Paul E . McKenney" , Jann Horn , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shuah Khan , linux-kselftest@vger.kernel.org, linux-api@vger.kernel.org, John Hubbard , Juan Yescas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 1731740007 X-Stat-Signature: fgszo5jt96b9bjizgao4m93ejxfgphma X-Rspamd-Server: rspam03 X-HE-Tag: 1739954125-709201 X-HE-Meta: U2FsdGVkX18yadBOjYk+fK/DTnflRDCSfCi2VDLhWr1XCIBwQFz6N7ycmX+JxMYdTtPybATVzCDn76cjR5D1PIKjnKL58QP1TN1mY8tmtgT5VmAVHrUUD6D5mFWE7FC4H/8Nto5XFQjTrMlMQCp2f7Vfkuv6JG0ElUEdYOZXcBIZyZFwIQD8xiWHFav2NU6ZLXW1zzxcPdgyggBGYyDZpQrROTQ1snScUsWA9rQvA2Qfsx7ViJoldJsKzoWxBris20INHf5XRjj3WVBJd9zsvmidJgS6anL5e657shu9n7RthIl5VkvmWom4P9Wb566HgO6POQ/6v1Tu+raHYsXK3T983g4Y2CU1ry/WGRgZzo10/LVxmb2ebdARHEXJUAxOO1uiqE72A5MBYH9q9IkhX7eBzqDnDgFXEqU3nD39zZphA/6bAl9KayaQN581pXOyAllLX3frfZD+gGkZsiJtJDGWsHr1EHbn7ixQP4pv9LKKVYsE7gdDZfNvWtbMMBHbmjdVsBF+ETOYK7lBl67yGm4S27IP9kZXPgsx9HNU7Irfxsakh/3nkF55P2F7TYr2dtyRXl2peqtE8kBBZLcVmSN5eszZKmdE3swhtBD7mowgVfdoIkWYAHyKKatLxrWqS9iYoS/dt4O0blY/RTRcmRd/o3sCod8l+RkhMbud7o+OrvL2EurHG3TVPrcwoPIyPcbmDpElzc9MOkhC3NMmFNnqmZ3611snlti9si0vdHGVmbbhkinXlAPNxWvrJIf7SHdO61rYegr+cV817ShgGOQMDQ7Ccquy3WIv3TgDB8j6fWyHZgjzH8w5pFklPfebgGCn/fnU4/MF+6+X5Aco/hj+8tB8fd2qC5DRkuQjqZqasRPkDszZZHbFtCPJO0W9n/s7k7YDOa0pQYDm8+PxxUVn4EQfgFSQDNAL6MjWYcQzpaJ5igbUS6XPzqH0DLJcJAWkqscUmOc/Wa2CzzA J0N7WJ09 M5h0DujGEzcwrsCLHbMVvTmewh6AnAniBug1IRwLjVCGJIBMxZeiZxj1AODieLCA2DmLY1Mv0du8ny3uBFhMOUn9ic2KFxwHJwbnTEFyZRv2tPlQD++/HLQypCxPkfPTbiVY8voSVJM0ObKFhqRM1+EPrGOLoG4SRB0SuBA8YZeQI4+layi+PAqQcugLupGZFznnEcv49opYbNIQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000015, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 19, 2025 at 12:25=E2=80=AFAM Kalesh Singh wrote: > > On Thu, Feb 13, 2025 at 10:18=E2=80=AFAM Lorenzo Stoakes > wrote: > > > > The guard regions feature was initially implemented to support anonymou= s > > mappings only, excluding shmem. > > > > This was done such as to introduce the feature carefully and incrementa= lly > > and to be conservative when considering the various caveats and corner > > cases that are applicable to file-backed mappings but not to anonymous > > ones. > > > > Now this feature has landed in 6.13, it is time to revisit this and to > > extend this functionality to file-backed and shmem mappings. > > > > In order to make this maximally useful, and since one may map file-back= ed > > mappings read-only (for instance ELF images), we also remove the > > restriction on read-only mappings and permit the establishment of guard > > regions in any non-hugetlb, non-mlock()'d mapping. > > Hi Lorenzo, > > Thank you for your work on this. > > Have we thought about how guard regions are represented in /proc/*/[s]map= s? > > In the field, I've found that many applications read the ranges from > /proc/self/[s]maps to determine what they can access (usually related > to obfuscation techniques). If they don't know of the guard regions it > would cause them to crash; I think that we'll need similar entries to > PROT_NONE (---p) for these, and generally to maintain consistency > between the behavior and what is being said from /proc/*/[s]maps. To clarify why the applications may not be aware of their guard regions -- in the case of the ELF mappings these PROT_NONE (guard regions) would be installed by the dynamic loader; or may be inherited from the parent (zygote in Android's case). > > -- Kalesh > > > > > It is permissible to permit the establishment of guard regions in read-= only > > mappings because the guard regions only reduce access to the mapping, a= nd > > when removed simply reinstate the existing attributes of the underlying > > VMA, meaning no access violations can occur. > > > > While the change in kernel code introduced in this series is small, the > > majority of the effort here is spent in extending the testing to assert > > that the feature works correctly across numerous file-backed mapping > > scenarios. > > > > Every single guard region self-test performed against anonymous memory > > (which is relevant and not anon-only) has now been updated to also be > > performed against shmem and a mapping of a file in the working director= y. > > > > This confirms that all cases also function correctly for file-backed gu= ard > > regions. > > > > In addition a number of other tests are added for specific file-backed > > mapping scenarios. > > > > There are a number of other concerns that one might have with regard to > > guard regions, addressed below: > > > > Readahead > > ~~~~~~~~~ > > > > Readahead is a process through which the page cache is populated on the > > assumption that sequential reads will occur, thus amortising I/O and, > > through a clever use of the PG_readahead folio flag establishing during > > major fault and checked upon minor fault, provides for asynchronous I/O= to > > occur as dat is processed, reducing I/O stalls as data is faulted in. > > > > Guard regions do not alter this mechanism which operations at the folio= and > > fault level, but do of course prevent the faulting of folios that would > > otherwise be mapped. > > > > In the instance of a major fault prior to a guard region, synchronous > > readahead will occur including populating folios in the page cache whic= h > > the guard regions will, in the case of the mapping in question, prevent > > access to. > > > > In addition, if PG_readahead is placed in a folio that is now inaccessi= ble, > > this will prevent asynchronous readahead from occurring as it would > > otherwise do. > > > > However, there are mechanisms for heuristically resetting this within > > readahead regardless, which will 'recover' correct readahead behaviour. > > > > Readahead presumes sequential data access, the presence of a guard regi= on > > clearly indicates that, at least in the guard region, no such sequentia= l > > access will occur, as it cannot occur there. > > > > So this should have very little impact on any real workload. The far mo= re > > important point is as to whether readahead causes incorrect or > > inappropriate mapping of ranges disallowed by the presence of guard > > regions - this is not the case, as readahead does not 'pre-fault' memor= y in > > this fashion. > > > > At any rate, any mechanism which would attempt to do so would hit the u= sual > > page fault paths, which correctly handle PTE markers as with anonymous > > mappings. > > > > Fault-Around > > ~~~~~~~~~~~~ > > > > The fault-around logic, in a similar vein to readahead, attempts to imp= rove > > efficiency with regard to file-backed memory mappings, however it diffe= rs > > in that it does not try to fetch folios into the page cache that are ab= out > > to be accessed, but rather pre-maps a range of folios around the faulti= ng > > address. > > > > Guard regions making use of PTE markers makes this relatively trivial, = as > > this case is already handled - see filemap_map_folio_range() and > > filemap_map_order0_folio() - in both instances, the solution is to simp= ly > > keep the established page table mappings and let the fault handler take > > care of PTE markers, as per the comment: > > > > /* > > * NOTE: If there're PTE markers, we'll leave them to be > > * handled in the specific fault path, and it'll prohibit > > * the fault-around logic. > > */ > > > > This works, as establishing guard regions results in page table mapping= s > > with PTE markers, and clearing them removes them. > > > > Truncation > > ~~~~~~~~~~ > > > > File truncation will not eliminate existing guard regions, as the > > truncation operation will ultimately zap the range via > > unmap_mapping_range(), which specifically excludes PTE markers. > > > > Zapping > > ~~~~~~~ > > > > Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(= ), > > which specifically deals with guard entries, leaving them intact except= in > > instances such as process teardown or munmap() where they need to be > > removed. > > > > Reclaim > > ~~~~~~~ > > > > When reclaim is performed on file-backed folios, it ultimately invokes > > try_to_unmap_one() via the rmap. If the folio is non-large, then map_pt= e() > > will ultimately abort the operation for the guard region mapping. If la= rge, > > then check_pte() will determine that this is a non-device private > > entry/device-exclusive entry 'swap' PTE and thus abort the operation in > > that instance. > > > > Therefore, no odd things happen in the instance of reclaim being attemp= ted > > upon a file-backed guard region. > > > > Hole Punching > > ~~~~~~~~~~~~~ > > > > This updates the page cache and ultimately invokes unmap_mapping_range(= ), > > which explicitly leaves PTE markers in place. > > > > Because the establishment of guard regions zapped any existing mappings= to > > file-backed folios, once the guard regions are removed then the > > hole-punched region will be faulted in as usual and everything will beh= ave > > as expected. > > > > Lorenzo Stoakes (4): > > mm: allow guard regions in file-backed and read-only mappings > > selftests/mm: rename guard-pages to guard-regions > > tools/selftests: expand all guard region tests to file-backed > > tools/selftests: add file/shmem-backed mapping guard region tests > > > > mm/madvise.c | 8 +- > > tools/testing/selftests/mm/.gitignore | 2 +- > > tools/testing/selftests/mm/Makefile | 2 +- > > .../mm/{guard-pages.c =3D> guard-regions.c} | 921 ++++++++++++++++= -- > > 4 files changed, 821 insertions(+), 112 deletions(-) > > rename tools/testing/selftests/mm/{guard-pages.c =3D> guard-regions.c}= (58%) > > > > -- > > 2.48.1