linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>,
	linux-hardening@vger.kernel.org,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Ira Weiny <ira.weiny@intel.com>, Jann Horn <jannh@google.com>,
	Jeff Xu <jeffxu@chromium.org>, Joey Gouly <joey.gouly@arm.com>,
	Kees Cook <kees@kernel.org>,
	Linus Walleij <linus.walleij@linaro.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Marc Zyngier <maz@kernel.org>, Mark Brown <broonie@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Maxwell Bland <mbland@motorola.com>,
	"Mike Rapoport (IBM)" <rppt@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Pierre Langlois <pierre.langlois@arm.com>,
	Quentin Perret <qperret@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	x86@kernel.org
Subject: Re: [RFC PATCH v5 00/18] pkeys-based page table hardening
Date: Thu, 18 Sep 2025 15:57:48 +0100	[thread overview]
Message-ID: <aMwd7IJVECEy8mzf@willie-the-truck> (raw)
In-Reply-To: <8f7b3f4e-bf56-4030-952f-962291e53ccc@arm.com>

On Thu, Sep 18, 2025 at 04:15:52PM +0200, Kevin Brodsky wrote:
> On 25/08/2025 09:31, Kevin Brodsky wrote:
> >>> Note: the performance impact of set_memory_pkey() is likely to be
> >>> relatively low on arm64 because the linear mapping uses PTE-level
> >>> descriptors only. This means that set_memory_pkey() simply changes the
> >>> attributes of some PTE descriptors. However, some systems may be able to
> >>> use higher-level descriptors in the future [5], meaning that
> >>> set_memory_pkey() may have to split mappings. Allocating page tables
> >> I'm supposed the page table hardening feature will be opt-in due to
> >> its overhead? If so I think you can just keep kernel linear mapping
> >> using PTE, just like debug page alloc.
> > Indeed, I don't expect it to be turned on by default (in defconfig). If
> > the overhead proves too large when block mappings are used, it seems
> > reasonable to force PTE mappings when kpkeys_hardened_pgtables is enabled.
> 
> I had a closer look at what happens when the linear map uses block
> mappings, rebasing this series on top of [1]. Unfortunately, this is
> worse than I thought: it does not work at all as things stand.
> 
> The main issue is that calling set_memory_pkey() in pagetable_*_ctor()
> can cause the linear map to be split, which requires new PTP(s) to be
> allocated, which means more nested call(s) to set_memory_pkey(). This
> explodes as a non-recursive lock is taken on that path.
> 
> More fundamentally, this cannot work unless we can explicitly allocate
> PTPs from either:
> 1. A pool of PTE-mapped pages
> 2. A pool of memory that is already mapped with the right pkey (at any
> level)
> 
> This is where I have to apologise to Rick for not having studied his
> series more thoroughly, as patch 17 [2] covers this issue very well in
> the commit message.
> 
> It seems fair to say there is no ideal or simple solution, though.
> Rick's patch reserves enough (PTE-mapped) memory for fully splitting the
> linear map, which is relatively simple but not very pleasant. Chatting
> with Ryan Roberts, we figured another approach, improving on solution 1
> mentioned in [2]. It would rely on allocating all PTPs from a special
> pool (without using set_memory_pkey() in pagetable_*_ctor), along those
> lines:
> 
> 1. 2 pages are reserved at all times (with the appropriate pkey)
> 2. Try to allocate a 2M block. If needed, use a reserved page as PMD to
> split a PUD. If successful, set its pkey - the entire block can now be
> used for PTPs. Replenish the reserve from the block if needed.
> 3. If no block is available, make an order-2 allocation (4 pages). If
> needed, use 1-2 reserved pages to split PUD/PMD. Set the pkey of the 4
> pages, take 1-2 pages to replenish the reserve if needed.
> 
> This ensures that we never run out of PTPs for splitting. We may get
> into an OOM situation more easily due to the order-2 requirement, but
> the risk remains low compared to requiring a 2M block. A bigger concern
> is concurrency - do we need a per-CPU cache? Reserving a 2M block per
> CPU could be very much overkill.
> 
> No matter which solution is used, this clearly increases the complexity
> of kpkeys_hardened_pgtables. Mike Rapoport has posted a number of RFCs
> [3][4] that aim at addressing this problem more generally, but no
> consensus seems to have emerged and I'm not sure they would completely
> solve this specific problem either.
> 
> For now, my plan is to stick to solution 3 from [2], i.e. force the
> linear map to be PTE-mapped. This is easily done on arm64 as it is the
> default, and is required for rodata=full, unless [1] is applied and the
> system supports BBML2_NOABORT. See [1] for the potential performance
> improvements we'd be missing out on (~5% ballpark). I'm not quite sure
> what the picture looks like on x86 - it may well be more significant as
> Rick suggested.

Just as a data point, but forcing the linear map down to 4k would likely
prevent us from being able to enable this on Android. We've measured a
considerable (double digit %) increase in CPU power consumption for some
real-life camera workloads when mapping the linear map at 4k granularity
thanks to the additional memory traffic from the PTW.

At some point, KFENCE required 4k granularity for the linear map, but we
fixed it. rodata=full requires 4k granularity, but there are patches to
fix that too. So I think we should avoid making the same mistake from
the start for this series.

Will


  reply	other threads:[~2025-09-18 14:58 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-15  8:54 Kevin Brodsky
2025-08-15  8:54 ` [RFC PATCH v5 01/18] mm: Introduce kpkeys Kevin Brodsky
2025-08-15  8:54 ` [RFC PATCH v5 02/18] set_memory: Introduce set_memory_pkey() stub Kevin Brodsky
2025-08-15  8:54 ` [RFC PATCH v5 03/18] arm64: mm: Enable overlays for all EL1 indirect permissions Kevin Brodsky
2025-08-15  8:54 ` [RFC PATCH v5 04/18] arm64: Introduce por_elx_set_pkey_perms() helper Kevin Brodsky
2025-08-15  8:54 ` [RFC PATCH v5 05/18] arm64: Implement asm/kpkeys.h using POE Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 06/18] arm64: set_memory: Implement set_memory_pkey() Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 07/18] arm64: Reset POR_EL1 on exception entry Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 08/18] arm64: Context-switch POR_EL1 Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 09/18] arm64: Enable kpkeys Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 10/18] mm: Introduce kernel_pgtables_set_pkey() Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 11/18] mm: Introduce kpkeys_hardened_pgtables Kevin Brodsky
2025-11-28 16:44   ` Yeoreum Yun
2025-12-01  9:19     ` Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 12/18] mm: Allow __pagetable_ctor() to fail Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 13/18] mm: Map page tables with privileged pkey Kevin Brodsky
2025-08-15 16:37   ` Edgecombe, Rick P
2025-08-18 16:02     ` Kevin Brodsky
2025-08-18 17:01       ` Edgecombe, Rick P
2025-08-19  9:35         ` Kevin Brodsky
2025-10-01 15:28   ` David Hildenbrand
2025-10-01 17:22     ` Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 14/18] arm64: kpkeys: Support KPKEYS_LVL_PGTABLES Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 15/18] arm64: mm: Guard page table writes with kpkeys Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 16/18] arm64: Enable kpkeys_hardened_pgtables support Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 17/18] mm: Add basic tests for kpkeys_hardened_pgtables Kevin Brodsky
2025-08-15  8:55 ` [RFC PATCH v5 18/18] arm64: mm: Batch kpkeys level switches Kevin Brodsky
2025-08-20 15:53 ` [RFC PATCH v5 00/18] pkeys-based page table hardening Kevin Brodsky
2025-08-20 16:01   ` Kevin Brodsky
2025-08-20 16:18     ` Edgecombe, Rick P
2025-08-21  7:23       ` Kevin Brodsky
2025-08-21 17:29 ` Yang Shi
2025-08-25  7:31   ` Kevin Brodsky
2025-08-26 19:18     ` Yang Shi
2025-08-27 16:09       ` Kevin Brodsky
2025-08-29 22:31         ` Yang Shi
2025-09-18 14:15     ` Kevin Brodsky
2025-09-18 14:57       ` Will Deacon [this message]
2025-10-01 12:22         ` Kevin Brodsky
2025-09-18 17:31       ` Edgecombe, Rick P
2025-10-01 12:41         ` Kevin Brodsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMwd7IJVECEy8mzf@willie-the-truck \
    --to=will@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jeffxu@chromium.org \
    --cc=joey.gouly@arm.com \
    --cc=kees@kernel.org \
    --cc=kevin.brodsky@arm.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=maz@kernel.org \
    --cc=mbland@motorola.com \
    --cc=peterz@infradead.org \
    --cc=pierre.langlois@arm.com \
    --cc=qperret@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox