From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CB7AECFD313 for ; Mon, 24 Nov 2025 13:22:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D353B6B0012; Mon, 24 Nov 2025 08:22:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CE69A6B0024; Mon, 24 Nov 2025 08:22:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAEC36B0027; Mon, 24 Nov 2025 08:22:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9FD2A6B0012 for ; Mon, 24 Nov 2025 08:22:45 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 56EE01A0358 for ; Mon, 24 Nov 2025 13:22:45 +0000 (UTC) X-FDA: 84145565490.06.F9BEBC5 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf30.hostedemail.com (Postfix) with ESMTP id A0A7180002 for ; Mon, 24 Nov 2025 13:22:43 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of kevin.brodsky@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=kevin.brodsky@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1763990563; a=rsa-sha256; cv=none; b=kusWo3a6tXG2357i5lTMi5v53GUtwpuUbcQmcCBMq+xJBePyFnL0d/okUiNrFcaN2v0evA eZs+i4FWCiBO/KNhuzfkx38IHpFwu4MzRC2TovomVVrAFVyQxLvFBdPqLt6c9KgZikqQmi hmBT1ig9MYqQs9cF7sHvfNam4TV9qCE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of kevin.brodsky@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=kevin.brodsky@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1763990563; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=EDGbPPsRPziooSK7Hh0RxfwOuXGzBG+EdJKmYVIhHpg=; b=dB1axsIB72UQBpijC4caWvu3+1iFECyPJ0v9L+eCmwTlr2+8xfCJU39sH/VQmJvrG+O2qU 9r07df2Tr2H4Vc1cRs9SXESDik6Xc4sGThbLGcyFLNywzUluIGrcbmjNpn/43YVdUuAosQ QihRBjucH8hctAkMzukcpGC9KXEq/BM= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 433AD497; Mon, 24 Nov 2025 05:22:35 -0800 (PST) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8F32D3F73B; Mon, 24 Nov 2025 05:22:37 -0800 (PST) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , David Woodhouse , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , "Ritesh Harjani (IBM)" , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Venkat Rao Bagalkote , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, x86@kernel.org Subject: [PATCH v5 00/12] Nesting support for lazy MMU mode Date: Mon, 24 Nov 2025 13:22:16 +0000 Message-ID: <20251124132228.622678-1-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.51.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: A0A7180002 X-Rspamd-Server: rspam07 X-Stat-Signature: maa9z5e7ryzh84syyrsbo8gwsgczfuer X-Rspam-User: X-HE-Tag: 1763990563-689716 X-HE-Meta: U2FsdGVkX18sLCGdKQa93N882NVNmrCsEeKhxIffioBm6JpsWbq0aSdhcAzfST44F0NDmbDaOgwguWV3CctvZSSitF9Jos2CTC9VXXhlRFxVLGFr/D6+PHsQ6s0z491lD1uTj/Ef3ivcNdOJkPpVGCutJJfpXjuZeGHzPaEEJNtYxxs8Xn1aGcTiZ2N0yL06FFX7K40RlCxEK8g7/ePgSSwQ11E7XupZAUO8l4fxgeTFPac1GFEpL1794RC04S/v+wDQNXebOaZFhow373FPQcxSymTOfhk3JfaviC/LyYdP8j4Z1rXVHRbe3EXEO3adI3bFQf/y3lgs7N/d14nMJpcswbqpVJngcBgpgscFFx3X1zNBypVj1AEb0Uwle2sacmw+0wOjWiMBk9PHYXgeQnURTnTmEQjC3wy7jHjj/je6i78P2TbN1aXYwHb6wKsAJJJ2Hv8UGznjyUfgkCnGQLCrdJuiQ0+dp6za8jIh/OmQsBi3fs9PApXoMtkR8MxpfR1yo8mgD3CosTbvT8lopUk8thKfLCXW6nUaD63FAzXRYIvRDgaZLFlPjwpJdCY+equ3sq4iktsMQEU1wWPV0HYzXwddfkDRkn9f+vI+ujsI5EN1lOZ89prxodD+GGDJpDgitpE4HCcPLmfnufQa2j8vqBNYqVgUB32DZOAo+zRJR2IdAAFEclj7EPxY3PazSER3SHIuY0JPnWd9eO3MTcPuytbz39JCDoG2gXUWLPjenDdbupCtujmReSbvt9tZNb/nju4bCnb2zyGvHeAE45t/JyB4mt30dICBaEnjj1l1iA+E5ZUodGlsB1XiUuoSFtmGC13xeW4I/EL3/XHIlD03nkoUKJMEox2ilcKSw7wdET5t7YqkFL7zZ2oeOWV/VkqIc/0u1u0holySa009QAsAq0DPuq7zMGWLYSvWP8/krcpmmtJTMkd5n61ZfDenROlQjCdizrwNAP8ErO/ wX1KYgmg p06Y4sQH1aFhyB7ZEO8vwt1hV2FSHl4z++1VLZoDJHWwCHn5J5bHmgjhFf5Lu++pgT5U++wSR2KgiYV90+V+vX7oAcEhJoCysNfFWeYR/S+kHrkwzcGk9nW5KzQ3+nbNxbII8npsDhxdyh36QNPVbAwZMv/F+XqDw8N25/HeR3mOhr4b8yMOIUi5m2iC6avN9UIfibON62TBRTP558XiwHJv6GlWi8EIeUCAqD8b/AaOljr8zCcx10q45GLs+G2SKIKuI2c0jmujixLQ6L1tXe1hxVqONR76KxB8VPdwgfwx7oQWFLC6aHg4W+7EWG+/16tu42S8nLBAD9YD1X9HPRclaSLfQ/o1d3kmPFNwqTTSvI6K17Ypv2JHthKXdqiQj2LS/666FMGdXaOBgZm0ggVimwlAgmMFRLW5B6MHCtypjARVN2ms46rDz2VVwxzVL3U++HRe1nD7x5vQXwtIyohdgFRLBNlWDCUqxOyKIdZFZ2Xqd2mpFk0+M2OmNq4szEfERftz4u0nOh/MmpiXRVIy0lDdjfXGyTd/D8H1z3fq8PHJ8pSdsA0P/jSndRhrh0tW1epz5Lib8sbMuOIw4TeZWyO+rhjQGRKk2lWwiWS+X+A27xKxgeEgxIoW5OUGYAC+9Shm+q5jn8VG1d8uW/JEHLHglM9PSfMYeULqPfO3GtBawT31JVxEpHEwdDX2uMTWGp7zaHlDM5WsvE1wEYVQpbEP2mJNXGW3NUlOJP+p3BMZpIRRm91JxByBTJugqhWGK+Fs+ckLMD+Arf7gbHdrcwppnr3MgcMe13MRfEkoz/+qnKCx5y75EOA/8rBVmI7AueK908la7Rewhd6GfPO/hcm6BNGoUuXSdRTThylTTuRPgl5QEewxxHuzLC4Kh94BNLYVvXxrBZQBm7/HuNQdL893xkJ9V4a6UBFocInkMPIo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When the lazy MMU mode was introduced eons ago, it wasn't made clear whether such a sequence was legal: arch_enter_lazy_mmu_mode() ... arch_enter_lazy_mmu_mode() ... arch_leave_lazy_mmu_mode() ... arch_leave_lazy_mmu_mode() It seems fair to say that nested calls to arch_{enter,leave}_lazy_mmu_mode() were not expected, and most architectures never explicitly supported it. Nesting does in fact occur in certain configurations, and avoiding it has proved difficult. This series therefore enables lazy_mmu sections to nest, on all architectures. Nesting is handled using a counter in task_struct (patch 8), like other stateless APIs such as pagefault_{disable,enable}(). This is fully handled in a new generic layer in ; the arch_* API remains unchanged. A new pair of calls, lazy_mmu_mode_{pause,resume}(), is also introduced to allow functions that are called with the lazy MMU mode enabled to temporarily pause it, regardless of nesting. An arch now opts in to using the lazy MMU mode by selecting CONFIG_ARCH_LAZY_MMU; this is more appropriate now that we have a generic API, especially with state conditionally added to task_struct. --- Background: Ryan Roberts' series from March [1] attempted to prevent nesting from ever occurring, and mostly succeeded. Unfortunately, a corner case (DEBUG_PAGEALLOC) may still cause nesting to occur on arm64. Ryan proposed [2] to address that corner case at the generic level but this approach received pushback; [3] then attempted to solve the issue on arm64 only, but it was deemed too fragile. It feels generally difficult to guarantee that lazy_mmu sections don't nest, because callers of various standard mm functions do not know if the function uses lazy_mmu itself. The overall approach in v3/v4 is very close to what David Hildenbrand proposed on v2 [4]. Unlike in v1/v2, no special provision is made for architectures to save/restore extra state when entering/leaving the mode. Based on the discussions so far, this does not seem to be required - an arch can store any relevant state in thread_struct during arch_enter() and restore it in arch_leave(). Nesting is not a concern as these functions are only called at the top level, not in nested sections. The introduction of a generic layer, and tracking of the lazy MMU state in task_struct, also allows to streamline the arch callbacks - this series removes 67 lines from arch/. Patch overview: * Patch 1: cleanup - avoids having to deal with the powerpc context-switching code * Patch 2-4: prepare arch_flush_lazy_mmu_mode() to be called from the generic layer (patch 8) * Patch 5-6: new API + CONFIG_ARCH_LAZY_MMU * Patch 7: ensure correctness in interrupt context * Patch 8: nesting support * Patch 9-12: replace arch-specific tracking of lazy MMU mode with generic API This series has been tested by running the mm kselftests on arm64 with DEBUG_VM, DEBUG_PAGEALLOC, KFENCE and KASAN. It was also build-tested on other architectures (with and without XEN_PV on x86). - Kevin [1] https://lore.kernel.org/all/20250303141542.3371656-1-ryan.roberts@arm.com/ [2] https://lore.kernel.org/all/20250530140446.2387131-1-ryan.roberts@arm.com/ [3] https://lore.kernel.org/all/20250606135654.178300-1-ryan.roberts@arm.com/ [4] https://lore.kernel.org/all/ef343405-c394-4763-a79f-21381f217b6c@redhat.com/ --- Changelog v4..v5: - Rebased on mm-unstable - Patch 3: added missing radix_enabled() check in arch_flush() [Ritesh Harjani] - Patch 6: declare arch_flush_lazy_mmu_mode() as static inline on x86 [Ryan Roberts] - Patch 7 (formerly 12): moved before patch 8 to ensure correctness in interrupt context [Ryan]. The diffs in in_lazy_mmu_mode() and queue_pte_barriers() are moved to patch 8 and 9 resp. - Patch 8: * Removed all restrictions regarding lazy_mmu_mode_{pause,resume}(). They may now be called even when lazy MMU isn't enabled, and any call to lazy_mmu_mode_* may be made while paused (such calls will be ignored). [David, Ryan] * lazy_mmu_state.{nesting_level,active} are replaced with {enable_count,pause_count} to track arbitrary nesting of both enable/disable and pause/resume [Ryan] * Added __task_lazy_mmu_mode_active() for use in patch 12 [David] * Added documentation for all the functions [Ryan] - Patch 9: keep existing test + set TIF_LAZY_MMU_PENDING instead of atomic RMW [David, Ryan] - Patch 12: use __task_lazy_mmu_mode_active() instead of accessing lazy_mmu_state directly [David] - Collected R-b/A-b tags v4: https://lore.kernel.org/all/20251029100909.3381140-1-kevin.brodsky@arm.com/ v3..v4: - Patch 2: restored ordering of preempt_{disable,enable}() [Dave Hansen] - Patch 5 onwards: s/ARCH_LAZY_MMU/ARCH_HAS_LAZY_MMU_MODE/ [Mike Rapoport] - Patch 7: renamed lazy_mmu_state members, removed VM_BUG_ON(), reordered writes to lazy_mmu_state members [David Hildenbrand] - Dropped patch 13 as it doesn't seem justified [David H] - Various improvements to commit messages [David H] v3: https://lore.kernel.org/all/20251015082727.2395128-1-kevin.brodsky@arm.com/ v2..v3: - Full rewrite; dropped all Acked-by/Reviewed-by. - Rebased on v6.18-rc1. v2: https://lore.kernel.org/all/20250908073931.4159362-1-kevin.brodsky@arm.com/ v1..v2: - Rebased on mm-unstable. - Patch 2: handled new calls to enter()/leave(), clarified how the "flush" pattern (leave() followed by enter()) is handled. - Patch 5,6: removed unnecessary local variable [Alexander Gordeev's suggestion]. - Added Mike Rapoport's Acked-by. v1: https://lore.kernel.org/all/20250904125736.3918646-1-kevin.brodsky@arm.com/ --- Cc: Alexander Gordeev Cc: Andreas Larsson Cc: Andrew Morton Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christophe Leroy Cc: Dave Hansen Cc: David Hildenbrand Cc: "David S. Miller" Cc: David Woodhouse Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Jann Horn Cc: Juergen Gross Cc: "Liam R. Howlett" Cc: Lorenzo Stoakes Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Michal Hocko Cc: Mike Rapoport Cc: Nicholas Piggin Cc: Peter Zijlstra Cc: Ritesh Harjani (IBM) Cc: Ryan Roberts Cc: Suren Baghdasaryan Cc: Thomas Gleixner Cc: Venkat Rao Bagalkote Cc: Vlastimil Babka Cc: Will Deacon Cc: Yeoreum Yun Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: x86@kernel.org --- Alexander Gordeev (1): powerpc/64s: Do not re-activate batched TLB flush Kevin Brodsky (11): x86/xen: simplify flush_lazy_mmu() powerpc/mm: implement arch_flush_lazy_mmu_mode() sparc/mm: implement arch_flush_lazy_mmu_mode() mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE mm: introduce generic lazy_mmu helpers mm: bail out of lazy_mmu_mode_* in interrupt context mm: enable lazy_mmu sections to nest arm64: mm: replace TIF_LAZY_MMU with in_lazy_mmu_mode() powerpc/mm: replace batch->active with in_lazy_mmu_mode() sparc/mm: replace batch->active with in_lazy_mmu_mode() x86/xen: use lazy_mmu_state when context-switching arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 41 +---- arch/arm64/include/asm/thread_info.h | 3 +- arch/arm64/mm/mmu.c | 4 +- arch/arm64/mm/pageattr.c | 4 +- .../include/asm/book3s/64/tlbflush-hash.h | 20 ++- arch/powerpc/include/asm/thread_info.h | 2 - arch/powerpc/kernel/process.c | 25 --- arch/powerpc/mm/book3s64/hash_tlb.c | 10 +- arch/powerpc/mm/book3s64/subpage_prot.c | 4 +- arch/powerpc/platforms/Kconfig.cputype | 1 + arch/sparc/Kconfig | 1 + arch/sparc/include/asm/tlbflush_64.h | 5 +- arch/sparc/mm/tlb.c | 14 +- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/misc.h | 1 + arch/x86/boot/startup/sme.c | 1 + arch/x86/include/asm/paravirt.h | 1 - arch/x86/include/asm/pgtable.h | 1 + arch/x86/include/asm/thread_info.h | 4 +- arch/x86/xen/enlighten_pv.c | 3 +- arch/x86/xen/mmu_pv.c | 6 +- fs/proc/task_mmu.c | 4 +- include/linux/mm_types_task.h | 5 + include/linux/pgtable.h | 147 +++++++++++++++++- include/linux/sched.h | 45 ++++++ mm/Kconfig | 3 + mm/kasan/shadow.c | 8 +- mm/madvise.c | 18 +-- mm/memory.c | 16 +- mm/migrate_device.c | 8 +- mm/mprotect.c | 4 +- mm/mremap.c | 4 +- mm/userfaultfd.c | 4 +- mm/vmalloc.c | 12 +- mm/vmscan.c | 12 +- 36 files changed, 282 insertions(+), 161 deletions(-) base-commit: 1f1edd95f9231ba58a1e535b10200cb1eeaf1f67 -- 2.51.2