From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6236FC28B28 for ; Thu, 13 Mar 2025 18:11:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7F1F28000F; Thu, 13 Mar 2025 14:11:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0C8F280001; Thu, 13 Mar 2025 14:11:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAE6028000F; Thu, 13 Mar 2025 14:11:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AC047280001 for ; Thu, 13 Mar 2025 14:11:43 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9E1AB52FC7 for ; Thu, 13 Mar 2025 18:11:44 +0000 (UTC) X-FDA: 83217320928.24.2A339C5 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf12.hostedemail.com (Postfix) with ESMTP id 026C74000E for ; Thu, 13 Mar 2025 18:11:41 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eQaA+jXR; spf=pass (imf12.hostedemail.com: domain of 33B_TZwgKCK0WNPXZNaOTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=33B_TZwgKCK0WNPXZNaOTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741889502; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=8p30DA53bI6PuiFh/iHxzo25VltyUumdd721+bLC4Pk=; b=RHC8KcXo/+n0gRzzsCYOS3GAsKBNRYnNFPdv3z4DzwYa8OGide5AREmAw1XFRsTI8FERUo UP/GpX0Rm/ipVQ+DLgH5CRR5JRy7gKtcq7yCvTmKKkSH/IbsoL6E1l9eWpPaj8/E+YfI2n Jwa9ooQyc5XWIlQZZalShLTYooEnCbU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741889502; a=rsa-sha256; cv=none; b=vBO8mnXpEzKsmxBrHfKpcoIrJCGlv853lD/MlJeDDQth/oCtK6q+g80uG5LWcwlxUlrLZq qqZgffrzSSQx9VIdpw6ElaLQB9wygQEEE2YxdPyZiQuL8iCPjMD5hiHyZKU47gcE00AL0z orLqATlvSYZ2iP6a/1rZEJJ1tq9i44w= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=eQaA+jXR; spf=pass (imf12.hostedemail.com: domain of 33B_TZwgKCK0WNPXZNaOTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=33B_TZwgKCK0WNPXZNaOTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43947979ce8so6642775e9.0 for ; Thu, 13 Mar 2025 11:11:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1741889500; x=1742494300; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=8p30DA53bI6PuiFh/iHxzo25VltyUumdd721+bLC4Pk=; b=eQaA+jXRso+TMNTsDHkAzIDHP87FOq4LRKG841+BqxCn2P3bWQ+N853vyrrKwZLNzT 8+FypDoKKkSNXXHX36o3Cjr4D/xPKTfby4uuhVSELd7IuvqMvIgADDBzTSjPFrrBSYUL FW97Czs5Z1vgyQVpqMOzQQrx71F9asK/jB+rZHJ2Z1lGW17hEof8AnxhIdMw7jdn5wkT t07atJup995L5L/nQfWH0DPCKy7FznUbaUMK7tBLfyi0TuKnSHY5kjhUv8UqQ2a1fuOb SBMbSUUmBbKYHduf9ISSa5u0BX0STkVCR47y/XGs+x75BUFM7zYkjvJYOd9XRzQkPdSx uyxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741889500; x=1742494300; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=8p30DA53bI6PuiFh/iHxzo25VltyUumdd721+bLC4Pk=; b=WmnFbnvA/OBebdXnFFCyhA9IiW0xIPTrjTNYgR6b9Erco8PXgBXHdw+3BAIP4lWYdD Z1aOCwsoy+ODYfiKls+Q3VDPiWVKrQ9pWyHa2se9ysY5zKP976kw3wElMLCHWKqGdwJq rE5BqVo8tp9W7PtxlZVftSQq0n38oYCVa/XPJ+P+tidHdf0aAUvr939jEYLmgHRk9xDy +vDDWkgueqRkjHYetETesDCyFu2mipPtrGpZjs0CmG/3RMjSQiI/QDwU6JB1S1fLCEyc Zwn4z1pqOIVss3X4UKcHSgu1mdjHwBWj9bZn5o8d7u1dOZbwp9sRFbJZQVmSYA2A5Pj4 ek+w== X-Forwarded-Encrypted: i=1; AJvYcCUmRgG0uf9yQbKnXWqAkxJOTrzAw1Qpf4ZyqG5kq/svbQdoEk9fQu6eGxctOk0bvxFnwWlJ4Z7rfA==@kvack.org X-Gm-Message-State: AOJu0YyNfsbQmkZKDaxex+TK6cBsjcmf8Zo6/iHzrAruLI59X+ofiX1S it3bI9dIhIsCbCoFuN7BUfs9/UKAMxHU7Y+8CDZ3ygKhxNKGeZZaupRsJIcgK6yCWHloTNpKZGU vpQwnJbHoQA== X-Google-Smtp-Source: AGHT+IE4wDnoPc9EOKAgyJOM9+Bkac5viowYsO8CgIONHzVZLkFEl13poXtNAOIIrPh4W5A6xNrV2GgLwMNQJw== X-Received: from wmgg26.prod.google.com ([2002:a05:600d:1a:b0:43c:fa87:4fa0]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:524f:b0:43c:fbba:41ba with SMTP id 5b1f17b1804b1-43d01c25c31mr108057025e9.28.1741889500197; Thu, 13 Mar 2025 11:11:40 -0700 (PDT) Date: Thu, 13 Mar 2025 18:11:19 +0000 Mime-Version: 1.0 X-B4-Tracking: v=1; b=H4sIAMcf02cC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDI1MDY0MD3cTiTN2CRKB4Yk5OfrKuhUFqomGahbGBeYqBElBTQVFqWmYF2MB opSA3Z6XY2loAh7L+lWUAAAA= X-Change-Id: 20250310-asi-page-alloc-80ea1f8307d0 X-Mailer: b4 0.14.2 Message-ID: <20250313-asi-page-alloc-v1-0-04972e046cea@google.com> Subject: [PATCH RFC 00/11] mm: ASI integration for the page allocator From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Andrew Morton , David Rientjes , Vlastimil Babka , David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , Junaid Shahid , Reiji Watanabe , Patrick Bellasi , Brendan Jackman , Yosry Ahmed Content-Type: text/plain; charset="utf-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 026C74000E X-Stat-Signature: oiehbju9j81iwhaff789cx19u9o3esra X-HE-Tag: 1741889501-962400 X-HE-Meta: U2FsdGVkX1+JFPlk+wtY0NZRGmZ/onbNXhDrq/w7YRRcxbqY0yFiLSL4P4XzJ6DxoDVf87fnBjYgGDIrJqK0pb0LtCY/xVhLWXVkLmbl6Gji9MXf+tU9mzu6gjTXCamYZhqH9YlV07spleVyBy6BFPirQ9PTQVgnspEceck/cncV+/S756VKUyP69o5vQbYVQEUyNHKD5yqQRgdhWYnLov4oVndz3ZGQqEEJGsAx3waoA5m4BANLqMebYigByapGKniW8NW1EFmBBkuAR4K/EJ+lGx9wZEZq8ZF3dtRQ3Y4ybR4o2LIaHdMt2gz9byva0yClSzJyUNKQHudD4UIjFOL9pRLSeijziyrcoTHpMvq6wR4JIaRfkmhhgC3GM/j7vtfoBGGTiUu/t2G9Xw/Mkj1yoT98MsgGGY4y71zU2mmDIhP6OxQS+jw66xM1JfaWfqC661I3IGuz69t3VXvkBM8DL87hknAP0kJ8n0JWv6pE5IWRpcaXyoiLlSsDUWhX54lNPBsIAEKPWu5VGnAL0zF7d97s/6yeyQhoJcrVqXR3AZJOTS7IJvQ/tILSb++jIogdljm6Uzk7539XIC1lujmE54aJMZUhApH4lnbrIR31/j7hCW1XZGNvTtqPxdi3WMBChbtqunRzDvbHns53lhAvrjfplzmqHGUhCUA42n1FadHW/ltsNS8TjJGECrLXnJGXXf/dD/EpxaTx/uluAAy55arlDVZHIPwR6+cLkLdy5TVAA4l5zocVSzDf5JLRjx2fH7bQNfRfs7woUk4fqXaxAYYGp3AX2G8vU41hBuePhs2DeHoxS7L6DMBsBgKRv71NN6CO+rl2+liA6UuBdim+kqbcXhtVzhl2IKvli99KNqIxfUKIX07dambFzbheE2rhY9fPM5KBBYP0mWnHMKxoNbhQe0pqlmMjZ0IHPFvZgppSlHXahiJk4n23tPbsLPdOO1X0y6xt0H+Rmtb gccB7RwK Iz3zHhu0Jep2dTaf5LKacffqhJRnU6k06SrtJd458B6J8uWxai85R9LGrCEIG+iC/4np+SPISuud8rSfBOoumpfyUgXQFzjycweouNLRLLrBCMGD/W25MTaGvjVrLlqMgrX4UqE7YBYFCCYp7qrW6M8yk6auGM6xNsjI2efEeeHCcj8LM3NRNhyUY1UM4LBLD08Xu4Jc/iihtaGmAWmjUZdSR5YXkNU2i4voGQbSuodaDsEPlIitUUkTsK6d5pUCLABkjfw5l2wTCjffXMhOyjH5UVWqfw/C3l4Ts21XekkxOrcRK2+8cmQ5nVVLnvuz5ZsWnjJ+lhDqCXxThTo4Duimwys0Oo1WCZhDPbiD+NOpx4eqwK6cKqXu7DBPCsjpjqwn8E92gcZHalVQKg7BWwxwJsjuOWkPdR1rcuL//YwEXrso6ksvRIB0qO96L9SPPgfEV7VQ82qL1YfUCSKlU+2yBAIwJM4aOG+zbfqHCQ2ixRWoehGpNl8j7mgDOLqxXGwf7nJaoVJd7ajPY4pYKtSUzemo/H6hXmr4X9m7NWwxeE+UO+Lau0MFP4qpDPMHzAL8Iw9+h19yJrGYUsup8+R4T+wBIIZqiQHCf0Fp+/pMsQusZxtaKF4TZyICWbxKQphWi0RxjErUqyoBMBBNE+XojMw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: .:: Intro This code illustrates the idea I'm proposing at LSF/MM/BPF [0]. Sorry it's so close to the conference, I was initially quite ambitious in what I wanted to show here and tried to implement a more complete patch series. Now I've run out of time and I've had to reduce the scope and just hack some minimal stuff together. Now, this series is _only_ supposed to be about page_alloc.c, everything else is just there as scaffolding so that allocator code can be discussed. I've marked the most incomplete patches with [HACKS] in the title to illustrate what aspects are less worthy of attention. See [0] and also [1] for broader context on the ASI/page_alloc topic. See [2] for context about ASI itself. For this RFC the most important fact is: ASI requires creating another kernel address space (the "restricted address space") that is a subset of that normal one (i.e. the "unrestricted address space"). That is, an address space just like the normal one, but with holes in it. Pages that are unmapped from the restricted address space are called "sensitive". .:: The Idea What is sensitive (i.e. where the holes are) is decided at allocation time. This illustrates an initial implementation of that capability for the direct map. The basic idea of this implementation is to operate at pageblock-granularity, and use migratetypes to track sensitivity. The key advantages of this approach are: - Migratetypes exist to avoid fragmentation. Using them to index pages by sensitivity takes advantage of this, so that the physmap doesn't get fragmented with respect to sensitivity. This means we can use large TLB entries for the restricted physmap. - Since pageblocks are never smaller than a PMD mapping, if the restricted physmap is always made of PMDs, we never have to break down mappings while changing sensitivity. This means we don't have difficulties with needing to allocate pagetables in the middle of the allocator. - Migratetypes already offer indexing capability - that is, there are separate freelists for each migratetype. This means when the user allocates a page with a given sensitivity, all the infrastructure is already in place to look up a page that is already mapped/unmapped as needed (if it exists). This minimizes unnecessary TLB flushes. This differs from Mike Rapoport's work on __GFP_UNMAPPED [3] in that, instead of having a totally separate free area for the pages that are unmapped, it aims to pervade the allocator. If it turns out that for all nonsensitive (or all sensitive, which seems highly unlikely) pages, a access to the full feature set of the page allocator is not needed for a performant system, we could certainly do something like Mike's patchset. But we don't have any reason to expect a correlation between sensitivity and performance needs. .:: Patchset overview - Patch 1 adds a minimal subset of the base ASI framework that was introduced by the RFCv2 [2]. - Patches 2-5 add the necessary framework for creating and manipulating the ASI physmap. This is the area where I have had to reduce the scope of this series, I had hoped to present a proper integration here. But instead I've had to just hack something together that kinda works. You can probably skip over this section. - Patches 6-8 are preparatory hacks and changes to the generic mm code. - Patches 9-11 are the important bit. The new migratetypes are created. Then logic is added to create nonsensitive pageblocks when needed. Then logic is added to change them back to sensitive pageblocks when needed. .:: TODOs - This doesn't let you allocate from MIGRATE_HIGHATOMIC pageblocks unless you have __GFP_SENSITIVE. We probably need to make the pageblock type and per-freelist logic more advanced to be able to account for this. - When pages transition from sensitive to nonsensitive, they need to be zeroed to prevent any leftover data being leaked. This series doesn't address that requirement at all. - Although I think the abstract design is OK, the actual implementation of calling asi_map()/asi_unmap() from page_alloc.c is pretty confusing: asi_map() is implicit when calling set_pageblock_migratetype() but asi_unmap() is up to the caller. This requires some refactoring. - Changes to the unrestricted physmap (page protection changes, memory hotplug) are not properly mirrored into the restricted physmap. - There's no integration with CMA. The branch at [4] has some minimal integration into alloc_contig_range(). .:: References [0] https://lore.kernel.org/linux-mm/CA+i-1C169s8pyqZDx+iSnFmftmGfssdQA29+pYm-gqySAYWgpg@mail.gmail.com/ [1] Some slides I presented in an earlier discussion of this topic: https://docs.google.com/presentation/d/1Ozuan7E4z2YWm4V6uE_fe7YoF2BdS3m5jXjDKO7DVy0/edit#slide=id.g32d28ea451a_0_43 [2] https://lore.kernel.org/linux-mm/20250110-asi-rfc-v2-v2-0-8419288bc805@google.com/ [3] https://lore.kernel.org/all/20230308094106.227365-1-rppt@kernel.org/ [5] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@google.com/ This series is available as a branch with some additional testing here: [4] https://github.com/bjackman/linux/tree/asi/page-alloc-lsfmmbpf25 This applies to mm-unstable. Signed-off-by: Brendan Jackman --- Brendan Jackman (11): x86/mm: Bare minimum ASI API for page_alloc integration x86/mm: Factor out phys_pgd_init() x86/mm: Add lookup_pgtable_in_pgd() x86/mm/asi: Sync physmap into ASI_GLOBAL_NONSENSITIVE [RFC HACKS] Add asi_map() and asi_unmap() mm/page_alloc: Add __GFP_SENSITIVE and always set it [RFC HACKS] mm/slub: Set __GFP_SENSITIVE for reclaimable slabs [RFC HACKS] mm/page_alloc: Simplify gfp_migratetype() mm/page_alloc: Split MIGRATE_UNMOVABLE by sensitivity mm/page_alloc: Add support for nonsensitive allocations mm/page_alloc: Add support for ASI-unmapping pages arch/Kconfig | 14 ++++ arch/x86/Kconfig | 1 + arch/x86/include/asm/asi.h | 36 ++++++++ arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/mm/Makefile | 1 + arch/x86/mm/asi.c | 85 +++++++++++++++++++ arch/x86/mm/init.c | 3 +- arch/x86/mm/init_64.c | 53 ++++++++++-- arch/x86/mm/pat/set_memory.c | 34 ++++++++ include/linux/asi.h | 20 +++++ include/linux/gfp.h | 30 ++++--- include/linux/gfp_types.h | 15 +++- include/linux/mmzone.h | 19 ++++- include/linux/vmalloc.h | 4 + mm/internal.h | 5 ++ mm/memory_hotplug.c | 2 +- mm/page_alloc.c | 158 +++++++++++++++++++++++++++++++---- mm/show_mem.c | 13 +-- mm/slub.c | 6 +- mm/vmalloc.c | 32 ++++--- 20 files changed, 475 insertions(+), 58 deletions(-) --- base-commit: 5ee93e1a769230377c3d44edd4917e8df77be566 change-id: 20250310-asi-page-alloc-80ea1f8307d0 Best regards, -- Brendan Jackman