From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D34E3C44536 for ; Wed, 21 Jan 2026 16:23:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC5C86B0099; Wed, 21 Jan 2026 11:23:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C4FE36B009B; Wed, 21 Jan 2026 11:23:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A94006B009D; Wed, 21 Jan 2026 11:23:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 959C16B0099 for ; Wed, 21 Jan 2026 11:23:18 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0B9881B03EE for ; Wed, 21 Jan 2026 16:23:17 +0000 (UTC) X-FDA: 84356490834.11.B64CD57 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf02.hostedemail.com (Postfix) with ESMTP id 1B5FC80015 for ; Wed, 21 Jan 2026 16:23:14 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=szVXsqN7; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of kas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=kas@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1769012595; a=rsa-sha256; cv=none; b=voezsZG8XyoSVDD+I9sOzJLYju03gSY8wbU/JSrJOreanLcTpc0kGlv5HBsAnEjpmWuOhK E9VFi2Cv4t2SV7ykm4Rm3UgKrvc/exNDaeFFMtezFMOcxyRCILA72VhJj1vxkvj+Zh4g9y KHhKx8h4uraZtmWQBIZgebuqP5/EPGI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=szVXsqN7; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of kas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=kas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1769012595; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1jd6B4+mvM+jKn/m2Ntm9QU6Al8M8dGIr0MX/Voounk=; b=H1bPKExgu0BW3wyMMGsJLuss15jrr7EAjBAOs1HQla6YhRIKqCGCRxWhZpJ71K35qeRQ+O TKlmi8wntzRkpijBDLDkbS2EbDPI/CtUkf0//W7k82BU5ELiYauEkt8IvO+1cuYMHEPYAZ H+bMPaMiS/YqEMCS5foYqXIw+YuUdSo= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 9FFA260156; Wed, 21 Jan 2026 16:23:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E14BAC4AF0C; Wed, 21 Jan 2026 16:23:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1769012594; bh=8NvlIiLlILaSNIOp6pOcIGvF2kKFaoRUdLllbVMgNm4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=szVXsqN7RRxgU/MRJDng4sb8bSbP3dY/p5xURbvfXcJR+JZ7qLjobxG6VuYp+w1LH RKjUuP5XVS0R47jUFO+yLmDSIg7FyKhe6Xy1s1OU43JtIfrXH7Dv3Rim2LtwplJ9Rc I9ec3Sdx/aWD3vruJSpFzxLNEnRDi7VIUBxzybMWBvnO5sgWpDWigOMco1eJT3+bbV iKJADjv47Ljik7MmCFCntU2UzBaWuQvFbK3CODYc0byz91SuB0z6jetyJag6XPUZSx 2LFbo65jTUoJeoF/yl+8oEG1N6YvuXUrOYhiSz9hDeDS+XyxfsGUB4sbLgOz3zLeZ5 2cuK5XGJSA6Dg== Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfauth.phl.internal (Postfix) with ESMTP id 265AFF4006B; Wed, 21 Jan 2026 11:23:13 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Wed, 21 Jan 2026 11:23:13 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddugeefjeehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfdufeejhefhkedtuedvfeevjeffvdfhvedtudfgudffjeefieekleehvdetvdev necuvehluhhsthgvrhfuihiivgepudenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepvddtpdhmohguvgepshhmthhpohhuthdprhgtphht thhopegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoh epmhhutghhuhhnrdhsohhngheslhhinhhugidruggvvhdprhgtphhtthhopegurghvihgu sehkvghrnhgvlhdrohhrghdprhgtphhtthhopeifihhllhihsehinhhfrhgruggvrggurd horhhgpdhrtghpthhtohepuhhsrghmrggrrhhifheigedvsehgmhgrihhlrdgtohhmpdhr tghpthhtohepfhhvughlsehgohhoghhlvgdrtghomhdprhgtphhtthhopehoshgrlhhvrg guohhrsehsuhhsvgdruggvpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhrghdp rhgtphhtthhopehvsggrsghkrgesshhushgvrdgtii X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 21 Jan 2026 11:23:12 -0500 (EST) From: Kiryl Shutsemau To: Andrew Morton , Muchun Song , David Hildenbrand , Matthew Wilcox , Usama Arif , Frank van der Linden Cc: Oscar Salvador , Mike Rapoport , Vlastimil Babka , Lorenzo Stoakes , Zi Yan , Baoquan He , Michal Hocko , Johannes Weiner , Jonathan Corbet , kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, Kiryl Shutsemau Subject: [PATCHv4 08/14] mm/hugetlb: Refactor code around vmemmap_walk Date: Wed, 21 Jan 2026 16:22:45 +0000 Message-ID: <20260121162253.2216580-9-kas@kernel.org> X-Mailer: git-send-email 2.51.2 In-Reply-To: <20260121162253.2216580-1-kas@kernel.org> References: <20260121162253.2216580-1-kas@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: ixzx6ecnhp66wonfpa8wg3nntqpygxp6 X-Rspam-User: X-Rspamd-Queue-Id: 1B5FC80015 X-Rspamd-Server: rspam11 X-HE-Tag: 1769012594-474714 X-HE-Meta: U2FsdGVkX1/8+XPgT9QWNFKkZ2Tzmq0UrQeFRsfBwlJi3m/glpkmV0FHI4tEDsTf45H4AH6pemTzGMnShOQuvmYn3AM6yw5dgD3NTJlg9YuD4HlV0hXUTNtDh/N4+v0lfCt8Bd0cHeWUBon4FL3VhVKjhZggPcizxzfyn86mVdB+ryVDaaEnckaahuunQO4tq7O+ul8Dnn1QLdLgM06AkELFOVmdSgas+zPJgRjTy9cJAb6x+cbmRsO9953wQKywLCCaRmhWKyB+pATjCh7/xkJdFx0d1MWtmAbohYV6zmyeBSDVETU0rBj+jvQfopW9UZbr4Jkf45vH9u8TUMP8eM9o6qZy4F4sAjY4VEzbUU2eGdh5dcKtpn+8YWSsQboD9beUMZnsA5o+xsep9C9FmAuZRwpJx2R9MMFiFAcQwvRCCx40+twQKAwtncViKUrXTInwrvb2ZGyHWwW58onUM/GuiKJQ6Gn7caKjlv0YSpkk84SJ42foP4qmwn3+yqJzwmGDtZwwf0pvRPxeaXjsb3BlbxHfA/whsh3wOs0iC9DEFIuTU7y6mPf8kO3zhuBeWLHVMkzPv68EdhNdqI1A0sjRAZv8FZiDJaxJI7B0yV7L0pp6DiX+jdsDu4MYNm3he7/00VBS1FV6GXKaluK9MTMx00zo3rZ3bucw05rgAmqJ6jD4DHONGUBk8YxoybnSUCPLGnuSQhtjCFyNK2UfH9qchXeSe3Qpy7ZehggEBO6j5dzcF7rRsCQ50fHev5VCO90BcLwfaK146+Hs2fIfOz97DqZxc3koSrl8c96PljL1vGrv+rarsPdd8e3p6OBpyWtmRmjY7ZjFkKGWCv8mm/HThghsdX2+q9hSWx5/xt3topsvSnUSKso9aINOMKF90CY755EEGsW9d/q4Yar0mt1j8W86BAisurIolwEZh4lTXuPP2Yum12hXNU/N7rwYHmQAK1CZa0104tgH4O7 u63SiFSu s5UMqewIRDMBcO0c9IRKjaws1Uop5XHK4EK4B7lDLJxZxrdahY1b85zhzVo90Ya8EzHs9hN4rHCfjO3KTvSZ2XMIK6GU7fhkvD839ep3PW1qc59osFLexPvGMoOoBnyV0Fx/a2LCzZY3aHBNMVlvaQNTohVksvXLT0ydfqd3dhuCtTQg5aGZMGdc1LXd7HNYb7GlV7EzxLDFUtbT/fW+CYflvLW9JyxDoq08LqMyw77ko4HV/w2c2JWbF1Gqj3a1qWm7MZW3xt2l+ezNK8HVSvMNqhGYgo9QoKOzYmoTBXTOXnruJVCN+hiOYq2BwmN6Y6fYFEr99m4IA7kwdkR5vhPdC7w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: To prepare for removing fake head pages, the vmemmap_walk code is being reworked. The reuse_page and reuse_addr variables are being eliminated. There will no longer be an expectation regarding the reuse address in relation to the operated range. Instead, the caller will provide head and tail vmemmap pages, along with the vmemmap_start address where the head page is located. Currently, vmemmap_head and vmemmap_tail are set to the same page, but this will change in the future. The only functional change is that __hugetlb_vmemmap_optimize_folio() will abandon optimization if memory allocation fails. Signed-off-by: Kiryl Shutsemau --- mm/hugetlb_vmemmap.c | 200 ++++++++++++++++++------------------------- 1 file changed, 84 insertions(+), 116 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index ba0fb1b6a5a8..a51c0e293175 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -24,8 +24,9 @@ * * @remap_pte: called for each lowest-level entry (PTE). * @nr_walked: the number of walked pte. - * @reuse_page: the page which is reused for the tail vmemmap pages. - * @reuse_addr: the virtual address of the @reuse_page page. + * @vmemmap_start: the start of vmemmap range, where head page is located + * @vmemmap_head: the page to be installed as first in the vmemmap range + * @vmemmap_tail: the page to be installed as non-first in the vmemmap range * @vmemmap_pages: the list head of the vmemmap pages that can be freed * or is mapped from. * @flags: used to modify behavior in vmemmap page table walking @@ -34,11 +35,14 @@ struct vmemmap_remap_walk { void (*remap_pte)(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk); + unsigned long nr_walked; - struct page *reuse_page; - unsigned long reuse_addr; + unsigned long vmemmap_start; + struct page *vmemmap_head; + struct page *vmemmap_tail; struct list_head *vmemmap_pages; + /* Skip the TLB flush when we split the PMD */ #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) /* Skip the TLB flush when we remap the PTE */ @@ -140,14 +144,7 @@ static int vmemmap_pte_entry(pte_t *pte, unsigned long addr, { struct vmemmap_remap_walk *vmemmap_walk = walk->private; - /* - * The reuse_page is found 'first' in page table walking before - * starting remapping. - */ - if (!vmemmap_walk->reuse_page) - vmemmap_walk->reuse_page = pte_page(ptep_get(pte)); - else - vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); + vmemmap_walk->remap_pte(pte, addr, vmemmap_walk); vmemmap_walk->nr_walked++; return 0; @@ -207,18 +204,12 @@ static void free_vmemmap_page_list(struct list_head *list) static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { - /* - * Remap the tail pages as read-only to catch illegal write operation - * to the tail pages. - */ - pgprot_t pgprot = PAGE_KERNEL_RO; struct page *page = pte_page(ptep_get(pte)); pte_t entry; /* Remapping the head page requires r/w */ - if (unlikely(addr == walk->reuse_addr)) { - pgprot = PAGE_KERNEL; - list_del(&walk->reuse_page->lru); + if (unlikely(addr == walk->vmemmap_start)) { + list_del(&walk->vmemmap_head->lru); /* * Makes sure that preceding stores to the page contents from @@ -226,9 +217,16 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, * write. */ smp_wmb(); + + entry = mk_pte(walk->vmemmap_head, PAGE_KERNEL); + } else { + /* + * Remap the tail pages as read-only to catch illegal write + * operation to the tail pages. + */ + entry = mk_pte(walk->vmemmap_tail, PAGE_KERNEL_RO); } - entry = mk_pte(walk->reuse_page, pgprot); list_add(&page->lru, walk->vmemmap_pages); set_pte_at(&init_mm, addr, pte, entry); } @@ -255,16 +253,13 @@ static inline void reset_struct_pages(struct page *start) static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, struct vmemmap_remap_walk *walk) { - pgprot_t pgprot = PAGE_KERNEL; struct page *page; void *to; - BUG_ON(pte_page(ptep_get(pte)) != walk->reuse_page); - page = list_first_entry(walk->vmemmap_pages, struct page, lru); list_del(&page->lru); to = page_to_virt(page); - copy_page(to, (void *)walk->reuse_addr); + copy_page(to, (void *)walk->vmemmap_start); reset_struct_pages(to); /* @@ -272,7 +267,7 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, * before the set_pte_at() write. */ smp_wmb(); - set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); + set_pte_at(&init_mm, addr, pte, mk_pte(page, PAGE_KERNEL)); } /** @@ -282,33 +277,29 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, * to remap. * @end: end address of the vmemmap virtual address range that we want to * remap. - * @reuse: reuse address. - * * Return: %0 on success, negative error code otherwise. */ -static int vmemmap_remap_split(unsigned long start, unsigned long end, - unsigned long reuse) +static int vmemmap_remap_split(unsigned long start, unsigned long end) { struct vmemmap_remap_walk walk = { .remap_pte = NULL, + .vmemmap_start = start, .flags = VMEMMAP_SPLIT_NO_TLB_FLUSH, }; - /* See the comment in the vmemmap_remap_free(). */ - BUG_ON(start - reuse != PAGE_SIZE); - - return vmemmap_remap_range(reuse, end, &walk); + return vmemmap_remap_range(start, end, &walk); } /** * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) - * to the page which @reuse is mapped to, then free vmemmap - * which the range are mapped to. + * to use @vmemmap_head/tail, then free vmemmap which + * the range are mapped to. * @start: start address of the vmemmap virtual address range that we want * to remap. * @end: end address of the vmemmap virtual address range that we want to * remap. - * @reuse: reuse address. + * @vmemmap_head: the page to be installed as first in the vmemmap range + * @vmemmap_tail: the page to be installed as non-first in the vmemmap range * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers * responsibility to free pages. * @flags: modifications to vmemmap_remap_walk flags @@ -316,69 +307,40 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end, * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse, + struct page *vmemmap_head, + struct page *vmemmap_tail, struct list_head *vmemmap_pages, unsigned long flags) { int ret; struct vmemmap_remap_walk walk = { .remap_pte = vmemmap_remap_pte, - .reuse_addr = reuse, + .vmemmap_start = start, + .vmemmap_head = vmemmap_head, + .vmemmap_tail = vmemmap_tail, .vmemmap_pages = vmemmap_pages, .flags = flags, }; - int nid = page_to_nid((struct page *)reuse); - gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + + ret = vmemmap_remap_range(start, end, &walk); + if (!ret || !walk.nr_walked) + return ret; + + end = start + walk.nr_walked * PAGE_SIZE; /* - * Allocate a new head vmemmap page to avoid breaking a contiguous - * block of struct page memory when freeing it back to page allocator - * in free_vmemmap_page_list(). This will allow the likely contiguous - * struct page backing memory to be kept contiguous and allowing for - * more allocations of hugepages. Fallback to the currently - * mapped head page in case should it fail to allocate. + * vmemmap_pages contains pages from the previous vmemmap_remap_range() + * call which failed. These are pages which were removed from + * the vmemmap. They will be restored in the following call. */ - walk.reuse_page = alloc_pages_node(nid, gfp_mask, 0); - if (walk.reuse_page) { - copy_page(page_to_virt(walk.reuse_page), - (void *)walk.reuse_addr); - list_add(&walk.reuse_page->lru, vmemmap_pages); - memmap_pages_add(1); - } + walk = (struct vmemmap_remap_walk) { + .remap_pte = vmemmap_restore_pte, + .vmemmap_start = start, + .vmemmap_pages = vmemmap_pages, + .flags = 0, + }; - /* - * In order to make remapping routine most efficient for the huge pages, - * the routine of vmemmap page table walking has the following rules - * (see more details from the vmemmap_pte_range()): - * - * - The range [@start, @end) and the range [@reuse, @reuse + PAGE_SIZE) - * should be continuous. - * - The @reuse address is part of the range [@reuse, @end) that we are - * walking which is passed to vmemmap_remap_range(). - * - The @reuse address is the first in the complete range. - * - * So we need to make sure that @start and @reuse meet the above rules. - */ - BUG_ON(start - reuse != PAGE_SIZE); - - ret = vmemmap_remap_range(reuse, end, &walk); - if (ret && walk.nr_walked) { - end = reuse + walk.nr_walked * PAGE_SIZE; - /* - * vmemmap_pages contains pages from the previous - * vmemmap_remap_range call which failed. These - * are pages which were removed from the vmemmap. - * They will be restored in the following call. - */ - walk = (struct vmemmap_remap_walk) { - .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, - .vmemmap_pages = vmemmap_pages, - .flags = 0, - }; - - vmemmap_remap_range(reuse, end, &walk); - } + vmemmap_remap_range(start, end, &walk); return ret; } @@ -415,29 +377,25 @@ static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, * to remap. * @end: end address of the vmemmap virtual address range that we want to * remap. - * @reuse: reuse address. * @flags: modifications to vmemmap_remap_walk flags * * Return: %0 on success, negative error code otherwise. */ static int vmemmap_remap_alloc(unsigned long start, unsigned long end, - unsigned long reuse, unsigned long flags) + unsigned long flags) { LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk = { .remap_pte = vmemmap_restore_pte, - .reuse_addr = reuse, + .vmemmap_start = start, .vmemmap_pages = &vmemmap_pages, .flags = flags, }; - /* See the comment in the vmemmap_remap_free(). */ - BUG_ON(start - reuse != PAGE_SIZE); - if (alloc_vmemmap_page_list(start, end, &vmemmap_pages)) return -ENOMEM; - return vmemmap_remap_range(reuse, end, &walk); + return vmemmap_remap_range(start, end, &walk); } DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); @@ -454,8 +412,7 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, struct folio *folio, unsigned long flags) { int ret; - unsigned long vmemmap_start = (unsigned long)&folio->page, vmemmap_end; - unsigned long vmemmap_reuse; + unsigned long vmemmap_start, vmemmap_end; VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio), folio); @@ -466,18 +423,18 @@ static int __hugetlb_vmemmap_restore_folio(const struct hstate *h, if (flags & VMEMMAP_SYNCHRONIZE_RCU) synchronize_rcu(); + vmemmap_start = (unsigned long)&folio->page; vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); - vmemmap_reuse = vmemmap_start; - vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; + + vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; /* * The pages which the vmemmap virtual address range [@vmemmap_start, - * @vmemmap_end) are mapped to are freed to the buddy allocator, and - * the range is mapped to the page which @vmemmap_reuse is mapped to. + * @vmemmap_end) are mapped to are freed to the buddy allocator. * When a HugeTLB page is freed to the buddy allocator, previously * discarded vmemmap pages must be allocated and remapping. */ - ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, vmemmap_reuse, flags); + ret = vmemmap_remap_alloc(vmemmap_start, vmemmap_end, flags); if (!ret) { folio_clear_hugetlb_vmemmap_optimized(folio); static_branch_dec(&hugetlb_optimize_vmemmap_key); @@ -565,9 +522,9 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct list_head *vmemmap_pages, unsigned long flags) { - int ret = 0; - unsigned long vmemmap_start = (unsigned long)&folio->page, vmemmap_end; - unsigned long vmemmap_reuse; + unsigned long vmemmap_start, vmemmap_end; + struct page *vmemmap_head, *vmemmap_tail; + int nid, ret = 0; VM_WARN_ON_ONCE_FOLIO(!folio_test_hugetlb(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio), folio); @@ -592,18 +549,31 @@ static int __hugetlb_vmemmap_optimize_folio(const struct hstate *h, */ folio_set_hugetlb_vmemmap_optimized(folio); + nid = folio_nid(folio); + vmemmap_head = alloc_pages_node(nid, GFP_KERNEL, 0); + + if (!vmemmap_head) { + ret = -ENOMEM; + goto out; + } + + copy_page(page_to_virt(vmemmap_head), folio); + list_add(&vmemmap_head->lru, vmemmap_pages); + memmap_pages_add(1); + + vmemmap_tail = vmemmap_head; + vmemmap_start = (unsigned long)&folio->page; vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); - vmemmap_reuse = vmemmap_start; - vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; /* - * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end) - * to the page which @vmemmap_reuse is mapped to. Add pages previously - * mapping the range to vmemmap_pages list so that they can be freed by - * the caller. + * Remap the vmemmap virtual address range [@vmemmap_start, @vmemmap_end). + * Add pages previously mapping the range to vmemmap_pages list so that + * they can be freed by the caller. */ - ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, + ret = vmemmap_remap_free(vmemmap_start, vmemmap_end, + vmemmap_head, vmemmap_tail, vmemmap_pages, flags); +out: if (ret) { static_branch_dec(&hugetlb_optimize_vmemmap_key); folio_clear_hugetlb_vmemmap_optimized(folio); @@ -632,21 +602,19 @@ void hugetlb_vmemmap_optimize_folio(const struct hstate *h, struct folio *folio) static int hugetlb_vmemmap_split_folio(const struct hstate *h, struct folio *folio) { - unsigned long vmemmap_start = (unsigned long)&folio->page, vmemmap_end; - unsigned long vmemmap_reuse; + unsigned long vmemmap_start, vmemmap_end; if (!vmemmap_should_optimize_folio(h, folio)) return 0; + vmemmap_start = (unsigned long)&folio->page; vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); - vmemmap_reuse = vmemmap_start; - vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; /* * Split PMDs on the vmemmap virtual address range [@vmemmap_start, * @vmemmap_end] */ - return vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); + return vmemmap_remap_split(vmemmap_start, vmemmap_end); } static void __hugetlb_vmemmap_optimize_folios(struct hstate *h, -- 2.51.2