From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F0A7C54F30 for ; Tue, 27 May 2025 05:05:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A29456B0085; Tue, 27 May 2025 01:05:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A00BA6B008A; Tue, 27 May 2025 01:05:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93DD76B008C; Tue, 27 May 2025 01:05:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 76DF56B0085 for ; Tue, 27 May 2025 01:05:22 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1A9E01D6418 for ; Tue, 27 May 2025 05:05:22 +0000 (UTC) X-FDA: 83487499284.12.E70CCFC Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by imf14.hostedemail.com (Postfix) with ESMTP id 6665610000C for ; Tue, 27 May 2025 05:05:20 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.152 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=samsung.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748322320; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LcB7L+bIQ3rkMHB4ackCRQkDhqH6W4NZPW3GJbB9RqA=; b=4xZ3HtDQ87Pdnl0a2Eu98opX647XbeaS5yEoAdS6D7SaJYWs+kFXnVWuDNUHg9pnJ+GY// DN771QY7qGOrFlj0IEiTDcF8ktSwdOI66riqvd71rUEd0cYHxHR4j8ZmZK10TZpaiR39BO y1ys+z3IWTf+HW4ijwEHFE45xAT4AfU= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.152 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=samsung.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748322320; a=rsa-sha256; cv=none; b=aeFUwkdhCYdKhFP8s4DIBGV9TKkAbP2tDJ1/6/6Y60vUDjII/+CJpgP2qCEWYMUpfULppB 8BnQP2DxBjYcXb1oEMXFrXhX++VeaKyn6lqjvEQpAdvesR2GkiLi+nUJzjHP2sQx0mI/IL KAf0CQJkcLS72+PUu5Z3W4pgN0Uij+Y= Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4b60v06QP9z9tNF; Tue, 27 May 2025 07:05:16 +0200 (CEST) From: Pankaj Raghav To: Suren Baghdasaryan , Ryan Roberts , Vlastimil Babka , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , David Hildenbrand , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, willy@infradead.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, kernel@pankajraghav.com, hch@lst.de, Pankaj Raghav Subject: [RFC 2/3] mm: add STATIC_PMD_ZERO_PAGE config option Date: Tue, 27 May 2025 07:04:51 +0200 Message-ID: <20250527050452.817674-3-p.raghav@samsung.com> In-Reply-To: <20250527050452.817674-1-p.raghav@samsung.com> References: <20250527050452.817674-1-p.raghav@samsung.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6665610000C X-Stat-Signature: tbf9jt7xocdof7eamczgf4ahjaj8u5i1 X-Rspam-User: X-HE-Tag: 1748322320-913046 X-HE-Meta: U2FsdGVkX18OzjtDeD33XxNG+/fXEes6mCSW+bAjj9z8feK22I/VRUOT7gtJYWnKf+cUSp3h6Nzi4zWcSv83O7++Of2T8WsUOEB65nHbnCRiqFU1KoMJovjuPKEa7u6kNYsQGU4sQ/auJM3m1Xcc4YxVrm9wYHB8k5cIHCERlHScRH6Ok5kQdvcHMGydjrNWLM+FwNbLelCArrgWN2h8AXBsH25Ym5tN8rEgcwq5nB9pJBV3CELxJHFR2Qlze26cDEZhw3NW/XM1+uqc6l5wX1nvKcK9/9Vku1ccxZmyGXijolOZMozt8ga13c5mCe4IYTgbdUwsxDlGp2szzWwrvLLvApssNsOKeDgBp/atcnpC2SZqBIvHIUR8Y83WIOZMu889P+ns1ayNku/G/1QiiY+z810z4ra315sG9LI2YQx21ZEtwfl/RW/z7i+Dmk1M47YW8bV30NUhfcfk3ZjzONSm0sYSFviN6OzBOOmpJAnjZ+IkA46G6cFd1mdJiLW45+yUFB5J4nj3OlO7efjynH+Dh/l7y6SZcWZ3DjJ5mUnJweUh2vd6KMw2KzIcMjzfdjxZnB5eJ8xv7EPIJJ3qPL2qyQokTbLF+bHLIMUw3DkGZPRZw5iDm7XaMsjdfSG8dZkqgZbzzfjwS/8vHoE/9KxfKF9B6LbGpI78A1zZxT6+MattIK+ALK5dwO32sm+Lp2pafgFH0WOe0k4Q9mZqMvpG5TpjWLL5o0/iKKlM+h8goQKTlehGq1f4EIRWMN25MqKHyg5K95cWG6lR+XhWRu9jvFRdYblBExlz/g3TZ67aQqgUCQBTDYgaPK17DG5zrnuY8uuYGe1J3J09R8jezv09DRdho3ACX3LSnXajZ5c63PU5mbOHrkAUQtGQ1ppvf5ERZWq3naF2MFuuVmI81wB5mkJTaZjuDj3RfxYGGZB0tglRw/zuhwBykmxFnMPveqjtvBQP2kOYond4FcI 8fjCziWR LHLBszulBWpHBZNEy0Rf2+Q0bCbdSuJphn8BulqFPYmBrDKhtWjW+Dcpzl+TKaBluN3YzOV49iULo5THCULp2yJwJTpbcwPfYf9bvgG6S/iLGnjKMcUP0ZmlYIJeUE2rJXQygmdVzDfruRxV4zbJiGtcbc2ci2PjBq7BwCZKPgP66F0o0Sn3xE/b+5FBmJz8X4GrNw9NqSYPWN8wdPasjXppl+rqlBu0QOFIoqrdN2Mo4lj9LkhbaPal2VVwI72WGwDPYeue+YVhbHJw+3GTThohMbiCSCUTk6Njwnk+4dFZJiOKqeYLOWwDG0W2jRjn/+YJR9apadzPalLOH9+rEb3tC6NYW4Ns+LFOIzCPN4TknY3kZDM730xL3o+zI+MsvidKOisgFDvWBEYqULvrTO5UsEoh56UrunILSODjlwYUvsePQhHld3OkOtjg6ZL5TUXQ6uM6SiTVZ2eY9cp61b+zZuyOT3HX+IV/cqT9jQhRlUxSuT3Pc3Skx2CfN2P0xaG6ssDWHyrJYR4sNxv2e9EL0qA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are many places in the kernel where we need to zeroout larger chunks but the maximum segment we can zeroout at a time by ZERO_PAGE is limited by PAGE_SIZE. This is especially annoying in block devices and filesystems where we attach multiple ZERO_PAGEs to the bio in different bvecs. With multipage bvec support in block layer, it is much more efficient to send out larger zero pages as a part of single bvec. This concern was raised during the review of adding LBS support to XFS[1][2]. Usually huge_zero_folio is allocated on demand, and it will be deallocated by the shrinker if there are no users of it left. Add a config option STATIC_PMD_ZERO_PAGE that will always allocate the huge_zero_folio, and it will never be freed. This makes using the huge_zero_folio without having to pass any mm struct and call put_folio in the destructor. We can enable it by default for x86_64 where the PMD size is 2M. It is good compromise between the memory and efficiency. As a THP zero page might be wasteful for architectures with bigger page sizes, let's not enable it for them. [1] https://lore.kernel.org/linux-xfs/20231027051847.GA7885@lst.de/ [2] https://lore.kernel.org/linux-xfs/ZitIK5OnR7ZNY0IG@infradead.org/ Suggested-by: David Hildenbrand Signed-off-by: Pankaj Raghav --- arch/x86/Kconfig | 1 + mm/Kconfig | 12 ++++++++++++ mm/memory.c | 30 ++++++++++++++++++++++++++---- 3 files changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 055204dc211d..96f99b4f96ea 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -152,6 +152,7 @@ config X86 select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64 select ARCH_WANT_HUGETLB_VMEMMAP_PREINIT if X86_64 select ARCH_WANTS_THP_SWAP if X86_64 + select ARCH_WANTS_STATIC_PMD_ZERO_PAGE if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH select BUILDTIME_TABLE_SORT select CLKEVT_I8253 diff --git a/mm/Kconfig b/mm/Kconfig index bd08e151fa1b..8f50f5c3f7a7 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -826,6 +826,18 @@ config ARCH_WANTS_THP_SWAP config MM_ID def_bool n +config ARCH_WANTS_STATIC_PMD_ZERO_PAGE + bool + +config STATIC_PMD_ZERO_PAGE + def_bool y + depends on ARCH_WANTS_STATIC_PMD_ZERO_PAGE + help + Typically huge_zero_folio, which is a PMD page of zeroes, is allocated + on demand and deallocated when not in use. This option will always + allocate huge_zero_folio for zeroing and it is never deallocated. + Not suitable for memory constrained systems. + menuconfig TRANSPARENT_HUGEPAGE bool "Transparent Hugepage Support" depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE && !PREEMPT_RT diff --git a/mm/memory.c b/mm/memory.c index 11edc4d66e74..ab8c16d04307 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -203,9 +203,17 @@ static void put_huge_zero_page(void) BUG_ON(atomic_dec_and_test(&huge_zero_refcount)); } +/* + * If STATIC_PMD_ZERO_PAGE is enabled, @mm can be NULL, i.e, the huge_zero_folio + * is not associated with any mm_struct. +*/ struct folio *mm_get_huge_zero_folio(struct mm_struct *mm) { - if (test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) + if (!IS_ENABLED(CONFIG_STATIC_PMD_ZERO_PAGE) && !mm) + return NULL; + + if (IS_ENABLED(CONFIG_STATIC_PMD_ZERO_PAGE) || + test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) return READ_ONCE(huge_zero_folio); if (!get_huge_zero_page()) @@ -219,6 +227,9 @@ struct folio *mm_get_huge_zero_folio(struct mm_struct *mm) void mm_put_huge_zero_folio(struct mm_struct *mm) { + if (IS_ENABLED(CONFIG_STATIC_PMD_ZERO_PAGE)) + return; + if (test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) put_huge_zero_page(); } @@ -246,15 +257,26 @@ static unsigned long shrink_huge_zero_page_scan(struct shrinker *shrink, static int __init init_huge_zero_page(void) { + int ret = 0; + + if (IS_ENABLED(CONFIG_STATIC_PMD_ZERO_PAGE)) { + if (!get_huge_zero_page()) + ret = -ENOMEM; + goto out; + } + huge_zero_page_shrinker = shrinker_alloc(0, "thp-zero"); - if (!huge_zero_page_shrinker) - return -ENOMEM; + if (!huge_zero_page_shrinker) { + ret = -ENOMEM; + goto out; + } huge_zero_page_shrinker->count_objects = shrink_huge_zero_page_count; huge_zero_page_shrinker->scan_objects = shrink_huge_zero_page_scan; shrinker_register(huge_zero_page_shrinker); - return 0; +out: + return ret; } early_initcall(init_huge_zero_page); -- 2.47.2