From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96082C3ABDD for ; Fri, 16 May 2025 10:11:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA9DE6B0128; Fri, 16 May 2025 06:11:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5DD56B012A; Fri, 16 May 2025 06:11:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B00C76B012B; Fri, 16 May 2025 06:11:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 911FB6B0128 for ; Fri, 16 May 2025 06:11:20 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 61773BFE17 for ; Fri, 16 May 2025 10:11:21 +0000 (UTC) X-FDA: 83448353562.27.47012AA Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf25.hostedemail.com (Postfix) with ESMTP id AED43A0008 for ; Fri, 16 May 2025 10:11:19 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=samsung.com (policy=none); spf=pass (imf25.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747390280; a=rsa-sha256; cv=none; b=hD0BG5uZvRwiW9XZ/J6vRgMpPxhiUtovU+BrvTJmeN0IujbcBTbWFssoUv4YYBmuD3o1hO D8RHpZ8DTPrQlL/j2TAf0EUBs4/6UVWhR11YHo5rYhelUSJCgw34tmXQM+reJroXe984Gk dtX7HCgtXWX3l0YQ1795mC1PvySka8I= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=samsung.com (policy=none); spf=pass (imf25.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747390280; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vsTeU+SLWG/hFz5zHQw9+/AmiPIuPMy9iaz0aceaJmE=; b=nZDDR55j5XgH3bxGSrTwfNdHI9UHjsv502rRRzq4HwgoDYWw/FVAVhBmMsIMD8109ctoFw 6MLlkRTM3N/awMjP8ae+O5rqwSQW80YQDVW7L0PQdGD3bn+/u/bUhHSJ/zZPVR9+IQ8Gzu r8shpw1i6QfQ3aH4ei63KVB6tF4etI0= Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4ZzNC70M28z9tBW; Fri, 16 May 2025 12:11:15 +0200 (CEST) From: Pankaj Raghav To: "Darrick J . Wong" , hch@lst.de, willy@infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Hildenbrand , linux-fsdevel@vger.kernel.org, mcgrof@kernel.org, gost.dev@samsung.com, Andrew Morton , kernel@pankajraghav.com, Pankaj Raghav Subject: [RFC 1/3] mm: add large zero page for efficient zeroing of larger segments Date: Fri, 16 May 2025 12:10:52 +0200 Message-ID: <20250516101054.676046-2-p.raghav@samsung.com> In-Reply-To: <20250516101054.676046-1-p.raghav@samsung.com> References: <20250516101054.676046-1-p.raghav@samsung.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: AED43A0008 X-Stat-Signature: 3wc5xk3ybfp9cxcs1jxonxd3yrf4ne7k X-HE-Tag: 1747390279-346709 X-HE-Meta: U2FsdGVkX180shBkWr8tLDUw3uVa1BnLqMshgC6tNYiG9UU24rqJkUBEMe+B7/Zsl7Pb9VPE+YisvRPVhRs6Ve2bFfhx3Odm+BFyOVivHbnFo7Iid+f+MmS2fuOiCD8IpgWPUUgHT0C4sJd+RQ6SRVgOmTtNc7iiUbgvdaIVk5K/4Gf5fIpmbGTi4GY9qJy6NFD7Lz/n0rL7AiZg5KsN8f3D6z8NeViEA8k+LBrGmXrYWmUcjF8MS9Y9TUHFib1uyaC94GyJ8fVhhTqp5WPZ6HHEGnFxkCGRODkEpY066kbEEBoPcuuxii9JlTrxhCoRg6wUMKmYdfUhSpTnKtfCIysH6uVUjUahAzDrRwBQgcuYby/vzVW5PqhWZ9cttS5giWQTsl2eIai9VA0smV6yw5QRRrRhHCtvG/5cw6TfQ6HWG5Fo1mCKXTl4MWYtNHtPIvgHn4RB4N94e6oC1kx1pU/P5r3e4bvTLiY2yTmv70qxHicY0tojFBRcuQgjqx6XpNmd59OS+wjbxw/6KYz8veiNys/WLacDWPiJd1OShRUua8K7JTUTEdrfUPpSl7r4A0WbNZDmn5o/5eQf3LJ0KRxpugJzc6j/TiXnXnZz+pTQdjiaI4Zk9zFJ4DB1PUmEInXhvpeTFTnAw2JcRGHvS/1USj/xBkD1bgHkUtaTPS1ripJLhK6lx2nShvp4qFAA+QXQ2cod8AQnwG5n8nqYIPUsGmZ24r9MUujExJHo9cCAEYUMDIuHQVgCmRmLPVPKQ/5QrpXDq87WYfgotGU5n0g7bwRxZ+iaiO9WxJJbxBwpGjQBYQxtLtDzAHsOG9pf/EnH3iOa+KZMPmXZudyN6ejEWEeIiBzCKFnjB4av2i13zFnp61nYB/cRfnQQ/89DxdpvwlkqRdA6v4FImvX1Vor7B/5cWD9u4UHb/zjwYkGegTqwjzPqazsL0epGvCPPtAjdUUq5EibNqKiA61l qIYb3g8I ChFpjySBZ52Mnig+nqZstOTRj2vkotImd0mXRdmb0VMgXzXu55kAEU0w3deKFxeij1yAKdPSKY8q8CFEzvd4Pb4Av2UaKmdXJIqGVRj0otvp5ot690VgonT+bnbPqTFUBWQOWIkWVrNQNqEy2YhRW24rLen4IKeMOjtMsppmdFhSwdijTHAb0at7rWDAnYMGF7kcP4vtA8vkyrvoFvhQ5nf7yVpm/ReZjV31ePf3N5hm2frCnvigrBKaudgju5FS7LCZHnluL+hA5KcAu2fWJoQkUKcYXfOBhySckeGTs8UQdmvIN4A/cxJ3BOruzRIKxRG2ct+kCtk0H6gPRQxZXzPuK+C4T4rG2DIp1KReWZT1xw2O3uPSIJn1DFxvVrpwh9lv4REQz5wCs0lzmZS1C9Xv0EcHlN1KYbD7OB15ScC6s2VU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce LARGE_ZERO_PAGE of size 2M as an alternative to ZERO_PAGE of size PAGE_SIZE. There are many places in the kernel where we need to zeroout larger chunks but the maximum segment we can zeroout at a time is limited by PAGE_SIZE. This is especially annoying in block devices and filesystems where we attach multiple ZERO_PAGEs to the bio in different bvecs. With multipage bvec support in block layer, it is much more efficient to send out larger zero pages as a part of single bvec. While there are other options such as huge_zero_page, they can fail based on the system memory pressure requiring a fallback to ZERO_PAGE[3]. This idea (but not the implementation) was suggested during the review of adding LBS support to XFS[1][2]. LARGE_ZERO_PAGE is added behind a config option so that systems that are constrained by memory are not forced to use it. [1] https://lore.kernel.org/linux-xfs/20231027051847.GA7885@lst.de/ [2] https://lore.kernel.org/linux-xfs/ZitIK5OnR7ZNY0IG@infradead.org/ [3] https://lore.kernel.org/linux-xfs/3pqmgrlewo6ctcwakdvbvjqixac5en6irlipe5aiz6vkylfyni@2luhrs36ke5r/ Suggested-by: Christoph Hellwig Signed-off-by: Pankaj Raghav --- arch/Kconfig | 8 ++++++++ arch/x86/include/asm/pgtable.h | 20 +++++++++++++++++++- arch/x86/kernel/head_64.S | 9 ++++++++- 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index b0adb665041f..aefa519cb211 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -218,6 +218,14 @@ config USER_RETURN_NOTIFIER Provide a kernel-internal notification when a cpu is about to switch to user mode. +config LARGE_ZERO_PAGE + bool "Large zero pages" + def_bool n + help + 2M sized zero pages for zeroing. This will reserve 2M sized + physical pages for zeroing. Not suitable for memory constrained + systems. + config HAVE_IOREMAP_PROT bool diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 3f59d7a16010..78eb83f2da34 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -17,6 +17,7 @@ #ifndef __ASSEMBLER__ #include +#include #include #include #include @@ -47,14 +48,31 @@ void ptdump_walk_user_pgd_level_checkwx(void); #define debug_checkwx_user() do { } while (0) #endif +#ifdef CONFIG_LARGE_ZERO_PAGE +/* + * LARGE_ZERO_PAGE is a global shared page that is always zero: used + * for zero-mapped memory areas etc.. + */ +extern unsigned long empty_large_zero_page[(SZ_2M) / sizeof(unsigned long)] + __visible; +#define ZERO_LARGE_PAGE(vaddr) ((void)(vaddr),virt_to_page(empty_large_zero_page)) + +#define ZERO_PAGE(vaddr) ZERO_LARGE_PAGE(vaddr) +#define ZERO_LARGE_PAGE_SIZE SZ_2M +#else /* * ZERO_PAGE is a global shared page that is always zero: used * for zero-mapped memory areas etc.. */ -extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] +extern unsigned long empty_zero_page[(PAGE_SIZE) / sizeof(unsigned long)] __visible; #define ZERO_PAGE(vaddr) ((void)(vaddr),virt_to_page(empty_zero_page)) +#define ZERO_LARGE_PAGE(vaddr) ZERO_PAGE(vaddr) + +#define ZERO_LARGE_PAGE_SIZE PAGE_SIZE +#endif + extern spinlock_t pgd_lock; extern struct list_head pgd_list; diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index fefe2a25cf02..ebcd12f72966 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -708,8 +709,14 @@ EXPORT_SYMBOL(phys_base) #include "../xen/xen-head.S" __PAGE_ALIGNED_BSS +#ifdef CONFIG_LARGE_ZERO_PAGE +SYM_DATA_START_PAGE_ALIGNED(empty_large_zero_page) + .skip SZ_2M +SYM_DATA_END(empty_large_zero_page) +EXPORT_SYMBOL(empty_large_zero_page) +#else SYM_DATA_START_PAGE_ALIGNED(empty_zero_page) .skip PAGE_SIZE SYM_DATA_END(empty_zero_page) EXPORT_SYMBOL(empty_zero_page) - +#endif -- 2.47.2