From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A88B7C30658 for ; Fri, 5 Jul 2024 14:14:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D5DD6B00A0; Fri, 5 Jul 2024 10:14:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 085DC6B00A1; Fri, 5 Jul 2024 10:14:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E436E6B00A2; Fri, 5 Jul 2024 10:14:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BF1CC6B00A0 for ; Fri, 5 Jul 2024 10:14:27 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 46C82120151 for ; Fri, 5 Jul 2024 14:14:27 +0000 (UTC) X-FDA: 82305894174.21.39146FB Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf11.hostedemail.com (Postfix) with ESMTP id 0E5B740027 for ; Fri, 5 Jul 2024 14:14:24 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b="Ybe/2mjn"; spf=pass (imf11.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720188852; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y1Y4p2xqqd6gS24EgCZ1kgyTpBC+2Zys7nr/Eng5dRc=; b=XG5BputF0QjfpeoSFvWpzeCT3rKpHh3mQMCKDPhrwt7OCqgmktOw946dbzTTrhyGNI2jeQ W7mlPPXGZ3yFqQIGmYUOui4xanrB8bvlOrsOTyuS8XvuWVgG8fO5AkNadKJQkiteqQPHRj JhquEk9vHFw6TtrpVyFt+fua1NjA+MM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b="Ybe/2mjn"; spf=pass (imf11.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720188852; a=rsa-sha256; cv=none; b=nL7foWUxWjufQIrngH4q/hl6kCtuujKh7udqZeEpRRzStRLYCsndMNl4paSOsTeCW5cjZ1 f3fnLo+Jb6FUA4MTVIir8ufalfJ3VR/mvixKTFqhoT7bW9zZQVRCJIXvXzm7TXBcAGPOd/ JMlsejPYA32SNK9BTUaninNzqLp4/pk= Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4WFwW00hDJz9smF; Fri, 5 Jul 2024 16:14:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720188860; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=y1Y4p2xqqd6gS24EgCZ1kgyTpBC+2Zys7nr/Eng5dRc=; b=Ybe/2mjnsGCnkb6nxXo8R7NDwbb1VLDonzraFe0Y4P8Otu6MUwQ7yH2u1wZxowOgnNSVb6 tuQZA2bzQHTGmJTxNPvLBiFzYWgR1V18M8Zu2doB7FTv1VKcHYiTPuTHwR26A7Fay8UCzN DXFOd5J68bmcpNFFRGBPmWOhGwFsv5RJ9/fOxxws+B8VK/L6w2SWXcKWDWphR6xGBFt0Mq TVBLc++d1EEbDBeo/3DhQ0zgsHRlZcqwsbkDNBryOpfp2HZg2f6aajyKF8j3ITopkqnwAg Mam5ttaaDbXSjdYTI6ejz4oemihqdGG4a1ZPJubXdXZirBcgfrEzSelTqyVJcQ== Date: Fri, 5 Jul 2024 14:14:14 +0000 From: "Pankaj Raghav (Samsung)" To: Ryan Roberts Cc: Dave Chinner , Matthew Wilcox , chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, hch@lst.de, Zi Yan Subject: Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes Message-ID: <20240705141414.72yy6m75aajmlhvt@quentin> References: <20240625114420.719014-1-kernel@pankajraghav.com> <20240625114420.719014-2-kernel@pankajraghav.com> <20240705132418.gk7oeucdisat3sq5@quentin> <1e0e89ea-3130-42b0-810d-f52da2affe51@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1e0e89ea-3130-42b0-810d-f52da2affe51@arm.com> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0E5B740027 X-Stat-Signature: rs77ddww7ibahjjpexaiu1piznrjudmg X-Rspam-User: X-HE-Tag: 1720188864-739778 X-HE-Meta: U2FsdGVkX196MLhUoBnkh5ujJbB/4Mp0AYG5sRB1Li2tIOyt57cAOJ5Ew3tC5vRpratsAgbGO2bDoJv/bqdyzDIkF89OIk8gT8ekCvDly852UgRLGVP9ORz2x+zcVhz50tDtzMQ/uJDIDLK0Q6YjsskdTuJ3CDm6LR3PXj6ngZySmOcKYNcFX6hFO8NoUUmSX38/H8Ft3nkN1/ow1RZa5U4KoQS/z/WaQiyJKPc3UjUbHUOOJv0X72qwzopjBPWh0pqBcYbDNyhhulO8KLXJN/qaXyHsBUmzvdJKu5gRcJSkpTy0JyIovmZL8FPL6Qw2Uf7s5JYM8zYTzvMrZuJMsJDe1KH1BK48hFQ2ViqrH1LYh4qEMbRcKSEzGz7X86brNdYdku28068c8gzU276VOLxvRTfjVzgMjdG9xebd1m/kZz4ZJwqsGSZkXVz6qfVfXBfWa8N0fs6dysVoatImTP0OyzI4fN+RHv1xzbdkLbFilQ07F2YzcIwu50J+Ss/5Ioc5ZERnDGCp4UCJpWjNUmjISjSIThx0ZnT8gwtjhJVuZAsb2AIo1WvG1G/x0PdmVw/jqY94cbUpDx5AvFGhbGu5WoH3fjwVB1HV66Eh3K6jG2/iSd9hfXy6djaAzyGC/pPE6WjZAzDAXFo5cC6/AvjaMXqVm7H+xXumUMP2qe0xbqmW9uWe/f8BKmT4g3B45Z3Zxj41cShjPwaI5KJuQS37xChC4DoqeXwcG987Ga7QYUck3Tlha3JxF2N1liZH4fzxW6Ora9ETRd10CoSKcX/+DKukXkOoEjIZDKnpEv1Htt4vKuQwcbpSrkk19RPc+ffNYdxotTFyUbX9slsQaaJzhy0YKzac5odIZ04T1Y+YMJYvkR+wuCFyIz95QBRKbkwxJpz+KXQyjacY6FGtEPQItpF01BsnnaB5Ja1abELaQnZV+kmEsGIS9uR0t/sGVmElsDmi6ag0yj2wDkn vDoM7K0y FT2gFAvGi0YU077v01RfARuj+21l5uB7cg2i6sLN1/aTUcZi3C1eeov/46HS54Z7Dewbdbc0KDHgFaFYYLy8ZjNZPlI4EixywJ17odT2l+iR+7fnrWMeOPstGoO+W6erEzCMtXz4XvNDQeM7e58f77TXHw3s0raTzAggO5ZTozFu9WVGjNap9zghCnI/omVN58ehr0DF6IdwOgOwYYr26RdF27CMYLAQ7GYZcwoNn9xYQiLBww85FvcROXB4oJZHUmirHnrwnwYtXIFM/QZws+bPL4GI9ZmNl8mGdT4nO6sB/OBFZTLiVhjxvDaYFMqHRkz3v+jxnacR2JMfXeNtQMM7Ky3C/qeJ4T/oHMnmF8pn1SBKjAecEEOlLiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > >> > >>> If the device is > >>> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is > >>> not set, you should also decline to mount the filesystem. > >> > >> What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems > >> being able to use large folios? > >> > >> If that's an actual dependency of using large folios, then we're at > >> the point where the mm side of large folios needs to be divorced > >> from CONFIG_TRANSPARENT_HUGEPAGE and always supported. > >> Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the > >> block layer and also every filesystem that wants to support > >> sector/blocks sizes larger than PAGE_SIZE. IOWs, large folio > >> support needs to *always* be enabled on systems that say > >> CONFIG_BLOCK=y. > > > > Why CONFIG_BLOCK? I think it is enough if it comes from the FS side > > right? And for now, the only FS that needs that sort of bs > ps > > guarantee is XFS with this series. Other filesystems such as bcachefs > > that call mapping_set_large_folios() only enable it as an optimization > > and it is not needed for the filesystem to function. > > > > So this is my conclusion from the conversation: > > - Add a dependency in Kconfig on THP for XFS until we fix the dependency > > of large folios on THP > > THP isn't supported on some arches, so isn't this effectively saying XFS can no > longer be used with those arches, even if the bs <= ps? I think while pagecache > large folios depend on THP, you need to make this a mount-time check in the FS? > > But ideally, MAX_PAGECACHE_ORDER would be set to 0 for > !CONFIG_TRANSPARENT_HUGEPAGE so you can just check against that and don't have > to worry about THP availability directly. Yes, that would be better. We should have a way to probe it during mount time without requiring any address_space mapping. We could have a helper something as follows: static inline unsigned int mapping_max_folio_order_supported() { if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) return 0; return MAX_PAGECACHE_ORDER; } This could be used by the FS to verify during mount time. > > Willy; Why is MAX_PAGECACHE_ORDER set to 8 when THP is disabled currently? > This appeared in this patch with the following comment: https://lore.kernel.org/linux-fsdevel/20230710130253.3484695-8-willy@infradead.org/ +/* + * There are some parts of the kernel which assume that PMD entries + * are exactly HPAGE_PMD_ORDER. Those should be fixed, but until then, + * limit the maximum allocation order to PMD size. I'm not aware of any + * assumptions about maximum order if THP are disabled, but 8 seems like + * a good order (that's 1MB if you're using 4kB pages) + */ > > - Add a BUILD_BUG_ON(XFS_MAX_BLOCKSIZE > MAX_PAGECACHE_ORDER) > > - Add a WARN_ON_ONCE() and clamp the min and max value in > > mapping_set_folio_order_range() ? > > > > Let me know what you all think @willy, @dave and @ryan. > > > > -- > > Pankaj >