From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7576EC30658 for ; Fri, 5 Jul 2024 13:31:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3E146B009E; Fri, 5 Jul 2024 09:31:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEDE36B009F; Fri, 5 Jul 2024 09:31:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB5DF6B00A0; Fri, 5 Jul 2024 09:31:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9C5CA6B009E for ; Fri, 5 Jul 2024 09:31:16 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2CBDD16194F for ; Fri, 5 Jul 2024 13:31:16 +0000 (UTC) X-FDA: 82305785352.21.E7AB65F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf12.hostedemail.com (Postfix) with ESMTP id A28CD4000C for ; Fri, 5 Jul 2024 13:31:13 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720186261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LY6lPcwNBP106wyWF1qpfecz2oLtkoyvNhybF+qwel8=; b=yhdUB4Lx1eg5RBZoZBhXqbBxaXVHH1woK7MTRGO+8QmzDgEb2dwjnFegHTLdq2kE4vDB7j iolhFT07VJbAUH8rvcsD2KpENlfUetXbIK8LZYsO8zam25WdQV8THYtj+VxH6TFPm65DoJ sZ5ZtqRp9LEG33cVVySED4rxiNFp7Oo= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720186261; a=rsa-sha256; cv=none; b=tA9Xsr2Rl/Bbteet0O5HSUjHZf7/hysaROLUDAHjLJzlk6hC8ibnGWaAwEKAqhMOt6aQpf wl1vmJlihWnO/PgMJ6yh09yEVjAW3xO3pUoga3NfocEMXUNNUbMtnKi2yepDGeVIOWmbIJ FoE4tsd/n70fKbYsS8M6yxbkR6f0hrg= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA57E367; Fri, 5 Jul 2024 06:31:37 -0700 (PDT) Received: from [10.57.74.223] (unknown [10.57.74.223]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 07DCA3F762; Fri, 5 Jul 2024 06:31:09 -0700 (PDT) Message-ID: <1e0e89ea-3130-42b0-810d-f52da2affe51@arm.com> Date: Fri, 5 Jul 2024 14:31:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes Content-Language: en-GB To: "Pankaj Raghav (Samsung)" , Dave Chinner Cc: Matthew Wilcox , chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, yang@os.amperecomputing.com, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, hch@lst.de, Zi Yan References: <20240625114420.719014-1-kernel@pankajraghav.com> <20240625114420.719014-2-kernel@pankajraghav.com> <20240705132418.gk7oeucdisat3sq5@quentin> From: Ryan Roberts In-Reply-To: <20240705132418.gk7oeucdisat3sq5@quentin> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: A28CD4000C X-Stat-Signature: 3rxkeezgr47a4rkrkxz1qh5f3rwutdqs X-HE-Tag: 1720186273-339217 X-HE-Meta: U2FsdGVkX1/GxKaEFxdeShr284yYHbwPMqvPBb6LyAM1RzxMna5/30ZEM/5IrM1IR1MB7IlHy/f7dMaAb9f5FeAqueSFILL316Sg30l0a9pvH68xxs19Q22oLXFLM9eI+sGIoP+sivI17uEIIyJr81RPpuSJOXD8KqjXcZyVOzvNce/DH88CI7gG7i3H4HQ+4kwfpzd4Va0Qt/A6CU0OweLe9lCELDwFs+ml39nM1Hf2fB+qJJw+s/2xsX+iyWn62zwx/zfMt/6bwOlXH3pvUpoOOi5hmCTmY3/mGkSQzUQCU0GRSECuJE0EMbU0W3HsHOla3tRxlvRdpFCDNN54ESsyJIr0aG2j1dlepWohSzJ9ka5JcMIs+ON4ANxDItmujqwdW6yyvtey3PsPUgpQG2Asd5EqYAl0tTqutgvLHHqhcguHzxSgfYMncGQWAFjfZ56xzKls1M06Rq3JwspYgIuJXyMrGhj/kTnvhPZWtY+gPWYg82gXmbh+moUnVdfpWq6Di3TY3YK9kDFb6WtxL928HgASizcec98uEC/PSn932oqjjLjv0l4y52SqxGrFVA8wOUfOEJ/MPPhX1BO6Ru2Ju0CpxswikBE7yoPTDQ4Y7OWS3ktMxJ4BiHfpNvWDm1lvV3clvhEzqoPRSEUzERM8PCtsKn16bZy3GLXBYIit1WZ1ubFz3vO64XYQ/JafPdBR/qZ6N5hvMeRcuVRaOeTwTTgYQdoWipPFrkUrnDF8XtZrt4p/GQTaEuOrsy+XGB0YOhAkudXmxkr3JTPkrx5gpsVhea0VVKQHUWOPIrp4fVPZ6o6XUdbnaO0V7LVcuDafbu57bPmgwDby7PHUVPnYYMMs+xdTGy8Kiv8xOCdnEzXRKLpIU6sDhoxXuEgWQlk3FDOF3gkfumHh7+6pwU4mlVkOjM2IV7innQn6dw7rg+KuIlCi8Fl9sBrAScT1IgFNvt0NLu7s83oey32 OyvBof8n 65LsSjPlOKhjR4RRLCWSCwWShxuqEU1xdGhGQGh7bCIHzm9FePz90qk+YhDHhu8La3Zqx1coyXDeyhVag4UeJpn/SkkLX0GMM7Ivd0DZaQ5j6V3/g/x7m/GvFlFbjnVrhRFe7nXufbpX7dYvVP2w1BjMqbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/07/2024 14:24, Pankaj Raghav (Samsung) wrote: >>> I suggest you handle it better than this. If the device is asking for a >>> blocksize > PMD_SIZE, you should fail to mount it. >> >> That's my point: we already do that. >> >> The largest block size we support is 64kB and that's way smaller >> than PMD_SIZE on all platforms and we always check for bs > ps >> support at mount time when the filesystem bs > ps. >> >> Hence we're never going to set the min value to anything unsupported >> unless someone makes a massive programming mistake. At which point, >> we want a *hard, immediate fail* so the developer notices their >> mistake immediately. All filesystems and block devices need to >> behave this way so the limits should be encoded as asserts in the >> function to trigger such behaviour. > > I agree, this kind of bug will be encountered only during developement > and not during actual production due to the limit we have fs block size > in XFS. > >> >>> If the device is >>> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is >>> not set, you should also decline to mount the filesystem. >> >> What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems >> being able to use large folios? >> >> If that's an actual dependency of using large folios, then we're at >> the point where the mm side of large folios needs to be divorced >> from CONFIG_TRANSPARENT_HUGEPAGE and always supported. >> Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the >> block layer and also every filesystem that wants to support >> sector/blocks sizes larger than PAGE_SIZE. IOWs, large folio >> support needs to *always* be enabled on systems that say >> CONFIG_BLOCK=y. > > Why CONFIG_BLOCK? I think it is enough if it comes from the FS side > right? And for now, the only FS that needs that sort of bs > ps > guarantee is XFS with this series. Other filesystems such as bcachefs > that call mapping_set_large_folios() only enable it as an optimization > and it is not needed for the filesystem to function. > > So this is my conclusion from the conversation: > - Add a dependency in Kconfig on THP for XFS until we fix the dependency > of large folios on THP THP isn't supported on some arches, so isn't this effectively saying XFS can no longer be used with those arches, even if the bs <= ps? I think while pagecache large folios depend on THP, you need to make this a mount-time check in the FS? But ideally, MAX_PAGECACHE_ORDER would be set to 0 for !CONFIG_TRANSPARENT_HUGEPAGE so you can just check against that and don't have to worry about THP availability directly. Willy; Why is MAX_PAGECACHE_ORDER set to 8 when THP is disabled currently? > - Add a BUILD_BUG_ON(XFS_MAX_BLOCKSIZE > MAX_PAGECACHE_ORDER) > - Add a WARN_ON_ONCE() and clamp the min and max value in > mapping_set_folio_order_range() ? > > Let me know what you all think @willy, @dave and @ryan. > > -- > Pankaj