From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61F8FC001B0 for ; Thu, 13 Jul 2023 11:32:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D67548E0002; Thu, 13 Jul 2023 07:32:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D15498E0001; Thu, 13 Jul 2023 07:32:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC0AC8E0002; Thu, 13 Jul 2023 07:32:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A9B1C8E0001 for ; Thu, 13 Jul 2023 07:32:46 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3F786B011D for ; Thu, 13 Jul 2023 11:32:46 +0000 (UTC) X-FDA: 81006376332.20.B406664 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf15.hostedemail.com (Postfix) with ESMTP id C38BAA001B for ; Thu, 13 Jul 2023 11:32:43 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=cIbrstJ2; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=zYWAg3eR; spf=pass (imf15.hostedemail.com: domain of dsterba@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=dsterba@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689247964; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Oyq/R68zA9i9MrOMH2THhNFvOednoz8gGojgSJ/u5ao=; b=slAFdLlykScIwWrLfgrfKUKEKQQ8ZRwehFS8Y5jxYq2qH6i4M8VNi56kczA4Cf1jEHj2nD C5SzccKrXph8FmvR+BauBh4e72/mg51PvmqFEc3nQUlpEkcVgQDJ3jBfE8RqRK+VORheX5 SaFObkNdy035aAGWby8ZN0DG9Vy0ujY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689247964; a=rsa-sha256; cv=none; b=q6n4PGDIpfAAg1trShFTOjS3XZl0DoidAie0bGZuTaB465egixFHxg//E7aXRMGLBchSE1 OxeLiXU+LNMLk8eROJsASIBvYGFkefkxnjESBQbEr8UD7asUb98Q2OhLE0/69cRitx/inp ylDuRIieuYS5tqva+KJuXQY2adwLwk4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=cIbrstJ2; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=zYWAg3eR; spf=pass (imf15.hostedemail.com: domain of dsterba@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=dsterba@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CD3921FD8E; Thu, 13 Jul 2023 11:32:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1689247961; h=from:from:reply-to:reply-to:date:date:message-id:message-id:to:to: cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Oyq/R68zA9i9MrOMH2THhNFvOednoz8gGojgSJ/u5ao=; b=cIbrstJ2+gjjTQXFB6zYYTSNfYzJsdVLrJa2OY/JVCZflDvBpoZU6NCQ9wTH9KVhi4Bp1R BuhABsfAd+tZTaeFpXexjwU+ipdep5LYtQxH2kLRkCqo+JqN8LogGjncIsf3bFHnA3qYXn mqS7KVuHZosYalVfKSkdIukO7NW/OS4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1689247961; h=from:from:reply-to:reply-to:date:date:message-id:message-id:to:to: cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Oyq/R68zA9i9MrOMH2THhNFvOednoz8gGojgSJ/u5ao=; b=zYWAg3eRLUVaclAuLVUfH3IE1iNHMWowlF/cSrhOglSSuOBLrQWAZIV7UuDmzbY3z3ji5e de5btlKHx+s9zoAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8771E13489; Thu, 13 Jul 2023 11:32:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0n5LH9ngr2QVDgAAMHmgww (envelope-from ); Thu, 13 Jul 2023 11:32:41 +0000 Date: Thu, 13 Jul 2023 13:26:05 +0200 From: David Sterba To: Qu Wenruo Cc: Christoph Hellwig , linux-btrfs@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org Subject: Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Message-ID: <20230713112605.GO30916@twin.jikos.cz> Reply-To: dsterba@suse.cz References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) X-Rspamd-Queue-Id: C38BAA001B X-Rspam-User: X-Stat-Signature: 9rapn9rmsgn1axezj6imd4iuzw67fxdc X-Rspamd-Server: rspam03 X-HE-Tag: 1689247963-172432 X-HE-Meta: U2FsdGVkX1/d5UKBL0cKg7711g9nKrQQWKuuDygnVM6njDUMStp09UXorPn4S6oUvne+TDTplrFqRm3HcBr1nCemOfmDsIykDRMu+H1Cl8U+cJQqtZwAwdjW/Sni0DPax7jR5+WES4VRKe5qtWRiaxJ7hiTa/sml7dfjbwY94ncdX16E2rTgE8StcGKFjf2sfhpq++xP255/wKGGbl7NwmSOmI+yp6vNFw0XSVWpKJtk6+n6b+QD6CefJ9nTlFz8p5LOiDOhcbcJJsKKs2c9Hd2jxCU+DQGEv9nBwbK9RLCIkfTmnAFViSVp+07rYEIbAjuAh0IgjqjOmGogILv1BD4kTF/yGTcO1XvkbLs503UtNkZ4f7bfBh9jZ8Aqkxy/1osV/8gDWPl0PheduU1OmKHSriSB+o+s62MEGpmuvn1JZ2oZqVS4MVzpNxslep2XsPQqHIYByn3/ntBkT851ET8mDIX8AkzwDyNQxCmWiL9PFsoLfVYRkMEMu1qnA0fvdqYwxmjmI337glqFfovOjCjVCa0gn/h/n+tyb6SIM0Ekp0s9K614y2RzYFFXZw8Jvz902hYC02V2MLjpuXPqUbBM1W+YQIxgnXGhI5zlasW/dRvcG+9D59q15Vq37ynYXP3tjFp+shyMc2FEjsL6cMhwhahSrDNkebGYFHm/PH2WMWFrdxSl08595D/nvGWSoa1rfNZMBeTD2VXAIhugAIDzwL6Oly/UFxRYtCXYqUpJksJqbRc/xxqtZ8UribtLlpg6jsTPysHr69+6y8eqkklGx0z7yX1q7RvU80W0F2OpUe5rNmrXEc+GlUejCJGxM9mPLz19M5T4NZMerxXMjxHouH5BacxtVhhRkK3BAopY+Euc7Hgmn9xrfiLPno9apsq0vmShQoCMFs0YygWBnVOQ2HzH82CwvS4C1/H6qOC/j5PPmfI8sK+BQgOjs0mKdu5IIb4HVgNcqK/DSDS 5baHayEJ NpJKtzha6hEcUxB8pdfwOMPAHLnPdLJf8jAYXcTkWowIGbHi9E2Ssjs7bSplRLkyC/q1beBugmr7dgiME1ojfJP0gK9t14hm63POtSPEE2gyzXLJz4MwZP/MS7OUXAZZUURsbPl4ei5TSU7RuPVIsYcNgLOkCQJ70Q4na7HC1mxKLc3JQyXvt2LFLPPR0POXTtgzvZrsrsV0AYCNe+HSrDqxDJWSOTVCcaCBMT5whVo7Na9MGMciXxExhS2Ipruipv6kL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote: > On 2023/7/13 00:41, Christoph Hellwig wrote: > > On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: > >> One of the biggest problem for metadata folio conversion is, we still > >> need the current page based solution (or folios with order 0) as a > >> fallback solution when we can not get a high order folio. > > > > Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with > > a maximum of 64k (order 4). IIRC we should be able to get them pretty > > reliably. > > If it can be done as reliable as order 0 with NOFAIL, I'm totally fine > with that. I have mentioned my concerns about the allocation problems with higher order than 0 in the past. Allocator gives some guarantees about not failing for certain levels, now it's 1 (mm/fail_page_alloc.c fail_page_alloc.min_oder = 1). Per comment in page_alloc.c:rmqueue() 2814 /* 2815 * We most definitely don't want callers attempting to 2816 * allocate greater than order-1 page units with __GFP_NOFAIL. 2817 */ 2818 WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); For allocations with higher order, eg. 4 to match the default 16K nodes, this increases pressure and can trigger compaction, logic around PAGE_ALLOC_COSTLY_ORDER which is 3. > > If not the best thning is to just a virtually contigous allocation as > > fallback, i.e. use vm_map_ram. So we can allocate 0-order pages and then map them to virtual addresses, which needs manipulation of PTE (page table entries), and requires additional memory. This is what xfs does, fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory, so vm_unmap_aliases() is required and brings some overhead, and at the end vm_unmap_ram() needs to be called, another overhead but probably bearable. With all that in place there would be a contiguous memory range representing the metadata, so a simple memcpy() can be done. Sure, with higher overhead and decreased reliability due to potentially failing memory allocations - for metadata operations. Compare that to what we have: Pages are allocated as order 0, so there's much higher chance to get them under pressure and not increasing the pressure otherwise. We don't need any virtual mappings. The cost is that we have to iterate the pages and do the partial copying ourselves, but this is hidden in helpers. We have different usage pattern of the metadata buffers than xfs, so that it does something with vmapped contiguous buffers may not be easily transferable to btrfs and bring us new problems. The conversion to folios will happen eventually, though I don't want to sacrifice reliability just for API use convenience. First the conversion should be done 1:1 with pages and folios both order 0 before switching to some higher order allocations hidden behind API calls.