From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D102EB64DD for ; Thu, 13 Jul 2023 11:42:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7ACC8E0005; Thu, 13 Jul 2023 07:42:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2BDD8E0001; Thu, 13 Jul 2023 07:42:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CC3D8E0005; Thu, 13 Jul 2023 07:42:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7B7FB8E0001 for ; Thu, 13 Jul 2023 07:42:06 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4FB871A0210 for ; Thu, 13 Jul 2023 11:42:06 +0000 (UTC) X-FDA: 81006399852.02.41D8FCE Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) by imf13.hostedemail.com (Postfix) with ESMTP id 3C68C2001A for ; Thu, 13 Jul 2023 11:42:02 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmx.com header.s=s31663417 header.b=CJatrKSx; spf=pass (imf13.hostedemail.com: domain of quwenruo.btrfs@gmx.com designates 212.227.17.22 as permitted sender) smtp.mailfrom=quwenruo.btrfs@gmx.com; dmarc=pass (policy=none) header.from=gmx.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689248523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qd62+xc5cfrg1X3t+NrychV2xntZPHNiZ9THzYlWqNQ=; b=dqtVtqe+Hm3Lh0szPAXmreZVfVlOr/gb3MxC6NuR4rsDPzUrWdQNiBOnDtGA5JpSz/cOmP ftWTwbmSWthVEjRKvU1LMkmZ6cZN0VYSaxOqZNpg1yZO2IzY2O/8cZzMKbM9xFWOWrYmIj BPXguFxwJLdB6TFV3A7RVC/TWsiESI8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689248523; a=rsa-sha256; cv=none; b=IcOU5Cz58pmDlPYAbwqP7XhlUGKhwXiHC32AWzIoE0i5rCOmXKlwvQAM9KbtBpcSYv3pD5 qqziAE/QfJCDbVbKLOvmwhWPfrRuHaijzHHZ2lMP5i2oyXaWKavJRacU8nQK0vxKWbwkW0 B6+AjTWmPbe6xz4+lRI516y+50ghBjc= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmx.com header.s=s31663417 header.b=CJatrKSx; spf=pass (imf13.hostedemail.com: domain of quwenruo.btrfs@gmx.com designates 212.227.17.22 as permitted sender) smtp.mailfrom=quwenruo.btrfs@gmx.com; dmarc=pass (policy=none) header.from=gmx.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.com; s=s31663417; t=1689248521; x=1689853321; i=quwenruo.btrfs@gmx.com; bh=xxHgFrzYzdSPA7cT/6F0GkhxL63FbvDSBOjRQn57Pyw=; h=X-UI-Sender-Class:Date:Subject:To:Cc:References:From:In-Reply-To; b=CJatrKSxRy7wuN+LV0F8QbBOuIb+v0UrXizW944qGvbg0FAGl0X/w6h/6+hapQ6LzPUudDk /rVYUNaWcebxF1wtmjFFwh/IZc67jBpQnL9E7y0B7xOLrsg95FHSJhzFbYTGWOHI/0hTWFmT8 1HzFkzb7BOiF/7dEjmt+TNgP3nPFGp5RgHJBZBxH62zk74muEMT0W4k7KdKVcw4MOCExo4YRu YQgODNAJxaJCZmVud1Fh6+nq6XllYXg8zw/Roys9w38W0gRHWt9+ixA5W+batfGzJfmj3OVX6 /ccKc5Do6X97lDZ4O+NOqwfolPqa9sbrzlbpeRLLk5dH7Bd7kYgg== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from [0.0.0.0] ([149.28.201.231]) by mail.gmx.net (mrgmx105 [212.227.17.174]) with ESMTPSA (Nemesis) id 1Mlf4c-1pbC2M1SoU-00ihjz; Thu, 13 Jul 2023 13:42:01 +0200 Message-ID: <4d46cb42-0253-9182-3c61-5610c7f8ff89@gmx.com> Date: Thu, 13 Jul 2023 19:41:53 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Content-Language: en-US To: dsterba@suse.cz, Qu Wenruo Cc: Christoph Hellwig , linux-btrfs@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org References: <20230713112605.GO30916@twin.jikos.cz> From: Qu Wenruo In-Reply-To: <20230713112605.GO30916@twin.jikos.cz> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:VRgtbJmRdPIvTqREOoBH9kDUuNLfYEmAMaInr0ZH1TVgWb+Grq0 S+I0z1hg03jmwTS3bKeaPmLHXjhXVhRP42MteAewGqLa+zyhZ2Y4NaNcWDQCpBC2nYiTNip BTe22GIkpd95Aa1c12gM8Ny60080LIRgyz6RKkJBN9u5oyrzVuiekvU8qiXURor3hmFE5xL tPZWezNBC4c3ZRS0bic1Q== UI-OutboundReport: notjunk:1;M01:P0:AIPsJmyPrYg=;ufgLK6uVFCly0uz6G3tzQJ9mBxS 0TOzQ0cAPPdGAlP4Peu1NDze5g4sfSblt9SZZehzO1ESWtvPCSVtyrA1kRDhKUnsZt9+XI4jq qIc5/SHivuTt7l6wGTWvJiqxd3XE6fGZpDbRL2x75ASmwanhLMXw3RyxXRpbUKrknfTo5A6Bz nixaVH5J9F8UjoEdezQ0Uz14qTZ+Gka81GT7KlttdGdptQvfTLqm7fQO+Oq5cloPy5O3lfneP tBIXmDKcQC7C1uwW+CqLleFRyu3sxPZ20ZVqPUh/HmjHVjgBRn3gm/k/RFa6A413D9jUxj6QH PfLp4LCRrSnhtc4Q9K5nTNQRMR6mWliIqlRr3eWkgoKtsBETE0+bo54NB6wYIvnNmf0947nYu iF5pVHp/aYtztCK78Go7dNokDN0JS/fpuZSj6V0Jw7lZNXM/Gsz9Rb2m0GwdeYHnFdd5JbsIm /Sk5CR8xemZrSvmmB0jYY2/uNKKWp55aWavIPLmQ53L7/yQrJlFNeJvdJVToNWrRELz2OrSGV TJScRLjXAyDh69imCM71ubVQ6WEZ+Fm884i6MZ8zJviKOgHXnxBAofT7CSDsOrE5MjdhgnBJr 1Ea5nZqBcXmT6Ehzh2yad7H9X4v9T/ABHYN4626ZUPFG5Ejtvy9zWVVsdmgOZGZLYzFufCuA1 dUMSHdK39SSdVB+5iqa+8Hfulw5zvpWZl9bmH+b4MFN2VKQVF6l1usRMd4eSqHhUyGJDOhACP bDToQZ26dQKmF+LswaFtacC6V3paODtnMbQqaoP3nLDBlQIA6nnL9npZJmDnlxU16VyCqBNq/ Ht4P2AAboe2fLxrJzEXA+s57iC0f3b+zEklxfi3gTFftHX/upwfHKu8Tg9DpeRqErKHMBndVu 1i6vgixS9lQDp3tJoIpBSTUvAsfSzFT6fWK8Np36BNUZQlDY5fpvW/9WktYU6eIirTln6DiVO vIUNqvZ69TmvatWNY0xmfegEbsQ= X-Rspamd-Queue-Id: 3C68C2001A X-Rspam-User: X-Stat-Signature: zxh43c3hdmnjt5uyog6qqkiu6xarqjdm X-Rspamd-Server: rspam03 X-HE-Tag: 1689248522-496245 X-HE-Meta: U2FsdGVkX1+4iaISom01msR79vEq+OQIs9wwVBTKTN8C6bYX+jjNVe6FRNwrwQnLqCx83Eys7YtLIeXYtp48YHFkzAV//Eden5LjI3BvkSsWj9l2TL3RRIFue+Vvtn06UbnGqwYYofvrV6qB1qgG3+SayixRPk0virAUQM/ZD15owYmo3C/nfecxdUKrqwTr2/c1JqxkSfRRRr9ldWSsjOoBa6hV0mQEHPCyZoEZECxL5eiKJZw/gUi46yzyFWdPvXFhA97NoO9kud+q/o9rTW7fT4A8zzUqolRhBWcUPDZLu/Q6yDYEc3q80viifNp4SO1dcBxT2VQDjtBe6lDBTyNOFKk/+HzS0fVLkBr8wSqpaHI1LcGmIHpUPTKjaWZNKPYAfaUGBrFviZLvWmhOX7a1cyYO0nbvvRk66EeiEEt2vQOOAAE9w2tKkx4GMDn9I0O6SrBSEegvkYa2jAhA5EbJ9rI1oNEq/cn1o2h0Trd2Qh4exjtJRukGn6/pveSv7Xzw2o/HwkyOj3EVN4JHf9hU1YZlEb+sidzl4zPQRE1hTE3AiYZkVCUDKjrqZD7Ctar1+rTSbL29B0p6zRPAlub8Rd/wd7lmSOHISKmeKR/YPnNvDZMoWw85GNJRma86VRHIA7QQyF8xTxZwSuaOiLVGSbZK4AMG3bxEF6sAp5/prW2IHh83AkKWzu4VWy5NdviujS0dYOd5M/R1jG7E3SX80mj1E3AmiEyIU7TzOHjWkimajtME3hgcmD4EkGCW8snBuOArYXw2CbHfplRyH8botjlXj20Nxk8rNx43zPjrvFrCqAfbwRAEJ3LDwCj46B6vRaVWXNfh56xFWy8FTCZGXLaOttkIhIo2tnVe6PXaYo2bkDeFEVs6rr8Ei69tqn0cBJWH4WwwZoy7HtpZoWSEX71QPqrLFG/8DUI9wIvp+1AtwCk42AoZsEf/m+CuEQRXi7WYhAD/rIA7smQ QFuLdpGU ZRb23L5cuavHuPpfTFJOSSekpk6SGDnFCl/+CKZwNU+xYhhp5amUpcSjczVN0NJKcW7nNkrNILOMG3u67LeomwKxFwrMHjGbgLnZrxm4SNdglvMmIVaI4vaI0VqBSSsR1NLF8XGwXCZxtuSg7xjfDeCcDfsYtdq27OSjQo8pRBI367ZN1FSuSTjDHXMSCTwvOz9ty/UoNldLFzCVpkNzso08nXrV5V0o06OFI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/7/13 19:26, David Sterba wrote: > On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote: >> On 2023/7/13 00:41, Christoph Hellwig wrote: >>> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: >>>> One of the biggest problem for metadata folio conversion is, we still >>>> need the current page based solution (or folios with order 0) as a >>>> fallback solution when we can not get a high order folio. >>> >>> Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with >>> a maximum of 64k (order 4). IIRC we should be able to get them pretty >>> reliably. >> >> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine >> with that. > > I have mentioned my concerns about the allocation problems with higher > order than 0 in the past. Allocator gives some guarantees about not > failing for certain levels, now it's 1 (mm/fail_page_alloc.c > fail_page_alloc.min_oder =3D 1). > > Per comment in page_alloc.c:rmqueue() > > 2814 /* > 2815 * We most definitely don't want callers attempting to > 2816 * allocate greater than order-1 page units with __GFP_NOFA= IL. > 2817 */ > 2818 WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > For allocations with higher order, eg. 4 to match the default 16K nodes, > this increases pressure and can trigger compaction, logic around > PAGE_ALLOC_COSTLY_ORDER which is 3. > >>> If not the best thning is to just a virtually contigous allocation as >>> fallback, i.e. use vm_map_ram. > > So we can allocate 0-order pages and then map them to virtual addresses, > which needs manipulation of PTE (page table entries), and requires > additional memory. This is what xfs does, > fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory, > so vm_unmap_aliases() is required and brings some overhead, and at the > end vm_unmap_ram() needs to be called, another overhead but probably > bearable. > > With all that in place there would be a contiguous memory range > representing the metadata, so a simple memcpy() can be done. Sure, > with higher overhead and decreased reliability due to potentially > failing memory allocations - for metadata operations. > > Compare that to what we have: > > Pages are allocated as order 0, so there's much higher chance to get > them under pressure and not increasing the pressure otherwise. We don't > need any virtual mappings. The cost is that we have to iterate the pages > and do the partial copying ourselves, but this is hidden in helpers. > > We have different usage pattern of the metadata buffers than xfs, so > that it does something with vmapped contiguous buffers may not be easily > transferable to btrfs and bring us new problems. > > The conversion to folios will happen eventually, though I don't want to > sacrifice reliability just for API use convenience. First the conversion > should be done 1:1 with pages and folios both order 0 before switching > to some higher order allocations hidden behind API calls. In fact, I have another solution as a middle ground before adding folio into the situation. Check if the pages are already physically continuous. If so, everything can go without any cross-page handling. If not, we can either keep the current cross-page handling, or migrate to the virtually continuous mapped pages. Currently we already have around 50~66% of eb pages are already allocated physically continuous. If we can just reduce the cross page handling for more than half of the ebs, it's already a win. For the vmapped pages, I'm not sure about the overhead, but I can try to go that path and check the result. Thanks, Qu