From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65378CD13CF for ; Sun, 17 Sep 2023 22:05:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 935826B01F0; Sun, 17 Sep 2023 18:05:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BF1F6B01F1; Sun, 17 Sep 2023 18:05:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 737F66B01F2; Sun, 17 Sep 2023 18:05:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5F0326B01F0 for ; Sun, 17 Sep 2023 18:05:28 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 32364B364B for ; Sun, 17 Sep 2023 22:05:28 +0000 (UTC) X-FDA: 81247471536.15.C70B576 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) by imf11.hostedemail.com (Postfix) with ESMTP id 37FC740006 for ; Sun, 17 Sep 2023 22:05:25 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=j7Z7Eihz; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf11.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694988325; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fDmEsXyaSf8yAoYrZgi7k3PEY6Z2DsRWqnUB0JhWW88=; b=5xHr+tB2XZfVBNgZ8t1DYcnPo9ZvOQaswzX1CNuP1QtLiruB5jnLXKnPBVpVven1hL6W5W 2bgALcf5tYPlQfmKWpuh0lGuIgXm3s6a+uMvKTYHLn+KeJ9lufFLTHe6qGO7UDw6rngiXo dA9KMVYF1w0+RUNXCsWBMftpunULCV0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=j7Z7Eihz; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf11.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.173 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694988325; a=rsa-sha256; cv=none; b=RIQwAqxhVHem0i9+nd0Lrgfjnrxe36UnWzciKZuiqGovExIB6lb+yc3vvgYwBb8FrMa/aQ J6EvyKt3En5vWw+YMowME8NGh3udSSMqBdhfbQ89WJgRExriIJjOdZDkjFEnwwJteZGRja JFF+GMZGlBdqBjPtqriJ7lerFEJnWC4= Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-68fbd5cd0ceso3435837b3a.1 for ; Sun, 17 Sep 2023 15:05:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1694988324; x=1695593124; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=fDmEsXyaSf8yAoYrZgi7k3PEY6Z2DsRWqnUB0JhWW88=; b=j7Z7EihzTXb4bFf9ASEBuVSTr1AAtHH/8bIzHa/0TJ80wxuWUwVN7bDl/sP1+EsIGU PhswyOjP1toWxW3Hovl/51wGguH7lYaczzzCO3gGbq5zgB6vhf87jh7xLR1dXfemqIQX 6v+kEMQ8GzHUlsxldeXeyosDE1GPGRr4Jj0W/tlKQnJqDzOqf6DtFd4rGQ6x0vE5bR67 FZdc2PQ9hfErnugcF/yGlwN3ZC+4q1E8C9hn3CATSbpElyIEwXAtxV/45twEZZjzUX+6 chGmW3YYJ8kRgkHATc/QYLnvjv1a6q7GnIbqa7h7lMwJdCoeBeps9MpBgYpNaP3d86Ob 0h7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694988324; x=1695593124; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=fDmEsXyaSf8yAoYrZgi7k3PEY6Z2DsRWqnUB0JhWW88=; b=LTrC/kY2NslU47OmVr7fh3KNY1CLzBt1KIamncUVWx7RgQR3HBgxClLDerFCrkaNKw 0x9YsvSDJLmWATHFQpIwgGgEBg4eGfzw/qa++BG0BUJWtCQy9P6xb9FkYrVcmUfD5wQY t+8xLWDaZO0Kox77WUiZs7HWRSnS9uuK1KscArEJfDT/drL7nR6GNTI43zHFTDxuS+IX axb3rMxTWjlNRj7Do7wIpHRNJA9k3hd/fkQ08x7nnj7oqOYKW+twO4VTbTAP6pTO75DX wEU0+dowgJiysFp+Xx0zPsEFKdcHxmfGB5RGuJUZWlaLdLEgn3b7qp6QhYoGkhg3lPtm IDqQ== X-Gm-Message-State: AOJu0YxpGBNeijsGYaGRa3NQ1sfz6kz9SHp46nQ2XjFl5b1SAwLAn9gA 8VMw7PvxRgmCB7CPMMZOI/8vMj1jCJFddObOQLM= X-Google-Smtp-Source: AGHT+IGriyOaU60Vb7Rit7epiVgY+WyPy27El2SWay4AblxK+YycjfKHg8vCZzD3aFqma8BOKSbIJw== X-Received: by 2002:a05:6a00:2283:b0:68f:ea5d:1f70 with SMTP id f3-20020a056a00228300b0068fea5d1f70mr10470850pfe.14.1694988323991; Sun, 17 Sep 2023 15:05:23 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id d23-20020aa78157000000b00690188b124esm6249509pfn.174.2023.09.17.15.05.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Sep 2023 15:05:23 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qhztI-0025Sj-0n; Mon, 18 Sep 2023 08:05:20 +1000 Date: Mon, 18 Sep 2023 08:05:20 +1000 From: Dave Chinner To: Pankaj Raghav Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, p.raghav@samsung.com, da.gomez@samsung.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, willy@infradead.org, djwong@kernel.org, linux-mm@kvack.org, chandan.babu@oracle.com, mcgrof@kernel.org, gost.dev@samsung.com Subject: Re: [RFC 00/23] Enable block size > page size in XFS Message-ID: References: <20230915183848.1018717-1-kernel@pankajraghav.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230915183848.1018717-1-kernel@pankajraghav.com> X-Rspamd-Queue-Id: 37FC740006 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: m4deu3j994fayjbjsz3ymno7gby4rzsx X-HE-Tag: 1694988325-28428 X-HE-Meta: U2FsdGVkX18b56R2BmhWzASxDyszq5TONPWR3c8J636/bAfeCB51FolyThtJb1cRwNTfi8/XdyEqN1TxKNjqtwI/Lr/EC+NZcuDcxZ14CbRJZ9t/Y1guU4JwA36pKZKWM3lRg8kcbdqS7jSh8UcVHeaI5PQ8qmcJoJnwfRlNUn1/DBz7/eNpL4zLJBjkKuk6WiKop6lETTcwymJsKvrq1ravplC6VACm/XV+5uUBrlsEyNHz6Oz/Wg4odyo6zzQvjHgPqv6h54OXwhwyINjYvrzrONAT0hiC7M8LwSet459PUh2mwf27ld7PfpXI0MsuBDcJfPXwGVijg2EXh4VftsBN12nOCDRhs/GK5WAUw4rxy2jQyEnejCGbBGBLnPczLSVstQDNUgHCFWkBjHlNrp/rZDOBtikdmNf/fmd1uJR84YNfRl0KELG1GCRjQlPf1faaBldzC083enteNmZtUjaFSL02hzM9ppkDJ4qsEVsveaB012jBxLvDvVbbtxzUH8mlkskgDwfpkvSYNmVpw/Nd/ZD+x2sPJdom1pkhPvbcAkaoeYpYQwWrrf4UREQKViDJLN4hYHLUwImvmPK3vZSoYe7pMi9F4pQucsLQSQtw3tyUsRPcP13DSJyHNu/Ugc0XOef/dM4cJZbwWV4c1IhS6kvSfbq7NTAyqD4XyfRTmsJYj8zz97RkT33WdLwxTVhM1DbLX1puvoTQ7x58s4UBbwssUgWrYYpBsx1lX9ldhJqVS884VAquAYqkVFFD8rQLIVqvvBx9MrQicrblJt3Yy3JpPWx2YCWizgmbdbXBUzf7quA5mQwj7tiqeCHnPwvI6Cf+R2o8FEwn+TpFgNZQKAZ12JyeeMZZxcTIQ/J6LadUV0LKOh0tqLxA+/0qcz0j92/5WGmxLbqOgngU1D3+h5sbnwmzxlzk1ar0qcqAHeB3E74dbtXHsNCTE9GHP3Pl+3fOgzAVoNNmhBa sEyoEEUy acXYFwOA69F/v/SsEC7j/RD/rZDTOCLRfL71P9AZ1AeX3jpMMzLjdGYQOz+UATDsuSVsjQwzh6UVT7MDLI31XxaLbO0wtFFiVSBUgWztKfTlZ/WrpyFCJJiXzgeOxCmOyTiMEgb33NtLfYuVOj3MqHGHAAykjJ4Kb2pSUuAxOe8x6786ecaHtLvGUgRopAX9Nz/1PnfbnCVTjVUT/U3vg1BTw4wDbMFkXexDHvIZGLJ3g4cLp6lQLUFBp/JmEzsqZw0pbvUEw03ptDe5r7Piov5k/4WStXFvVgpIPZl/bnsBaf+WAAsDPaO/XIoI+zbb/b/bnJ5235DNgBIHxyeTs50H6CXrmBUECCFMMuKZgUnwWc+LEH9vkWKSBCUYhrtVgOpeCtIoppHf1Tp0fRLvnoMeFll94DHS1RoWqZ372xtt5Agiu/8XJUABT1GutjGsiZOYVUZkc/7fUTLrp4a2/6CBesfipM7dd9lGw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 15, 2023 at 08:38:25PM +0200, Pankaj Raghav wrote: > From: Pankaj Raghav > > There has been efforts over the last 16 years to enable enable Large > Block Sizes (LBS), that is block sizes in filesystems where bs > page > size [1] [2]. Through these efforts we have learned that one of the > main blockers to supporting bs > ps in fiesystems has been a way to > allocate pages that are at least the filesystem block size on the page > cache where bs > ps [3]. Another blocker was changed in filesystems due to > buffer-heads. Thanks to these previous efforts, the surgery by Matthew > Willcox in the page cache for adopting xarray's multi-index support, and > iomap support, it makes supporting bs > ps in XFS possible with only a few > line change to XFS. Most of changes are to the page cache to support minimum > order folio support for the target block size on the filesystem. > > A new motivation for LBS today is to support high-capacity (large amount > of Terabytes) QLC SSDs where the internal Indirection Unit (IU) are > typically greater than 4k [4] to help reduce DRAM and so in turn cost > and space. In practice this then allows different architectures to use a > base page size of 4k while still enabling support for block sizes > aligned to the larger IUs by relying on high order folios on the page > cache when needed. It also enables to take advantage of these same > drive's support for larger atomics than 4k with buffered IO support in > Linux. As described this year at LSFMM, supporting large atomics greater > than 4k enables databases to remove the need to rely on their own > journaling, so they can disable double buffered writes [5], which is a > feature different cloud providers are already innovating and enabling > customers for through custom storage solutions. > > This series still needs some polishing and fixing some crashes, but it is > mainly targeted to get initial feedback from the community, enable initial > experimentation, hence the RFC. It's being posted now given the results from > our testing are proving much better results than expected and we hope to > polish this up together with the community. After all, this has been a 16 > year old effort and none of this could have been possible without that effort. > > Implementation: > > This series only adds the notion of a minimum order of a folio in the > page cache that was initially proposed by Willy. The minimum folio order > requirement is set during inode creation. The minimum order will > typically correspond to the filesystem block size. The page cache will > in turn respect the minimum folio order requirement while allocating a > folio. This series mainly changes the page cache's filemap, readahead, and > truncation code to allocate and align the folios to the minimum order set for the > filesystem's inode's respective address space mapping. > > Only XFS was enabled and tested as a part of this series as it has > supported block sizes up to 64k and sector sizes up to 32k for years. > The only thing missing was the page cache magic to enable bs > ps. However any filesystem > that doesn't depend on buffer-heads and support larger block sizes > already should be able to leverage this effort to also support LBS, > bs > ps. > > This also paves the way for supporting block devices where their logical > block size > page size in the future by leveraging iomap's address space > operation added to the block device cache by Christoph Hellwig [6]. We > have work to enable support for this, enabling LBAs > 4k on NVME, and > at the same time allow coexistence with buffer-heads on the same block > device so to enable support allow for a drive to use filesystem's to > switch between filesystem's which may depend on buffer-heads or need the > iomap address space operations for the block device cache. Patches for > this will be posted shortly after this patch series. Do you have a git tree branch that I can pull this from somewhere? As it is, I'd really prefer stuff that adds significant XFS functionality that we need to test to be based on a current Linus TOT kernel so that we can test it without being impacted by all the random unrelated breakages that regularly happen in linux-next kernels.... -Dave. -- Dave Chinner david@fromorbit.com