From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB04C47422 for ; Fri, 19 Jan 2024 01:05:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6395D6B0080; Thu, 18 Jan 2024 20:05:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C3E46B0082; Thu, 18 Jan 2024 20:05:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B10D6B0085; Thu, 18 Jan 2024 20:05:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 38E766B0080 for ; Thu, 18 Jan 2024 20:05:39 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0A4361405A7 for ; Fri, 19 Jan 2024 01:05:39 +0000 (UTC) X-FDA: 81694267998.24.F6A7F1E Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id 464CE180008 for ; Fri, 19 Jan 2024 01:05:37 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=uUuTYMCj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705626337; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VXwudjr0dGJlw4zcBfj+W6VyBEm1OIvqR80CswXDTLY=; b=TiyFCuS5ctjtn9tApQ/YIpU8EQaUl3HLeSQVZqf8seYZ+3hv0Q/xSeYT7GZG+OPqS8nB0J CAKZPWvQsI9eiVfnVTscvQNmIDGhNEf5tM4mocNu2Y0ExD8X6tjkVWoCcGl9sTW7xyGcsN G2XZhDAhKywtht+Oo+h9hSbF+Gv4qoY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=uUuTYMCj; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of djwong@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=djwong@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705626337; a=rsa-sha256; cv=none; b=cAMqr7EuIPwVan8/vrBAr3z6ZAAhLvDEwv94/aqo+OJ8CdBEOfoN2PK1+dv0N1bf+OB3Br sPZ0/afN8ooXBNTg3P0JI5Y5Z/FxVXG4bqg2CGslTLSrKFBJQfVW6rPJU5PL+U1kZqHG// i0MFOf1cEw7QBk2SgtnqGgwcp1hIVhM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 2370761717; Fri, 19 Jan 2024 01:05:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BEA86C433C7; Fri, 19 Jan 2024 01:05:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705626335; bh=JJIMTdptpjRLqghyqHcwxmA/ycuk6CNv+5IIAWSkqaw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=uUuTYMCjI9pzDy1xuacxfo6X2TPAD/5Z5Ef/2PS/FWlsz0yjlcz8i047TNXIsW9yw hHN4RQW7N4FxyDFnSDRwzNUabUVbDfwrEVnS73areg3028gX9deRBXEiXt8UdXDE6E TEgcpvuGE6eUE7bd1OB0zgvDJaCAmg/jfxa1ncP0oWBo5hLadrcjneo10S22qh53mb Io4dmvJcZU38ufkMqpV6lloCsQLr0y4xBCXYFpCk/lWfRy/rXPY4N4MqeRArTlpsJI 8Y9S7MJvT+SuccCEVxba7PpnuAnrWZ9+JPtcJUU4s6LRrbtj0QUnsrvKH9LQ5lS8t6 DTViiyOWZYxvg== Date: Thu, 18 Jan 2024 17:05:35 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: linux-xfs@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org Subject: Re: [RFC] [PATCH 0/3] xfs: use large folios for buffers Message-ID: <20240119010535.GP674499@frogsfrogsfrogs> References: <20240118222216.4131379-1-david@fromorbit.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240118222216.4131379-1-david@fromorbit.com> X-Rspam-User: X-Stat-Signature: izzip7d8dmwr9msqsphdoobck81wzyfg X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 464CE180008 X-HE-Tag: 1705626337-451398 X-HE-Meta: U2FsdGVkX1+VHzS5cUI3Fhix/xbZjv+pTb/OYDazxkv+d7K9pQm4JsQZGiuKCwnzbQ2BsvJSA/QGQhDvNjFMyTDmQPCO7LbUJr4z3rkek4L0Quewrq2n1btZATdGSBAPse43jRG57RYdwmN4dZWWsVXKNQV0LJLf+hGuNO/hZQYY67j+rC2PMxwDP1fJ8gXIHLGPWmdbTXq+IMXZ+WmRZDWAfivsLBlwTah85W+skY3OxVuRQZYrl6h+azczUzEYPyZTZodA/9Vv4dyf4Vq7cpgHP484PZx47RYliXm1iGnDO/VeiuDexRyUCGkUJC8jpN+Dt+HnQjwqfxlRl8KczMpdoCpwAKDMoZ+RUKUtS00180la2uMqtqSUsOEC9okqtsEX1Cf7gXBSTbjovN/GD20fmbsQrrbaHZ5NIDz8Z7FpIJb6lQxEeskIWIFJE87GWMiT4NEoL407K4LjAA2llE9rYdT2RMb9JKx0aBhWhkZLgClA0E5nKkjQkIFx6w0EUy8/gljDHvBvT+8FAxkDcJ+q68vs0Akce/sVL9eHjRGIN0y3nHHGWgdYLY3TlDlfn6VKIiX0d1hx9oGuTQPP838weiEMA3umj8CCPLXfoDst62w8+NFUGuHeVrWxI47l2NXvAki1JmQQ28dXSPMDTiDvfeOd3XOkUs3y0cuAQ01BCqPcU+AszM2O6TZxherAeVVAqICj6VUPk/6ViJ3md7efnxZYHH6xeK7p3cD77f1/3/ijCpCpn86lBhitnP0KyoB6fn3FQcExKCbGaq+aMBPcFvPQ239pBPOgjA0oLYFKkthofkNmT/wogHOYzB+c43IQQrvy98ztKn+LD0TKVYmPfEOHLWNc579qtxyTMNhl7kKpSc8K5ENtKQjdFlH/+0b75zGzIHVAH5KumeKzpNqkRoCBv+FHKMjJCjyAYqFctH2EI7ySN5Wlyyszr5W/jSqbVetxEkbfnSEHnyY ibzCB6Z7 J/tNrT/8Icy1O9oU35y1V7q559sPurWKxj2jnXAa9oWyOPBqAI1AGelTEW3si0PmDRXsmRylndp+TqWC6JzJhkBnlrfc+tSjOs88uazowh5n8pS2xXY56VwOfQgtbjvbsh9Agf0VJ29+xp/qy4TVy5wxP3eXDEoBpSTmfTr9njPSKN9+9/oaeKtKwPxpvb5bj+jJo6FQSKaJw3MgbOaDPOqy+xmCOE/o8aITra2QWewA6/S9ta1Z2d0w19A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 19, 2024 at 09:19:38AM +1100, Dave Chinner wrote: > The XFS buffer cache supports metadata buffers up to 64kB, and it does so by > aggregating multiple pages into a single contiguous memory region using > vmapping. This is expensive (both the setup and the runtime TLB mapping cost), > and would be unnecessary if we could allocate large contiguous memory regions > for the buffers in the first place. > > Enter multi-page folios. LOL, hch and I just wrapped up making the xfbtree buffer cache work with large folios coming from tmpfs. Though the use case there is simpler because we require blocksize==PAGE_SIZE, forbid the use of highmem, and don't need discontig buffers. Hence we sidestep vm_map_ram. :) > This patchset converts the buffer cache to use the folio API, then enhances it > to optimisitically use large folios where possible. It retains the old "vmap an > array of single page folios" functionality as a fallback when large folio > allocation fails. This means that, like page cache support for large folios, we > aren't dependent on large folio allocation succeeding all the time. > > This relegates the single page array allocation mechanism to the "slow path" > that we don't have to care so much about performance of this path anymore. This > might allow us to simplify it a bit in future. > > One of the issues with the folio conversion is that we use a couple of APIs that > take struct page ** (i.e. pointers to page pointer arrays) and there aren't > folio counterparts. These are the bulk page allocator and vm_map_ram(). In the > cases where they are used, we cast &bp->b_folios[] to (struct page **) knowing > that this array will only contain single page folios and that single page folios > and struct page are the same structure and so have the same address. This is a > bit of a hack (hence the RFC) but I'm not sure that it's worth adding folio > versions of these interfaces right now. We don't need to use the bulk page > allocator so much any more, because that's now a slow path and we could probably > just call folio_alloc() in a loop like we used to. What to do about vm_map_ram() > is a little less clear.... Yeah, that's what I suspected. > The other issue I tripped over in doing this conversion is that the > discontiguous buffer straddling code in the buf log item dirty region tracking > is broken. We don't actually exercise that code on existing configurations, and > I tripped over it when tracking down a bug in the folio conversion. I fixed it > and short-circuted the check for contiguous buffers, but that didn't fix the > failure I was seeing (which was not handling bp->b_offset and large folios > properly when building bios). Yikes. > Apart from those issues, the conversion and enhancement is relatively straight > forward. It passes fstests on both 512 and 4096 byte sector size storage (512 > byte sectors exercise the XBF_KMEM path which has non-zero bp->b_offset values) > and doesn't appear to cause any problems with large directory buffers, though I > haven't done any real testing on those yet. Large folio allocations are > definitely being exercised, though, as all the inode cluster buffers are 16kB on > a 512 byte inode V5 filesystem. > > Thoughts, comments, etc? Not yet. > Note: this patchset is on top of the NOFS removal patchset I sent a > few days ago. That can be pulled from this git branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-kmem-cleanup Oooh a branch link, thank you. It's so much easier if I can pull a branch while picking through commits over gitweb. --D > >