From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org
Subject: Re: [RFC] [PATCH 0/3] xfs: use large folios for buffers
Date: Thu, 18 Jan 2024 17:05:35 -0800 [thread overview]
Message-ID: <20240119010535.GP674499@frogsfrogsfrogs> (raw)
In-Reply-To: <20240118222216.4131379-1-david@fromorbit.com>
On Fri, Jan 19, 2024 at 09:19:38AM +1100, Dave Chinner wrote:
> The XFS buffer cache supports metadata buffers up to 64kB, and it does so by
> aggregating multiple pages into a single contiguous memory region using
> vmapping. This is expensive (both the setup and the runtime TLB mapping cost),
> and would be unnecessary if we could allocate large contiguous memory regions
> for the buffers in the first place.
>
> Enter multi-page folios.
LOL, hch and I just wrapped up making the xfbtree buffer cache work with
large folios coming from tmpfs. Though the use case there is simpler
because we require blocksize==PAGE_SIZE, forbid the use of highmem, and
don't need discontig buffers. Hence we sidestep vm_map_ram. :)
> This patchset converts the buffer cache to use the folio API, then enhances it
> to optimisitically use large folios where possible. It retains the old "vmap an
> array of single page folios" functionality as a fallback when large folio
> allocation fails. This means that, like page cache support for large folios, we
> aren't dependent on large folio allocation succeeding all the time.
>
> This relegates the single page array allocation mechanism to the "slow path"
> that we don't have to care so much about performance of this path anymore. This
> might allow us to simplify it a bit in future.
>
> One of the issues with the folio conversion is that we use a couple of APIs that
> take struct page ** (i.e. pointers to page pointer arrays) and there aren't
> folio counterparts. These are the bulk page allocator and vm_map_ram(). In the
> cases where they are used, we cast &bp->b_folios[] to (struct page **) knowing
> that this array will only contain single page folios and that single page folios
> and struct page are the same structure and so have the same address. This is a
> bit of a hack (hence the RFC) but I'm not sure that it's worth adding folio
> versions of these interfaces right now. We don't need to use the bulk page
> allocator so much any more, because that's now a slow path and we could probably
> just call folio_alloc() in a loop like we used to. What to do about vm_map_ram()
> is a little less clear....
Yeah, that's what I suspected.
> The other issue I tripped over in doing this conversion is that the
> discontiguous buffer straddling code in the buf log item dirty region tracking
> is broken. We don't actually exercise that code on existing configurations, and
> I tripped over it when tracking down a bug in the folio conversion. I fixed it
> and short-circuted the check for contiguous buffers, but that didn't fix the
> failure I was seeing (which was not handling bp->b_offset and large folios
> properly when building bios).
Yikes.
> Apart from those issues, the conversion and enhancement is relatively straight
> forward. It passes fstests on both 512 and 4096 byte sector size storage (512
> byte sectors exercise the XBF_KMEM path which has non-zero bp->b_offset values)
> and doesn't appear to cause any problems with large directory buffers, though I
> haven't done any real testing on those yet. Large folio allocations are
> definitely being exercised, though, as all the inode cluster buffers are 16kB on
> a 512 byte inode V5 filesystem.
>
> Thoughts, comments, etc?
Not yet.
> Note: this patchset is on top of the NOFS removal patchset I sent a
> few days ago. That can be pulled from this git branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-kmem-cleanup
Oooh a branch link, thank you. It's so much easier if I can pull a
branch while picking through commits over gitweb.
--D
>
>
next prev parent reply other threads:[~2024-01-19 1:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-18 22:19 Dave Chinner
2024-01-18 22:19 ` [PATCH 1/3] xfs: unmapped buffer item size straddling mismatch Dave Chinner
2024-01-22 6:41 ` Christoph Hellwig
2024-01-18 22:19 ` [PATCH 2/3] xfs: use folios in the buffer cache Dave Chinner
2024-01-19 1:26 ` Darrick J. Wong
2024-01-19 7:03 ` Dave Chinner
2024-01-22 6:39 ` Christoph Hellwig
2024-01-22 12:05 ` Dave Chinner
2024-01-22 13:18 ` Christoph Hellwig
2024-01-22 21:10 ` Dave Chinner
2024-01-18 22:19 ` [PATCH 3/3] xfs: convert buffer cache to use high order folios Dave Chinner
2024-01-19 1:31 ` Darrick J. Wong
2024-01-19 7:12 ` Dave Chinner
2024-01-22 6:45 ` Christoph Hellwig
2024-01-22 11:57 ` Dave Chinner
2024-01-19 1:05 ` Darrick J. Wong [this message]
2024-01-22 10:13 ` Andi Kleen
2024-01-22 11:53 ` Dave Chinner
2024-01-22 13:34 ` Using Folios for XFS metadata Andi Kleen
2024-01-23 0:19 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240119010535.GP674499@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox