Re: [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size
       [not found] <cover.1706068026.git.wqu@suse.com>
@ 2024-01-24  4:03 ` Qu Wenruo
  2024-01-24  4:48   ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2024-01-24  4:03 UTC (permalink / raw)
  To: linux-btrfs, Linux Memory Management List


[-- Attachment #1.1.1: Type: text/plain, Size: 1403 bytes --]

Adding MM list to this cover letter.

On 2024/1/24 14:29, Qu Wenruo wrote:
> With the folio interface, it's much easier to support multi-page sector
> size (aka, sector/block size > PAGE_SIZE, which is rare between major
> upstream filesystems).
> 
> The basic idea is, if we firstly convert to full folio interface, and
> allow an address space to only allocate folio which is exactly
> block/sector size, the support for multi-page would be mostly done.
> 
> But before that support, there are still quite some conversion left for
> btrfs.
> 
> Furthermore, with both subpage and multipage sector size, we need to
> handle folio different:
> 
> - For subpage
>    The folio would always be page sized.
> 
> - For multipage (and regular sectorsize == PAGE_SIZE)
>    The folio would be sector sized.
> 
> Furthermore, the filemap interface would make various shifts more
> complex.
> As filemap_*() interfaces use index which is PAGE_SHIFT based,
> meanwhile with potential larger folio, the folio shift can be larger
> than PAGE_SHIFT.

As I really want some feedback on this part.

I'm pretty sure we would have some filesystems go utilizing larger 
folios to implement their multi-page block size support.

Thus in that case, can we have an interface change to make all folio 
versions of filemap_*() to accept a file offset instead of page index?

Thanks,
Qu

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size
  2024-01-24  4:03 ` [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size Qu Wenruo
@ 2024-01-24  4:48   ` Matthew Wilcox
  2024-01-24  5:27     ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-01-24  4:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Linux Memory Management List

On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote:
> I'm pretty sure we would have some filesystems go utilizing larger folios to
> implement their multi-page block size support.
> 
> Thus in that case, can we have an interface change to make all folio
> versions of filemap_*() to accept a file offset instead of page index?

You're confused.  There's no change needed to the filemap API to support
large folios used by large block sizes.  Quite possibly more of btrfs
is confused, but it's really very simple.  index == pos / PAGE_SIZE.
That's all.  Even if you have a 64kB block size device on a 4kB PAGE_SIZE
machine.

That implies that folios must be at least order-4, but you can still
look up a folio at index 23 and get back the folio which was stored at
index 16 (range 16-31).

hugetlbfs made the mistake of 'hstate->order' and it's still not fixed.
It's a little better than it was (thanks to Sid), but more work is needed.
Just use the same approach as THPs or you're going to end up hurt.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size
  2024-01-24  4:48   ` Matthew Wilcox
@ 2024-01-24  5:27     ` Qu Wenruo
  2024-01-24  5:43       ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2024-01-24  5:27 UTC (permalink / raw)
  To: Matthew Wilcox, Qu Wenruo; +Cc: linux-btrfs, Linux Memory Management List



On 2024/1/24 15:18, Matthew Wilcox wrote:
> On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote:
>> I'm pretty sure we would have some filesystems go utilizing larger folios to
>> implement their multi-page block size support.
>>
>> Thus in that case, can we have an interface change to make all folio
>> versions of filemap_*() to accept a file offset instead of page index?
>
> You're confused.  There's no change needed to the filemap API to support
> large folios used by large block sizes.  Quite possibly more of btrfs
> is confused, but it's really very simple.  index == pos / PAGE_SIZE.
> That's all.  Even if you have a 64kB block size device on a 4kB PAGE_SIZE
> machine.

Yes, I understand that filemap API is always working on PAGE_SHIFTed index.

The concern is, (hopefully) with more fses going to utilized large
folios, there would be two shifts.

One folio shift (ilog2(blocksize)), one PAGE_SHIFT for filemap interfaces.

And I'm pretty sure it's going to cause confusion, e.g. someone doing
the conversion without much think, and all go the folio shift, even for
filemap_get_folio().

Thus I'm wondering if it's possible to get a bytenr version of
filemap_get_folio().

(Or is it better just creating an inline wrapper inside the fs to avoid
confusion?)

>
> That implies that folios must be at least order-4, but you can still
> look up a folio at index 23 and get back the folio which was stored at
> index 16 (range 16-31).

Yep, that's also what I expect, and that is very handy.

Thanks,
Qu

>
> hugetlbfs made the mistake of 'hstate->order' and it's still not fixed.
> It's a little better than it was (thanks to Sid), but more work is needed.
> Just use the same approach as THPs or you're going to end up hurt.
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size
  2024-01-24  5:27     ` Qu Wenruo
@ 2024-01-24  5:43       ` Matthew Wilcox
  2024-01-24  5:50         ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-01-24  5:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs, Linux Memory Management List

On Wed, Jan 24, 2024 at 03:57:39PM +1030, Qu Wenruo wrote:
> 
> 
> On 2024/1/24 15:18, Matthew Wilcox wrote:
> > On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote:
> > > I'm pretty sure we would have some filesystems go utilizing larger folios to
> > > implement their multi-page block size support.
> > > 
> > > Thus in that case, can we have an interface change to make all folio
> > > versions of filemap_*() to accept a file offset instead of page index?
> > 
> > You're confused.  There's no change needed to the filemap API to support
> > large folios used by large block sizes.  Quite possibly more of btrfs
> > is confused, but it's really very simple.  index == pos / PAGE_SIZE.
> > That's all.  Even if you have a 64kB block size device on a 4kB PAGE_SIZE
> > machine.
> 
> Yes, I understand that filemap API is always working on PAGE_SHIFTed index.

OK, good.

> The concern is, (hopefully) with more fses going to utilized large
> folios, there would be two shifts.
> 
> One folio shift (ilog2(blocksize)), one PAGE_SHIFT for filemap interfaces.

Don't shift the file position by the folio_shift().  You want to support
large(r) folios _and_ large blocksizes at the same time.  ie 64kB might
be the block size, but all that would mean would be that folio_shift()
would be at least 16.  It might be 17, 18 or 21 (for a THP).

Filesystems already have to deal with different PAGE_SIZE, SECTOR_SIZE,
fsblock size and LBA size.  Folios aren't making things any worse here
(they're also not making anything better in this area, but I never
claimed they would).

btrfs is slightly unusual in that it defined PAGE_SIZE and fsblock size
to be the same (and then had to deal with the consequences of arm64/x86
interoperability later).  But most filesystems have pretty good separation
of the four concepts.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size
  2024-01-24  5:43       ` Matthew Wilcox
@ 2024-01-24  5:50         ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2024-01-24  5:50 UTC (permalink / raw)
  To: Matthew Wilcox, Qu Wenruo; +Cc: linux-btrfs, Linux Memory Management List


[-- Attachment #1.1.1: Type: text/plain, Size: 2352 bytes --]



On 2024/1/24 16:13, Matthew Wilcox wrote:
> On Wed, Jan 24, 2024 at 03:57:39PM +1030, Qu Wenruo wrote:
>>
>>
>> On 2024/1/24 15:18, Matthew Wilcox wrote:
>>> On Wed, Jan 24, 2024 at 02:33:22PM +1030, Qu Wenruo wrote:
>>>> I'm pretty sure we would have some filesystems go utilizing larger folios to
>>>> implement their multi-page block size support.
>>>>
>>>> Thus in that case, can we have an interface change to make all folio
>>>> versions of filemap_*() to accept a file offset instead of page index?
>>>
>>> You're confused.  There's no change needed to the filemap API to support
>>> large folios used by large block sizes.  Quite possibly more of btrfs
>>> is confused, but it's really very simple.  index == pos / PAGE_SIZE.
>>> That's all.  Even if you have a 64kB block size device on a 4kB PAGE_SIZE
>>> machine.
>>
>> Yes, I understand that filemap API is always working on PAGE_SHIFTed index.
> 
> OK, good.
> 
>> The concern is, (hopefully) with more fses going to utilized large
>> folios, there would be two shifts.
>>
>> One folio shift (ilog2(blocksize)), one PAGE_SHIFT for filemap interfaces.
> 
> Don't shift the file position by the folio_shift().  You want to support
> large(r) folios _and_ large blocksizes at the same time.  ie 64kB might
> be the block size, but all that would mean would be that folio_shift()
> would be at least 16.  It might be 17, 18 or 21 (for a THP).

Indeed, I forgot we have THP.

> 
> Filesystems already have to deal with different PAGE_SIZE, SECTOR_SIZE,
> fsblock size and LBA size.  Folios aren't making things any worse here
> (they're also not making anything better in this area, but I never
> claimed they would).

OK, that makes sense.

So all the folio shifts would be an fs internal deal, and we have to 
handle it properly.

> 
> btrfs is slightly unusual in that it defined PAGE_SIZE and fsblock size
> to be the same (and then had to deal with the consequences of arm64/x86
> interoperability later).  But most filesystems have pretty good separation
> of the four concepts.

Indeed, although I also found most major fses do not support larger 
block size (> PAGE_SIZE) either.

I guess subpage is simpler in the past, and hopefully with larger folio, 
we can see more fses support multi-page blocksize soon.

Thanks,
Qu

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-24  5:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1706068026.git.wqu@suse.com>
2024-01-24  4:03 ` [PATCH RFC 0/2] btrfs: defrag: further preparation for multi-page sector size Qu Wenruo
2024-01-24  4:48   ` Matthew Wilcox
2024-01-24  5:27     ` Qu Wenruo
2024-01-24  5:43       ` Matthew Wilcox
2024-01-24  5:50         ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox