linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] breaking the 512 KiB IO boundary on x86_64
@ 2025-03-20 11:41 Luis Chamberlain
  2025-03-20 12:11 ` Matthew Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Luis Chamberlain @ 2025-03-20 11:41 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-block
  Cc: lsf-pc, david, leon, hch, kbusch, sagi, axboe, joro, brauner,
	hare, willy, djwong, john.g.garry, ritesh.list, p.raghav,
	gost.dev, da.gomez, Luis Chamberlain

We've been constrained to a max single 512 KiB IO for a while now on x86_64.
This is due to the number of DMA segments and the segment size. With LBS the
segments can be much bigger without using huge pages, and so on a 64 KiB
block size filesystem you can now see 2 MiB IOs when using buffered IO.
But direct IO is still crippled, because allocations are from anonymous
memory, and unless you are using mTHP you won't get large folios. mTHP
is also non-deterministic, and so you end up in a worse situation for
direct IO if you want to rely on large folios, as you may *sometimes*
end up with large folios and sometimes you might not. IO patterns can
therefore be erratic.

As I just posted in a simple RFC [0], I believe the two step DMA API
helps resolve this.  Provided we move the block integrity stuff to the
new DMA API as well, the only patches really needed to support larger
IOs for direct IO for NVMe are:

  iomap: use BLK_MAX_BLOCK_SIZE for the iomap zero page
  blkdev: lift BLK_MAX_BLOCK_SIZE to page cache limit

The other two nvme-pci patches in that series are to just help with
experimentation now and they can be ignored.

It does beg a few questions:

 - How are we computing the new max single IO anyway? Are we really
   bounded only by what devices support?
 - Do we believe this is the step in the right direction?
 - Is 2 MiB a sensible max block sector size limit for the next few years?
 - What other considerations should we have?
 - Do we want something more deterministic for large folios for direct IO?

[0] https://lkml.kernel.org/r/20250320111328.2841690-1-mcgrof@kernel.org

  Luis


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-03-21 18:57 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-20 11:41 [LSF/MM/BPF TOPIC] breaking the 512 KiB IO boundary on x86_64 Luis Chamberlain
2025-03-20 12:11 ` Matthew Wilcox
2025-03-20 13:29   ` Daniel Gomez
2025-03-20 14:31     ` Matthew Wilcox
2025-03-20 13:47 ` Daniel Gomez
2025-03-20 14:54   ` Christoph Hellwig
2025-03-21  9:14     ` Daniel Gomez
2025-03-20 14:18 ` Christoph Hellwig
2025-03-20 15:37   ` Bart Van Assche
2025-03-20 15:58     ` Keith Busch
2025-03-20 16:13       ` Kanchan Joshi
2025-03-20 16:38       ` Christoph Hellwig
2025-03-20 21:50         ` Luis Chamberlain
2025-03-20 21:46       ` Luis Chamberlain
2025-03-20 21:40   ` Luis Chamberlain
2025-03-20 18:46 ` Ritesh Harjani
2025-03-20 21:30   ` Darrick J. Wong
2025-03-21  2:13     ` Ritesh Harjani
2025-03-21  3:05       ` Darrick J. Wong
2025-03-21  4:56         ` Theodore Ts'o
2025-03-21  5:00           ` Christoph Hellwig
2025-03-21 18:39             ` Ritesh Harjani
2025-03-21 16:38       ` Keith Busch
2025-03-21 17:21         ` Ritesh Harjani
2025-03-21 18:55           ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox