* [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists
@ 2025-01-17 21:16 David Howells
2025-01-20 14:22 ` Leon Romanovsky
2025-01-31 16:08 ` Chuck Lever
0 siblings, 2 replies; 4+ messages in thread
From: David Howells @ 2025-01-17 21:16 UTC (permalink / raw)
To: lsf-pc, John Hubbard, Matthew Wilcox
Cc: dhowells, brauner, Herbert Xu, linux-fsdevel, linux-crypto,
linux-mm, linux-block
Hi,
I'd like to propose a discussion of two things: firstly, how might we improve
iov_iter and, secondly, would it be possible to replace scatterlists.
[*] First: Improvements to iov_iter.
I'm trying to get rid of ITER_XARRAY; xarrays are too unstable (in the sense
that their contents can shift under you).
I'm trying to replace that with ITER_FOLIOQ instead. This is a segmented list
of folios - so it can only hold folios, but has infinite capacity. How easy
would it be to extend this to be able to handle some other types of page, such
as anon pages or stuff that's been spliced out of network receive buffers?
Would it make sense to be able to have a chain of disparate types of object?
Say a couple of kmalloc'd buffers, followed by a number of folios, followed by
another kmalloc'd buffer and mark them such we know which ones can be DMA'd
and which ones must be copied.
Currently, the core iteration functions in linux/iov_iter.h each handle a
specific type of iterable. I wonder how much performance difference it would
make to have each item in a list have its own type. Now, I know, "try it and
see" is a valid suggestion here.
Rumour has it that John Hubbard may be working along similar lines, possibly
just in the area of bio_vecs and ITER_BVEC.
[*] Second: Can we replace the uses of scatterlist with iov_iter and reduce
the number of iterator classes we have?
One reason I'd like to do this is we have iov_iter at user end of the I/O
stack, and it percolates down to various depths. For network filesystems, for
example, the socket API takes iov_iters, so we want to plumb iov_iters all the
way down if we can - and have the filesystem know at little as possible about
folios and pages if we can manage it.
However, one thing that particularly stands out for me is that network
filesystems often want to use the crypto API - and that means allocating and
constructing a scatterlist to talk to the crypto API. Having spent some time
looking at crypto API, in most places iteration functions are used that mean
that changing to use an iov_iter might not be so hard.
That said, one thing that is made use of occasionally with scatterlists is the
ability to chain something on the front. That's significantly harder to do
with iov_iter.
That that said, one reason it's hard to modify the list attached to an
iterator is that we allow iterators to be rewound, using state stored in the
list to go backwards. I wonder if it might be possible to get rid of
iov_iter_revert() and use iterator copying instead.
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists
2025-01-17 21:16 [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists David Howells
@ 2025-01-20 14:22 ` Leon Romanovsky
2025-01-20 19:03 ` John Hubbard
2025-01-31 16:08 ` Chuck Lever
1 sibling, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2025-01-20 14:22 UTC (permalink / raw)
To: David Howells
Cc: lsf-pc, John Hubbard, Matthew Wilcox, brauner, Herbert Xu,
linux-fsdevel, linux-crypto, linux-mm, linux-block,
Christoph Hellwig, Jason Gunthorpe
On Fri, Jan 17, 2025 at 09:16:52PM +0000, David Howells wrote:
> Hi,
>
> I'd like to propose a discussion of two things: firstly, how might we improve
> iov_iter and, secondly, would it be possible to replace scatterlists.
<...>
> Rumour has it that John Hubbard may be working along similar lines, possibly
> just in the area of bio_vecs and ITER_BVEC.
>
>
> [*] Second: Can we replace the uses of scatterlist with iov_iter and reduce
> the number of iterator classes we have?
<...>
I would say yes to the questions.
Regarding rumors, I don't know, but Christoph, Jason and I are working towards
this goal. We proposed new DMA API which doesn't need scatterlists and allows
callers to implement their own data-structures.
See this "[PATCH v6 00/17] Provide a new two step DMA mapping API" series
https://lore.kernel.org/all/cover.1737106761.git.leon@kernel.org
and its block layer followup "[RFC PATCH 0/7] Block and NMMe PCI use of
new DMA mapping API"
https://lore.kernel.org/all/cover.1730037261.git.leon@kernel.org
Thanks
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists
2025-01-20 14:22 ` Leon Romanovsky
@ 2025-01-20 19:03 ` John Hubbard
0 siblings, 0 replies; 4+ messages in thread
From: John Hubbard @ 2025-01-20 19:03 UTC (permalink / raw)
To: Leon Romanovsky, David Howells
Cc: lsf-pc, Matthew Wilcox, brauner, Herbert Xu, linux-fsdevel,
linux-crypto, linux-mm, linux-block, Christoph Hellwig,
Jason Gunthorpe
On 1/20/25 6:22 AM, Leon Romanovsky wrote:
> On Fri, Jan 17, 2025 at 09:16:52PM +0000, David Howells wrote:
>> Hi,
>>
>> I'd like to propose a discussion of two things: firstly, how might we improve
>> iov_iter and, secondly, would it be possible to replace scatterlists.
>
> <...>
>
>> Rumour has it that John Hubbard may be working along similar lines, possibly
>> just in the area of bio_vecs and ITER_BVEC.
I do feel the need to apologize to Leon here, because I've been mostly MIA
ever we talked about this at LPC. Perhaps I'll actually be of some use in
2025. :)
>>
>>
>> [*] Second: Can we replace the uses of scatterlist with iov_iter and reduce
>> the number of iterator classes we have?
>
> <...>
>
> I would say yes to the questions.
>
> Regarding rumors, I don't know, but Christoph, Jason and I are working towards
> this goal. We proposed new DMA API which doesn't need scatterlists and allows
> callers to implement their own data-structures.
>
> See this "[PATCH v6 00/17] Provide a new two step DMA mapping API" series
> https://lore.kernel.org/all/cover.1737106761.git.leon@kernel.org
> and its block layer followup "[RFC PATCH 0/7] Block and NMMe PCI use of
> new DMA mapping API"
> https://lore.kernel.org/all/cover.1730037261.git.leon@kernel.org
>
> Thanks
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists
2025-01-17 21:16 [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists David Howells
2025-01-20 14:22 ` Leon Romanovsky
@ 2025-01-31 16:08 ` Chuck Lever
1 sibling, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2025-01-31 16:08 UTC (permalink / raw)
To: David Howells, lsf-pc, John Hubbard, Matthew Wilcox
Cc: brauner, Herbert Xu, linux-fsdevel, linux-crypto, linux-mm, linux-block
On 1/17/25 4:16 PM, David Howells wrote:
> Hi,
>
> I'd like to propose a discussion of two things: firstly, how might we improve
> iov_iter and, secondly, would it be possible to replace scatterlists.
>
> [*] First: Improvements to iov_iter.
>
> I'm trying to get rid of ITER_XARRAY; xarrays are too unstable (in the sense
> that their contents can shift under you).
>
> I'm trying to replace that with ITER_FOLIOQ instead. This is a segmented list
> of folios - so it can only hold folios, but has infinite capacity. How easy
> would it be to extend this to be able to handle some other types of page, such
> as anon pages or stuff that's been spliced out of network receive buffers?
>
> Would it make sense to be able to have a chain of disparate types of object?
> Say a couple of kmalloc'd buffers, followed by a number of folios, followed by
> another kmalloc'd buffer and mark them such we know which ones can be DMA'd
> and which ones must be copied.
>
> Currently, the core iteration functions in linux/iov_iter.h each handle a
> specific type of iterable. I wonder how much performance difference it would
> make to have each item in a list have its own type. Now, I know, "try it and
> see" is a valid suggestion here.
>
> Rumour has it that John Hubbard may be working along similar lines, possibly
> just in the area of bio_vecs and ITER_BVEC.
>
>
> [*] Second: Can we replace the uses of scatterlist with iov_iter and reduce
> the number of iterator classes we have?
>
> One reason I'd like to do this is we have iov_iter at user end of the I/O
> stack, and it percolates down to various depths. For network filesystems, for
> example, the socket API takes iov_iters, so we want to plumb iov_iters all the
> way down if we can - and have the filesystem know at little as possible about
> folios and pages if we can manage it.
>
> However, one thing that particularly stands out for me is that network
> filesystems often want to use the crypto API - and that means allocating and
> constructing a scatterlist to talk to the crypto API. Having spent some time
> looking at crypto API, in most places iteration functions are used that mean
> that changing to use an iov_iter might not be so hard.
>
> That said, one thing that is made use of occasionally with scatterlists is the
> ability to chain something on the front. That's significantly harder to do
> with iov_iter.
>
> That that said, one reason it's hard to modify the list attached to an
> iterator is that we allow iterators to be rewound, using state stored in the
> list to go backwards. I wonder if it might be possible to get rid of
> iov_iter_revert() and use iterator copying instead.
I don't have much to add other than that I am keenly interested in this
area.
The RPC client and server implementations are using bvecs and page
arrays for now. It would be more efficient if these consumers could
move folios from network to file system and back rather than splitting
them into individual pages, and use proper iterators to keep things
straightforward.
It would help us grow the maximum NFS rsize and wsize if we weren't
nailed to arrays of page-sized chunks of data.
--
Chuck Lever
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-31 16:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-17 21:16 [LSF/MM/BPF TOPIC] Improving iov_iter - and replacing scatterlists David Howells
2025-01-20 14:22 ` Leon Romanovsky
2025-01-20 19:03 ` John Hubbard
2025-01-31 16:08 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox