* [RFC]: Support for zero-copy TCP transmit of user space data
[not found] ` <4947FA1C.2090509@vlnb.net>
@ 2008-12-18 18:35 ` Vladislav Bolkhovitin
2008-12-18 18:43 ` David M. Lloyd
2008-12-19 11:27 ` Andi Kleen
0 siblings, 2 replies; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-18 18:35 UTC (permalink / raw)
To: linux-mm
Cc: Christoph Hellwig, James Bottomley, linux-scsi, linux-kernel,
scst-devel, Bart Van Assche, netdev
Hello linux-mm,
Recently I submitted a new SCSI target framework (SCST) and 4 target
drivers for it for the first iteration of review and comments. See
http://lkml.org/lkml/2008/12/10/245 for details.
An iSCSI target driver iSCSI-SCST was a part of the patchset
(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
have TCP zero-copy transmit of user space data was implemented. Patch,
implementing this optimization was also sent in the patchset, see
http://lkml.org/lkml/2008/12/10/296.
I would like to ask, if the approach used in this patch can be
acceptable from your point of view? I understand, that extending struct
page is a very much undesirable, but, from other side:
- This approach is very simple and straightforward. The patch is only
309 lines long, including comments. All other alternative
implementations would be at least an order of magnitude more complicated.
- Related kernel config option
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION should be disabled by
default in general distro kernels, so the would be no harm at all from
this patch. ISCSI-SCST can work without this patch or with
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION option disabled, although
with user space device handlers it will work considerably worse. Only
few distro kernels users need an iSCSI target and only few among such
users need to use user space device handlers. People who need both iSCSI
target *and* fast working user space device handlers would simply enable
that option and rebuild the kernel. Rejecting this patch provides much
worse alternative: those people would also have to *patch* the kernel at
first, only then enable that option, then rebuild the kernel.
- Although usage of struct page to keep network related pointer might
look as a layering violation, it isn't. I wrote in
http://lkml.org/lkml/2008/12/15/190 why.
Thanks,
Vlad
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-18 18:35 ` [RFC]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
@ 2008-12-18 18:43 ` David M. Lloyd
2008-12-19 17:37 ` Vladislav Bolkhovitin
2008-12-19 11:27 ` Andi Kleen
1 sibling, 1 reply; 12+ messages in thread
From: David M. Lloyd @ 2008-12-18 18:43 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
linux-kernel, scst-devel, Bart Van Assche, netdev
On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> An iSCSI target driver iSCSI-SCST was a part of the patchset
> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
> have TCP zero-copy transmit of user space data was implemented. Patch,
> implementing this optimization was also sent in the patchset, see
> http://lkml.org/lkml/2008/12/10/296.
I'm probably ignorant of about 90% of the context here, but isn't this the
sort of problem that was supposed to have been solved by vmsplice(2)?
- DML
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-18 18:35 ` [RFC]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-18 18:43 ` David M. Lloyd
@ 2008-12-19 11:27 ` Andi Kleen
2008-12-19 17:38 ` Vladislav Bolkhovitin
1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2008-12-19 11:27 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
linux-kernel, scst-devel, Bart Van Assche, netdev
Vladislav Bolkhovitin <vst@vlnb.net> writes:
>
> - Although usage of struct page to keep network related pointer might
> look as a layering violation, it isn't. I wrote in
> http://lkml.org/lkml/2008/12/15/190 why.
Sorry but extending struct page for this is really a bad idea because
of the extreme memory overhead even when it's not used (which is a
problem on distribution kernels) Find some other way to store this
information. Even for patches with more general value it was not
acceptable.
-Andi
--
ak@linux.intel.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-18 18:43 ` David M. Lloyd
@ 2008-12-19 17:37 ` Vladislav Bolkhovitin
2008-12-19 19:07 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 17:37 UTC (permalink / raw)
To: David M. Lloyd
Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
linux-kernel, scst-devel, Bart Van Assche, netdev
David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
>> An iSCSI target driver iSCSI-SCST was a part of the patchset
>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
>> have TCP zero-copy transmit of user space data was implemented. Patch,
>> implementing this optimization was also sent in the patchset, see
>> http://lkml.org/lkml/2008/12/10/296.
>
> I'm probably ignorant of about 90% of the context here, but isn't this the
> sort of problem that was supposed to have been solved by vmsplice(2)?
No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But,
even if it was a user space driver, vmsplice wouldn't change anything
much. It doesn't have a possibility for a user to know, when
transmission of the data finished. So, it is intended to be used as:
vmsplice() buffer -> munmap() the buffer -> mmap() new buffer ->
vmsplice() it. But on the mmap() stage kernel has to zero all the newly
mapped pages and zeroing memory isn't much faster, than copying it.
Hence, there would be no considerable performance increase.
Thanks,
Vlad
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 11:27 ` Andi Kleen
@ 2008-12-19 17:38 ` Vladislav Bolkhovitin
2008-12-19 18:00 ` Andi Kleen
0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 17:38 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
linux-kernel, scst-devel, Bart Van Assche, netdev
Andi Kleen, on 12/19/2008 02:27 PM wrote:
> Vladislav Bolkhovitin <vst@vlnb.net> writes:
>> - Although usage of struct page to keep network related pointer might
>> look as a layering violation, it isn't. I wrote in
>> http://lkml.org/lkml/2008/12/15/190 why.
>
> Sorry but extending struct page for this is really a bad idea because
> of the extreme memory overhead even when it's not used (which is a
> problem on distribution kernels) Find some other way to store this
> information. Even for patches with more general value it was not
> acceptable.
Sure, this is why I propose to disable that option by default in
distribution kernels, so it would produce no harm. ISCSI-SCST can work
in this configuration quite well too. People who need both iSCSI target
*and* fast working user space device handlers would simply enable that
option and rebuild the kernel. Rejecting this patch provides much worse
alternative: those people would also have to *patch* the kernel at
first, only then enable that option, then rebuild the kernel. (I'm
repeating it to make sure you didn't miss this my point; it was in the
part of my original message, which you cut out.)
Thanks,
Vlad
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 18:00 ` Andi Kleen
@ 2008-12-19 17:57 ` Vladislav Bolkhovitin
0 siblings, 0 replies; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 17:57 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
linux-kernel, scst-devel, Bart Van Assche, netdev
Andi Kleen, on 12/19/2008 09:00 PM wrote:
>> Sure, this is why I propose to disable that option by default in
>> distribution kernels, so it would produce no harm.
>
> That would make the option useless for most users. You might as well
> not bother merging then.
I believe 99.(9)% of users prefer don't patch kernel, if possible.
>> first, only then enable that option, then rebuild the kernel. (I'm
>> repeating it to make sure you didn't miss this my point; it was in the
>> part of my original message, which you cut out.)
>
> That was such a ridiculous suggestion, I didn't take it seriously.
>
> Also it should be really not rocket science to use a separate
> table for this.
Sorry, what do you mean? If usage of something like a hash table to map
pages to the corresponding iSCSI commands, this approach was evaluated
and rejected, because it wouldn't provide much performance increase,
which would worth the effort. See details in the end of the patch
description in http://lkml.org/lkml/2008/12/10/296
Thanks,
Vlad
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 17:38 ` Vladislav Bolkhovitin
@ 2008-12-19 18:00 ` Andi Kleen
2008-12-19 17:57 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2008-12-19 18:00 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: Andi Kleen, linux-mm, Christoph Hellwig, James Bottomley,
linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev
> Sure, this is why I propose to disable that option by default in
> distribution kernels, so it would produce no harm.
That would make the option useless for most users. You might as well
not bother merging then.
> first, only then enable that option, then rebuild the kernel. (I'm
> repeating it to make sure you didn't miss this my point; it was in the
> part of my original message, which you cut out.)
That was such a ridiculous suggestion, I didn't take it seriously.
Also it should be really not rocket science to use a separate
table for this.
-Andi
--
ak@linux.intel.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 17:37 ` Vladislav Bolkhovitin
@ 2008-12-19 19:07 ` Jens Axboe
2008-12-19 19:17 ` Vladislav Bolkhovitin
0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2008-12-19 19:07 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev
On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> >On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> >>An iSCSI target driver iSCSI-SCST was a part of the patchset
> >>(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
> >>have TCP zero-copy transmit of user space data was implemented. Patch,
> >>implementing this optimization was also sent in the patchset, see
> >>http://lkml.org/lkml/2008/12/10/296.
> >
> >I'm probably ignorant of about 90% of the context here, but isn't this the
> >sort of problem that was supposed to have been solved by vmsplice(2)?
>
> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But,
> even if it was a user space driver, vmsplice wouldn't change anything
> much. It doesn't have a possibility for a user to know, when
> transmission of the data finished. So, it is intended to be used as:
> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer ->
> vmsplice() it. But on the mmap() stage kernel has to zero all the newly
> mapped pages and zeroing memory isn't much faster, than copying it.
> Hence, there would be no considerable performance increase.
vmsplice() isn't the right choice, but splice() very well could be. You
could easily use splice internally as well. The vmsplice() part sort-of
applies in the sense that you want to fill pages into a pipe, which is
essentially what vmsplice() does. You'd need some helper to do that. And
the ack-on-xmit-done bits is something that splice-to-socket needs
anyway, so I think it'd be quite a suitable choice for this.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 19:07 ` Jens Axboe
@ 2008-12-19 19:17 ` Vladislav Bolkhovitin
2008-12-19 19:27 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 19:17 UTC (permalink / raw)
To: Jens Axboe
Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev
Jens Axboe, on 12/19/2008 10:07 PM wrote:
> On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
>> David M. Lloyd, on 12/18/2008 09:43 PM wrote:
>>> On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
>>>> An iSCSI target driver iSCSI-SCST was a part of the patchset
>>>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
>>>> have TCP zero-copy transmit of user space data was implemented. Patch,
>>>> implementing this optimization was also sent in the patchset, see
>>>> http://lkml.org/lkml/2008/12/10/296.
>>> I'm probably ignorant of about 90% of the context here, but isn't this the
>>> sort of problem that was supposed to have been solved by vmsplice(2)?
>> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But,
>> even if it was a user space driver, vmsplice wouldn't change anything
>> much. It doesn't have a possibility for a user to know, when
>> transmission of the data finished. So, it is intended to be used as:
>> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer ->
>> vmsplice() it. But on the mmap() stage kernel has to zero all the newly
>> mapped pages and zeroing memory isn't much faster, than copying it.
>> Hence, there would be no considerable performance increase.
>
> vmsplice() isn't the right choice, but splice() very well could be. You
> could easily use splice internally as well. The vmsplice() part sort-of
> applies in the sense that you want to fill pages into a pipe, which is
> essentially what vmsplice() does. You'd need some helper to do that.
Sorry, Jens, but splice() works only if there is a file handle on the
another side, so user space doesn't see data buffers. But SCST needs to
serve a wider usage cases, like reading data with decompression from a
virtual tape, where decompression is done in user space. For those only
complete zero-copy network send, which I implemented, can give the best
performance.
> And
> the ack-on-xmit-done bits is something that splice-to-socket needs
> anyway, so I think it'd be quite a suitable choice for this.
So, are you writing that splice() could also benefit from the zero-copy
transmit feature, like I implemented?
Thanks,
Vlad
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 19:17 ` Vladislav Bolkhovitin
@ 2008-12-19 19:27 ` Jens Axboe
2008-12-19 21:58 ` Evgeniy Polyakov
2008-12-23 19:11 ` Vladislav Bolkhovitin
0 siblings, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2008-12-19 19:27 UTC (permalink / raw)
To: Vladislav Bolkhovitin
Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev
On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> Jens Axboe, on 12/19/2008 10:07 PM wrote:
> >On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> >>David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> >>>On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> >>>>An iSCSI target driver iSCSI-SCST was a part of the patchset
> >>>>(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
> >>>>have TCP zero-copy transmit of user space data was implemented. Patch,
> >>>>implementing this optimization was also sent in the patchset, see
> >>>>http://lkml.org/lkml/2008/12/10/296.
> >>>I'm probably ignorant of about 90% of the context here, but isn't this
> >>>the sort of problem that was supposed to have been solved by vmsplice(2)?
> >>No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But,
> >>even if it was a user space driver, vmsplice wouldn't change anything
> >>much. It doesn't have a possibility for a user to know, when
> >>transmission of the data finished. So, it is intended to be used as:
> >>vmsplice() buffer -> munmap() the buffer -> mmap() new buffer ->
> >>vmsplice() it. But on the mmap() stage kernel has to zero all the newly
> >>mapped pages and zeroing memory isn't much faster, than copying it.
> >>Hence, there would be no considerable performance increase.
> >
> >vmsplice() isn't the right choice, but splice() very well could be. You
> >could easily use splice internally as well. The vmsplice() part sort-of
> >applies in the sense that you want to fill pages into a pipe, which is
> >essentially what vmsplice() does. You'd need some helper to do that.
>
> Sorry, Jens, but splice() works only if there is a file handle on the
> another side, so user space doesn't see data buffers. But SCST needs to
> serve a wider usage cases, like reading data with decompression from a
> virtual tape, where decompression is done in user space. For those only
> complete zero-copy network send, which I implemented, can give the best
> performance.
__splice_from_pipe() takes a pipe, a descriptor and an actor. There's
absolutely ZERO reason you could not reuse most of that for this
implementation. The big bonus here is that getting the put correct from
networking would even make splice() better for everyone. Win for Linux,
win for you since it'll make it MUCH easier for you to get this stuff
in. Looking at your original patch and I almost think it's a flame bait
to induce discussion (nothing wrong with that, that approach works quite
well and has been used before). There's no way in HELL that it'd ever be
a merge candidate. And I suspect you know that, at least I hope you do
or you are farther away from going forward with this than you think.
So don't look at splice() the system call, look at the infrastructure
and check if that could be useful for your case. To me it looks
absolutely like it could, if you goal is just zero-copy transmit. The
only missing piece is dropping the reference and signalling page
consumption at the right point, which is when the data is safe to be
reused. That very bit is missing, but that should be all as far as I can
tell.
> >And
> >the ack-on-xmit-done bits is something that splice-to-socket needs
> >anyway, so I think it'd be quite a suitable choice for this.
>
> So, are you writing that splice() could also benefit from the zero-copy
> transmit feature, like I implemented?
I like how you want to reinvent everything, perhaps you should spend a
little more time looking into various other approaches? splice() already
does zero-copy network transmit, there are no copies going on. Ideally,
you'd have zero copies moving data into your pipe, but migrade/move
isn't quite there yet. But that doesn't apply to your case at all.
What is missing, as I wrote, is the 'release on ack' and not on pipe
buffer release. This is similar to the get_page/put_page stuff you did
in your patch, but don't go claiming that zero-copy transmit is a
Vladislav original - the ->sendpage() does no copies.
--
Jens Axboe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 19:27 ` Jens Axboe
@ 2008-12-19 21:58 ` Evgeniy Polyakov
2008-12-23 19:11 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 12+ messages in thread
From: Evgeniy Polyakov @ 2008-12-19 21:58 UTC (permalink / raw)
To: Jens Axboe
Cc: Vladislav Bolkhovitin, David M. Lloyd, linux-mm,
Christoph Hellwig, James Bottomley, linux-scsi, linux-kernel,
scst-devel, Bart Van Assche, netdev
On Fri, Dec 19, 2008 at 08:27:36PM +0100, Jens Axboe (jens.axboe@oracle.com) wrote:
> What is missing, as I wrote, is the 'release on ack' and not on pipe
> buffer release. This is similar to the get_page/put_page stuff you did
> in your patch, but don't go claiming that zero-copy transmit is a
> Vladislav original - the ->sendpage() does no copies.
Just my small rant: it does, when underlying device does not support
hardware tx checksumming and scatter/gather, which is likely exception
than a rule for the modern NICs.
As of having notifications of the received ack (or from user's point of
view notification of the freeing of the buffer), I have following idea
in mind: extend skb ahsred info by copy of the frag array and additional
destructor field, which will be invoked when not only skb but also all
its clones are freed (that's when shared info is freed), so that user
could save some per-page context in fraglist and work with it when data
is not used anymore.
Extending page or skb structure is a no-go for sure, and actually even
shared info is not rubber, but there we can at least add something...
If only destructor field is allowed (similar patch was not rejected),
scsi can save its pages in the tree (indexed by the page pointer) and
traverse it when destructor is invoked selecting pages found in the
freed skb.
--
Evgeniy Polyakov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC]: Support for zero-copy TCP transmit of user space data
2008-12-19 19:27 ` Jens Axboe
2008-12-19 21:58 ` Evgeniy Polyakov
@ 2008-12-23 19:11 ` Vladislav Bolkhovitin
1 sibling, 0 replies; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-23 19:11 UTC (permalink / raw)
To: Jens Axboe
Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev
Jens Axboe, on 12/19/2008 10:27 PM wrote:
>>>>>> An iSCSI target driver iSCSI-SCST was a part of the patchset
>>>>>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
>>>>>> have TCP zero-copy transmit of user space data was implemented. Patch,
>>>>>> implementing this optimization was also sent in the patchset, see
>>>>>> http://lkml.org/lkml/2008/12/10/296.
>>>>> I'm probably ignorant of about 90% of the context here, but isn't this
>>>>> the sort of problem that was supposed to have been solved by vmsplice(2)?
>>>> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But,
>>>> even if it was a user space driver, vmsplice wouldn't change anything
>>>> much. It doesn't have a possibility for a user to know, when
>>>> transmission of the data finished. So, it is intended to be used as:
>>>> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer ->
>>>> vmsplice() it. But on the mmap() stage kernel has to zero all the newly
>>>> mapped pages and zeroing memory isn't much faster, than copying it.
>>>> Hence, there would be no considerable performance increase.
>>> vmsplice() isn't the right choice, but splice() very well could be. You
>>> could easily use splice internally as well. The vmsplice() part sort-of
>>> applies in the sense that you want to fill pages into a pipe, which is
>>> essentially what vmsplice() does. You'd need some helper to do that.
>> Sorry, Jens, but splice() works only if there is a file handle on the
>> another side, so user space doesn't see data buffers. But SCST needs to
>> serve a wider usage cases, like reading data with decompression from a
>> virtual tape, where decompression is done in user space. For those only
>> complete zero-copy network send, which I implemented, can give the best
>> performance.
>
> __splice_from_pipe() takes a pipe, a descriptor and an actor. There's
> absolutely ZERO reason you could not reuse most of that for this
> implementation. The big bonus here is that getting the put correct from
> networking would even make splice() better for everyone. Win for Linux,
> win for you since it'll make it MUCH easier for you to get this stuff
> in. Looking at your original patch and I almost think it's a flame bait
> to induce discussion (nothing wrong with that, that approach works quite
> well and has been used before). There's no way in HELL that it'd ever be
> a merge candidate. And I suspect you know that, at least I hope you do
> or you are farther away from going forward with this than you think.
>
> So don't look at splice() the system call, look at the infrastructure
> and check if that could be useful for your case. To me it looks
> absolutely like it could, if you goal is just zero-copy transmit.
I looked at the splice code again to make sure I don't miss anything.
__splice_from_pipe() leads to pipe_to_sendpage(), which leads to
sock_sendpage, then to sock->sendpage(). Sorry, but I don't see any
point why to go over all the complicated splice infrastructure instead
of directly call sock->sendpage(), as I do.
> The
> only missing piece is dropping the reference and signalling page
> consumption at the right point, which is when the data is safe to be
> reused. That very bit is missing, but that should be all as far as I can
> tell.
This is exactly what I implemented in the patch we are discussing.
>>> And
>>> the ack-on-xmit-done bits is something that splice-to-socket needs
>>> anyway, so I think it'd be quite a suitable choice for this.
>> So, are you writing that splice() could also benefit from the zero-copy
>> transmit feature, like I implemented?
>
> I like how you want to reinvent everything, perhaps you should spend a
> little more time looking into various other approaches? splice() already
> does zero-copy network transmit, there are no copies going on. Ideally,
> you'd have zero copies moving data into your pipe, but migrade/move
> isn't quite there yet. But that doesn't apply to your case at all.
>
> What is missing, as I wrote, is the 'release on ack' and not on pipe
> buffer release. This is similar to the get_page/put_page stuff you did
> in your patch, but don't go claiming that zero-copy transmit is a
> Vladislav original - the ->sendpage() does no copies.
Jens, I have never claimed I reinvented ->sendpage(). Quite opposite, I
use it. I only extended it by a missing feature. Although, seems, since
you were misleaded, I should apologize for not too good description of
the patch.
Thanks,
Vlad
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-12-23 19:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <494009D7.4020602@vlnb.net>
[not found] ` <494012C4.7090304@vlnb.net>
[not found] ` <20081210214500.GA24212@ioremap.net>
[not found] ` <4941590F.3070705@vlnb.net>
[not found] ` <1229022734.3266.67.camel@localhost.localdomain>
[not found] ` <4942BAB8.4050007@vlnb.net>
[not found] ` <1229110673.3262.94.camel@localhost.localdomain>
[not found] ` <49469ADB.6010709@vlnb.net>
[not found] ` <20081215231801.GA27168@infradead.org>
[not found] ` <4947FA1C.2090509@vlnb.net>
2008-12-18 18:35 ` [RFC]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-18 18:43 ` David M. Lloyd
2008-12-19 17:37 ` Vladislav Bolkhovitin
2008-12-19 19:07 ` Jens Axboe
2008-12-19 19:17 ` Vladislav Bolkhovitin
2008-12-19 19:27 ` Jens Axboe
2008-12-19 21:58 ` Evgeniy Polyakov
2008-12-23 19:11 ` Vladislav Bolkhovitin
2008-12-19 11:27 ` Andi Kleen
2008-12-19 17:38 ` Vladislav Bolkhovitin
2008-12-19 18:00 ` Andi Kleen
2008-12-19 17:57 ` Vladislav Bolkhovitin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox