linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC]: Support for zero-copy TCP transmit of user space data
       [not found]                 ` <4947FA1C.2090509@vlnb.net>
@ 2008-12-18 18:35                   ` Vladislav Bolkhovitin
  2008-12-18 18:43                     ` David M. Lloyd
  2008-12-19 11:27                     ` Andi Kleen
  0 siblings, 2 replies; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-18 18:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, linux-kernel,
	scst-devel, Bart Van Assche, netdev

Hello linux-mm,

Recently I submitted a new SCSI target framework (SCST) and 4 target 
drivers for it for the first iteration of review and comments. See 
http://lkml.org/lkml/2008/12/10/245 for details.

An iSCSI target driver iSCSI-SCST was a part of the patchset 
(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
have TCP zero-copy transmit of user space data was implemented. Patch, 
implementing this optimization was also sent in the patchset, see 
http://lkml.org/lkml/2008/12/10/296.

I would like to ask, if the approach used in this patch can be 
acceptable from your point of view? I understand, that extending struct 
page is a very much undesirable, but, from other side:

  - This approach is very simple and straightforward. The patch is only 
309 lines long, including comments. All other alternative 
implementations would be at least an order of magnitude more complicated.

  - Related kernel config option 
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION should be disabled by 
default in general distro kernels, so the would be no harm at all from 
this patch. ISCSI-SCST can work without this patch or with 
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION option disabled, although 
with user space device handlers it will work considerably worse. Only 
few distro kernels users need an iSCSI target and only few among such 
users need to use user space device handlers. People who need both iSCSI 
target *and* fast working user space device handlers would simply enable 
that option and rebuild the kernel. Rejecting this patch provides much 
worse alternative: those people would also have to *patch* the kernel at 
first, only then enable that option, then rebuild the kernel.

  - Although usage of struct page to keep network related pointer might 
look as a layering violation, it isn't. I wrote in 
http://lkml.org/lkml/2008/12/15/190 why.

Thanks,
Vlad

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-18 18:35                   ` [RFC]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
@ 2008-12-18 18:43                     ` David M. Lloyd
  2008-12-19 17:37                       ` Vladislav Bolkhovitin
  2008-12-19 11:27                     ` Andi Kleen
  1 sibling, 1 reply; 12+ messages in thread
From: David M. Lloyd @ 2008-12-18 18:43 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
	linux-kernel, scst-devel, Bart Van Assche, netdev

On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> An iSCSI target driver iSCSI-SCST was a part of the patchset 
> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
> have TCP zero-copy transmit of user space data was implemented. Patch, 
> implementing this optimization was also sent in the patchset, see 
> http://lkml.org/lkml/2008/12/10/296.

I'm probably ignorant of about 90% of the context here, but isn't this the 
sort of problem that was supposed to have been solved by vmsplice(2)?

- DML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-18 18:35                   ` [RFC]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
  2008-12-18 18:43                     ` David M. Lloyd
@ 2008-12-19 11:27                     ` Andi Kleen
  2008-12-19 17:38                       ` Vladislav Bolkhovitin
  1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2008-12-19 11:27 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
	linux-kernel, scst-devel, Bart Van Assche, netdev

Vladislav Bolkhovitin <vst@vlnb.net> writes:
>
>  - Although usage of struct page to keep network related pointer might
> look as a layering violation, it isn't. I wrote in
> http://lkml.org/lkml/2008/12/15/190 why.

Sorry but extending struct page for this is really a bad idea because
of the extreme memory overhead even when it's not used (which is a 
problem on distribution kernels) Find some other way to store this
information.  Even for patches with more general value it was not
acceptable.

-Andi


-- 
ak@linux.intel.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-18 18:43                     ` David M. Lloyd
@ 2008-12-19 17:37                       ` Vladislav Bolkhovitin
  2008-12-19 19:07                         ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 17:37 UTC (permalink / raw)
  To: David M. Lloyd
  Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
	linux-kernel, scst-devel, Bart Van Assche, netdev

David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
>> An iSCSI target driver iSCSI-SCST was a part of the patchset 
>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
>> have TCP zero-copy transmit of user space data was implemented. Patch, 
>> implementing this optimization was also sent in the patchset, see 
>> http://lkml.org/lkml/2008/12/10/296.
> 
> I'm probably ignorant of about 90% of the context here, but isn't this the 
> sort of problem that was supposed to have been solved by vmsplice(2)?

No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, 
even if it was a user space driver, vmsplice wouldn't change anything 
much. It doesn't have a possibility for a user to know, when 
transmission of the data finished. So, it is intended to be used as: 
vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> 
vmsplice() it. But on the mmap() stage kernel has to zero all the newly 
mapped pages and zeroing memory isn't much faster, than copying it. 
Hence, there would be no considerable performance increase.

Thanks,
Vlad

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 11:27                     ` Andi Kleen
@ 2008-12-19 17:38                       ` Vladislav Bolkhovitin
  2008-12-19 18:00                         ` Andi Kleen
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 17:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
	linux-kernel, scst-devel, Bart Van Assche, netdev

Andi Kleen, on 12/19/2008 02:27 PM wrote:
> Vladislav Bolkhovitin <vst@vlnb.net> writes:
>>  - Although usage of struct page to keep network related pointer might
>> look as a layering violation, it isn't. I wrote in
>> http://lkml.org/lkml/2008/12/15/190 why.
> 
> Sorry but extending struct page for this is really a bad idea because
> of the extreme memory overhead even when it's not used (which is a 
> problem on distribution kernels) Find some other way to store this
> information.  Even for patches with more general value it was not
> acceptable.

Sure, this is why I propose to disable that option by default in 
distribution kernels, so it would produce no harm. ISCSI-SCST can work 
in this configuration quite well too. People who need both iSCSI target 
*and* fast working user space device handlers would simply enable that 
option and rebuild the kernel. Rejecting this patch provides much worse 
alternative: those people would also have to *patch* the kernel at 
first, only then enable that option, then rebuild the kernel. (I'm 
repeating it to make sure you didn't miss this my point; it was in the 
part of my original message, which you cut out.)

Thanks,
Vlad

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 18:00                         ` Andi Kleen
@ 2008-12-19 17:57                           ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 17:57 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-mm, Christoph Hellwig, James Bottomley, linux-scsi,
	linux-kernel, scst-devel, Bart Van Assche, netdev

Andi Kleen, on 12/19/2008 09:00 PM wrote:
>> Sure, this is why I propose to disable that option by default in 
>> distribution kernels, so it would produce no harm.
> 
> That would make the option useless for most users. You might as well
> not bother merging then.

I believe 99.(9)% of users prefer don't patch kernel, if possible.

>> first, only then enable that option, then rebuild the kernel. (I'm 
>> repeating it to make sure you didn't miss this my point; it was in the 
>> part of my original message, which you cut out.)
> 
> That was such a ridiculous suggestion, I didn't take it seriously.
> 
> Also it should be really not rocket science to use a separate 
> table for this.

Sorry, what do you mean? If usage of something like a hash table to map 
pages to the corresponding iSCSI commands, this approach was evaluated 
and rejected, because it wouldn't provide much performance increase, 
which would worth the effort. See details in the end of the patch 
description in http://lkml.org/lkml/2008/12/10/296

Thanks,
Vlad

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 17:38                       ` Vladislav Bolkhovitin
@ 2008-12-19 18:00                         ` Andi Kleen
  2008-12-19 17:57                           ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2008-12-19 18:00 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Andi Kleen, linux-mm, Christoph Hellwig, James Bottomley,
	linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev

> Sure, this is why I propose to disable that option by default in 
> distribution kernels, so it would produce no harm.

That would make the option useless for most users. You might as well
not bother merging then.

> first, only then enable that option, then rebuild the kernel. (I'm 
> repeating it to make sure you didn't miss this my point; it was in the 
> part of my original message, which you cut out.)

That was such a ridiculous suggestion, I didn't take it seriously.

Also it should be really not rocket science to use a separate 
table for this.

-Andi

-- 
ak@linux.intel.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 17:37                       ` Vladislav Bolkhovitin
@ 2008-12-19 19:07                         ` Jens Axboe
  2008-12-19 19:17                           ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2008-12-19 19:07 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
	linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev

On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> >On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> >>An iSCSI target driver iSCSI-SCST was a part of the patchset 
> >>(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
> >>have TCP zero-copy transmit of user space data was implemented. Patch, 
> >>implementing this optimization was also sent in the patchset, see 
> >>http://lkml.org/lkml/2008/12/10/296.
> >
> >I'm probably ignorant of about 90% of the context here, but isn't this the 
> >sort of problem that was supposed to have been solved by vmsplice(2)?
> 
> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, 
> even if it was a user space driver, vmsplice wouldn't change anything 
> much. It doesn't have a possibility for a user to know, when 
> transmission of the data finished. So, it is intended to be used as: 
> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> 
> vmsplice() it. But on the mmap() stage kernel has to zero all the newly 
> mapped pages and zeroing memory isn't much faster, than copying it. 
> Hence, there would be no considerable performance increase.

vmsplice() isn't the right choice, but splice() very well could be. You
could easily use splice internally as well. The vmsplice() part sort-of
applies in the sense that you want to fill pages into a pipe, which is
essentially what vmsplice() does. You'd need some helper to do that. And
the ack-on-xmit-done bits is something that splice-to-socket needs
anyway, so I think it'd be quite a suitable choice for this.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 19:07                         ` Jens Axboe
@ 2008-12-19 19:17                           ` Vladislav Bolkhovitin
  2008-12-19 19:27                             ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-19 19:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
	linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev

Jens Axboe, on 12/19/2008 10:07 PM wrote:
> On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
>> David M. Lloyd, on 12/18/2008 09:43 PM wrote:
>>> On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
>>>> An iSCSI target driver iSCSI-SCST was a part of the patchset 
>>>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
>>>> have TCP zero-copy transmit of user space data was implemented. Patch, 
>>>> implementing this optimization was also sent in the patchset, see 
>>>> http://lkml.org/lkml/2008/12/10/296.
>>> I'm probably ignorant of about 90% of the context here, but isn't this the 
>>> sort of problem that was supposed to have been solved by vmsplice(2)?
>> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, 
>> even if it was a user space driver, vmsplice wouldn't change anything 
>> much. It doesn't have a possibility for a user to know, when 
>> transmission of the data finished. So, it is intended to be used as: 
>> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> 
>> vmsplice() it. But on the mmap() stage kernel has to zero all the newly 
>> mapped pages and zeroing memory isn't much faster, than copying it. 
>> Hence, there would be no considerable performance increase.
> 
> vmsplice() isn't the right choice, but splice() very well could be. You
> could easily use splice internally as well. The vmsplice() part sort-of
> applies in the sense that you want to fill pages into a pipe, which is
> essentially what vmsplice() does. You'd need some helper to do that.

Sorry, Jens, but splice() works only if there is a file handle on the 
another side, so user space doesn't see data buffers. But SCST needs to 
serve a wider usage cases, like reading data with decompression from a 
virtual tape, where decompression is done in user space. For those only 
complete zero-copy network send, which I implemented, can give the best 
performance.

> And
> the ack-on-xmit-done bits is something that splice-to-socket needs
> anyway, so I think it'd be quite a suitable choice for this.

So, are you writing that splice() could also benefit from the zero-copy 
transmit feature, like I implemented?

Thanks,
Vlad


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 19:17                           ` Vladislav Bolkhovitin
@ 2008-12-19 19:27                             ` Jens Axboe
  2008-12-19 21:58                               ` Evgeniy Polyakov
  2008-12-23 19:11                               ` Vladislav Bolkhovitin
  0 siblings, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2008-12-19 19:27 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
	linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev

On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> Jens Axboe, on 12/19/2008 10:07 PM wrote:
> >On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> >>David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> >>>On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> >>>>An iSCSI target driver iSCSI-SCST was a part of the patchset 
> >>>>(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
> >>>>have TCP zero-copy transmit of user space data was implemented. Patch, 
> >>>>implementing this optimization was also sent in the patchset, see 
> >>>>http://lkml.org/lkml/2008/12/10/296.
> >>>I'm probably ignorant of about 90% of the context here, but isn't this 
> >>>the sort of problem that was supposed to have been solved by vmsplice(2)?
> >>No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, 
> >>even if it was a user space driver, vmsplice wouldn't change anything 
> >>much. It doesn't have a possibility for a user to know, when 
> >>transmission of the data finished. So, it is intended to be used as: 
> >>vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> 
> >>vmsplice() it. But on the mmap() stage kernel has to zero all the newly 
> >>mapped pages and zeroing memory isn't much faster, than copying it. 
> >>Hence, there would be no considerable performance increase.
> >
> >vmsplice() isn't the right choice, but splice() very well could be. You
> >could easily use splice internally as well. The vmsplice() part sort-of
> >applies in the sense that you want to fill pages into a pipe, which is
> >essentially what vmsplice() does. You'd need some helper to do that.
> 
> Sorry, Jens, but splice() works only if there is a file handle on the 
> another side, so user space doesn't see data buffers. But SCST needs to 
> serve a wider usage cases, like reading data with decompression from a 
> virtual tape, where decompression is done in user space. For those only 
> complete zero-copy network send, which I implemented, can give the best 
> performance.

__splice_from_pipe() takes a pipe, a descriptor and an actor. There's
absolutely ZERO reason you could not reuse most of that for this
implementation. The big bonus here is that getting the put correct from
networking would even make splice() better for everyone. Win for Linux,
win for you since it'll make it MUCH easier for you to get this stuff
in. Looking at your original patch and I almost think it's a flame bait
to induce discussion (nothing wrong with that, that approach works quite
well and has been used before). There's no way in HELL that it'd ever be
a merge candidate. And I suspect you know that, at least I hope you do
or you are farther away from going forward with this than you think.

So don't look at splice() the system call, look at the infrastructure
and check if that could be useful for your case. To me it looks
absolutely like it could, if you goal is just zero-copy transmit. The
only missing piece is dropping the reference and signalling page
consumption at the right point, which is when the data is safe to be
reused. That very bit is missing, but that should be all as far as I can
tell.

> >And
> >the ack-on-xmit-done bits is something that splice-to-socket needs
> >anyway, so I think it'd be quite a suitable choice for this.
> 
> So, are you writing that splice() could also benefit from the zero-copy 
> transmit feature, like I implemented?

I like how you want to reinvent everything, perhaps you should spend a
little more time looking into various other approaches? splice() already
does zero-copy network transmit, there are no copies going on. Ideally,
you'd have zero copies moving data into your pipe, but migrade/move
isn't quite there yet. But that doesn't apply to your case at all.

What is missing, as I wrote, is the 'release on ack' and not on pipe
buffer release. This is similar to the get_page/put_page stuff you did
in your patch, but don't go claiming that zero-copy transmit is a
Vladislav original - the ->sendpage() does no copies.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 19:27                             ` Jens Axboe
@ 2008-12-19 21:58                               ` Evgeniy Polyakov
  2008-12-23 19:11                               ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 12+ messages in thread
From: Evgeniy Polyakov @ 2008-12-19 21:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Vladislav Bolkhovitin, David M. Lloyd, linux-mm,
	Christoph Hellwig, James Bottomley, linux-scsi, linux-kernel,
	scst-devel, Bart Van Assche, netdev

On Fri, Dec 19, 2008 at 08:27:36PM +0100, Jens Axboe (jens.axboe@oracle.com) wrote:
> What is missing, as I wrote, is the 'release on ack' and not on pipe
> buffer release. This is similar to the get_page/put_page stuff you did
> in your patch, but don't go claiming that zero-copy transmit is a
> Vladislav original - the ->sendpage() does no copies.

Just my small rant: it does, when underlying device does not support
hardware tx checksumming and scatter/gather, which is likely exception
than a rule for the modern NICs.

As of having notifications of the received ack (or from user's point of
view notification of the freeing of the buffer), I have following idea
in mind: extend skb ahsred info by copy of the frag array and additional
destructor field, which will be invoked when not only skb but also all
its clones are freed (that's when shared info is freed), so that user
could save some per-page context in fraglist and work with it when data
is not used anymore.

Extending page or skb structure is a no-go for sure, and actually even
shared info is not rubber, but there we can at least add something...

If only destructor field is allowed (similar patch was not rejected),
scsi can save its pages in the tree (indexed by the page pointer) and
traverse it when destructor is invoked selecting pages found in the
freed skb.

-- 
	Evgeniy Polyakov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC]: Support for zero-copy TCP transmit of user space data
  2008-12-19 19:27                             ` Jens Axboe
  2008-12-19 21:58                               ` Evgeniy Polyakov
@ 2008-12-23 19:11                               ` Vladislav Bolkhovitin
  1 sibling, 0 replies; 12+ messages in thread
From: Vladislav Bolkhovitin @ 2008-12-23 19:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: David M. Lloyd, linux-mm, Christoph Hellwig, James Bottomley,
	linux-scsi, linux-kernel, scst-devel, Bart Van Assche, netdev

Jens Axboe, on 12/19/2008 10:27 PM wrote:
>>>>>> An iSCSI target driver iSCSI-SCST was a part of the patchset 
>>>>>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
>>>>>> have TCP zero-copy transmit of user space data was implemented. Patch, 
>>>>>> implementing this optimization was also sent in the patchset, see 
>>>>>> http://lkml.org/lkml/2008/12/10/296.
>>>>> I'm probably ignorant of about 90% of the context here, but isn't this 
>>>>> the sort of problem that was supposed to have been solved by vmsplice(2)?
>>>> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, 
>>>> even if it was a user space driver, vmsplice wouldn't change anything 
>>>> much. It doesn't have a possibility for a user to know, when 
>>>> transmission of the data finished. So, it is intended to be used as: 
>>>> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> 
>>>> vmsplice() it. But on the mmap() stage kernel has to zero all the newly 
>>>> mapped pages and zeroing memory isn't much faster, than copying it. 
>>>> Hence, there would be no considerable performance increase.
>>> vmsplice() isn't the right choice, but splice() very well could be. You
>>> could easily use splice internally as well. The vmsplice() part sort-of
>>> applies in the sense that you want to fill pages into a pipe, which is
>>> essentially what vmsplice() does. You'd need some helper to do that.
>> Sorry, Jens, but splice() works only if there is a file handle on the 
>> another side, so user space doesn't see data buffers. But SCST needs to 
>> serve a wider usage cases, like reading data with decompression from a 
>> virtual tape, where decompression is done in user space. For those only 
>> complete zero-copy network send, which I implemented, can give the best 
>> performance.
> 
> __splice_from_pipe() takes a pipe, a descriptor and an actor. There's
> absolutely ZERO reason you could not reuse most of that for this
> implementation. The big bonus here is that getting the put correct from
> networking would even make splice() better for everyone. Win for Linux,
> win for you since it'll make it MUCH easier for you to get this stuff
> in. Looking at your original patch and I almost think it's a flame bait
> to induce discussion (nothing wrong with that, that approach works quite
> well and has been used before). There's no way in HELL that it'd ever be
> a merge candidate. And I suspect you know that, at least I hope you do
> or you are farther away from going forward with this than you think.
> 
> So don't look at splice() the system call, look at the infrastructure
> and check if that could be useful for your case. To me it looks
> absolutely like it could, if you goal is just zero-copy transmit.

I looked at the splice code again to make sure I don't miss anything. 
__splice_from_pipe() leads to pipe_to_sendpage(), which leads to 
sock_sendpage, then to sock->sendpage(). Sorry, but I don't see any 
point why to go over all the complicated splice infrastructure instead 
of directly call sock->sendpage(), as I do.

> The
> only missing piece is dropping the reference and signalling page
> consumption at the right point, which is when the data is safe to be
> reused. That very bit is missing, but that should be all as far as I can
> tell.

This is exactly what I implemented in the patch we are discussing.

>>> And
>>> the ack-on-xmit-done bits is something that splice-to-socket needs
>>> anyway, so I think it'd be quite a suitable choice for this.
>> So, are you writing that splice() could also benefit from the zero-copy 
>> transmit feature, like I implemented?
> 
> I like how you want to reinvent everything, perhaps you should spend a
> little more time looking into various other approaches? splice() already
> does zero-copy network transmit, there are no copies going on. Ideally,
> you'd have zero copies moving data into your pipe, but migrade/move
> isn't quite there yet. But that doesn't apply to your case at all.
> 
> What is missing, as I wrote, is the 'release on ack' and not on pipe
> buffer release. This is similar to the get_page/put_page stuff you did
> in your patch, but don't go claiming that zero-copy transmit is a
> Vladislav original - the ->sendpage() does no copies.

Jens, I have never claimed I reinvented ->sendpage(). Quite opposite, I 
use it. I only extended it by a missing feature. Although, seems, since 
you were misleaded, I should apologize for not too good description of 
the patch.

Thanks,
Vlad



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-12-23 19:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <494009D7.4020602@vlnb.net>
     [not found] ` <494012C4.7090304@vlnb.net>
     [not found]   ` <20081210214500.GA24212@ioremap.net>
     [not found]     ` <4941590F.3070705@vlnb.net>
     [not found]       ` <1229022734.3266.67.camel@localhost.localdomain>
     [not found]         ` <4942BAB8.4050007@vlnb.net>
     [not found]           ` <1229110673.3262.94.camel@localhost.localdomain>
     [not found]             ` <49469ADB.6010709@vlnb.net>
     [not found]               ` <20081215231801.GA27168@infradead.org>
     [not found]                 ` <4947FA1C.2090509@vlnb.net>
2008-12-18 18:35                   ` [RFC]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-18 18:43                     ` David M. Lloyd
2008-12-19 17:37                       ` Vladislav Bolkhovitin
2008-12-19 19:07                         ` Jens Axboe
2008-12-19 19:17                           ` Vladislav Bolkhovitin
2008-12-19 19:27                             ` Jens Axboe
2008-12-19 21:58                               ` Evgeniy Polyakov
2008-12-23 19:11                               ` Vladislav Bolkhovitin
2008-12-19 11:27                     ` Andi Kleen
2008-12-19 17:38                       ` Vladislav Bolkhovitin
2008-12-19 18:00                         ` Andi Kleen
2008-12-19 17:57                           ` Vladislav Bolkhovitin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox