linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: David Howells <dhowells@redhat.com>, netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	David Ahern <dsahern@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>, Jeff Layton <jlayton@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	Chuck Lever III <chuck.lever@oracle.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH net-next v7 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES
Date: Thu, 18 May 2023 10:28:29 +0200	[thread overview]
Message-ID: <93aba6cc363e94a6efe433b3c77ec1b6b54f2919.camel@redhat.com> (raw)
In-Reply-To: <20230515093345.396978-4-dhowells@redhat.com>

On Mon, 2023-05-15 at 10:33 +0100, David Howells wrote:
> Add a function to handle MSG_SPLICE_PAGES being passed internally to
> sendmsg().  Pages are spliced into the given socket buffer if possible and
> copied in if not (e.g. they're slab pages or have a zero refcount).
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Eric Dumazet <edumazet@google.com>
> cc: "David S. Miller" <davem@davemloft.net>
> cc: David Ahern <dsahern@kernel.org>
> cc: Jakub Kicinski <kuba@kernel.org>
> cc: Paolo Abeni <pabeni@redhat.com>
> cc: Al Viro <viro@zeniv.linux.org.uk>
> cc: Jens Axboe <axboe@kernel.dk>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: netdev@vger.kernel.org
> ---
> 
> Notes:
>     ver #7)
>      - Export function.
>      - Never copy data, return -EIO if sendpage_ok() returns false.
> 
>  include/linux/skbuff.h |  3 ++
>  net/core/skbuff.c      | 95 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 4c0ad48e38ca..1c5f0ac6f8c3 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -5097,5 +5097,8 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb)
>  #endif
>  }
>  
> +ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
> +			     ssize_t maxsize, gfp_t gfp);
> +
>  #endif	/* __KERNEL__ */
>  #endif	/* _LINUX_SKBUFF_H */
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7f53dcb26ad3..56d629ea2f3d 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6892,3 +6892,98 @@ nodefer:	__kfree_skb(skb);
>  	if (unlikely(kick) && !cmpxchg(&sd->defer_ipi_scheduled, 0, 1))
>  		smp_call_function_single_async(cpu, &sd->defer_csd);
>  }
> +
> +static void skb_splice_csum_page(struct sk_buff *skb, struct page *page,
> +				 size_t offset, size_t len)
> +{
> +	const char *kaddr;
> +	__wsum csum;
> +
> +	kaddr = kmap_local_page(page);
> +	csum = csum_partial(kaddr + offset, len, 0);
> +	kunmap_local(kaddr);
> +	skb->csum = csum_block_add(skb->csum, csum, skb->len);
> +}
> +
> +/**
> + * skb_splice_from_iter - Splice (or copy) pages to skbuff
> + * @skb: The buffer to add pages to
> + * @iter: Iterator representing the pages to be added
> + * @maxsize: Maximum amount of pages to be added
> + * @gfp: Allocation flags
> + *
> + * This is a common helper function for supporting MSG_SPLICE_PAGES.  It
> + * extracts pages from an iterator and adds them to the socket buffer if
> + * possible, copying them to fragments if not possible (such as if they're slab
> + * pages).
> + *
> + * Returns the amount of data spliced/copied or -EMSGSIZE if there's
> + * insufficient space in the buffer to transfer anything.
> + */
> +ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
> +			     ssize_t maxsize, gfp_t gfp)
> +{
> +	struct page *pages[8], **ppages = pages;
> +	unsigned int i;
> +	ssize_t spliced = 0, ret = 0;
> +	size_t frag_limit = READ_ONCE(sysctl_max_skb_frags);

Minor nit: please respect the reverse x-mas tree order (there are a few
other occurrences around)

> +
> +	while (iter->count > 0) {
> +		ssize_t space, nr;
> +		size_t off, len;
> +
> +		ret = -EMSGSIZE;
> +		space = frag_limit - skb_shinfo(skb)->nr_frags;
> +		if (space < 0)
> +			break;
> +
> +		/* We might be able to coalesce without increasing nr_frags */
> +		nr = clamp_t(size_t, space, 1, ARRAY_SIZE(pages));
> +
> +		len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off);
> +		if (len <= 0) {
> +			ret = len ?: -EIO;
> +			break;
> +		}
> +
> +		if (space == 0 &&
> +		    !skb_can_coalesce(skb, skb_shinfo(skb)->nr_frags,
> +				      pages[0], off)) {
> +			iov_iter_revert(iter, len);
> +			break;
> +		}

It looks like the above condition/checks duplicate what the later
skb_append_pagefrags() will perform below. I guess the above chunk
could be removed?

> +
> +		i = 0;
> +		do {
> +			struct page *page = pages[i++];
> +			size_t part = min_t(size_t, PAGE_SIZE - off, len);
> +
> +			ret = -EIO;
> +			if (!sendpage_ok(page))
> +				goto out;

My (limited) understanding is that the current sendpage code assumes
that the caller provides/uses pages suitable for such use. The existing
sendpage_ok() check is in place as way to try to catch possible code
bug - via the WARN_ONCE().

I think the same could be done here?

Thanks!

Paolo

> +
> +			ret = skb_append_pagefrags(skb, page, off, part,
> +						   frag_limit);
> +			if (ret < 0) {
> +				iov_iter_revert(iter, len);
> +				goto out;
> +			}
> +
> +			if (skb->ip_summed == CHECKSUM_NONE)
> +				skb_splice_csum_page(skb, page, off, part);
> +
> +			off = 0;
> +			spliced += part;
> +			maxsize -= part;
> +			len -= part;
> +		} while (len > 0);
> +
> +		if (maxsize <= 0)
> +			break;
> +	}
> +
> +out:
> +	skb_len_add(skb, spliced);
> +	return spliced ?: ret;
> +}
> +EXPORT_SYMBOL(skb_splice_from_iter);
> 



  reply	other threads:[~2023-05-18  8:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15  9:33 [PATCH net-next v7 00/16] splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1 David Howells
2023-05-15  9:33 ` [PATCH net-next v7 01/16] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag David Howells
2023-05-15  9:33 ` [PATCH net-next v7 02/16] net: Pass max frags into skb_append_pagefrags() David Howells
2023-05-15  9:33 ` [PATCH net-next v7 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells
2023-05-18  8:28   ` Paolo Abeni [this message]
2023-05-18  9:53   ` David Howells
2023-05-18 10:25     ` Paolo Abeni
2023-05-18 10:32     ` David Howells
2023-05-18 11:10       ` Paolo Abeni
2023-05-18 11:34       ` David Howells
2023-05-15  9:33 ` [PATCH net-next v7 04/16] tcp: Support MSG_SPLICE_PAGES David Howells
2023-05-15  9:33 ` [PATCH net-next v7 05/16] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES David Howells
2023-05-15  9:33 ` [PATCH net-next v7 06/16] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg David Howells
2023-05-15  9:33 ` [PATCH net-next v7 07/16] espintcp: Inline do_tcp_sendpages() David Howells
2023-05-15  9:33 ` [PATCH net-next v7 08/16] tls: " David Howells
2023-05-15  9:33 ` [PATCH net-next v7 09/16] siw: " David Howells
2023-05-15  9:33 ` [PATCH net-next v7 10/16] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() David Howells
2023-05-15  9:33 ` [PATCH net-next v7 11/16] ip, udp: Support MSG_SPLICE_PAGES David Howells
2023-05-15  9:33 ` [PATCH net-next v7 12/16] ip6, udp6: " David Howells
2023-05-15  9:33 ` [PATCH net-next v7 13/16] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-05-15  9:33 ` [PATCH net-next v7 14/16] ip: Remove ip_append_page() David Howells
2023-05-15  9:33 ` [PATCH net-next v7 15/16] af_unix: Support MSG_SPLICE_PAGES David Howells
2023-05-15  9:33 ` [PATCH net-next v7 16/16] unix: Convert udp_sendpage() to use MSG_SPLICE_PAGES David Howells
2023-05-18  9:56 ` [PATCH net-next v8 03/16] net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93aba6cc363e94a6efe433b3c77ec1b6b54f2919.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=jlayton@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox