From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B057C77B6C for ; Fri, 7 Apr 2023 02:13:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0AAF6B0074; Thu, 6 Apr 2023 22:13:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB9896B0075; Thu, 6 Apr 2023 22:13:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA9196B0078; Thu, 6 Apr 2023 22:13:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CCF4C6B0074 for ; Thu, 6 Apr 2023 22:13:54 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9E275A0FAC for ; Fri, 7 Apr 2023 02:13:54 +0000 (UTC) X-FDA: 80652974388.21.C5CCF75 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id CE6F1120005 for ; Fri, 7 Apr 2023 02:13:51 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VkjV5PM8; spf=pass (imf29.hostedemail.com: domain of dsahern@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=dsahern@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680833631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ezQ1+z9kZ/XxJR781GbyEHGlRZ0CXXxJzDBHBGSNsfA=; b=6I6eyfpZ2waAm2kw/zbg9Fem1cyAoROcLQpaAzvkCF9jEl0zEL45ILMjfskzwgX22vc5Ob ge6fmUFV5pv71PznoCh9WxMGVux+LOnFk5MAELxFs9UqC/lG7VFnucf/EPcBzxSBwgjKHQ Fu/rHMEuiGLTFggdMQUvM6KdWLTWhjc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VkjV5PM8; spf=pass (imf29.hostedemail.com: domain of dsahern@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=dsahern@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680833631; a=rsa-sha256; cv=none; b=4X1OuvdZIpZ/K+GsUpIXYFOBFsr/f4v3tYHUbcyNG8fVPZGq9vmnRTAFPQVehVG9y5H5ZH 7WMnyr/N6cE42UoCBGTVEBhyIT2J9CEBiF7eXSUSpPtZtANI9s5hpiC1PGOX4Lct10E5Mn OdtZjLbB6GMWseJ/5yneWBie5RP1ubE= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CEDDE6101C; Fri, 7 Apr 2023 02:13:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7C934C433EF; Fri, 7 Apr 2023 02:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680833630; bh=FnCx5kBmg3BOil0cdazETi3OKImtCqFOSaJY70j7QqM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=VkjV5PM8ekxVErbI2M8WiVvDkAWYuLZY0kkJBlSdwZMkC3ceRsynl5QOF60uTnmGp 7UDEqx8jr/vM6fzi+JP6Kn0U10D3Ae1ov18aicQg0rYYQby3uMKr9rId6fsnUfSdlR rFgizS6FxqvfYICQrAkE+kBTtRLUI1yPDUAB/6AUo3/9lmtNI7YDtaGzIUAbRUHpk0 a/0AQeFbUMF/k8KcAfY1ZBbytdfviIZMYIVybyQT5Fq3m2Z4AIeQajhZ7DLJdfMbH/ f/JtSYeVaLMpg7v2tXhVg9YYqg8RcYXv6dST04/4vdEhCQpg7LgcPdyG7tJpD6K6gz tbWGaMVgDYXJA== Message-ID: <3a109738-f666-20e5-a135-b466c5546c29@kernel.org> Date: Thu, 6 Apr 2023 20:13:48 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH net-next v5 05/19] tcp: Support MSG_SPLICE_PAGES Content-Language: en-US To: Willem de Bruijn , David Howells Cc: netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230406094245.3633290-1-dhowells@redhat.com> <20230406094245.3633290-6-dhowells@redhat.com> From: David Ahern In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CE6F1120005 X-Stat-Signature: qriq6jqqrme59rz7u9ak5wg7iwja7j5s X-Rspam-User: X-HE-Tag: 1680833631-186814 X-HE-Meta: U2FsdGVkX188phOXoZyv4XZYE+MlZu4dUOkY01KpBAQuJoa9tSQDeth1lPip40Fy6BHGOYmaxDfNZPEl+FZxvFI+wkAjiqwG9yn352anuFEB3ERHGBGKvHtJsm38GL+Dgj0PENl9rgtcDv3bqBkqZNDZ0GrQT/GioDHiDSjS7c9CcF7oE9yPd/1MDcoTmTll6vqD46qjRbU0wJ1FJIc7NqTVCc5hJG2FX5MOvpqYnMaIGGrhPixA+Vrd36E2E/pvEZPi12JQOtq6zoLf2WOfQ7P0uRVkfPcoOO7QAeG6zV03QIlmsjGF3x/Xcsd07tqVtEULyIQMU+w2CZDCRdZ3zUfDeB8rEcZbE+MIrwyJCnarY1DMHsh0VaPioL19wkdSJDehZRl4UlyIPSFmQCNb2oT3GuOe7fcwj2ht2JbBUlfSVQdH4wZfLMGcjcxz2WCaH6uyymRePd/DisFGWvPm3hj5ljxy/8yvrQjMZRffHUtHbfdPX0/KmkgdgwMdNDnkU4H/nCA0B1NnT4SXUlpqK1mqFsoPa3o7MMKhAZO4sO8qj4sckQiYA7HxMhsm0uZa4Xw2aSRCv91yQ6uRUevmQKRCXfmBWfZv/suSmDFib3r+UL4ouOcG3w6UzHgpqkdE9Qoi7qbMU7o/cR9S7dFJz2D2+KM0YRhvuuSkh378cTzxiHglh+Ov1UbUdnHATGdcEmC7DWnoghc+8b/06MTaBGLfpjbEiOYREH6LGxVKIESyjizCDLOamHfVLJEB1ZgUmFJZDncmvQI4EE3STDSAC4mJLGXNrHRsC08KNWmTR23Eh0scrhw+bxUf4X4Xl3US0CvWI0WXlcSrOTQ9KiF699ru0RQfpT3JkRxkpI1vKZBjUWtVxuFEpl/YqjSXfRtzoUIrU0sp/ahhKQOf3F99HxvL/neU8fnTLfDWnWgce5nG1SO2j1N85bA1zfyxYKJRUSLKMCHGcJRHW1SpMMx JV8pjT3l T3dubUghm8rVK6SnNt2YYe2akRaUyP8+6de4rmOpahcAvM71n06yrkzvqb1iPGH9WfwwYze2/ePTNDAWrmP/IyZxx4uq+T+PcqEk0pmcu9MmaJZ4TOPCdpRV8VgZ0kNFTKtS5W7sVYBzyhrwpDY0hFFG9tyGUpv/B8Uk4G+hbCQN97yH1xJaj05c3bWizLP/QO6+2PRODPF7rT4bLzwHos8SC03IwKI55MPPuQGr/7mVjX+jNPp1XnBOjEtnowfeA84PjyQ+eLtaMrah/zSIguMyw4S+nHkVZ/IO4ARvAPkk+/iRryk8jMcBaYWxA4mRnHcWxsB1JQutqD1iBi0fdz0J7m8wUn/f/oV6Q4wkixjD4sg4mDADDLAXxIR8IEKUz1TnNsiaHbF69XH03vLNlMZ/zjo/SAQB7dCi84cMfG5ZyYGEfpWT/au8cMLGgRfRQ8TDCbxNUBFgO3DDRVx4ts0cHJjObVlUNUbdXmZ4ANIRICtgmT7NH6vT+yQ9HaRp/h4rxEPl5dtX57eZm3QXV/NY9zL9vtivTKL17EPyoORa0CYqGlatsO1R/FEKqmcP636rKh6lN5J54opPPT3y5nF//Fnz8ptqWau9G4Gya3vBHGJrq7EsF2rQnwRcvbFTJK4wK5dsdCvt7U/M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/6/23 7:56 PM, Willem de Bruijn wrote: > On Thu, Apr 6, 2023 at 5:43 AM David Howells wrote: >> >> Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be >> spliced from the source iterator. >> >> This allows ->sendpage() to be replaced by something that can handle >> multiple multipage folios in a single transaction. >> >> Signed-off-by: David Howells >> cc: Eric Dumazet >> cc: "David S. Miller" >> cc: David Ahern >> cc: Jakub Kicinski >> cc: Paolo Abeni >> cc: Jens Axboe >> cc: Matthew Wilcox >> cc: netdev@vger.kernel.org >> --- >> net/ipv4/tcp.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------ >> 1 file changed, 60 insertions(+), 7 deletions(-) >> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c >> index fd68d49490f2..510bacc7ce7b 100644 >> --- a/net/ipv4/tcp.c >> +++ b/net/ipv4/tcp.c >> @@ -1221,7 +1221,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) >> int flags, err, copied = 0; >> int mss_now = 0, size_goal, copied_syn = 0; >> int process_backlog = 0; >> - bool zc = false; >> + int zc = 0; >> long timeo; >> >> flags = msg->msg_flags; >> @@ -1232,17 +1232,22 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) >> if (msg->msg_ubuf) { >> uarg = msg->msg_ubuf; >> net_zcopy_get(uarg); >> - zc = sk->sk_route_caps & NETIF_F_SG; >> + if (sk->sk_route_caps & NETIF_F_SG) >> + zc = 1; >> } else if (sock_flag(sk, SOCK_ZEROCOPY)) { >> uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); >> if (!uarg) { >> err = -ENOBUFS; >> goto out_err; >> } >> - zc = sk->sk_route_caps & NETIF_F_SG; >> - if (!zc) >> + if (sk->sk_route_caps & NETIF_F_SG) >> + zc = 1; >> + else >> uarg_to_msgzc(uarg)->zerocopy = 0; >> } >> + } else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) { >> + if (sk->sk_route_caps & NETIF_F_SG) >> + zc = 2; >> } >> >> if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && >> @@ -1305,7 +1310,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) >> goto do_error; >> >> while (msg_data_left(msg)) { >> - int copy = 0; >> + ssize_t copy = 0; >> >> skb = tcp_write_queue_tail(sk); >> if (skb) >> @@ -1346,7 +1351,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) >> if (copy > msg_data_left(msg)) >> copy = msg_data_left(msg); >> >> - if (!zc) { >> + if (zc == 0) { >> bool merge = true; >> int i = skb_shinfo(skb)->nr_frags; >> struct page_frag *pfrag = sk_page_frag(sk); >> @@ -1391,7 +1396,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) >> page_ref_inc(pfrag->page); >> } >> pfrag->offset += copy; >> - } else { >> + } else if (zc == 1) { > > Instead of 1 and 2, MSG_ZEROCOPY and MSG_SPLICE_PAGES make the code > more self-documenting. > >> /* First append to a fragless skb builds initial >> * pure zerocopy skb >> */ >> @@ -1412,6 +1417,54 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) >> if (err < 0) >> goto do_error; >> copy = err; >> + } else if (zc == 2) { >> + /* Splice in data. */ >> + struct page *page = NULL, **pages = &page; >> + size_t off = 0, part; >> + bool can_coalesce; >> + int i = skb_shinfo(skb)->nr_frags; >> + >> + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, >> + copy, 1, 0, &off); >> + if (copy <= 0) { >> + err = copy ?: -EIO; >> + goto do_error; >> + } >> + >> + can_coalesce = skb_can_coalesce(skb, i, page, off); >> + if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) { >> + tcp_mark_push(tp, skb); >> + iov_iter_revert(&msg->msg_iter, copy); >> + goto new_segment; >> + } >> + if (tcp_downgrade_zcopy_pure(sk, skb)) { >> + iov_iter_revert(&msg->msg_iter, copy); >> + goto wait_for_space; >> + } >> + >> + part = tcp_wmem_schedule(sk, copy); >> + iov_iter_revert(&msg->msg_iter, copy - part); >> + if (!part) >> + goto wait_for_space; >> + copy = part; >> + >> + if (can_coalesce) { >> + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); >> + } else { >> + get_page(page); >> + skb_fill_page_desc_noacc(skb, i, page, off, copy); >> + } >> + page = NULL; >> + >> + if (!(flags & MSG_NO_SHARED_FRAGS)) >> + skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG; >> + >> + skb->len += copy; >> + skb->data_len += copy; >> + skb->truesize += copy; >> + sk_wmem_queued_add(sk, copy); >> + sk_mem_charge(sk, copy); >> + > > Similar to udp, perhaps in a helper? tcp_sendmsg_locked is already more than 250 lines long and this 47 lines is compounding it. I was staring at this code 2 weeks ago wondering if it can be split or refactored to reduce the complexity.