From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76A37C761A6 for ; Fri, 7 Apr 2023 01:57:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B40C86B0072; Thu, 6 Apr 2023 21:57:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACA1D6B0074; Thu, 6 Apr 2023 21:57:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96A9D6B0075; Thu, 6 Apr 2023 21:57:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7FD536B0072 for ; Thu, 6 Apr 2023 21:57:00 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 43BC51A02CA for ; Fri, 7 Apr 2023 01:57:00 +0000 (UTC) X-FDA: 80652931800.11.F29EE64 Received: from mail-ua1-f46.google.com (mail-ua1-f46.google.com [209.85.222.46]) by imf15.hostedemail.com (Postfix) with ESMTP id 8CD9AA0006 for ; Fri, 7 Apr 2023 01:56:57 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g2mZYAJJ; spf=pass (imf15.hostedemail.com: domain of willemdebruijn.kernel@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=willemdebruijn.kernel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680832617; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ePvYWF3LgrcJpSASOA2Qoq+JJD21U5lACk4RrX/7jlk=; b=DxuIuwH4mC8itvmgUXArMO3OY8OYsP5mKMwOI11DQkGSzyWYP3ASBp1LNlh5F71PRZnHVv pJHnYl2QnVt+Pw8otuOTfRQcxTKzU3Q+/FiPdjgG7/cPAnZoY1KJINkMztYvNAsNrjy1ks M8gHG95riBa+FhI3cBtxqLFzEUt/KFw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=g2mZYAJJ; spf=pass (imf15.hostedemail.com: domain of willemdebruijn.kernel@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=willemdebruijn.kernel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680832617; a=rsa-sha256; cv=none; b=N0T/gGUy9PLNJI0FpSvE7UNHb0rGiofY9Efo7k8Nu5b0baYEPcoNfVSDzGG6jULfGmrx6M 8qhSob1LW5hqPMVUmsi5SF81BO276npSmnLKvOZO2ooiRmOMdSnwajeECj1mHuMOGsh6KB Bc6p+2zzBYwPhjLWa7lv7r8Q32q3mVs= Received: by mail-ua1-f46.google.com with SMTP id m5so28958908uae.11 for ; Thu, 06 Apr 2023 18:56:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680832616; x=1683424616; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ePvYWF3LgrcJpSASOA2Qoq+JJD21U5lACk4RrX/7jlk=; b=g2mZYAJJnU1KnfR/yHCsvDwnCfuYElvYtsT/kyxkU0tRQGasWF1Sy+DZHkx1V9Op+d iwlFDMTYW/Nq2m2ZFQzcXOadB4I9ZZn2PGjQcJEp2RvZWcFrvqBprNTgD+Sb0YoW61GT ZYVJFDxzzUW/wRM7T1S4rWLlEhNqthJUPa2nEQAZNC4m50hH3EzQEbOywTZgkmQ1vwHy 7MxVJwMokNAymJvD5IZ9lxjaO+tTVCacb5krBZbjWKKveoJAf72XO1bMruMAee4CXols HGDO78FAtsTCsI5SI0xOpIVwj8JShB8es/McNabo22aHe/xjChP3o9rQoo1eFcDxHnsp 1vgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680832616; x=1683424616; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ePvYWF3LgrcJpSASOA2Qoq+JJD21U5lACk4RrX/7jlk=; b=O1bw7mNHuwf6cty7NL1ArH6uQ7rCFo8ZvzG3t1h/yw6d3hVqEgYA/KyS+YBEhSsAgU fr8MAo8/+w4docMD2bhvFwRMq7P+PdOqp3d14VWZQjZ3JVuJF+CeeIpgZGMhGqvr6bej qCVGcpgkUfsbB5RoFmmGGRVYrjvkQH3NULZvjgTQo3yJDzWRmOIh97LBo2YPsgGsr05z 9sxdUSqJNhzCTgObXvUTaXIeZqSo0BMSVfk9yR9q+NWlyUZLVxyopa9p1yq1cTTvWZ0D RLnHv/2ITq1TnDAB+BGbFomnFEeIIRugD3dFWiTq4Y7Bs20CmcUk9HAYFN7aayTBy7WA 1UDg== X-Gm-Message-State: AAQBX9dxbI9osBfV4yo+F3OQwZR83Jjg+MPsZ7dk3MJ+u2M/LdU8JIly Lv0WIQPIqxTiZA3osLHObZ3ZEiGyxmU/FSe5cY4= X-Google-Smtp-Source: AKy350aN1EG3BFhPHsmtekMoLwM0M28+ANFkN2MoPqmBdJx+Rgl+3U1I8HzoiHBTu1F37NkXpx33PJMqLfkoDTHUFzk= X-Received: by 2002:ab0:539b:0:b0:6cd:2038:4911 with SMTP id k27-20020ab0539b000000b006cd20384911mr258188uaa.1.1680832616636; Thu, 06 Apr 2023 18:56:56 -0700 (PDT) MIME-Version: 1.0 References: <20230406094245.3633290-1-dhowells@redhat.com> <20230406094245.3633290-6-dhowells@redhat.com> In-Reply-To: <20230406094245.3633290-6-dhowells@redhat.com> From: Willem de Bruijn Date: Thu, 6 Apr 2023 21:56:19 -0400 Message-ID: Subject: Re: [PATCH net-next v5 05/19] tcp: Support MSG_SPLICE_PAGES To: David Howells Cc: netdev@vger.kernel.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Matthew Wilcox , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Ahern Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: wb14jspczfmuss5jrq85jn5hun8u1bkj X-Rspamd-Queue-Id: 8CD9AA0006 X-HE-Tag: 1680832617-11455 X-HE-Meta: U2FsdGVkX18VaJT6eg7rQbCVu9Jc7M1tccD+3u1tHff+q+AmeRWaUfKZlbaBvcZI19rV2sb8kTExodOCPBvzMsP5ydhRr9zvDFqN/Y5ReSoU7PYukWUc+Ysff2fK7J3yvGtUs+gcbIVmQjDqcJl2F4KP0muZonWn4+G4tgKYauzPfzjiTkuJqocFhu7Kl/WOplsj9KqvZzZmVaF6AtxGdKVg8kpG38YGJLWASPyDA1FB0vHNsSBL0Ebr+jrpN1D6LPu66CKJxEZB/U1HjMNqTBD3yKXYCRCxSRD/lJT9UqQUjN2o9OtvIrFr3/3lk3Ca/D0HE3e66KOfvTwlkTZfrYujHbMWDGypajPpfRetg6NbJqh32CJw8GrqyqEUxjzj+hqQ48zb+xwAKNpHW0lANw/7Q7iVPoF/Rv7VpVAWA5yxwQMaYyZ6vdBWVk/IH0DQOZt0NecGbS/5mUwNVGeIprMc9Tk+KJtuMnYxDa45Vw5Qs+Y1ze3akJ7GFwrtWRK2/Zm7rv1zhssfKg/AYSVKwmB9bKkwg2GuXpt1FRCgMr7kHlKqqdjUT9vMPh3mztH9xDBQ3xSapEV6knM4en3utKBBj61p/jlANAMwCBnBDgpIFiyrnPWje54vrhMDj7oL12ZK7aMgvA8grTnKQmO3hAmwsD/UdTkWeZbMUnaZmpHAY/89yiuhVDYtF6RUl9G/GMNs0VbOnWkraI2ZMEDWDnSD9KMP+W39UKDZ8rAzlZCoGYhZoiID+QrpWcD0KCkIJGml+MUDEsiSOSEwXDEqi+CWRDxiQ+XijkTA7SlH9L/lUv2XFjuKl7ouXqW4GXoUYXFlltsnEaYiyqpiG3hH22C6lqoyoU6wQKLi9V9syvuEncx69sUNxOEJt6vA9g4pFa2Nck1QHg7dMnuC4HCMSqY9YGz6XeQcIpEwkDnDaHA6n5sTHhFC3UpuvtWOH7qdtOE2zaefe4rDKl0Wzv7 9F94OC17 Nwr7QQQXR0P3/0/Op4gEjnYXFXVexLZ5FYJ/XsZzFTTeQ0QtDeVodrE5p10wJWiFlTj8w4a2wETDczYxYvu00l/tgB/tSEWSyUXwhYQYBHTlo8bSoTaUpzfZcWkQjJQUbp9Zg/iAeZCBf3U1P7Nh0KLLe3NtsuC+s5b24fvgml3nwqAGSFoQHf13pRu/soluZlAUKHMqNckbUDQs7Dl2Zvs9HQQvye+CNus1Tk1eOleKDr2anQEtF2JbuOB0jRIB6N+7Vk9c5Wj5HkAqC5RNdIbo+J3akAd7v5Dz6+vXpZWM/nTpYVHFvWcFfZR0d5QIQBRghVQOFqzZfeQ6HI3jThJ1l0KWHI0zdQR7IJrX91Z1BXlUIAtsrMTO6TIUkOZVrYeZ/OEfAHaYBS1pvMSQ9qDdllaP36O2j2JncIUF8RKmVcr/sEXl7Ye4l515bnYPsBk6p4EOyv1nCJcMKmNJNrKVarNY5R47PvMsNk068CBTvotarW89HjWFP/Q3hz5lUgq4iBfkAhAv3d4HnnKT+qhb70iMnjHLm6J5KMutQ/0NQw8EdtisYgWevyaPIxKZyrym7d1A82Em7n8xGVpLVxRjPJMifueZHKzv0NZvKUtUmyM5lApWoP3mFSTPwfNbgF5ZzxFEPNBTBm2ro0e//dDROJYB+asVH+wTo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 6, 2023 at 5:43=E2=80=AFAM David Howells = wrote: > > Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be > spliced from the source iterator. > > This allows ->sendpage() to be replaced by something that can handle > multiple multipage folios in a single transaction. > > Signed-off-by: David Howells > cc: Eric Dumazet > cc: "David S. Miller" > cc: David Ahern > cc: Jakub Kicinski > cc: Paolo Abeni > cc: Jens Axboe > cc: Matthew Wilcox > cc: netdev@vger.kernel.org > --- > net/ipv4/tcp.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 60 insertions(+), 7 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index fd68d49490f2..510bacc7ce7b 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1221,7 +1221,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msgh= dr *msg, size_t size) > int flags, err, copied =3D 0; > int mss_now =3D 0, size_goal, copied_syn =3D 0; > int process_backlog =3D 0; > - bool zc =3D false; > + int zc =3D 0; > long timeo; > > flags =3D msg->msg_flags; > @@ -1232,17 +1232,22 @@ int tcp_sendmsg_locked(struct sock *sk, struct ms= ghdr *msg, size_t size) > if (msg->msg_ubuf) { > uarg =3D msg->msg_ubuf; > net_zcopy_get(uarg); > - zc =3D sk->sk_route_caps & NETIF_F_SG; > + if (sk->sk_route_caps & NETIF_F_SG) > + zc =3D 1; > } else if (sock_flag(sk, SOCK_ZEROCOPY)) { > uarg =3D msg_zerocopy_realloc(sk, size, skb_zcopy= (skb)); > if (!uarg) { > err =3D -ENOBUFS; > goto out_err; > } > - zc =3D sk->sk_route_caps & NETIF_F_SG; > - if (!zc) > + if (sk->sk_route_caps & NETIF_F_SG) > + zc =3D 1; > + else > uarg_to_msgzc(uarg)->zerocopy =3D 0; > } > + } else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) { > + if (sk->sk_route_caps & NETIF_F_SG) > + zc =3D 2; > } > > if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) = && > @@ -1305,7 +1310,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msgh= dr *msg, size_t size) > goto do_error; > > while (msg_data_left(msg)) { > - int copy =3D 0; > + ssize_t copy =3D 0; > > skb =3D tcp_write_queue_tail(sk); > if (skb) > @@ -1346,7 +1351,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msgh= dr *msg, size_t size) > if (copy > msg_data_left(msg)) > copy =3D msg_data_left(msg); > > - if (!zc) { > + if (zc =3D=3D 0) { > bool merge =3D true; > int i =3D skb_shinfo(skb)->nr_frags; > struct page_frag *pfrag =3D sk_page_frag(sk); > @@ -1391,7 +1396,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msgh= dr *msg, size_t size) > page_ref_inc(pfrag->page); > } > pfrag->offset +=3D copy; > - } else { > + } else if (zc =3D=3D 1) { Instead of 1 and 2, MSG_ZEROCOPY and MSG_SPLICE_PAGES make the code more self-documenting. > /* First append to a fragless skb builds initial > * pure zerocopy skb > */ > @@ -1412,6 +1417,54 @@ int tcp_sendmsg_locked(struct sock *sk, struct msg= hdr *msg, size_t size) > if (err < 0) > goto do_error; > copy =3D err; > + } else if (zc =3D=3D 2) { > + /* Splice in data. */ > + struct page *page =3D NULL, **pages =3D &page; > + size_t off =3D 0, part; > + bool can_coalesce; > + int i =3D skb_shinfo(skb)->nr_frags; > + > + copy =3D iov_iter_extract_pages(&msg->msg_iter, &= pages, > + copy, 1, 0, &off); > + if (copy <=3D 0) { > + err =3D copy ?: -EIO; > + goto do_error; > + } > + > + can_coalesce =3D skb_can_coalesce(skb, i, page, o= ff); > + if (!can_coalesce && i >=3D READ_ONCE(sysctl_max_= skb_frags)) { > + tcp_mark_push(tp, skb); > + iov_iter_revert(&msg->msg_iter, copy); > + goto new_segment; > + } > + if (tcp_downgrade_zcopy_pure(sk, skb)) { > + iov_iter_revert(&msg->msg_iter, copy); > + goto wait_for_space; > + } > + > + part =3D tcp_wmem_schedule(sk, copy); > + iov_iter_revert(&msg->msg_iter, copy - part); > + if (!part) > + goto wait_for_space; > + copy =3D part; > + > + if (can_coalesce) { > + skb_frag_size_add(&skb_shinfo(skb)->frags= [i - 1], copy); > + } else { > + get_page(page); > + skb_fill_page_desc_noacc(skb, i, page, of= f, copy); > + } > + page =3D NULL; > + > + if (!(flags & MSG_NO_SHARED_FRAGS)) > + skb_shinfo(skb)->flags |=3D SKBFL_SHARED_= FRAG; > + > + skb->len +=3D copy; > + skb->data_len +=3D copy; > + skb->truesize +=3D copy; > + sk_wmem_queued_add(sk, copy); > + sk_mem_charge(sk, copy); > + Similar to udp, perhaps in a helper?