From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B089C761A6 for ; Thu, 30 Mar 2023 14:21:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 103746B0071; Thu, 30 Mar 2023 10:21:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 08D146B0072; Thu, 30 Mar 2023 10:21:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E20F26B0074; Thu, 30 Mar 2023 10:21:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CBBB66B0071 for ; Thu, 30 Mar 2023 10:21:07 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6298D1A0A08 for ; Thu, 30 Mar 2023 14:21:07 +0000 (UTC) X-FDA: 80625776574.09.D28BEE2 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf01.hostedemail.com (Postfix) with ESMTP id 8D3FD40017 for ; Thu, 30 Mar 2023 14:21:04 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=BFe15+bY; spf=pass (imf01.hostedemail.com: domain of willemdebruijn.kernel@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=willemdebruijn.kernel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680186064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HNzyLbcT71SZ0Q6C/jWoJ0Cg7eh1oszsRyaIyb92T8k=; b=0PdjkjzCmBegong+kr2y75Kua4UzcEBy4kxoApeXEW5MlFwmNJGQpiQnNyK5/ftsl0665n 3lcpeuMQPK/WfKoVexuwu2mX0Y84s1yjexYHnTuJNMpYkWp4qd+8uvtJ3SvpmOh3Atvs+M Faayan1NXBxhZ8473rUzFFMQwHJZAcg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=BFe15+bY; spf=pass (imf01.hostedemail.com: domain of willemdebruijn.kernel@gmail.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=willemdebruijn.kernel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680186064; a=rsa-sha256; cv=none; b=AFLSURdfLirRl2T4Cs71gQItuavfRjo0y/cRkTaBk/+RSyvTmwMarky5FrPNpPkaZI017n v+2yap1S1A/qzYTQnyt9vy/+bA7bPw1mwa0H2qc9uLJesempSyVkXdmkA8nGPrqooKO5Sx O8CyWhoa4sBTk6kbkTju1MNc47H2BuQ= Received: by mail-qt1-f180.google.com with SMTP id ga7so18582349qtb.2 for ; Thu, 30 Mar 2023 07:21:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680186063; x=1682778063; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=HNzyLbcT71SZ0Q6C/jWoJ0Cg7eh1oszsRyaIyb92T8k=; b=BFe15+bYWhXYvY4GorsLIlwPLJW5nVfpu9uDVTkB7uvDcSzbN8oAcwMjQPN4veK8sw fnk9zwpwb8ZqQCzkLIJ54iaS9GTt6ODBlG56Nj6ND12/sMHVKH2mm5bN1eWC4E0jr0rv jvvKarogQYQ8FHyPkqW62qjBGVkd7T0+qyGALdadbjwHes9Uh91LCp+hyqNQP8hHzetd K7zESzqhh4KL/qfAGuuHkhaqBKo4msp8II7jBfdQuJzmsoyCyHKGEEKpwstulRUrsOU1 N4Pl1CyEkD+R9a1Fxnbq7F6RxfHm6ep+6URL8bOJpcVKLFlQ2p8Oh6+2ZlXSyY8hByx0 Jf6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680186063; x=1682778063; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=HNzyLbcT71SZ0Q6C/jWoJ0Cg7eh1oszsRyaIyb92T8k=; b=2gA8jSUvj+yLjB1Fkjpn2DmW5Mi2yKU3MtBA3EXbHRhapWS+lvAlIutEwg7B4WSgOr R/JusVRoIEhjyFVgFX2Dx45GDQjXIsem7tb/adzxfxsTdkMjZ6Dx7qkWDiTsC3m1JHTU 7QEgzzw1cSvO2aMnc9BQxUkTPaPAhw2+IBnmEhTIZ1vjEHv8G+PJPqucz4u5YHGuXQLd +l7HNs6bs7cjJYeDvFP18hnwWz0avyWpBC0cEP5F8wGE1AK+VnttMaxjw4avDz7OFPUK BqYE/iSet5585iGH4/yF5XiHiy3/zfG8/i0lm+RHaXe+qDgwRW+R71pUPWa6IFee0BSQ 5jLg== X-Gm-Message-State: AO0yUKXQk/HqxHxuz7U9SfhU9eIQ3Qli7Y+2xqZPSJxkZobcFjYQAKzE NcdhJ4klioEDA3doHYy33i0= X-Google-Smtp-Source: AK7set/bWzjtAA65AERRinPes69F2ViUXErn9DbBsEtdGoGYHHSER9u9vaEhBqPpA48BKj/KhCrS9A== X-Received: by 2002:ac8:58c6:0:b0:3e3:9199:d27 with SMTP id u6-20020ac858c6000000b003e391990d27mr38956546qta.53.1680186063638; Thu, 30 Mar 2023 07:21:03 -0700 (PDT) Received: from localhost (240.157.150.34.bc.googleusercontent.com. [34.150.157.240]) by smtp.gmail.com with ESMTPSA id e16-20020ac86710000000b003ba2a15f93dsm10300408qtp.26.2023.03.30.07.20.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 07:20:59 -0700 (PDT) Date: Thu, 30 Mar 2023 10:20:58 -0400 From: Willem de Bruijn To: David Howells , Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn Message-ID: <64259aca22046_21883920890@willemb.c.googlers.com.notmuch> In-Reply-To: <20230329141354.516864-17-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> <20230329141354.516864-17-dhowells@redhat.com> Subject: RE: [RFC PATCH v2 16/48] ip, udp: Support MSG_SPLICE_PAGES Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8D3FD40017 X-Stat-Signature: ta8x67d3yykd6ptwnjedy1pk4tczig94 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1680186064-656131 X-HE-Meta: U2FsdGVkX19F5G159VNOdcWhnIFcFLP22/cuHUnGETUQJCretQ4YY0/VqMamWahNC0Byfn6Ys9jvwWGyY6xWjbGwts4tTaRQ/mHBBYvHW8GP/XS3NVBJZBWEyS24o07CZ0hYXk9j4bIZwXZszeY1nEcEjQa9898a6WsU7pf7gCH6pONA5CaotKO74sp5Ge0l+qPL7XPyayIS9eFp3kY27itc+Q4WwlZzMSjrKFg0j3e+xc0rZt3wNZJVeM/+ko2AvhYXyPCsbv/EKVooSrWCm644Z9jyV1oTt3ZIgu68EU3GIrSFBgRF1aVjoitKu6hUTr6AZGBjmlkPoVFu4gkMs6dhnmc4muzijwtsS9ojKIeiMx+GuQydMyKOVoLsxviQlBFYip0x1qyrAHh7Ll92diYe/qaCZQpvTe957rqjnH+2PihzTCUTK2ihsf49kfgNKHo3pDdd+Pxfs2MC73k2gHgUyPsjCO7RrnADKg5lF7M83Hte4CnEgy4+g9fiVV3dO253GGDw+OVOVf76RAeAZnHjHBo0Uml33kOb7qlDzkeCpyyUnv5+9exEQYJCH1ueCkDtndfSZe/xuHKI7hAg0d0s9XnrsWBGOObwfVsCHG+S/wVdRFIrnkiqO02hv2rQVq9va+S4r0nhdyz89ffEyC0A9dLkobGwpqrRznNmrRgMHs3syaPaMSSYj95Lo4DD+9AooeZBpEXSTWU9PKPZ8w2IifAigJCvm88khTN1EojFAjiahDDXQXr0f4ISGnUvXHpM2i/pbxVw/QSdDNTT53WTR8bpv2hPvO69cixDYmYNA0Wd/WGuACVJUY/8MEtbs3WuCUygd9VY0L3n0Enr5GK5ZN3tifMQxDF1ck9bL3j5dhZuJBFE8m0JBxQM/n5knazapJ9hWh1RjW+1fygiVt/c2tV0h9L3EOaB9tOB7ZOoRk1Bf5CItuwuQxun+KXdL2dysVYQS30grUrexi6 DSR7Enpf XPunjMoEeQzv9L2TGqorqn3CVWX7n3BN+QFOjykXr/B0NqTgDgBuB+Pjg1zrVyHBKnGqsgh3vI0Fn/XxpbD+RHcKym2bZ3qepsKOUCIc9amVjqo9/Lkd2D32tfiZjY8FUJsTIcegwZOOpANy49IaW2FsDg/pXakxMhv2uHKrA9VH1zAf8qZ5e6nLo6nLob4q1lDhWON3PUwOvchUyBU0uVzJPTr/VlIcbR099QjkKrLYwo+bWfZRblwFNBbGCRrq9836yv0JLe1wKsXJT/wPx2d7POele8Wa0oZH+u6ylbx1Ku6EIsCkZgE7v74PxZoDlXqiY7Vq5Z+yPrUaellpOscyUX7FjC+SF/dBX3tW5LqTrFP2tC2ZoPJdAJ8VTd6xZyKtni12FXjreO7IusfXol+28ceFmSNjjBUAO9rOWcSMOhF8SnZsRC/yLQYqO0zrRwunuW2OJo5ys521oOyTHNpg1BdKGdXM3aRYUttKQUq3bInVChSdWRx6oYXuKywnBKwBQhoTUhSHqIaUbzNxN0wwBRMn6GWoHN7Mp+UaiDJLDN9h5DI/Zz/TP6zXyaKJPNtztCwYt0uCOCRUozSb3bkx4lXjR5ngh5iYBcccIGKPckKaKnyiHwFRdCplVcfMIo3elD9erwI1sbi4NR6AdYRhNjmmDUPeZisPqYMhpbpAi/9q4SJXd2InYplgMFOARBjrMc0VNL/ufg1P6d1AK99+eDmqPxBVamt8RUxotNHO24OoOjNmCDc48jvzNc/mngrT2CYy1Cn2nTRXKVXrcR+lPi+lSams0TsPApn0aGCF6lKLixnPHU7ETFXDHfdRdAj69/x3O353cKyk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Howells wrote: > Make IP/UDP sendmsg() support MSG_SPLICE_PAGES. This causes pages to be > spliced from the source iterator if possible (the iterator must be > ITER_BVEC and the pages must be spliceable). > > This allows ->sendpage() to be replaced by something that can handle > multiple multipage folios in a single transaction. > > Signed-off-by: David Howells > cc: Willem de Bruijn > cc: "David S. Miller" > cc: Eric Dumazet > cc: Jakub Kicinski > cc: Paolo Abeni > cc: Jens Axboe > cc: Matthew Wilcox > cc: netdev@vger.kernel.org > --- > net/ipv4/ip_output.c | 85 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 81 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c A non-RFC version would require the same for ipv6, of course. > index 4e4e308c3230..07736da70eab 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -973,11 +973,11 @@ static int __ip_append_data(struct sock *sk, > int hh_len; > int exthdrlen; > int mtu; > - int copy; > + ssize_t copy; > int err; > int offset = 0; > bool zc = false; > - unsigned int maxfraglen, fragheaderlen, maxnonfragsize; > + unsigned int maxfraglen, fragheaderlen, maxnonfragsize, xlength; Does x here stand for anything? > int csummode = CHECKSUM_NONE; > struct rtable *rt = (struct rtable *)cork->dst; > unsigned int wmem_alloc_delta = 0; > @@ -1017,6 +1017,7 @@ static int __ip_append_data(struct sock *sk, > (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) > csummode = CHECKSUM_PARTIAL; > > + xlength = length; > if ((flags & MSG_ZEROCOPY) && length) { > struct msghdr *msg = from; > > @@ -1047,6 +1048,14 @@ static int __ip_append_data(struct sock *sk, > skb_zcopy_set(skb, uarg, &extra_uref); > } > } > + } else if ((flags & MSG_SPLICE_PAGES) && length) { > + struct msghdr *msg = from; > + > + if (inet->hdrincl) > + return -EPERM; > + if (!(rt->dst.dev->features & NETIF_F_SG)) > + return -EOPNOTSUPP; > + xlength = transhdrlen; /* We need an empty buffer to attach stuff to */ > } > > cork->length += length; > @@ -1074,6 +1083,50 @@ static int __ip_append_data(struct sock *sk, > unsigned int alloclen, alloc_extra; > unsigned int pagedlen; > struct sk_buff *skb_prev; > + > + if (unlikely(flags & MSG_SPLICE_PAGES)) { > + skb_prev = skb; > + fraggap = skb_prev->len - maxfraglen; > + > + alloclen = fragheaderlen + hh_len + fraggap + 15; > + skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation); > + if (unlikely(!skb)) { > + err = -ENOBUFS; > + goto error; > + } > + > + /* > + * Fill in the control structures > + */ > + skb->ip_summed = CHECKSUM_NONE; > + skb->csum = 0; > + skb_reserve(skb, hh_len); > + > + /* > + * Find where to start putting bytes. > + */ > + skb_put(skb, fragheaderlen + fraggap); > + skb_reset_network_header(skb); > + skb->transport_header = (skb->network_header + > + fragheaderlen); > + if (fraggap) { > + skb->csum = skb_copy_and_csum_bits( > + skb_prev, maxfraglen, > + skb_transport_header(skb), > + fraggap); > + skb_prev->csum = csum_sub(skb_prev->csum, > + skb->csum); > + pskb_trim_unique(skb_prev, maxfraglen); > + } > + > + /* > + * Put the packet on the pending queue. > + */ > + __skb_queue_tail(&sk->sk_write_queue, skb); > + continue; > + } > + xlength = length; > + > alloc_new_skb: > skb_prev = skb; > if (skb_prev) > @@ -1085,7 +1138,7 @@ static int __ip_append_data(struct sock *sk, > * If remaining data exceeds the mtu, > * we know we need more fragment(s). > */ > - datalen = length + fraggap; > + datalen = xlength + fraggap; > if (datalen > mtu - fragheaderlen) > datalen = maxfraglen - fragheaderlen; > fraglen = datalen + fragheaderlen; > @@ -1099,7 +1152,7 @@ static int __ip_append_data(struct sock *sk, > * because we have no idea what fragment will be > * the last. > */ > - if (datalen == length + fraggap) > + if (datalen == xlength + fraggap) > alloc_extra += rt->dst.trailer_len; > > if ((flags & MSG_MORE) && > @@ -1206,6 +1259,30 @@ static int __ip_append_data(struct sock *sk, > err = -EFAULT; > goto error; > } > + } else if (flags & MSG_SPLICE_PAGES) { > + struct msghdr *msg = from; > + struct page *page = NULL, **pages = &page; > + size_t off; > + > + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, > + copy, 1, 0, &off); > + if (copy <= 0) { > + err = copy ?: -EIO; > + goto error; > + } > + > + err = skb_append_pagefrags(skb, page, off, copy); > + if (err < 0) > + goto error; > + > + if (skb->ip_summed == CHECKSUM_NONE) { > + __wsum csum; > + csum = csum_page(page, off, copy); > + skb->csum = csum_block_add(skb->csum, csum, skb->len); > + } > + > + skb_len_add(skb, copy); > + refcount_add(copy, &sk->sk_wmem_alloc); > } else if (!zc) { > int i = skb_shinfo(skb)->nr_frags; > > This does add a lot of code to two functions that are already unwieldy. It may be unavoidable, but it if can use helpers, that would be preferable.