From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 436E6C6FD1D for ; Thu, 30 Mar 2023 17:55:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4F25900002; Thu, 30 Mar 2023 13:55:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFECB6B0078; Thu, 30 Mar 2023 13:55:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9E8C900002; Thu, 30 Mar 2023 13:55:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 997ED6B0074 for ; Thu, 30 Mar 2023 13:55:23 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 33807401B6 for ; Thu, 30 Mar 2023 17:55:23 +0000 (UTC) X-FDA: 80626316526.19.CFB3963 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf15.hostedemail.com (Postfix) with ESMTP id 5A6E7A0006 for ; Thu, 30 Mar 2023 17:55:21 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FGBMPfNm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of willemdebruijn.kernel@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=willemdebruijn.kernel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680198921; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bT5Jd1+THvKq1SAoxJ0RmL0CWt0HENziWG2ORFrQswY=; b=vyUJVaj+v2HQhzBB10ucJ5GYrr55B3Ews89MLgER7vygV9vyX6KhDMQNraH2x6McCajh1o wnLjexyhZW8tqWPzKNmum265QxH3tiTFpynQOI3E8SGVYRhqTs7u2uTzKVH/ccf3MRJKlz NxFbixg1n9eHMEXNnSjy6TaQh6MK85Q= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FGBMPfNm; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of willemdebruijn.kernel@gmail.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=willemdebruijn.kernel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680198921; a=rsa-sha256; cv=none; b=mn7mQn81qc0Y0UkOpEb8klI2mYeZRYDGwFIll0ayByOLuaxZstWMnH2nhXahSLFjc0MCKt /Rw1TnmPRX2mDoaOXQ07OoUy4Z91n3dwYJKMGPmzNeT7k9uYTYylTuGiztEIqA8kTPyUEI PnzaAgspPLGykoh47a5mC/fEy+k2fcw= Received: by mail-qv1-f42.google.com with SMTP id g9so14644275qvt.8 for ; Thu, 30 Mar 2023 10:55:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680198920; x=1682790920; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=bT5Jd1+THvKq1SAoxJ0RmL0CWt0HENziWG2ORFrQswY=; b=FGBMPfNmbeDptza2nmg7r5EeE+xiRg1lkhuHyFbUOZmb7lpXSmxMTP6dNlG6TSsttf r3OmBG6hlvE5xcGcekTIO1cTscBzD1NDRzIhijEwyCctjiK2nVd0Sc+8T+9O8bxXZT29 l6L9MfC4LNS+n5QD+97vt94iXbipQcWMBpjxQdtlSXqZxlAE4x3kUJUgvghxTUsmbhPi 8vyzmPd6IPX9kqp5trftoJFHu3For+PGgz9HeXaP2vFg+jUFKPKsUKKdyaAUFoJo37be 1y4Xt9Q3cToQeC+U+IfAPZ7MXxZ+BJu1fVivUV85nqkfDP+WgG3iw5EDmwdgSZsA5gYg m6fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680198920; x=1682790920; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=bT5Jd1+THvKq1SAoxJ0RmL0CWt0HENziWG2ORFrQswY=; b=K8qy5FQAs8H84ooKsfBwrjzDb9/x4Kj+IJvWWgqsjefXdub8RvPYpd3LktiFfm4bzn 3dVxom2WzX3AA3+DLRTeyeUpCs/QE4tF7vmZI56KaL2vNxdnoJCNYH9SVOO4/k8jQNRh pQYaa51NyAH9CxJWn6V/q5VQu27So1/8/sz3HkCUDhFuxThei+6QCZbaQEDy1rtvWt7A jFbR5XN45+TTsi3YEK2gx6iFiqOIlrXrbB4MXmoj/aJl0FG3m67ARVskvykybJX/LbBl 23yXzkj3gjl0AVx9LR1bVwpgPNH7H9REzP/w2jXQf6BG/FPXHWGb9VcuVZFTibLh2a33 sptQ== X-Gm-Message-State: AAQBX9fj/FtN6qRyIrMckAB8HeqJNmI6RNnuO9DDjgnj/MauEW7QxX63 vz2gLgLnWormPhbLUVsIdAk= X-Google-Smtp-Source: AKy350Znw4NVAYfu219upnHEOkf12n8Ur23PQbvyNOTAnH/i6twLTHs+Gkyqmm5G2VB+Nq2FlhgmFg== X-Received: by 2002:ad4:5962:0:b0:56f:5466:20dc with SMTP id eq2-20020ad45962000000b0056f546620dcmr34875995qvb.4.1680198920379; Thu, 30 Mar 2023 10:55:20 -0700 (PDT) Received: from localhost (240.157.150.34.bc.googleusercontent.com. [34.150.157.240]) by smtp.gmail.com with ESMTPSA id s184-20020ae9dec1000000b007426e664cdcsm15189qkf.133.2023.03.30.10.55.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 10:55:20 -0700 (PDT) Date: Thu, 30 Mar 2023 13:55:19 -0400 From: Willem de Bruijn To: David Howells , Willem de Bruijn Cc: dhowells@redhat.com, Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Message-ID: <6425cd07c25dc_21f56920810@willemb.c.googlers.com.notmuch> In-Reply-To: <854811.1680189069@warthog.procyon.org.uk> References: <64259aca22046_21883920890@willemb.c.googlers.com.notmuch> <20230329141354.516864-1-dhowells@redhat.com> <20230329141354.516864-17-dhowells@redhat.com> <854811.1680189069@warthog.procyon.org.uk> Subject: Re: [RFC PATCH v2 16/48] ip, udp: Support MSG_SPLICE_PAGES Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5A6E7A0006 X-Stat-Signature: 5quh5gf6t5w84iwktsqkcqfiedtbxk8u X-HE-Tag: 1680198921-83457 X-HE-Meta: U2FsdGVkX19KN1E6o5XO04+5f/ksbc2C3yDisn0EfhX6MccvXPn7Tj54JhgUY7Z0tUIHNFu2USf72zB3iVHqL9yuQkWtXki3e1xJcwNREIpS4scA2OFxNryLw3DpYnacx0I2EzdJnVlECj7MN0ufeCGy+SeIEvJYN0scPB1E0hWBpGA2sRaqRv6dZJ/JM8Qlw9CBFfUmPt2pCpNDzIfl1uZg5phJAlv79bMJ2omQpyhXjj8A3cKIzPqv3+iTLfn4kjmkCspWem3uy0o3guRfLPuKDBZqCsEEmfQL/Y3WO7895QW+wF4HOKE524BUfJcuOR1V+LnrL9Wt0t0cFomAmJUlRDEEVcirzKf9wKJbL+3t23o8DAWKW+MibqrY54Gj2FTxgjh5pE+x3JjGhKHi80sdvBsBgBmBfIq0kKv3SgrKJwq73f8NZEy2WOEU2IbfjrZpl/SIfTSSa1GQTdSWDT+kQ9zPREsPHmBZ1pr2o7cL/qSXz5gch3y42+xC0Vde4uTdLiXul0FHZElxjLvao+ItxMonVpgJysyxK1/tFTErWe60241Z+JZcgpit6LH+gjo2OsBReK6v9OeW9Ce7bVaBK27ggNzrum0rsvp5Wdj294tJbU5AFP/cc73Woj2XjvKCLf3AzehjmBmmpuD30Ej8eOWh8GyOnKAilMuZ5bKysZVjakyzbsRiQpKLfCeCcg7qbiD9XHrB+J34GILnF8sqshOH8r96aAvw0UxTBz2prGVXU13Zkt/bGbi2di38blqT8jUKXFQz/vjc7zg75tB+zs4WTe2yBrTKQJreK1wwtbhimFUfKBAlKScuU4X0fqL0r95XuWC1SjgV/lb0kReAUp6EsSBme8Y6MDfC3yqsZCyNyiaVXePESwZ8J4lt3eLjLv4bDai6wXA02XRNxeIKX8T+7ciQrClUdb6SPt24vawoZO/yLyVD/kgTzXigr8BXBKwsImuiTOweUqq 02iQql/g +x3ui+8HGDK1iS/7gAIj7xt0v6dEwQ5iewiRQD+3LmW0wGUwoCZz3hSBbrrHPtEEyzOWlA/hqdxvhtgnhTdvjV3kbhYJ0O4mGlbnfBoSducFUmwtq4CiRGK2nFmhtAEmAUcEqfnHqWT3g/24Kk/xVjdPODwrJnWA+j4RCTACWxK+/D+8fMPyAjcasasCl+1yzj92Mcaldi4m5GxbcXF8+bs0FRjzYwXuI0BlIyBAWQdGbS5YuUu5DQCLaeOipWW2HSIJZQAD4pq/TfnxTbo8+u1zzQVwyiIxFtpxzmjiEEZHz4JbwFWkvHsvr8scA8ZiRXSThQ/q68P9YjEidiB+wO36DRvtIy84Is6cUYesgLR23wECFNYzEG+t9bGGwkIt8ILqg66bTbD/evlog9LWl5WdZJgGsGDDwnK1BgfhY6ekvFmCdTKFV9tr0FpGhgazQIR2efq2xR6+VPRGBHXLieBJ73s9YaTyxfLAxy2R0NmJtgxbDTHdad+a9P6SV2q8SnxSXYZ8Q1NGm4mmXQe4jOVXEpYFFbDMmaeQCT6xrKBRoJp24XgESyX8eOlMuyI+vIsYGeUsrwItb8FZjSqYE5I7dNBOt4fhMX7dT3Q4lMlIzNDM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Howells wrote: > Willem de Bruijn wrote: > > > > + unsigned int maxfraglen, fragheaderlen, maxnonfragsize, xlength; > > > > Does x here stand for anything? > > Yeah... "bad naming". How about if I call it initial_length? I'm trying to > avoid allocating bufferage for the data. That's more informative, thanks. Let me not bikeshed this further for now. > > This does add a lot of code to two functions that are already > > unwieldy. It may be unavoidable, but it if can use helpers, that would > > be preferable. > > Something like the attached? (This is on top of patches 16-17, but I would > need to fold it in) Yes exactly. I wasn't sure whether the inner loops required access to too many function scope variables to pull this off. But seems like it is doable. Great. > > David > --- > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index b38dbb2f9c3f..019ed9bb6745 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -956,6 +956,96 @@ csum_page(struct page *page, int offset, int copy) > return csum; > } > > +/* > + * Allocate a packet for MSG_SPLICE_PAGES. > + */ > +static int __ip_splice_alloc(struct sock *sk, struct sk_buff **pskb, > + unsigned int fragheaderlen, unsigned int maxfraglen, > + unsigned int hh_len) > +{ > + struct sk_buff *skb_prev = *pskb, *skb; > + unsigned int fraggap = skb_prev->len - maxfraglen; > + unsigned int alloclen = fragheaderlen + hh_len + fraggap + 15; > + > + skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation); > + if (unlikely(!skb)) > + return -ENOBUFS; > + > + /* Fill in the control structures */ > + skb->ip_summed = CHECKSUM_NONE; > + skb->csum = 0; > + skb_reserve(skb, hh_len); > + > + /* Find where to start putting bytes. */ > + skb_put(skb, fragheaderlen + fraggap); > + skb_reset_network_header(skb); > + skb->transport_header = skb->network_header + fragheaderlen; > + if (fraggap) { > + skb->csum = skb_copy_and_csum_bits(skb_prev, maxfraglen, > + skb_transport_header(skb), > + fraggap); > + skb_prev->csum = csum_sub(skb_prev->csum, skb->csum); > + pskb_trim_unique(skb_prev, maxfraglen); > + } > + > + /* Put the packet on the pending queue. */ > + __skb_queue_tail(&sk->sk_write_queue, skb); > + *pskb = skb; > + return 0; > +} > + > +/* > + * Add (or copy) data pages for MSG_SPLICE_PAGES. > + */ > +static int __ip_splice_pages(struct sock *sk, struct sk_buff *skb, > + void *from, size_t *pcopy) > +{ > + struct msghdr *msg = from; > + struct page *page = NULL, **pages = &page; > + ssize_t copy = *pcopy; > + size_t off; > + bool put = false; > + int err; > + > + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, copy, 1, 0, &off); > + if (copy <= 0) > + return copy ?: -EIO; > + > + if (!sendpage_ok(page)) { > + const void *p = kmap_local_page(page); > + void *q; > + > + q = page_frag_memdup(NULL, p + off, copy, > + sk->sk_allocation, ULONG_MAX); > + kunmap_local(p); > + if (!q) > + return -ENOMEM; > + page = virt_to_page(q); > + off = offset_in_page(q); > + put = true; > + } > + > + err = skb_append_pagefrags(skb, page, off, copy); > + if (put) > + put_page(page); > + if (err < 0) { > + iov_iter_revert(&msg->msg_iter, copy); > + return err; > + } > + > + if (skb->ip_summed == CHECKSUM_NONE) { > + __wsum csum; > + > + csum = csum_page(page, off, copy); > + skb->csum = csum_block_add(skb->csum, csum, skb->len); > + } > + > + skb_len_add(skb, copy); > + refcount_add(copy, &sk->sk_wmem_alloc); > + *pcopy = copy; > + return 0; > +} > + > static int __ip_append_data(struct sock *sk, > struct flowi4 *fl4, > struct sk_buff_head *queue, > @@ -977,7 +1067,7 @@ static int __ip_append_data(struct sock *sk, > int err; > int offset = 0; > bool zc = false; > - unsigned int maxfraglen, fragheaderlen, maxnonfragsize, xlength; > + unsigned int maxfraglen, fragheaderlen, maxnonfragsize, initial_length; > int csummode = CHECKSUM_NONE; > struct rtable *rt = (struct rtable *)cork->dst; > unsigned int wmem_alloc_delta = 0; > @@ -1017,7 +1107,7 @@ static int __ip_append_data(struct sock *sk, > (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) > csummode = CHECKSUM_PARTIAL; > > - xlength = length; > + initial_length = length; > if ((flags & MSG_ZEROCOPY) && length) { > struct msghdr *msg = from; > > @@ -1053,7 +1143,7 @@ static int __ip_append_data(struct sock *sk, > return -EPERM; > if (!(rt->dst.dev->features & NETIF_F_SG)) > return -EOPNOTSUPP; > - xlength = transhdrlen; /* We need an empty buffer to attach stuff to */ > + initial_length = transhdrlen; /* We need an empty buffer to attach stuff to */ > } > > cork->length += length; > @@ -1083,47 +1173,13 @@ static int __ip_append_data(struct sock *sk, > struct sk_buff *skb_prev; > > if (unlikely(flags & MSG_SPLICE_PAGES)) { > - skb_prev = skb; > - fraggap = skb_prev->len - maxfraglen; > - > - alloclen = fragheaderlen + hh_len + fraggap + 15; > - skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation); > - if (unlikely(!skb)) { > - err = -ENOBUFS; > + err = __ip_splice_alloc(sk, &skb, fragheaderlen, > + maxfraglen, hh_len); > + if (err < 0) > goto error; > - } > - > - /* > - * Fill in the control structures > - */ > - skb->ip_summed = CHECKSUM_NONE; > - skb->csum = 0; > - skb_reserve(skb, hh_len); > - > - /* > - * Find where to start putting bytes. > - */ > - skb_put(skb, fragheaderlen + fraggap); > - skb_reset_network_header(skb); > - skb->transport_header = (skb->network_header + > - fragheaderlen); > - if (fraggap) { > - skb->csum = skb_copy_and_csum_bits( > - skb_prev, maxfraglen, > - skb_transport_header(skb), > - fraggap); > - skb_prev->csum = csum_sub(skb_prev->csum, > - skb->csum); > - pskb_trim_unique(skb_prev, maxfraglen); > - } > - > - /* > - * Put the packet on the pending queue. > - */ > - __skb_queue_tail(&sk->sk_write_queue, skb); > continue; > } > - xlength = length; > + initial_length = length; > > alloc_new_skb: > skb_prev = skb; > @@ -1136,7 +1192,7 @@ static int __ip_append_data(struct sock *sk, > * If remaining data exceeds the mtu, > * we know we need more fragment(s). > */ > - datalen = xlength + fraggap; > + datalen = initial_length + fraggap; > if (datalen > mtu - fragheaderlen) > datalen = maxfraglen - fragheaderlen; > fraglen = datalen + fragheaderlen; > @@ -1150,7 +1206,7 @@ static int __ip_append_data(struct sock *sk, > * because we have no idea what fragment will be > * the last. > */ > - if (datalen == xlength + fraggap) > + if (datalen == initial_length + fraggap) > alloc_extra += rt->dst.trailer_len; > > if ((flags & MSG_MORE) && > @@ -1258,48 +1314,9 @@ static int __ip_append_data(struct sock *sk, > goto error; > } > } else if (flags & MSG_SPLICE_PAGES) { > - struct msghdr *msg = from; > - struct page *page = NULL, **pages = &page; > - size_t off; > - bool put = false; > - > - copy = iov_iter_extract_pages(&msg->msg_iter, &pages, > - copy, 1, 0, &off); > - if (copy <= 0) { > - err = copy ?: -EIO; > - goto error; > - } > - > - if (!sendpage_ok(page)) { > - const void *p = kmap_local_page(page); > - void *q; > - > - q = page_frag_memdup(NULL, p + off, copy, > - sk->sk_allocation, ULONG_MAX); > - kunmap_local(p); > - if (!q) { > - err = copy ?: -ENOMEM; > - goto error; > - } > - page = virt_to_page(q); > - off = offset_in_page(q); > - put = true; > - } > - > - err = skb_append_pagefrags(skb, page, off, copy); > - if (put) > - put_page(page); > + err = __ip_splice_pages(sk, skb, from, ©); > if (err < 0) > goto error; > - > - if (skb->ip_summed == CHECKSUM_NONE) { > - __wsum csum; > - csum = csum_page(page, off, copy); > - skb->csum = csum_block_add(skb->csum, csum, skb->len); > - } > - > - skb_len_add(skb, copy); > - refcount_add(copy, &sk->sk_wmem_alloc); > } else if (!zc) { > int i = skb_shinfo(skb)->nr_frags; >