From: Arjun Roy <arjunroy@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjun Roy <arjunroy.kdev@gmail.com>,
David Miller <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>,
linux-mm@kvack.org, Eric Dumazet <edumazet@google.com>,
Soheil Hassas Yeganeh <soheil@google.com>
Subject: Re: [PATCH resend mm,net-next 3/3] net-zerocopy: Use vm_insert_pages() for tcp rcv zerocopy.
Date: Fri, 10 Apr 2020 12:13:53 -0700 [thread overview]
Message-ID: <CAOFY-A0w-RkDf9PXROOVow3RwWXBeOx9e2kh7unM1EVARg7YXA@mail.gmail.com> (raw)
In-Reply-To: <20200410120443.ad7856db13e158fbd441f3ae@linux-foundation.org>
[-- Attachment #1: Type: text/plain, Size: 7814 bytes --]
On Fri, Apr 10, 2020 at 12:04 PM Andrew Morton <akpm@linux-foundation.org>
wrote:
> On Fri, 21 Feb 2020 13:21:41 -0800 Arjun Roy <arjunroy@google.com> wrote:
>
> > I remain a bit concerned regarding the merge process for this specific
> > patch (0003, the net/ipv4/tcp.c change) since I have other in-flight
> > changes for TCP receive zerocopy that I'd like to upstream for
> > net-next - and would like to avoid weird merge issues.
> >
> > So perhaps the following could work:
> >
> > 1. Andrew, perhaps we could remove this particular patch (0003, the
> > net/ipv4/tcp.c change) from mm-next; that way we merge
> > vm_insert_pages() but not the call-site within TCP, for now.
> > 2. net-next will eventually pick vm_insert_pages() up.
> > 3. I can modify the zerocopy code to use it at that point?
> >
> > Else I'm concerned a complicated merge situation may result.
>
> The merge situation is quite clean.
>
> I guess I'll hold off on
> net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy.patch (below) and
> shall send it to davem after Linus has merged the prerequisites.
>
>
>
Acknowledged, thank you!
-Arjun
> From: Arjun Roy <arjunroy@google.com>
> Subject: net-zerocopy: use vm_insert_pages() for tcp rcv zerocopy
>
> Use vm_insert_pages() for tcp receive zerocopy. Spin lock cycles (as
> reported by perf) drop from a couple of percentage points to a fraction of
> a percent. This results in a roughly 6% increase in efficiency, measured
> roughly as zerocopy receive count divided by CPU utilization.
>
> The intention of this patchset is to reduce atomic ops for tcp zerocopy
> receives, which normally hits the same spinlock multiple times
> consecutively.
>
> [akpm@linux-foundation.org: suppress gcc-7.2.0 warning]
> Link:
> http://lkml.kernel.org/r/20200128025958.43490-3-arjunroy.kdev@gmail.com
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> net/ipv4/tcp.c | 70 ++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 63 insertions(+), 7 deletions(-)
>
> --- a/net/ipv4/tcp.c~net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy
> +++ a/net/ipv4/tcp.c
> @@ -1734,14 +1734,48 @@ int tcp_mmap(struct file *file, struct s
> }
> EXPORT_SYMBOL(tcp_mmap);
>
> +static int tcp_zerocopy_vm_insert_batch(struct vm_area_struct *vma,
> + struct page **pages,
> + unsigned long pages_to_map,
> + unsigned long *insert_addr,
> + u32 *length_with_pending,
> + u32 *seq,
> + struct tcp_zerocopy_receive *zc)
> +{
> + unsigned long pages_remaining = pages_to_map;
> + int bytes_mapped;
> + int ret;
> +
> + ret = vm_insert_pages(vma, *insert_addr, pages, &pages_remaining);
> + bytes_mapped = PAGE_SIZE * (pages_to_map - pages_remaining);
> + /* Even if vm_insert_pages fails, it may have partially succeeded
> in
> + * mapping (some but not all of the pages).
> + */
> + *seq += bytes_mapped;
> + *insert_addr += bytes_mapped;
> + if (ret) {
> + /* But if vm_insert_pages did fail, we have to unroll some
> state
> + * we speculatively touched before.
> + */
> + const int bytes_not_mapped = PAGE_SIZE * pages_remaining;
> + *length_with_pending -= bytes_not_mapped;
> + zc->recv_skip_hint += bytes_not_mapped;
> + }
> + return ret;
> +}
> +
> static int tcp_zerocopy_receive(struct sock *sk,
> struct tcp_zerocopy_receive *zc)
> {
> unsigned long address = (unsigned long)zc->address;
> u32 length = 0, seq, offset, zap_len;
> + #define PAGE_BATCH_SIZE 8
> + struct page *pages[PAGE_BATCH_SIZE];
> const skb_frag_t *frags = NULL;
> struct vm_area_struct *vma;
> struct sk_buff *skb = NULL;
> + unsigned long pg_idx = 0;
> + unsigned long curr_addr;
> struct tcp_sock *tp;
> int inq;
> int ret;
> @@ -1754,6 +1788,8 @@ static int tcp_zerocopy_receive(struct s
>
> sock_rps_record_flow(sk);
>
> + tp = tcp_sk(sk);
> +
> down_read(¤t->mm->mmap_sem);
>
> ret = -EINVAL;
> @@ -1762,7 +1798,6 @@ static int tcp_zerocopy_receive(struct s
> goto out;
> zc->length = min_t(unsigned long, zc->length, vma->vm_end -
> address);
>
> - tp = tcp_sk(sk);
> seq = tp->copied_seq;
> inq = tcp_inq(sk);
> zc->length = min_t(u32, zc->length, inq);
> @@ -1774,8 +1809,20 @@ static int tcp_zerocopy_receive(struct s
> zc->recv_skip_hint = zc->length;
> }
> ret = 0;
> + curr_addr = address;
> while (length + PAGE_SIZE <= zc->length) {
> if (zc->recv_skip_hint < PAGE_SIZE) {
> + /* If we're here, finish the current batch. */
> + if (pg_idx) {
> + ret = tcp_zerocopy_vm_insert_batch(vma,
> pages,
> + pg_idx,
> +
> &curr_addr,
> + &length,
> + &seq,
> zc);
> + if (ret)
> + goto out;
> + pg_idx = 0;
> + }
> if (skb) {
> if (zc->recv_skip_hint > 0)
> break;
> @@ -1784,7 +1831,6 @@ static int tcp_zerocopy_receive(struct s
> } else {
> skb = tcp_recv_skb(sk, seq, &offset);
> }
> -
> zc->recv_skip_hint = skb->len - offset;
> offset -= skb_headlen(skb);
> if ((int)offset < 0 || skb_has_frag_list(skb))
> @@ -1808,14 +1854,24 @@ static int tcp_zerocopy_receive(struct s
> zc->recv_skip_hint -= remaining;
> break;
> }
> - ret = vm_insert_page(vma, address + length,
> - skb_frag_page(frags));
> - if (ret)
> - break;
> + pages[pg_idx] = skb_frag_page(frags);
> + pg_idx++;
> length += PAGE_SIZE;
> - seq += PAGE_SIZE;
> zc->recv_skip_hint -= PAGE_SIZE;
> frags++;
> + if (pg_idx == PAGE_BATCH_SIZE) {
> + ret = tcp_zerocopy_vm_insert_batch(vma, pages,
> pg_idx,
> + &curr_addr,
> &length,
> + &seq, zc);
> + if (ret)
> + goto out;
> + pg_idx = 0;
> + }
> + }
> + if (pg_idx) {
> + ret = tcp_zerocopy_vm_insert_batch(vma, pages, pg_idx,
> + &curr_addr, &length,
> &seq,
> + zc);
> }
> out:
> up_read(¤t->mm->mmap_sem);
> _
>
>
[-- Attachment #2: Type: text/html, Size: 10982 bytes --]
next prev parent reply other threads:[~2020-04-10 19:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-28 2:59 [PATCH resend mm,net-next 1/3] mm: Refactor insert_page to prepare for batched-lock insert Arjun Roy
2020-01-28 2:59 ` [PATCH resend mm,net-next 2/3] mm: Add vm_insert_pages() Arjun Roy
2020-02-13 2:41 ` Andrew Morton
2020-02-13 17:09 ` Arjun Roy
2020-02-13 21:37 ` Linus Torvalds
2020-02-13 21:54 ` Matthew Wilcox
2020-02-13 22:06 ` Arjun Roy
2020-01-28 2:59 ` [PATCH resend mm,net-next 3/3] net-zerocopy: Use vm_insert_pages() for tcp rcv zerocopy Arjun Roy
2020-02-13 2:56 ` Andrew Morton
2020-02-17 2:49 ` Arjun Roy
2020-02-21 21:21 ` Arjun Roy
2020-02-24 3:37 ` Andrew Morton
2020-02-24 16:19 ` Arjun Roy
2020-04-10 19:04 ` Andrew Morton
2020-04-10 19:13 ` Arjun Roy [this message]
2020-04-10 19:15 ` Arjun Roy
2020-02-13 2:41 ` [PATCH resend mm,net-next 1/3] mm: Refactor insert_page to prepare for batched-lock insert Andrew Morton
2020-02-13 16:52 ` Arjun Roy
2020-02-13 16:55 ` Arjun Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOFY-A0w-RkDf9PXROOVow3RwWXBeOx9e2kh7unM1EVARg7YXA@mail.gmail.com \
--to=arjunroy@google.com \
--cc=akpm@linux-foundation.org \
--cc=arjunroy.kdev@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=soheil@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox