linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Arjun Roy <arjunroy@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjun Roy <arjunroy.kdev@gmail.com>,
	David Miller <davem@davemloft.net>,
	 netdev <netdev@vger.kernel.org>,
	linux-mm@kvack.org,  Eric Dumazet <edumazet@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>
Subject: Re: [PATCH resend mm,net-next 3/3] net-zerocopy: Use vm_insert_pages() for tcp rcv zerocopy.
Date: Fri, 10 Apr 2020 12:13:53 -0700	[thread overview]
Message-ID: <CAOFY-A0w-RkDf9PXROOVow3RwWXBeOx9e2kh7unM1EVARg7YXA@mail.gmail.com> (raw)
In-Reply-To: <20200410120443.ad7856db13e158fbd441f3ae@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 7814 bytes --]

On Fri, Apr 10, 2020 at 12:04 PM Andrew Morton <akpm@linux-foundation.org>
wrote:

> On Fri, 21 Feb 2020 13:21:41 -0800 Arjun Roy <arjunroy@google.com> wrote:
>
> > I remain a bit concerned regarding the merge process for this specific
> > patch (0003, the net/ipv4/tcp.c change) since I have other in-flight
> > changes for TCP receive zerocopy that I'd like to upstream for
> > net-next - and would like to avoid weird merge issues.
> >
> > So perhaps the following could work:
> >
> > 1. Andrew, perhaps we could remove this particular patch (0003, the
> > net/ipv4/tcp.c change) from mm-next; that way we merge
> > vm_insert_pages() but not the call-site within TCP, for now.
> > 2. net-next will eventually pick vm_insert_pages() up.
> > 3. I can modify the zerocopy code to use it at that point?
> >
> > Else I'm concerned a complicated merge situation may result.
>
> The merge situation is quite clean.
>
> I guess I'll hold off on
> net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy.patch (below) and
> shall send it to davem after Linus has merged the prerequisites.
>
>
>
Acknowledged, thank you!

-Arjun



> From: Arjun Roy <arjunroy@google.com>
> Subject: net-zerocopy: use vm_insert_pages() for tcp rcv zerocopy
>
> Use vm_insert_pages() for tcp receive zerocopy.  Spin lock cycles (as
> reported by perf) drop from a couple of percentage points to a fraction of
> a percent.  This results in a roughly 6% increase in efficiency, measured
> roughly as zerocopy receive count divided by CPU utilization.
>
> The intention of this patchset is to reduce atomic ops for tcp zerocopy
> receives, which normally hits the same spinlock multiple times
> consecutively.
>
> [akpm@linux-foundation.org: suppress gcc-7.2.0 warning]
> Link:
> http://lkml.kernel.org/r/20200128025958.43490-3-arjunroy.kdev@gmail.com
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  net/ipv4/tcp.c |   70 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 63 insertions(+), 7 deletions(-)
>
> --- a/net/ipv4/tcp.c~net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy
> +++ a/net/ipv4/tcp.c
> @@ -1734,14 +1734,48 @@ int tcp_mmap(struct file *file, struct s
>  }
>  EXPORT_SYMBOL(tcp_mmap);
>
> +static int tcp_zerocopy_vm_insert_batch(struct vm_area_struct *vma,
> +                                       struct page **pages,
> +                                       unsigned long pages_to_map,
> +                                       unsigned long *insert_addr,
> +                                       u32 *length_with_pending,
> +                                       u32 *seq,
> +                                       struct tcp_zerocopy_receive *zc)
> +{
> +       unsigned long pages_remaining = pages_to_map;
> +       int bytes_mapped;
> +       int ret;
> +
> +       ret = vm_insert_pages(vma, *insert_addr, pages, &pages_remaining);
> +       bytes_mapped = PAGE_SIZE * (pages_to_map - pages_remaining);
> +       /* Even if vm_insert_pages fails, it may have partially succeeded
> in
> +        * mapping (some but not all of the pages).
> +        */
> +       *seq += bytes_mapped;
> +       *insert_addr += bytes_mapped;
> +       if (ret) {
> +               /* But if vm_insert_pages did fail, we have to unroll some
> state
> +                * we speculatively touched before.
> +                */
> +               const int bytes_not_mapped = PAGE_SIZE * pages_remaining;
> +               *length_with_pending -= bytes_not_mapped;
> +               zc->recv_skip_hint += bytes_not_mapped;
> +       }
> +       return ret;
> +}
> +
>  static int tcp_zerocopy_receive(struct sock *sk,
>                                 struct tcp_zerocopy_receive *zc)
>  {
>         unsigned long address = (unsigned long)zc->address;
>         u32 length = 0, seq, offset, zap_len;
> +       #define PAGE_BATCH_SIZE 8
> +       struct page *pages[PAGE_BATCH_SIZE];
>         const skb_frag_t *frags = NULL;
>         struct vm_area_struct *vma;
>         struct sk_buff *skb = NULL;
> +       unsigned long pg_idx = 0;
> +       unsigned long curr_addr;
>         struct tcp_sock *tp;
>         int inq;
>         int ret;
> @@ -1754,6 +1788,8 @@ static int tcp_zerocopy_receive(struct s
>
>         sock_rps_record_flow(sk);
>
> +       tp = tcp_sk(sk);
> +
>         down_read(&current->mm->mmap_sem);
>
>         ret = -EINVAL;
> @@ -1762,7 +1798,6 @@ static int tcp_zerocopy_receive(struct s
>                 goto out;
>         zc->length = min_t(unsigned long, zc->length, vma->vm_end -
> address);
>
> -       tp = tcp_sk(sk);
>         seq = tp->copied_seq;
>         inq = tcp_inq(sk);
>         zc->length = min_t(u32, zc->length, inq);
> @@ -1774,8 +1809,20 @@ static int tcp_zerocopy_receive(struct s
>                 zc->recv_skip_hint = zc->length;
>         }
>         ret = 0;
> +       curr_addr = address;
>         while (length + PAGE_SIZE <= zc->length) {
>                 if (zc->recv_skip_hint < PAGE_SIZE) {
> +                       /* If we're here, finish the current batch. */
> +                       if (pg_idx) {
> +                               ret = tcp_zerocopy_vm_insert_batch(vma,
> pages,
> +                                                                  pg_idx,
> +
> &curr_addr,
> +                                                                  &length,
> +                                                                  &seq,
> zc);
> +                               if (ret)
> +                                       goto out;
> +                               pg_idx = 0;
> +                       }
>                         if (skb) {
>                                 if (zc->recv_skip_hint > 0)
>                                         break;
> @@ -1784,7 +1831,6 @@ static int tcp_zerocopy_receive(struct s
>                         } else {
>                                 skb = tcp_recv_skb(sk, seq, &offset);
>                         }
> -
>                         zc->recv_skip_hint = skb->len - offset;
>                         offset -= skb_headlen(skb);
>                         if ((int)offset < 0 || skb_has_frag_list(skb))
> @@ -1808,14 +1854,24 @@ static int tcp_zerocopy_receive(struct s
>                         zc->recv_skip_hint -= remaining;
>                         break;
>                 }
> -               ret = vm_insert_page(vma, address + length,
> -                                    skb_frag_page(frags));
> -               if (ret)
> -                       break;
> +               pages[pg_idx] = skb_frag_page(frags);
> +               pg_idx++;
>                 length += PAGE_SIZE;
> -               seq += PAGE_SIZE;
>                 zc->recv_skip_hint -= PAGE_SIZE;
>                 frags++;
> +               if (pg_idx == PAGE_BATCH_SIZE) {
> +                       ret = tcp_zerocopy_vm_insert_batch(vma, pages,
> pg_idx,
> +                                                          &curr_addr,
> &length,
> +                                                          &seq, zc);
> +                       if (ret)
> +                               goto out;
> +                       pg_idx = 0;
> +               }
> +       }
> +       if (pg_idx) {
> +               ret = tcp_zerocopy_vm_insert_batch(vma, pages, pg_idx,
> +                                                  &curr_addr, &length,
> &seq,
> +                                                  zc);
>         }
>  out:
>         up_read(&current->mm->mmap_sem);
> _
>
>

[-- Attachment #2: Type: text/html, Size: 10982 bytes --]

  reply	other threads:[~2020-04-10 19:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28  2:59 [PATCH resend mm,net-next 1/3] mm: Refactor insert_page to prepare for batched-lock insert Arjun Roy
2020-01-28  2:59 ` [PATCH resend mm,net-next 2/3] mm: Add vm_insert_pages() Arjun Roy
2020-02-13  2:41   ` Andrew Morton
2020-02-13 17:09     ` Arjun Roy
2020-02-13 21:37     ` Linus Torvalds
2020-02-13 21:54   ` Matthew Wilcox
2020-02-13 22:06     ` Arjun Roy
2020-01-28  2:59 ` [PATCH resend mm,net-next 3/3] net-zerocopy: Use vm_insert_pages() for tcp rcv zerocopy Arjun Roy
2020-02-13  2:56   ` Andrew Morton
2020-02-17  2:49     ` Arjun Roy
2020-02-21 21:21       ` Arjun Roy
2020-02-24  3:37         ` Andrew Morton
2020-02-24 16:19           ` Arjun Roy
2020-04-10 19:04         ` Andrew Morton
2020-04-10 19:13           ` Arjun Roy [this message]
2020-04-10 19:15           ` Arjun Roy
2020-02-13  2:41 ` [PATCH resend mm,net-next 1/3] mm: Refactor insert_page to prepare for batched-lock insert Andrew Morton
2020-02-13 16:52   ` Arjun Roy
2020-02-13 16:55   ` Arjun Roy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOFY-A0w-RkDf9PXROOVow3RwWXBeOx9e2kh7unM1EVARg7YXA@mail.gmail.com \
    --to=arjunroy@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjunroy.kdev@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=soheil@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox