linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: David Howells <dhowells@redhat.com>
Cc: willy@infradead.org, hch@infradead.org,
	Jakub Kicinski <kuba@kernel.org>,
	 Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org
Subject: Re: Device mem changes vs pinning/zerocopy changes
Date: Thu, 5 Jun 2025 12:27:49 -0700	[thread overview]
Message-ID: <CAHS8izNgJaj=S7HJ0Pjt2TaCA8_=vgmptzE2obmdLOuo8gby-w@mail.gmail.com> (raw)
In-Reply-To: <1098853.1749051265@warthog.procyon.org.uk>

On Wed, Jun 4, 2025 at 8:34 AM David Howells <dhowells@redhat.com> wrote:
> > FWIW, my initial gut feeling is that the work doesn't conflict that much.
> > The tcp devmem netmem/net_iov stuff is designed to follow the page stuff,
> > and as the usage of struct page changes we're happy moving net_iovs and
> > netmems to do the same thing. My read is that it will take a small amount of
> > extra work, but there are no in-principle design conflicts, at least AFAICT
> > so far.
>
> The problem is more the code you changed in the current merge window I'm also
> wanting to change, so merge conflicts will arise.
>
> However, I'm also looking to move the points at which refs are taken/dropped
> which will directly inpinge on the design of the code that's currently
> upstream.
>
> Would it help if I created some diagrams to show what I'm thinking of?
>

I think I understand what you want to do, but I'm happy looking at
diagrams or jumping on a call if needed.

[snip]

> > I think to accomplish what you're describing we need to modify
> > skb_frag_ref to do something else other than taking a reference on the
> > page or net_iov. I think maybe taking a reference on the skb itself
> > may be acceptable, and the skb can 'guarantee' that the individual
> > frags underneath it don't disappear while these functions are
> > executing.
>
> Maybe.  There is an issue with that, though it may not be insurmountable: If a
> userspace process does, say, a MSG_ZEROCOPY send of a page worth of data over
> TCP, under a typicalish MTU, say, 1500, this will be split across at least
> three skbuffs.
>
> This would involve making a call into GUP to get a pin - but we'd need a
> separate pin for each skbuff and we might (in fact we currently do) end up
> calling into GUP thrice to do the address translation and page pinning.
>
> What I want to do is to put this outside of the skbuff so that GUP pin can be
> shared - but if, instead, we attach a pin to each skbuff, we need to get that
> extra pin in some way.  Now, it may be reasonable to add a "get me an extra
> pin for such-and-such a range" thing and store the {physaddr,len} in the
> skbuff fragment, but we also have to be careful not to overrun the pin count -
> if there's even a pin count per se.
>

I think I understand. Currently the GUP is done in this call stack
(some helpers omitted), right?

tcp_send_message_locked
  skb_zerocopy_iter_stream
    zerocopy_fill_skb_from_iter
      iov_iter_get_pages2
        get_user_pages_fast

I think maybe the extra ref management you're referring to can be
tacked on to ubuf_info_msgzc? I still don't understand the need for a
completely new net_txbuf when the existing one seems to be almost what
you need, but I may be missing something.

I'm thinking, very roughly, I'm probably missing a lot of details:

1. Move the GUP call to msg_zerocopy_realloc, and save the pages array there.
2. Pass the ubuf_info_msgzc down to zerocopy_fill_skb_from_iter, and
have it fill the skb with pages from the GUP.
3. Modify skb_frag_ref such that if we want a reference on a frag that
belongs to a ubuf_info_msgzc, we grab a reference on the ubuf rather
than the frag.
4. Onces the ubuf_info_msgzc refcount hits 0, you can un-GUP the memory?

-- 
Thanks,
Mina


  reply	other threads:[~2025-06-05 19:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30 15:14 David Howells
2025-05-30 15:50 ` Stanislav Fomichev
2025-05-30 16:22 ` Mina Almasry
2025-06-04 14:56 ` David Howells
2025-06-05 18:59   ` Mina Almasry
2025-06-04 15:34 ` David Howells
2025-06-05 19:27   ` Mina Almasry [this message]
2025-06-04 15:59 ` David Howells
2025-06-05 19:30   ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izNgJaj=S7HJ0Pjt2TaCA8_=vgmptzE2obmdLOuo8gby-w@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox