From: Mina Almasry <almasrymina@google.com>
To: David Howells <dhowells@redhat.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
willy@infradead.org, hch@infradead.org,
Jakub Kicinski <kuba@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Byungchul Park <byungchul@sk.com>,
netfs@lists.linux.dev, netdev@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: Network filesystems and netmem
Date: Fri, 8 Aug 2025 16:28:23 -0700 [thread overview]
Message-ID: <CAHS8izOj3OCD6+6GD9foc53J7u9o6-VK+YS-1RUWU4NR6d-9bQ@mail.gmail.com> (raw)
In-Reply-To: <2941083.1754684186@warthog.procyon.org.uk>
On Fri, Aug 8, 2025 at 1:16 PM David Howells <dhowells@redhat.com> wrote:
>
> Mina Almasry <almasrymina@google.com> wrote:
>
> > > (1) The socket. We might want to group allocations relating to the same
> > > socket or destined to route through the same NIC together.
> > >
> > > (2) The destination address. Again, we might need to group by NIC. For TCP
> > > sockets, this likely doesn't matter as a connected TCP socket already
> > > knows this, but for a UDP socket, you can set that in sendmsg() (and
> > > indeed AF_RXRPC does just that).
> > >
> >
> > the page_pool model groups memory by NIC (struct netdev), not socket
> > or destination address. It may be feasible to extend it to be
> > per-socket, but I don't immediately understand what that entails
> > exactly. The page_pool uses the netdev for dma-mapping, i'm not sure
> > what it would use the socket or destination address for (unless it's
> > to grab the netdev :P).
>
> Yeah - but the network filesystem doesn't necessarily know anything about what
> NIC would be used... but a connected TCP socket surely does. Likewise, a UDP
> socket has to perform an address lookup to find the destination/route and thus
> the NIC.
>
> So, basically all three, the socket, the address and the flag would be hints,
> possibly unused for now.
>
> > Today the page_pool doesn't really care how long you hold onto the mem
> > allocated from it.
>
> It's not so much whether the page pool cares how long we hold on to the mem,
> but for a fragment allocator we want to group things together of similar
> lifetime as we don't get to reuse the page until all the things in it have
> been released.
>
> And if we're doing bulk DMA/IOMMU mapping, we also potentially have a second
> constraint: an IOMMU TLB entry may be keyed for a particular device.
>
> > Honestly the subject of whether to extend the page_pool or implement a
> > new allocator kinda comes up every once in a while.
>
> Do we actually use the netmem page pools only for receiving? If that's the
> case, then do I need to be managing this myself? Providing my own fragment
> allocator that handles bulk DMA mapping, that is. I'd prefer to use an
> existing one if I can.
>
Yes we only use page_pools for receiving at the moment. Some
discussion around using the page_pool for normal TX networking
happened in the past, but I can't find the thread.
I'm unsure what it would take to make it some-tx-path compatible off
the top of my head. At the very least, the page_pool at the moment has
some dependency/logic on napi-id that it may get from the driver, that
may need to be factored out. See all the places we touch pool->p.napi
in page_pool.c and other files. Or, like you said, you may want your
own fragment allocator, if wrestling the page_pool to do what you want
is too cumbersome.
--
Thanks,
Mina
prev parent reply other threads:[~2025-08-08 23:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-08 13:16 David Howells
2025-08-08 17:57 ` Mina Almasry
2025-08-08 20:16 ` David Howells
2025-08-08 23:28 ` Mina Almasry [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHS8izOj3OCD6+6GD9foc53J7u9o6-VK+YS-1RUWU4NR6d-9bQ@mail.gmail.com \
--to=almasrymina@google.com \
--cc=byungchul@sk.com \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=hch@infradead.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=netfs@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox