linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Mina Almasry <almasrymina@google.com>
Cc: dhowells@redhat.com, willy@infradead.org, hch@infradead.org,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Byungchul Park <byungchul@sk.com>,
	netfs@lists.linux.dev, netdev@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Network filesystems and netmem
Date: Fri, 08 Aug 2025 14:16:39 +0100	[thread overview]
Message-ID: <2869548.1754658999@warthog.procyon.org.uk> (raw)

Hi Mina,

Apologies for not keeping up with the stuff I proposed, but I had to go and do
a load of bugfixing.  Anyway, that gave me time to think about the netmem
allocator and how *that* may be something network filesystems can make use of.
I particularly like the way it can do DMA/IOMMU mapping in bulk (at least, if
I understand it aright).

So what I'm thinking of is changing the network filesystems - at least the
ones I can - from using kmalloc() to allocate memory for protocol fragments to
using the netmem allocator.  However, I think this might need to be
parameterisable by:

 (1) The socket.  We might want to group allocations relating to the same
     socket or destined to route through the same NIC together.

 (2) The destination address.  Again, we might need to group by NIC.  For TCP
     sockets, this likely doesn't matter as a connected TCP socket already
     knows this, but for a UDP socket, you can set that in sendmsg() (and
     indeed AF_RXRPC does just that).

 (3) The lifetime.  On a crude level, I would provide a hint flag that
     indicates whether it may be retained for some time (e.g. rxrpc DATA
     packets or TCP data) or whether the data is something we aren't going to
     retain (e.g. rxrpc ACK packets) as we might want to group these
     differently.

So what I'm thinking of is creating a net core API that looks something like:

	#define NETMEM_HINT_UNRETAINED 0x1
	void *netmem_alloc(struct socket *sock, size_t len, unsigned int hints);
	void *netmem_free(void *mem);

though I'm tempted to make it:

	int netmem_alloc(struct socket *sock, size_t len, unsigned int hints,
			 struct bio_vec *bv);
	void netmem_free(struct bio_vec *bv);

to accommodate Christoph's plans for the future of bio_vec.

I'm going to leave the pin vs ref for direct I/O and splice issues and the
zerocopy-completion issues for later.

I'm using cifs as a testcase for this idea and now have it able to do
MSG_SPLICE_PAGES, though at the moment it's just grabbing pages and copying
data into them in the transport layer rather than using a fragment allocator
or netmem.  See:

https://lore.kernel.org/linux-fsdevel/20250806203705.2560493-4-dhowells@redhat.com/T/#t
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=cifs-experimental

David



             reply	other threads:[~2025-08-08 13:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-08 13:16 David Howells [this message]
2025-08-08 17:57 ` Mina Almasry
2025-08-08 20:16 ` David Howells
2025-08-08 23:28   ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2869548.1754658999@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=almasrymina@google.com \
    --cc=byungchul@sk.com \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox