From: Jason Gunthorpe <jgg@nvidia.com>
To: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Andrew Lunn <andrew@lunn.ch>, Eric Dumazet <edumazet@google.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
David Hildenbrand <david@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
Mina Almasry <almasrymina@google.com>,
willy@infradead.org, Christian Brauner <brauner@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
netdev@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Leon Romanovsky <leon@kernel.org>,
Logan Gunthorpe <logang@deltatee.com>
Subject: Re: How to handle P2P DMA with only {physaddr,len} in bio_vec?
Date: Tue, 24 Jun 2025 09:18:46 -0300 [thread overview]
Message-ID: <20250624121846.GE17127@nvidia.com> (raw)
In-Reply-To: <1143687.1750755725@warthog.procyon.org.uk>
On Tue, Jun 24, 2025 at 10:02:05AM +0100, David Howells wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
>
> > On Mon, Jun 23, 2025 at 11:50:58AM +0100, David Howells wrote:
> > > What's the best way to manage this without having to go back to the page
> > > struct for every DMA mapping we want to make?
> >
> > There isn't a very easy way. Also because if you actually need to do
> > peer to peer transfers, you right now absolutely need the page to find
> > the pgmap that has the information on how to perform the peer to peer
> > transfer.
>
> Are you expecting P2P to become particularly common?
It is becoming common place in certain kinds of server system
types. If half the system's memory is behind PCI on a GPU or something
then you need P2P.
> Do we actually need 32 bits for bv_len, especially given that MAX_RW_COUNT is
> capped at a bit less than 2GiB? Could we, say, do:
>
> struct bio_vec {
> phys_addr_t bv_phys;
> u32 bv_len:31;
> u32 bv_use_p2p:1;
> } __packed;
>
> And rather than storing the how-to-do-P2P info in the page struct, does it
> make sense to hold it separately, keyed on bv_phys?
I though we had agreed these sorts of 'mixed transfers' were not
desirable and we want things to be uniform at this lowest level.
So, I suggest the bio_vec should be entirely uniform, either it is all
CPU memory or it is all P2P from the same source. This is what the
block stack is doing by holding the P2P flag in the bio and splitting
the bios when they are constructed.
My intention to make a more general, less performant, API was to copy
what bio is doing and have a list of bio_vecs, each bio_vec having the
same properties.
The struct enclosing the bio_vec (the bio, etc) would have the the
flag if it is p2p and some way to get the needed p2p source metadata.
The bio_vec itself would just store physical addresses and lengths. No
need for complicated bit slicing.
I think this is important because the new DMA API really doesn't want
to be changing modes on a per-item basis..
Jason
next prev parent reply other threads:[~2025-06-24 12:18 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <0aa1b4a2-47b2-40a4-ae14-ce2dd457a1f7@lunn.ch>
[not found] ` <1015189.1746187621@warthog.procyon.org.uk>
2025-05-02 13:41 ` MSG_ZEROCOPY and the O_DIRECT vs fork() race David Howells
2025-05-02 13:48 ` David Hildenbrand
2025-05-02 14:21 ` Andrew Lunn
2025-05-02 16:21 ` Reorganising how the networking layer handles memory David Howells
2025-05-05 20:14 ` Jakub Kicinski
2025-05-06 13:50 ` David Howells
2025-05-06 13:56 ` Christoph Hellwig
2025-05-06 18:20 ` Jakub Kicinski
2025-05-07 13:45 ` David Howells
2025-05-07 17:47 ` Willem de Bruijn
2025-05-07 13:49 ` David Howells
2025-05-12 14:51 ` AF_UNIX/zerocopy/pipe/vmsplice/splice vs FOLL_PIN David Howells
2025-05-12 21:59 ` David Hildenbrand
2025-06-23 11:50 ` Christian Brauner
2025-06-23 13:53 ` Christoph Hellwig
2025-06-23 14:16 ` David Howells
2025-06-23 10:50 ` How to handle P2P DMA with only {physaddr,len} in bio_vec? David Howells
2025-06-23 13:46 ` Christoph Hellwig
2025-06-23 23:38 ` Alistair Popple
2025-06-24 9:02 ` David Howells
2025-06-24 12:18 ` Jason Gunthorpe [this message]
2025-06-24 12:39 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250624121846.GE17127@nvidia.com \
--to=jgg@nvidia.com \
--cc=almasrymina@google.com \
--cc=andrew@lunn.ch \
--cc=brauner@kernel.org \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=hch@infradead.org \
--cc=jhubbard@nvidia.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=logang@deltatee.com \
--cc=netdev@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox