From: Harry Yoo <harry.yoo@oracle.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Byungchul Park <byungchul@sk.com>,
willy@infradead.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel_team@skhynix.com, almasrymina@google.com,
ilias.apalodimas@linaro.org, hawk@kernel.org,
akpm@linux-foundation.org, davem@davemloft.net,
john.fastabend@gmail.com, andrew+netdev@lunn.ch,
asml.silence@gmail.com, toke@redhat.com, tariqt@nvidia.com,
edumazet@google.com, pabeni@redhat.com, saeedm@nvidia.com,
leon@kernel.org, ast@kernel.org, daniel@iogearbox.net,
david@redhat.com, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, horms@kernel.org,
linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
vishal.moola@gmail.com, hannes@cmpxchg.org, ziy@nvidia.com,
jackmanb@google.com
Subject: Re: [PATCH net-next v7 1/7] netmem: introduce struct netmem_desc mirroring struct page
Date: Mon, 30 Jun 2025 08:34:48 +0900 [thread overview]
Message-ID: <aGHNmKRng9H6kTqz@hyeyoo> (raw)
In-Reply-To: <20250627173730.15b25a8c@kernel.org>
On Fri, Jun 27, 2025 at 05:37:30PM -0700, Jakub Kicinski wrote:
> On Fri, 27 Jun 2025 12:54:05 +0900 Byungchul Park wrote:
> > On Thu, Jun 26, 2025 at 05:49:04PM -0700, Jakub Kicinski wrote:
> > > On Wed, 25 Jun 2025 13:33:44 +0900 Byungchul Park wrote:
> > > > +/* A memory descriptor representing abstract networking I/O vectors,
> > > > + * generally for non-pages memory that doesn't have its corresponding
> > > > + * struct page and needs to be explicitly allocated through slab.
> > >
> > > I still don't get what your final object set is going to be.
> >
> > The ultimate goal is:
> >
> > Remove the pp fields from struct page
> >
> > The second important goal is:
> >
> > Introduce a network pp descriptor, netmem_desc
> >
> > While working on these two goals, I added some extra patches too, to
> > clean up related code if it's obvious e.g. patches for renaming and so
> > on.
>
> Object set. Not objective.
>
> > > We have
> > > - CPU-readable buffers (struct page)
> > > - un-readable buffers (struct net_iov)
> > > - abstract reference which can be a pointer to either of the
> > > above two (bitwise netmem_ref)
> > >
> > > You say you want to evacuate page pool state from struct page
> > > so I'd expect you to add a type which can always be fed into
> > > some form of $type_to_virt(). A type which can always be cast
> > > to net_iov, but not vice versa. So why are you putting things
> > > inside net_iov, not outside.
> >
> > The type, struct netmem_desc, is declared outside. Even though it's
> > used overlaying on struct page *for now*, it will be dynamically
> > allocated through slab shortly - it's also one of mm's plan.
> >
> > As you know, net_iov is working with the assumption that it overlays on
> > struct page *for now* indeed, when it comes to netmem_ref. See the
> > following APIs as example:
> >
> > static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> > {
> > return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> > }
> >
> > static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
> > {
> > __netmem_clear_lsb(netmem)->pp = pool;
> > }
> >
> > I'd say, I replaced the overlaying (on struct page) part with a
> > well-defined struct, netmem_desc that will play the role of struct page
> > for pp usage, instead of a set of the current overlaying fields of
> > net_iov.
> >
> > This 'introduction of netmem_desc' patch can be the base for network
> > code to use netmem_desc as pp descriptor instead of struct page. That's
> > what I meant.
> >
> > Am I missing something or got you wrong? If yes, please explain in more
> > detail then I will get back with the answer.
>
> Ugh, you keep explaining the mechanics to me. Our goal here is not
> just to move fields around and make it still compile :/
>
> Let me ask you this way: you said "netmem_desc" will be allocated
> thru slab "shortly". How will calling the equivalent of page_address()
> on netmem_desc work at that stage? Feel free to refer me to the existing
> docs if its covered..
https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
https://kernelnewbies.org/MatthewWilcox/Memdescs
May not be the exact document you're looking for,
but with this article I can imagine:
- The ultimate goal is to shrink struct page to eventually from 64 bytes
to 8 bytes, by allocating only the minimum required metadata per 4k page
statically and moving the rest of metadata to dynamically-allocated
descriptors (netmem_desc, anon, file, ptdesc, zpdesc, etc.) using slab
at page allocation time.
- We can't achieve that goal just yet, because several subsystems
still use struct page fields for their own purposes.
To achieve that, each of these subsystems needs to define
its own descriptor, which, for now, overlays struct page, and should be
converted to use the new descriptor.
Eventually, these descriptors will be allocated using slab.
- For CPU-readable buffers, page->memdesc will point to a netmem_desc,
with a lower bit set indicating that it's a netmem_desc rather than
other type. Networking code will need to cast it to (netmem_desc *)
and dereference it to access networking specific fields.
- The struct page array (vmemmap) will still be statically allocated
at boot time (or during memory hotplug time).
So no change in how page_address() works.
net_iovs will continue to be not associated with struct pages,
as the buffers don't have corresponding struct pages.
net_iovs are already allocated using slab.
HTH
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-06-29 23:35 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 4:33 [PATCH net-next v7 0/7] Split netmem from " Byungchul Park
2025-06-25 4:33 ` [PATCH net-next v7 1/7] netmem: introduce struct netmem_desc mirroring " Byungchul Park
2025-06-27 0:49 ` Jakub Kicinski
2025-06-27 3:54 ` Byungchul Park
2025-06-28 0:37 ` Jakub Kicinski
2025-06-29 23:34 ` Harry Yoo [this message]
2025-07-01 23:45 ` Jakub Kicinski
2025-07-07 0:21 ` Byungchul Park
2025-07-07 18:42 ` Jakub Kicinski
2025-07-14 9:42 ` Harry Yoo
2025-07-14 13:24 ` Vlastimil Babka
2025-07-14 13:58 ` Harry Yoo
2025-07-15 1:47 ` Jakub Kicinski
2025-07-15 2:08 ` Byungchul Park
2025-07-15 10:23 ` Pavel Begunkov
2025-06-25 4:33 ` [PATCH net-next v7 2/7] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-06-25 4:33 ` [PATCH net-next v7 3/7] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-06-25 4:33 ` [PATCH net-next v7 4/7] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-06-25 4:33 ` [PATCH net-next v7 5/7] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-06-25 4:33 ` [PATCH net-next v7 6/7] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-06-25 4:33 ` [PATCH net-next v7 7/7] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
2025-06-27 0:49 ` Jakub Kicinski
2025-06-26 6:41 ` [PATCH net-next v7 0/7] Split netmem from struct page Byungchul Park
2025-06-27 0:50 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGHNmKRng9H6kTqz@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=byungchul@sk.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=edumazet@google.com \
--cc=hannes@cmpxchg.org \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jackmanb@google.com \
--cc=john.fastabend@gmail.com \
--cc=kernel_team@skhynix.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=surenb@google.com \
--cc=tariqt@nvidia.com \
--cc=toke@redhat.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox