From: Mina Almasry <almasrymina@google.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org,
"Willem de Bruijn" <willemb@google.com>,
"Kaiyuan Zhang" <kaiyuanz@google.com>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Eric Dumazet" <edumazet@google.com>,
"Christian König" <christian.koenig@amd.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Matthew Wilcox" <willy@infradead.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider
Date: Tue, 14 Nov 2023 04:58:05 -0800 [thread overview]
Message-ID: <CAHS8izMj_89dMVaMr73r1-3Kewgc1YL3A1mjvixoax2War8kUg@mail.gmail.com> (raw)
In-Reply-To: <fa5d2f4c-5ccc-e23e-1926-2d7625b66b91@huawei.com>
On Tue, Nov 14, 2023 at 4:49 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/11/14 20:21, Mina Almasry wrote:
> > On Tue, Nov 14, 2023 at 12:23 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> +cc Christian, Jason and Willy
> >>
> >> On 2023/11/14 7:05, Jakub Kicinski wrote:
> >>> On Mon, 13 Nov 2023 05:42:16 -0800 Mina Almasry wrote:
> >>>> You're doing exactly what I think you're doing, and what was nacked in RFC v1.
> >>>>
> >>>> You've converted 'struct page_pool_iov' to essentially become a
> >>>> duplicate of 'struct page'. Then, you're casting page_pool_iov* into
> >>>> struct page* in mp_dmabuf_devmem_alloc_pages(), then, you're calling
> >>>> mm APIs like page_ref_*() on the page_pool_iov* because you've fooled
> >>>> the mm stack into thinking dma-buf memory is a struct page.
> >>
> >> Yes, something like above, but I am not sure about the 'fooled the mm
> >> stack into thinking dma-buf memory is a struct page' part, because:
> >> 1. We never let the 'struct page' for devmem leaking out of net stacking
> >> through the 'not kmap()able and not readable' checking in your patchset.
> >
> > RFC never used dma-buf pages outside the net stack, so that is the same.
> >
> > You are not able to get rid of the 'net kmap()able and not readable'
> > checking with this approach, because dma-buf memory is fundamentally
> > unkmapable and unreadable. This approach would still need
> > skb_frags_not_readable checks in net stack, so that is also the same.
>
> Yes, I am agreed that checking is still needed whatever the proposal is.
>
> >
> >> 2. We inititiate page->_refcount for devmem to one and it remains as one,
> >> we will never call page_ref_inc()/page_ref_dec()/get_page()/put_page(),
> >> instead, we use page pool's pp_frag_count to do reference counting for
> >> devmem page in patch 6.
> >>
> >
> > I'm not sure that moves the needle in terms of allowing dma-buf
> > memory to look like struct pages.
> >
> >>>>
> >>>> RFC v1 was almost exactly the same, except instead of creating a
> >>>> duplicate definition of struct page, it just allocated 'struct page'
> >>>> instead of allocating another struct that is identical to struct page
> >>>> and casting it into struct page.
> >>
> >> Perhaps it is more accurate to say this is something between RFC v1 and
> >> RFC v3, in order to decouple 'struct page' for devmem from mm subsystem,
> >> but still have most unified handling for both normal memory and devmem
> >> in page pool and net stack.
> >>
> >> The main difference between this patchset and RFC v1:
> >> 1. The mm subsystem is not supposed to see the 'struct page' for devmem
> >> in this patchset, I guess we could say it is decoupled from the mm
> >> subsystem even though we still call PageTail()/page_ref_count()/
> >> page_is_pfmemalloc() on 'struct page' for devmem.
> >>
> >
> > In this patchset you pretty much allocate a struct page for your
> > dma-buf memory, and then cast it into a struct page, so all the mm
> > calls in page_pool.c are seeing a struct page when it's really dma-buf
> > memory.
> >
> > 'even though we still call
> > PageTail()/page_ref_count()/page_is_pfmemalloc() on 'struct page' for
> > devmem' is basically making dma-buf memory look like struct pages.
> >
> > Actually because you put the 'strtuct page for devmem' in
> > skb->bv_frag, the net stack will grab the 'struct page' for devmem
> > using skb_frag_page() then call things like page_address(), kmap,
> > get_page, put_page, etc, etc, etc.
>
> Yes, as above, skb_frags_not_readable() checking is still needed for
> kmap() and page_address().
>
> get_page, put_page related calling is avoided in page_pool_frag_ref()
> and napi_pp_put_page() for devmem page as the above checking is true
> for devmem page:
> (pp_iov->pp_magic & ~0x3UL) == PP_SIGNATURE
>
So, devmem needs special handling with if statement for refcounting,
even after using struct pages for devmem, which is not allowed (IIUC
the dma-buf maintainer).
> >
> >> The main difference between this patchset and RFC v3:
> >> 1. It reuses the 'struct page' to have more unified handling between
> >> normal page and devmem page for net stack.
> >
> > This is what was nacked in RFC v1.
> >
> >> 2. It relies on the page->pp_frag_count to do reference counting.
> >>
> >
> > I don't see you change any of the page_ref_* calls in page_pool.c, for
> > example this one:
> >
> > https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L601
> >
> > So the reference the page_pool is seeing is actually page->_refcount,
> > not page->pp_frag_count? I'm confused here. Is this a bug in the
> > patchset?
>
> page->_refcount is the same as page_pool_iov->_refcount for devmem, which
> is ensured by the 'PAGE_POOL_MATCH(_refcount, _refcount);', and
> page_pool_iov->_refcount is set to one in mp_dmabuf_devmem_alloc_pages()
> by calling 'refcount_set(&ppiov->_refcount, 1)' and always remains as one.
>
> So the 'page_ref_count(page) == 1' checking is always true for devmem page.
Which, of course, is a bug in the patchset, and it only works because
it's a POC for you. devmem pages (which shouldn't exist according to
the dma-buf maintainer, IIUC) can't be recycled all the time. See
SO_DEVMEM_DONTNEED patch in my RFC and refcounting needed for devmem.
--
Thanks,
Mina
next prev parent reply other threads:[~2023-11-14 12:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20231113130041.58124-1-linyunsheng@huawei.com>
[not found] ` <20231113130041.58124-4-linyunsheng@huawei.com>
[not found] ` <CAHS8izMjmj0DRT_vjzVq5HMQyXtZdVK=o4OP0gzbaN=aJdQ3ig@mail.gmail.com>
[not found] ` <20231113180554.1d1c6b1a@kernel.org>
2023-11-14 8:23 ` Yunsheng Lin
2023-11-14 12:21 ` Mina Almasry
2023-11-14 12:49 ` Yunsheng Lin
2023-11-14 12:58 ` Mina Almasry [this message]
2023-11-14 13:19 ` Yunsheng Lin
2023-11-14 15:41 ` Willem de Bruijn
2023-11-15 9:29 ` Yunsheng Lin
2023-11-15 18:07 ` Mina Almasry
2023-11-15 19:05 ` Mina Almasry
2023-11-16 11:12 ` Yunsheng Lin
2023-11-16 11:30 ` Mina Almasry
2023-11-14 13:16 ` Jason Gunthorpe
2023-11-15 6:46 ` Christian König
2023-11-15 9:21 ` Yunsheng Lin
2023-11-15 13:38 ` Jason Gunthorpe
2023-11-16 11:10 ` Yunsheng Lin
2023-11-16 15:31 ` Jason Gunthorpe
2023-11-15 17:44 ` Mina Almasry
2023-11-16 11:11 ` Yunsheng Lin
2023-11-15 17:57 ` David Ahern
2023-11-16 11:12 ` Yunsheng Lin
2023-11-16 15:58 ` David Ahern
2023-11-17 11:27 ` Yunsheng Lin
2023-11-14 22:25 ` Jakub Kicinski
2023-11-15 9:33 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHS8izMj_89dMVaMr73r1-3Kewgc1YL3A1mjvixoax2War8kUg@mail.gmail.com \
--to=almasrymina@google.com \
--cc=christian.koenig@amd.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jgg@nvidia.com \
--cc=kaiyuanz@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linyunsheng@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox