From: Pavel Begunkov <asml.silence@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Mina Almasry" <almasrymina@google.com>,
"David Wei" <dw@davidwei.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
"David S. Miller" <davem@davemloft.net>,
Yunsheng Lin <linyunsheng@huawei.com>,
Yonglong Liu <liuyonglong@huawei.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
linux-mm@kvack.org, netdev@vger.kernel.org
Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
Date: Tue, 11 Mar 2025 13:25:11 +0000 [thread overview]
Message-ID: <edc407d1-bd76-4c6b-a2b1-0f1313ca3be7@gmail.com> (raw)
In-Reply-To: <87cyeqml3d.fsf@toke.dk>
On 3/9/25 12:42, Toke Høiland-Jørgensen wrote:
> Mina Almasry <almasrymina@google.com> writes:
>
>> On Sat, Mar 8, 2025 at 6:55 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>>> they are released from the pool, to avoid the overhead of re-mapping the
>>> pages every time they are used. This causes problems when a device is
>>> torn down, because the page pool can't unmap the pages until they are
>>> returned to the pool. This causes resource leaks and/or crashes when
>>> there are pages still outstanding while the device is torn down, because
>>> page_pool will attempt an unmap of a non-existent DMA device on the
>>> subsequent page return.
>>>
>>> To fix this, implement a simple tracking of outstanding dma-mapped pages
>>> in page pool using an xarray. This was first suggested by Mina[0], and
>>> turns out to be fairly straight forward: We simply store pointers to
>>> pages directly in the xarray with xa_alloc() when they are first DMA
>>> mapped, and remove them from the array on unmap. Then, when a page pool
>>> is torn down, it can simply walk the xarray and unmap all pages still
>>> present there before returning, which also allows us to get rid of the
>>> get/put_device() calls in page_pool.
>>
>> THANK YOU!! I had been looking at the other proposals to fix this here
>> and there and I had similar feelings to you. They add lots of code
>> changes and the code changes themselves were hard for me to
>> understand. I hope we can make this simpler approach work.
>
> You're welcome :)
> And yeah, me too!
>
>>> Using xa_cmpxchg(), no additional
>>> synchronisation is needed, as a page will only ever be unmapped once.
>>>
>>
>> Very clever. I had been wondering how to handle the concurrency. I
>> also think this works.
>
> Thanks!
>
>>> To avoid having to walk the entire xarray on unmap to find the page
>>> reference, we stash the ID assigned by xa_alloc() into the page
>>> structure itself, in the field previously called '_pp_mapping_pad' in
>>> the page_pool struct inside struct page. This field overlaps with the
>>> page->mapping pointer, which may turn out to be problematic, so an
>>> alternative is probably needed. Sticking the ID into some of the upper
>>> bits of page->pp_magic may work as an alternative, but that requires
>>> further investigation. Using the 'mapping' field works well enough as
>>> a demonstration for this RFC, though.
>>>
>>
>> I'm unsure about this. I think page->mapping may be used when we map
>> the page to the userspace in TCP zerocopy, but I'm really not sure.
>> Yes, finding somewhere else to put the id would be ideal. Do we really
>> need a full unsigned long for the pp_magic?
>
> No, pp_magic was also my backup plan (see the other thread). Tried
> actually doing that now, and while there's a bit of complication due to
> the varying definitions of POISON_POINTER_DELTA across architectures,
> but it seems that this can be defined at compile time. I'll send a v2
> RFC with this change.
FWIW, personally I like this one much more than an extra indirection
to pp.
If we're out of space in the page, why can't we use struct page *
as indices into the xarray? Ala
struct page *p = ...;
xa_store(xarray, index=(unsigned long)p, p);
Indices wouldn't be nicely packed, but it's still a map. Is there
a problem with that I didn't consider?
--
Pavel Begunkov
next prev parent reply other threads:[~2025-03-11 13:24 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-08 14:54 Toke Høiland-Jørgensen
2025-03-08 19:22 ` Mina Almasry
2025-03-09 12:42 ` Toke Høiland-Jørgensen
2025-03-11 13:25 ` Pavel Begunkov [this message]
2025-03-11 13:44 ` Toke Høiland-Jørgensen
2025-03-11 13:56 ` Pavel Begunkov
2025-03-11 15:49 ` Toke Høiland-Jørgensen
2025-03-11 16:46 ` Matthew Wilcox
2025-03-12 10:56 ` Toke Høiland-Jørgensen
2025-03-12 12:55 ` Pavel Begunkov
2025-03-10 7:17 ` Pavel Begunkov
2025-03-09 13:27 ` Yunsheng Lin
2025-03-10 9:13 ` Toke Høiland-Jørgensen
2025-03-10 12:35 ` Yunsheng Lin
2025-03-10 15:24 ` Toke Høiland-Jørgensen
2025-03-11 12:19 ` Yunsheng Lin
2025-03-11 13:26 ` Toke Høiland-Jørgensen
2025-03-12 12:04 ` Yunsheng Lin
2025-03-12 12:27 ` Toke Høiland-Jørgensen
2025-03-12 12:53 ` Pavel Begunkov
2025-03-10 15:42 ` Matthew Wilcox
2025-03-10 17:26 ` Toke Høiland-Jørgensen
2025-03-11 15:03 ` Matthew Wilcox
2025-03-11 15:32 ` Toke Høiland-Jørgensen
2025-03-11 12:25 ` Yunsheng Lin
2025-03-11 15:11 ` Matthew Wilcox
2025-03-12 12:05 ` Yunsheng Lin
2025-03-12 18:35 ` Shuah
2025-03-12 18:48 ` shuah
2025-03-12 18:56 ` Matthew Wilcox
2025-03-12 22:25 ` Shuah Khan
2025-03-14 18:09 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=edc407d1-bd76-4c6b-a2b1-0f1313ca3be7@gmail.com \
--to=asml.silence@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-mm@kvack.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox