linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Pavel Begunkov <asml.silence@gmail.com>,
	Mina Almasry <almasrymina@google.com>, David Wei <dw@davidwei.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	"David S. Miller" <davem@davemloft.net>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	Yonglong Liu <liuyonglong@huawei.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	linux-mm@kvack.org, netdev@vger.kernel.org
Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool
Date: Tue, 11 Mar 2025 14:44:15 +0100	[thread overview]
Message-ID: <87tt7ziswg.fsf@toke.dk> (raw)
In-Reply-To: <edc407d1-bd76-4c6b-a2b1-0f1313ca3be7@gmail.com>

Pavel Begunkov <asml.silence@gmail.com> writes:

> On 3/9/25 12:42, Toke Høiland-Jørgensen wrote:
>> Mina Almasry <almasrymina@google.com> writes:
>> 
>>> On Sat, Mar 8, 2025 at 6:55 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> When enabling DMA mapping in page_pool, pages are kept DMA mapped until
>>>> they are released from the pool, to avoid the overhead of re-mapping the
>>>> pages every time they are used. This causes problems when a device is
>>>> torn down, because the page pool can't unmap the pages until they are
>>>> returned to the pool. This causes resource leaks and/or crashes when
>>>> there are pages still outstanding while the device is torn down, because
>>>> page_pool will attempt an unmap of a non-existent DMA device on the
>>>> subsequent page return.
>>>>
>>>> To fix this, implement a simple tracking of outstanding dma-mapped pages
>>>> in page pool using an xarray. This was first suggested by Mina[0], and
>>>> turns out to be fairly straight forward: We simply store pointers to
>>>> pages directly in the xarray with xa_alloc() when they are first DMA
>>>> mapped, and remove them from the array on unmap. Then, when a page pool
>>>> is torn down, it can simply walk the xarray and unmap all pages still
>>>> present there before returning, which also allows us to get rid of the
>>>> get/put_device() calls in page_pool.
>>>
>>> THANK YOU!! I had been looking at the other proposals to fix this here
>>> and there and I had similar feelings to you. They add lots of code
>>> changes and the code changes themselves were hard for me to
>>> understand. I hope we can make this simpler approach work.
>> 
>> You're welcome :)
>> And yeah, me too!
>> 
>>>> Using xa_cmpxchg(), no additional
>>>> synchronisation is needed, as a page will only ever be unmapped once.
>>>>
>>>
>>> Very clever. I had been wondering how to handle the concurrency. I
>>> also think this works.
>> 
>> Thanks!
>> 
>>>> To avoid having to walk the entire xarray on unmap to find the page
>>>> reference, we stash the ID assigned by xa_alloc() into the page
>>>> structure itself, in the field previously called '_pp_mapping_pad' in
>>>> the page_pool struct inside struct page. This field overlaps with the
>>>> page->mapping pointer, which may turn out to be problematic, so an
>>>> alternative is probably needed. Sticking the ID into some of the upper
>>>> bits of page->pp_magic may work as an alternative, but that requires
>>>> further investigation. Using the 'mapping' field works well enough as
>>>> a demonstration for this RFC, though.
>>>>
>>>
>>> I'm unsure about this. I think page->mapping may be used when we map
>>> the page to the userspace in TCP zerocopy, but I'm really not sure.
>>> Yes, finding somewhere else to put the id would be ideal. Do we really
>>> need a full unsigned long for the pp_magic?
>> 
>> No, pp_magic was also my backup plan (see the other thread). Tried
>> actually doing that now, and while there's a bit of complication due to
>> the varying definitions of POISON_POINTER_DELTA across architectures,
>> but it seems that this can be defined at compile time. I'll send a v2
>> RFC with this change.
>
> FWIW, personally I like this one much more than an extra indirection
> to pp.
>
> If we're out of space in the page, why can't we use struct page *
> as indices into the xarray? Ala
>
> struct page *p = ...;
> xa_store(xarray, index=(unsigned long)p, p);
>
> Indices wouldn't be nicely packed, but it's still a map. Is there
> a problem with that I didn't consider?

Huh. As I just replied to Yunsheng, I was under the impression that this
was not supported. But since you're now the second person to suggest
this, I looked again, and it looks like I was wrong. There does indeed
seem to be other places in the kernel that does this.

As you say the indices won't be as densely packed, though. So I'm
wondering if using the bits in pp_magic would be better in any case to
get the better packing? I guess we can try benchmarking both approaches
and see if there's a measurable difference.

-Toke



  reply	other threads:[~2025-03-11 13:44 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-08 14:54 Toke Høiland-Jørgensen
2025-03-08 19:22 ` Mina Almasry
2025-03-09 12:42   ` Toke Høiland-Jørgensen
2025-03-11 13:25     ` Pavel Begunkov
2025-03-11 13:44       ` Toke Høiland-Jørgensen [this message]
2025-03-11 13:56         ` Pavel Begunkov
2025-03-11 15:49           ` Toke Høiland-Jørgensen
2025-03-11 16:46         ` Matthew Wilcox
2025-03-12 10:56           ` Toke Høiland-Jørgensen
2025-03-12 12:55           ` Pavel Begunkov
2025-03-10  7:17   ` Pavel Begunkov
2025-03-09 13:27 ` Yunsheng Lin
2025-03-10  9:13   ` Toke Høiland-Jørgensen
2025-03-10 12:35     ` Yunsheng Lin
2025-03-10 15:24       ` Toke Høiland-Jørgensen
2025-03-11 12:19         ` Yunsheng Lin
2025-03-11 13:26           ` Toke Høiland-Jørgensen
2025-03-12 12:04             ` Yunsheng Lin
2025-03-12 12:27               ` Toke Høiland-Jørgensen
2025-03-12 12:53                 ` Pavel Begunkov
2025-03-10 15:42     ` Matthew Wilcox
2025-03-10 17:26       ` Toke Høiland-Jørgensen
2025-03-11 15:03         ` Matthew Wilcox
2025-03-11 15:32           ` Toke Høiland-Jørgensen
2025-03-11 12:25       ` Yunsheng Lin
2025-03-11 15:11         ` Matthew Wilcox
2025-03-12 12:05           ` Yunsheng Lin
2025-03-12 18:35             ` Shuah
2025-03-12 18:48               ` shuah
2025-03-12 18:56                 ` Matthew Wilcox
2025-03-12 22:25                   ` Shuah Khan
2025-03-14 18:09                 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tt7ziswg.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=asml.silence@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox