linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gur Stavi <gur.stavi@huawei.com>
To: <linyunsheng@huawei.com>
Cc: <akpm@linux-foundation.org>, <aleksander.lobakin@intel.com>,
	<alexander.duyck@gmail.com>,
	<angelogioacchino.delregno@collabora.com>,
	<anthony.l.nguyen@intel.com>, <ast@kernel.org>,
	<bpf@vger.kernel.org>, <daniel@iogearbox.net>,
	<davem@davemloft.net>, <edumazet@google.com>,
	<fanghaiqing@huawei.com>, <hawk@kernel.org>,
	<ilias.apalodimas@linaro.org>, <imx@lists.linux.dev>,
	<intel-wired-lan@lists.osuosl.org>, <iommu@lists.linux.dev>,
	<john.fastabend@gmail.com>, <kuba@kernel.org>, <kvalo@kernel.org>,
	<leon@kernel.org>, <linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-mediatek@lists.infradead.org>, <linux-mm@kvack.org>,
	<linux-rdma@vger.kernel.org>, <linux-wireless@vger.kernel.org>,
	<liuyonglong@huawei.com>, <lorenzo@kernel.org>,
	<matthias.bgg@gmail.com>, <nbd@nbd.name>,
	<netdev@vger.kernel.org>, <pabeni@redhat.com>,
	<przemyslaw.kitszel@intel.com>, <robin.murphy@arm.com>,
	<ryder.lee@mediatek.com>, <saeedm@nvidia.com>,
	<sean.wang@mediatek.com>, <shayne.chen@mediatek.com>,
	<shenwei.wang@nxp.com>, <tariqt@nvidia.com>, <wei.fang@nxp.com>,
	<xiaoning.wang@nxp.com>, <zhangkun09@huawei.com>
Subject: Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound
Date: Tue, 24 Sep 2024 09:45:59 +0300	[thread overview]
Message-ID: <20240924064559.1681488-1-gur.stavi@huawei.com> (raw)
In-Reply-To: <2fb8d278-62e0-4a81-a537-8f601f61e81d@huawei.com>

>>>> With all the caching in the network stack, some pages may be
>>>> held in the network stack without returning to the page_pool
>>>> soon enough, and with VF disable causing the driver unbound,
>>>> the page_pool does not stop the driver from doing it's
>>>> unbounding work, instead page_pool uses workqueue to check
>>>> if there is some pages coming back from the network stack
>>>> periodically, if there is any, it will do the dma unmmapping
>>>> related cleanup work.
>>>>
>>>> As mentioned in [1], attempting DMA unmaps after the driver
>>>> has already unbound may leak resources or at worst corrupt
>>>> memory. Fundamentally, the page pool code cannot allow DMA
>>>> mappings to outlive the driver they belong to.
>>>>
>>>> Currently it seems there are at least two cases that the page
>>>> is not released fast enough causing dma unmmapping done after
>>>> driver has already unbound:
>>>> 1. ipv4 packet defragmentation timeout: this seems to cause
>>>>    delay up to 30 secs:
>>>>
>>>> 2. skb_defer_free_flush(): this may cause infinite delay if
>>>>    there is no triggering for net_rx_action().
>>>>
>>>> In order not to do the dma unmmapping after driver has already
>>>> unbound and stall the unloading of the networking driver, add
>>>> the pool->items array to record all the pages including the ones
>>>> which are handed over to network stack, so the page_pool can
>>>> do the dma unmmapping for those pages when page_pool_destroy()
>>>> is called.
>>>
>>> So, I was thinking of a very similar idea. But what do you mean by
>>> "all"? The pages that are still in caches (slow or fast) of the pool
>>> will be unmapped during page_pool_destroy().
>>
>> Yes, it includes the one in pool->alloc and pool->ring.
>
> It worths mentioning that there is a semantics changing here:
> Before this patch, there can be almost unlimited inflight pages used by
> driver and network stack, as page_pool doesn't really track those pages.
> After this patch, as we use a fixed-size pool->items array to track the
> inflight pages, the inflight pages is limited by the pool->items, currently
> the size of pool->items array is calculated as below in this patch:
>
> +#define PAGE_POOL_MIN_ITEM_CNT	512
> +	unsigned int item_cnt = (params->pool_size ? : 1024) +
> +				PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_ITEM_CNT;
>
> Personally I would consider it is an advantage to limit how many pages which
> are used by the driver and network stack, the problem seems to how to decide
> the limited number of page used by network stack so that performance is not
> impacted.

In theory, with respect to the specific problem at hand, you only have
a limit on the number of mapped pages inflight. Once you reach this
limit you can unmap these old pages, forget about them and remember
new ones.


  reply	other threads:[~2024-09-24  6:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240918111826.863596-1-linyunsheng@huawei.com>
2024-09-18 11:18 ` Yunsheng Lin
2024-09-18 17:06   ` Ilias Apalodimas
2024-09-19  9:42     ` Jesper Dangaard Brouer
2024-09-19 11:15       ` Yunsheng Lin
2024-09-19 21:04         ` Jesper Dangaard Brouer
2024-09-20  5:29           ` Ilias Apalodimas
2024-09-20  6:14             ` Yunsheng Lin
2024-09-23 17:52               ` Jason Gunthorpe
2024-09-24  6:27                 ` Ilias Apalodimas
2024-09-19 10:54     ` Yunsheng Lin
2024-09-23  7:01       ` Yunsheng Lin
2024-09-24  6:45         ` Gur Stavi [this message]
2024-09-24  7:48           ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240924064559.1681488-1-gur.stavi@huawei.com \
    --to=gur.stavi@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=alexander.duyck@gmail.com \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fanghaiqing@huawei.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=imx@lists.linux.dev \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=iommu@lists.linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kvalo@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=lorenzo@kernel.org \
    --cc=matthias.bgg@gmail.com \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=robin.murphy@arm.com \
    --cc=ryder.lee@mediatek.com \
    --cc=saeedm@nvidia.com \
    --cc=sean.wang@mediatek.com \
    --cc=shayne.chen@mediatek.com \
    --cc=shenwei.wang@nxp.com \
    --cc=tariqt@nvidia.com \
    --cc=wei.fang@nxp.com \
    --cc=xiaoning.wang@nxp.com \
    --cc=zhangkun09@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox