From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5BAADCF9C6B
	for <linux-mm@archiver.kernel.org>; Tue, 24 Sep 2024 07:48:26 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E171E6B00AE; Tue, 24 Sep 2024 03:48:25 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DC7556B00B0; Tue, 24 Sep 2024 03:48:25 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C8E726B00B1; Tue, 24 Sep 2024 03:48:25 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id AC9016B00AE
	for <linux-mm@kvack.org>; Tue, 24 Sep 2024 03:48:25 -0400 (EDT)
Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 58B07C1886
	for <linux-mm@kvack.org>; Tue, 24 Sep 2024 07:48:25 +0000 (UTC)
X-FDA: 82598854170.27.87EFE53
Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191])
	by imf29.hostedemail.com (Postfix) with ESMTP id 42A9512001A
	for <linux-mm@kvack.org>; Tue, 24 Sep 2024 07:48:21 +0000 (UTC)
Authentication-Results: imf29.hostedemail.com;
	dkim=none;
	dmarc=pass (policy=quarantine) header.from=huawei.com;
	spf=pass (imf29.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727164088; a=rsa-sha256;
	cv=none;
	b=CIA8jLDz37WZMuBOhwzkPnaWFF3PvnvXsbtmNKtlWyYjG/52USKnRFqNYBqtgewQYrKX10
	TkMZMG/Is3ViP2kMQvVa+ZGAIyJO5sDFy1BrqOGrL8sf3uz5PFUx4KXzv3hMT2CTI7QGX+
	Vpm67ea7ftY3uRkrcGlEWIWmhNc6H6A=
ARC-Authentication-Results: i=1;
	imf29.hostedemail.com;
	dkim=none;
	dmarc=pass (policy=quarantine) header.from=huawei.com;
	spf=pass (imf29.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1727164088;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=tAozLf9c+aDQa0XZQSewQFQRCpRH0y+FGJ/8Jcm6PUo=;
	b=0Xr5TmEjvEkTooIe6z6r4uTnFyKSb3x7ZjXAa8kI9KLQ7GwPO2vXojbNcOeMz5NoaLPuhu
	V3wjsKaTo7BIpKV7vW4ollM+g46lXRsMe9k8BuxqQYb6QCq6Ut1bcXBXXiFuwGFVZUuKhU
	zYNx0t/KKl5cfGaOyWuCH2oR4HB9sfE=
Received: from mail.maildlp.com (unknown [172.19.88.214])
	by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4XCX5H6pvcz2QTxJ;
	Tue, 24 Sep 2024 15:47:31 +0800 (CST)
Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61])
	by mail.maildlp.com (Postfix) with ESMTPS id A6BD31A016C;
	Tue, 24 Sep 2024 15:48:17 +0800 (CST)
Received: from [10.67.120.129] (10.67.120.129) by
 dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.11; Tue, 24 Sep 2024 15:48:17 +0800
Message-ID: <64730d70-e5b7-4117-9ee8-43f23543eafd@huawei.com>
Date: Tue, 24 Sep 2024 15:48:16 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has
 already unbound
To: Gur Stavi <gur.stavi@huawei.com>
CC: <akpm@linux-foundation.org>, <aleksander.lobakin@intel.com>,
	<alexander.duyck@gmail.com>, <angelogioacchino.delregno@collabora.com>,
	<anthony.l.nguyen@intel.com>, <ast@kernel.org>, <bpf@vger.kernel.org>,
	<daniel@iogearbox.net>, <davem@davemloft.net>, <edumazet@google.com>,
	<fanghaiqing@huawei.com>, <hawk@kernel.org>, <ilias.apalodimas@linaro.org>,
	<imx@lists.linux.dev>, <intel-wired-lan@lists.osuosl.org>,
	<iommu@lists.linux.dev>, <john.fastabend@gmail.com>, <kuba@kernel.org>,
	<kvalo@kernel.org>, <leon@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <linux-kernel@vger.kernel.org>,
	<linux-mediatek@lists.infradead.org>, <linux-mm@kvack.org>,
	<linux-rdma@vger.kernel.org>, <linux-wireless@vger.kernel.org>,
	<liuyonglong@huawei.com>, <lorenzo@kernel.org>, <matthias.bgg@gmail.com>,
	<nbd@nbd.name>, <netdev@vger.kernel.org>, <pabeni@redhat.com>,
	<przemyslaw.kitszel@intel.com>, <robin.murphy@arm.com>,
	<ryder.lee@mediatek.com>, <saeedm@nvidia.com>, <sean.wang@mediatek.com>,
	<shayne.chen@mediatek.com>, <shenwei.wang@nxp.com>, <tariqt@nvidia.com>,
	<wei.fang@nxp.com>, <xiaoning.wang@nxp.com>, <zhangkun09@huawei.com>
References: <2fb8d278-62e0-4a81-a537-8f601f61e81d@huawei.com>
 <20240924064559.1681488-1-gur.stavi@huawei.com>
Content-Language: en-US
From: Yunsheng Lin <linyunsheng@huawei.com>
In-Reply-To: <20240924064559.1681488-1-gur.stavi@huawei.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.67.120.129]
X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To
 dggpemf200006.china.huawei.com (7.185.36.61)
X-Rspam-User: 
X-Stat-Signature: p95bz9qu57t81rqz3ay5o33gzwxx7nhg
X-Rspamd-Queue-Id: 42A9512001A
X-Rspamd-Server: rspam02
X-HE-Tag: 1727164101-931306
X-HE-Meta: U2FsdGVkX1/8bEftP/X1jUnN1PZN021AALnsoIohA4Ldb0DukjpDKYaZstEfERwjTyudMsJxqMkCfQuPxo5sR95MtP9Tk6SbPWzD+Stb0Wl9B9gl3zw7nRlZ5Ya0iwIEOWL4b/SJz8d17+6R7nqJ+AItkWj7cPW2GHcqWTbT76Ks0f6+yG8nLXkhzne5yMV3e/GQ8sKzw5W74XPrdbL7yL8qad2r5NbT6kW3h7XRG9jQa/Nw4VM9orsSb2IrnZfZ0bFVgAqcTVmzMGn4y1WazEryKKUdYgC3sk3L5oQ0CNpLVsZjr16eOq134hPBtsQV07oc1syIkbxGhVHVlUG0N1wYyOjZ5PmSh4HMPhhgOv3WeI3H27vegcpSbEobsrzUvUZz6wJkp8c5giDyugddJ21gcIQOiibgO9hYVZK/QIcm58UhUvRPYeFQSdVypOAm9dDZIcrE+qpiCF1taBdoA+TiNA9iM/f8SxZj++2dyJ41HB2FGwnSal1ZSNb6FiLd4AgNL4FfppEJkuBdZxyCmtLcw7xXDguDhmnTPTdsRb1UqLAtsAgrtNg2nRvvp4kr0g9rd5E5w6r7gpyvmLiPJwwH5pM15MxxaMv/6pjG+zUvfGu9o0SQyw4mju2ZYWRidzRQ0EW0oTUqCL79Ch8jSgySo5bgGsUoStb2w6GGuiYQgrItwycrPLrF8LCNVVY1CqNZEDZ+4IF9AuTmY84Ymd12sKuFWZnrfXTbHU2tZi+cCBq2NOgbeP22hz1Qqvvi659Z+KpWUNI/dZh0suP7KnjydWt8iyNkkxMzj7wt/XBMRbSi+4wLrsBSCNA4u/QuvIi7I+EjxMjqGi5QVA5gg/zyDpRrxLQfrWAbalxeldyD6/x3XAstvB0blIcFIpjBNuT1l8oVMmC2wEhGRP28Ev6lDStNZFRoIgzpqrze52ri77zQiLqZSII5aky4HBgBAj3kLTu3yNg9cWAOk9C
 kxrJS61/
 Ob/MRLJgEzUS+kJXXI8kjn/U8PEPwAEWbIM4Akt43bKLaHOV66Sa+/uVvzJ2lG5XL7Hmm3EgUrV8FHBd5gvogJEHnHF4Lv+b6mOAKquAhgh7c2vUIPFp6CZtSCn77gRTpunZ2UdJ/rMHC3ZgmrmscTr9RmSJTTsn4tQHNWv0XA4Mdn7RVEn6JJbDwHd96GF6d+ut1GDsvD67Zd+wcO/vepgSMKYDR+4qRhRGmVUr7+K0BiO8KMxTIEGTbVplgY603uaXZXRnbrnyjDp21B9gdJqV6uB7nH970lYet
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 2024/9/24 14:45, Gur Stavi wrote:
>>>>> With all the caching in the network stack, some pages may be
>>>>> held in the network stack without returning to the page_pool
>>>>> soon enough, and with VF disable causing the driver unbound,
>>>>> the page_pool does not stop the driver from doing it's
>>>>> unbounding work, instead page_pool uses workqueue to check
>>>>> if there is some pages coming back from the network stack
>>>>> periodically, if there is any, it will do the dma unmmapping
>>>>> related cleanup work.
>>>>>
>>>>> As mentioned in [1], attempting DMA unmaps after the driver
>>>>> has already unbound may leak resources or at worst corrupt
>>>>> memory. Fundamentally, the page pool code cannot allow DMA
>>>>> mappings to outlive the driver they belong to.
>>>>>
>>>>> Currently it seems there are at least two cases that the page
>>>>> is not released fast enough causing dma unmmapping done after
>>>>> driver has already unbound:
>>>>> 1. ipv4 packet defragmentation timeout: this seems to cause
>>>>>    delay up to 30 secs:
>>>>>
>>>>> 2. skb_defer_free_flush(): this may cause infinite delay if
>>>>>    there is no triggering for net_rx_action().
>>>>>
>>>>> In order not to do the dma unmmapping after driver has already
>>>>> unbound and stall the unloading of the networking driver, add
>>>>> the pool->items array to record all the pages including the ones
>>>>> which are handed over to network stack, so the page_pool can
>>>>> do the dma unmmapping for those pages when page_pool_destroy()
>>>>> is called.
>>>>
>>>> So, I was thinking of a very similar idea. But what do you mean by
>>>> "all"? The pages that are still in caches (slow or fast) of the pool
>>>> will be unmapped during page_pool_destroy().
>>>
>>> Yes, it includes the one in pool->alloc and pool->ring.
>>
>> It worths mentioning that there is a semantics changing here:
>> Before this patch, there can be almost unlimited inflight pages used by
>> driver and network stack, as page_pool doesn't really track those pages.
>> After this patch, as we use a fixed-size pool->items array to track the
>> inflight pages, the inflight pages is limited by the pool->items, currently
>> the size of pool->items array is calculated as below in this patch:
>>
>> +#define PAGE_POOL_MIN_ITEM_CNT	512
>> +	unsigned int item_cnt = (params->pool_size ? : 1024) +
>> +				PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_ITEM_CNT;
>>
>> Personally I would consider it is an advantage to limit how many pages which
>> are used by the driver and network stack, the problem seems to how to decide
>> the limited number of page used by network stack so that performance is not
>> impacted.
> 
> In theory, with respect to the specific problem at hand, you only have
> a limit on the number of mapped pages inflight. Once you reach this
> limit you can unmap these old pages, forget about them and remember
> new ones.

Yes, it can be done theoretically.
The tricky part seems to be how to handle the concurrency problem when
we evict the old pages and the old pages are also returned back to the
page_pool concurrently without any locking or lockless operation.

For now each item has only one producer and one consumer, so we don't
really have to worry about the concurrency problem before calling
page_pool_item_uninit() in page_pool_destroy() to do the unmapping
for the inflight pages by using a newly added 'destroy_lock' lock.