From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D54CECF9C6F for ; Mon, 23 Sep 2024 07:01:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4213B6B007B; Mon, 23 Sep 2024 03:01:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D0266B0083; Mon, 23 Sep 2024 03:01:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 297E56B0085; Mon, 23 Sep 2024 03:01:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 05C276B007B for ; Mon, 23 Sep 2024 03:01:21 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8040DAC244 for ; Mon, 23 Sep 2024 07:01:21 +0000 (UTC) X-FDA: 82595106762.11.5CC5516 Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf19.hostedemail.com (Postfix) with ESMTP id 788721A000F for ; Mon, 23 Sep 2024 07:01:18 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727074761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SvHXEKnklcWRc5WpOcCs0OhOTmMRy5vRLxuwBAoMX+w=; b=aZYeY83FJlgWnFsRu4Wp7aAmDuifLE8sp406ZUjwppOw3X/coqZMXiIHb6iyMcD5qz3Tqs dpIbMeI6kEN35n+Z6WaVww8bDJQXriT5VkuhQW9gZb+7Jx/fbWIG6EutYZdumAdI4lyX1j cvpgYsdZTlhJAkZMDwdIUrFbUULOOpA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727074761; a=rsa-sha256; cv=none; b=5xUQoHvlZG7eA8VHRachc46BlFVkgOg7em/vFQ23A0UEg2xGkF0Mr0N28cO+3d/Ij2JjF0 zEospLgYKFTT/BJFxZr9MtPzIWebmohZKCZaL484FR8zGJ6tOdYmVtmQuahxJ6jhnenpcl BSII8Jg/6xa92zMPklkab9A5hTR/ALg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4XBv6L5xx7z1ymGS; Mon, 23 Sep 2024 15:01:14 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 9CD3A1400F4; Mon, 23 Sep 2024 15:01:13 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 23 Sep 2024 15:01:13 +0800 Message-ID: <2fb8d278-62e0-4a81-a537-8f601f61e81d@huawei.com> Date: Mon, 23 Sep 2024 15:01:12 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net 2/2] page_pool: fix IOMMU crash when driver has already unbound From: Yunsheng Lin To: Ilias Apalodimas CC: , , , , , , Robin Murphy , Alexander Duyck , IOMMU , Wei Fang , Shenwei Wang , Clark Wang , Eric Dumazet , Tony Nguyen , Przemek Kitszel , Alexander Lobakin , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Felix Fietkau , Lorenzo Bianconi , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Andrew Morton , , , , , , , , , , References: <20240918111826.863596-1-linyunsheng@huawei.com> <20240918111826.863596-3-linyunsheng@huawei.com> <50a463d5-a5a1-422f-a4f7-d3587b12c265@huawei.com> Content-Language: en-US In-Reply-To: <50a463d5-a5a1-422f-a4f7-d3587b12c265@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemf200006.china.huawei.com (7.185.36.61) X-Stat-Signature: tz7r4ag6bingbauetd1dg9izdmyza1ys X-Rspamd-Queue-Id: 788721A000F X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1727074878-806404 X-HE-Meta: U2FsdGVkX193xxKyh20teXH9T1ZSPi2rO6Zg0uBYpqeX7yqGDiJxM8ENbQNRVbnPGqrAVANtB06VJBDM6xsgiHjtTrTlL3lCfs07PdZae7fAv0b6uohg6IRz9lNcAywEk/CMqwK3xcs5AZCYr994wMe9LUG/DTKIMswBQW+e9TQjXQfyHEECkGWKX/8bTggfetPBd5+SokAF6c+M7PLo249zZ84kRnHvYWolZQb5gimvYlGG3CJSvbFfpUXLV4tvuSktUTevKp3E69QVq1XvkBmx9ngtvtHQJPqLe5OKwF/wPRER+pvQ5csUSf37IOLlLTNDDmPwx2SB2V3m+IxGcIoWr2/y5kCDhqnxu3yFREgpFdY2d/CKu7kUwUFzwvvxoBpaw4Fx9TYc+lKtJmJ1LfXujDL/Pdq4RK1gzkZjBDCf6CmJkSSd3Jekkpapv6BehabGcWys063cBBSL5iJS3+I93TeDylrRLDtvOvdvRCgJjgfGaTG+tqffw3qgyJUt6zfeZe2iibPBfvlfoZOwDc/q+2s2m4zm+K30WWbzy9CjsvG8eLXBivZ8/AnmsapBCJpBT0T/3ETqlBYyf2qgMRaRsLypUBqk8ijMHMb7aT/JRs5gseap1VvzA7P7Ns36mCzCk5KPAz6v28BxX1+0TrtHhBs4nejJWG8tjD74sOBEO3bBTCf0cgqcbJag70jdo1RjRAn3Z5zVDhuunN4ruAVOxz+np13p9AwBCXWvF719oEDxrn96xNmwZxyYmcR09r+joJjLiq3vHANq9rIGSpHA9Jwobk0a1QGgD9zMPSOioHcwlBN5EkvqxDtbLoEXHE2bHyxFzUXg52tV3zL3yNcvqK8hChHXiNwt2Mz2fM+ldTGzzjUNEhWEVICeLaxFkfHMPTz1ZfePPsDDZfoy1OwFs5oizVeFtP0lW/q2vXBNO0AQ266YvYRueqwjCjEsKzKVU5GbWnEDBbhWxks cyTxsQwA k86IUiMtG5wIAwH8daJIYmSBUVZI/D64gFc+1aWsWo0MstU20uQeR5TF2FfLhFbt6dK5H2OW+K8ClfamUcr4IdZvCM0rdau7tl+TeiKhn0/pKwn/NlA5sNq+7XTduk3JsYPT/3rXjMsYAdLjOY4DavKPqvfDSFgq+7g8u//ekPDZ76cmuSAECuVzl5iDnN6Ck8Wds9HZc78bOpll0OZjbgvjovDJgimRoqmP6UpVxVQK6CJCQwjfud640Vfsbqq4gB9p8NvN6eS/26BPqSVkh1cwjM/+iFSUzCmybxe3d4blLO6KNm79kvkiGYg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/9/19 18:54, Yunsheng Lin wrote: > On 2024/9/19 1:06, Ilias Apalodimas wrote: >> Hi Yunsheng, >> >> Thanks for looking into this! >> >> On Wed, 18 Sept 2024 at 14:24, Yunsheng Lin wrote: >>> >>> Networking driver with page_pool support may hand over page >>> still with dma mapping to network stack and try to reuse that >>> page after network stack is done with it and passes it back >>> to page_pool to avoid the penalty of dma mapping/unmapping. >> >> I think you can shorten this to "If recycling and DMA mapping are >> enabled during the pool creation" > > I am not sure if I understand the 'recycling' part here. Is the > 'recycling' part referring to whether skb_mark_for_recycle() is > called to enable recycling for the skb? Is there still any driver > with page_pool support but doesn't call skb_mark_for_recycle() > when handing over page to network stack? > > For the 'DMA mapping' part, as there is no space in 'struct > page' to track the inflight pages, so 'pp' in 'struct page' > is renamed to 'pp_item' to enable the tracking of inflight > page. I tried shortening this for 'pool->dma_map being false' > when coding, but it seems differentiating the same field in > 'struct page' doesn't make much sense according to 'pool->dma_map' > as it means we might need to add an union in 'struct page' for > that to work and add additional checking to decide if it is 'pp' > or 'pp_item'. > >> >>> With all the caching in the network stack, some pages may be >>> held in the network stack without returning to the page_pool >>> soon enough, and with VF disable causing the driver unbound, >>> the page_pool does not stop the driver from doing it's >>> unbounding work, instead page_pool uses workqueue to check >>> if there is some pages coming back from the network stack >>> periodically, if there is any, it will do the dma unmmapping >>> related cleanup work. >>> >>> As mentioned in [1], attempting DMA unmaps after the driver >>> has already unbound may leak resources or at worst corrupt >>> memory. Fundamentally, the page pool code cannot allow DMA >>> mappings to outlive the driver they belong to. >>> >>> Currently it seems there are at least two cases that the page >>> is not released fast enough causing dma unmmapping done after >>> driver has already unbound: >>> 1. ipv4 packet defragmentation timeout: this seems to cause >>> delay up to 30 secs: >>> >>> 2. skb_defer_free_flush(): this may cause infinite delay if >>> there is no triggering for net_rx_action(). >>> >>> In order not to do the dma unmmapping after driver has already >>> unbound and stall the unloading of the networking driver, add >>> the pool->items array to record all the pages including the ones >>> which are handed over to network stack, so the page_pool can >>> do the dma unmmapping for those pages when page_pool_destroy() >>> is called. >> >> So, I was thinking of a very similar idea. But what do you mean by >> "all"? The pages that are still in caches (slow or fast) of the pool >> will be unmapped during page_pool_destroy(). > > Yes, it includes the one in pool->alloc and pool->ring. It worths mentioning that there is a semantics changing here: Before this patch, there can be almost unlimited inflight pages used by driver and network stack, as page_pool doesn't really track those pages. After this patch, as we use a fixed-size pool->items array to track the inflight pages, the inflight pages is limited by the pool->items, currently the size of pool->items array is calculated as below in this patch: +#define PAGE_POOL_MIN_ITEM_CNT 512 + unsigned int item_cnt = (params->pool_size ? : 1024) + + PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_ITEM_CNT; Personally I would consider it is an advantage to limit how many pages which are used by the driver and network stack, the problem seems to how to decide the limited number of page used by network stack so that performance is not impacted. > >> Don't we 'just' need a list of the inflight packets and their pages or >> fragments? What we could do is go through that list and unmap these >> pages during page_pool_destroy(). > > The main reason for that is to avoid the overhead of page_pool_item_del() > and page_pool_item_add() when allocing/freeing page from/to pool->alloc > and pool->ring. > > Yes, including the pages in pool->ring seems to make the pool->ring > somewhat duplicated, maybe we can remove pool->ring if we can make > and prove 'pool->items' is performing better than pool->ring in the > future? > >> >> I'll have a closer look at the patch tomorrow > > Thanks for the reviewing. > >> >> Thanks! >> /Ilias >> >