From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E09ECC282DE for ; Mon, 10 Mar 2025 12:35:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03E1E280004; Mon, 10 Mar 2025 08:35:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F08F9280002; Mon, 10 Mar 2025 08:35:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAB39280004; Mon, 10 Mar 2025 08:35:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BA733280002 for ; Mon, 10 Mar 2025 08:35:37 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D16F3B7D99 for ; Mon, 10 Mar 2025 12:35:38 +0000 (UTC) X-FDA: 83205587556.20.F6C15D3 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf07.hostedemail.com (Postfix) with ESMTP id 46D0340008 for ; Mon, 10 Mar 2025 12:35:35 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741610137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=guN2xyUZ5ctiNPJuv8pp2M0U8Iam/5oOJATX3NddZp8=; b=OngEbSqoo9Ws981IbH8JZsaNg7nVI3aZEfpNMQZrB+i4b7WZe/GS9CCvnbAbed2pC/RsGL Exz4YW2dgBekrUSDdTWWqTIss9Y9hJdja5kfQQ7Latu1elWWCUt3NNHAhXchuhscwJC3pe 6abaNJ27/M6tV09pRKUA4cWEIEQ1aDY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741610137; a=rsa-sha256; cv=none; b=cYu0bm13JvCLH3qe9PLNgnUk1wi63Snjq/MjEh17E8tAb8PraIlkwpL2qmT63htQD/SheU ytvbo94b7f3nNzwFMvV+NKBA3qBJ77ugG+4xizzB/BQ3EkxweIDQVk1QOQod69u/NBkr0q trFQDuLcFuBzcVyLKioeemGVu+XwP5E= Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4ZBGXp4VB2zqVYD; Mon, 10 Mar 2025 20:34:02 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 423121400CA; Mon, 10 Mar 2025 20:35:32 +0800 (CST) Received: from [10.67.120.129] (10.67.120.129) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 10 Mar 2025 20:35:31 +0800 Message-ID: <2c363f6a-f9e4-4dd2-941d-db446c501885@huawei.com> Date: Mon, 10 Mar 2025 20:35:31 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool To: =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Yunsheng Lin , Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" CC: Yonglong Liu , Mina Almasry , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , , References: <20250308145500.14046-1-toke@redhat.com> <87cyepxn7n.fsf@toke.dk> Content-Language: en-US From: Yunsheng Lin In-Reply-To: <87cyepxn7n.fsf@toke.dk> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.129] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 46D0340008 X-Stat-Signature: dbw5hz8a9ibt1xgut7ngxu9tzzp3s1ji X-Rspam-User: X-HE-Tag: 1741610135-973675 X-HE-Meta: U2FsdGVkX199a58U9QqnXk3I+a4CQlmJstP174pUHAgRB/dzuF7CSn/87o+4hl2qAZ5ogX7OYaneNO3E0LXoJBS5tKhlDIkI/nXbrZuQXi045+BjfSLwocsJVsabrXYFTlsiQd2Kjuij3gl18t4ZP+IBMd1Q2fCoIsfPofZz2Jo8TDkZetGqYu/LWalTpvDXqLTl3j7k0ubKcMGeBcozW5nW8b/F6GDoc6H1fhXGjroY3JcN1TVQ9+KDhP6VTu1x4yW0t7QiuwfjkOBdaC8qIBO7b0CTm8nA3MrAAkfK8U294m8tBwdTzvtXfw3dw8/yX/2Nro9qD75vA1J8JgeYCciqvj/VX+OSeJ71D5oOB2siPYK2vmKgrm1MbEUVBlyOT60ylGXpLy8VbRxFDrbTwdlKsUARQsKjzTPwRLmpdgD3r2jU3q+qg7hOMKIIbVCYSTwFwnDOTPsS9fLSL1rRg0swGihABqP715K/CSqp78z754z3LqKTd1PFaIGJZgdq3OntBudh1ilgMh9iAk3GfJi3hVc51SGFbqvFnelGX8GHzkSw5LQpnrQE0T0KP0qJ84wZLkRCrfIshS1L9pD3FGmHQDdQqIn7DeRGuoTUyPkWpXI4yOJQrdkpXd1IH9X091E0dC4Krqo1AR07pp9z771T14kb17d8cc5PiltGrZftBIVdVzk/qGzSJeQrFt8I+F3NamqeT8+IBydKY4ENcUxcnQI/oMRw83o3Kd7VQFkXq183pqifxPUs4Vmqo1hCkeGMRgygisT13vXX/TA/0fbJ5FbV9aWVmUukAVz8xsVIgrPLCjWzqS+O0f2FsN2DJGYEfubg7t+U5RZMcStE9c2/QsPMKndJwKd4uX/DndTP/OHgMNl8axcGXd4QgptSsgIEju10RdGB3Kz6ZSgOo3eNnMmBc/qhwYKHeQ/VxIv2UIYne8lJxr363aemolCJnfUHmim7H06zFaKI7RA LYiKJmyC vOQ5cekmb8qpBqvoBM6Zghcir0JGXpe6AC51jRRtAWRqbKEhtuRPt3Ybz2ipqdbXkUEgGzoKIrSHZ1nGQaSn9C8dMbeJRWQlCTKwMLqKFU2LTnEmcPTsltC96+UWVxKR8aijfV0pwMdgpYeG+qpGF7jQkJJ7ZdPZ7wSrnCercWo0bPrawqfonvWxQ6Kh7pI3NwDlZ4uyVz/uBzhRwCPRPeS6GsLeCPfTxU9CDj3QxAUipvIK/qUaCFTE8KQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/3/10 17:13, Toke Høiland-Jørgensen wrote: ... > >> Also, we might need a similar locking or synchronisation for the dma >> sync API in order to skip the dma sync API when page_pool_destroy() is >> called too. > > Good point, but that seems a separate issue? And simpler to solve (just If I understand the comment from DMA experts correctly, the dma_sync API should not be allowed to be called after the dma_unmap API. > set pool->dma_sync to false when destroying?). Without locking or synchronisation, there is some small window between pool->dma_sync checking and dma sync API calling, during which the driver might have already unbound. > >>> To avoid having to walk the entire xarray on unmap to find the page >>> reference, we stash the ID assigned by xa_alloc() into the page >>> structure itself, in the field previously called '_pp_mapping_pad' in >>> the page_pool struct inside struct page. This field overlaps with the >>> page->mapping pointer, which may turn out to be problematic, so an >>> alternative is probably needed. Sticking the ID into some of the upper >>> bits of page->pp_magic may work as an alternative, but that requires >>> further investigation. Using the 'mapping' field works well enough as >>> a demonstration for this RFC, though. >> page->pp_magic seems to only have 16 bits space available at most when >> trying to reuse some unused bits in page->pp_magic, as BPF_PTR_POISON >> and STACK_DEPOT_POISON seems to already use 16 bits, so: >> 1. For 32 bits system, it seems there is only 16 bits left even if the >> ILLEGAL_POINTER_VALUE is defined as 0x0. >> 2. For 64 bits system, it seems there is only 12 bits left for powerpc >> as ILLEGAL_POINTER_VALUE is defined as 0x5deadbeef0000000, see >> arch/powerpc/Kconfig. >> >> So it seems we might need to limit the dma mapping count to 4096 or >> 65536? >> >> And to be honest, I am not sure if those 'unused' 12/16 bits can really >> be reused or not, I guess we might need suggestion from mm and arch >> experts here. > > Why do we need to care about BPF_PTR_POISON and STACK_DEPOT_POISON? > AFAICT, we only need to make sure we preserve the PP_SIGNATURE value. > See v2 of the RFC patch, the bit arithmetic there gives me: > > - 24 bits on 32-bit architectures > - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE) > - 32 bits on other 64-bit architectures > > Which seems to be plenty? I am really doubtful it is that simple, but we always can hear from the experts if it isn't:) > >>> Since all the tracking is performed on DMA map/unmap, no additional code >>> is needed in the fast path, meaning the performance overhead of this >>> tracking is negligible. The extra memory needed to track the pages is >>> neatly encapsulated inside xarray, which uses the 'struct xa_node' >>> structure to track items. This structure is 576 bytes long, with slots >>> for 64 items, meaning that a full node occurs only 9 bytes of overhead >>> per slot it tracks (in practice, it probably won't be this efficient, >>> but in any case it should be an acceptable overhead). >> >> Even if items is stored sequentially in xa_node at first, is it possible >> that there may be fragmentation in those xa_node when pages are released >> and allocated many times during packet processing? If yes, is there any >> fragmentation info about those xa_node? > > Some (that's what I mean with "not as efficient"). AFAICT, xa_array does > do some rebalancing of the underlying radix tree, freeing nodes when > they are no longer used. However, I am not too familiar with the xarray > code, so I don't know exactly how efficient this is in practice. I guess that is one of the disadvantages that an advanced struct like Xarray is used:( >