From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 299B1CDE02A for ; Thu, 26 Sep 2024 18:15:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81E256B0085; Thu, 26 Sep 2024 14:15:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7CDA66B0093; Thu, 26 Sep 2024 14:15:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66DF46B0096; Thu, 26 Sep 2024 14:15:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4AC9B6B0085 for ; Thu, 26 Sep 2024 14:15:35 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BBE4B402DD for ; Thu, 26 Sep 2024 18:15:34 +0000 (UTC) X-FDA: 82607692188.23.78CD1F5 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf27.hostedemail.com (Postfix) with ESMTP id DFA4440019 for ; Thu, 26 Sep 2024 18:15:32 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ISIjj5pn; spf=pass (imf27.hostedemail.com: domain of almasrymina@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727374412; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ghI5wl9Rodk/qOTN/GXzEVkvMG0ZFbywRGFDuKaTQGk=; b=OO/AYG1aSwk4e5iDrVbrGNZOR4yDv/gdpWHycJ0E8i2k/c5WaaPbkq6KF9y+CTGe6HLDXf okkD1f6NoLIURc24vOh6KB3/AWBPTXDrUGtK4dm6dnQhE4RJzpd7YNvm4aNqD/1oq9cKdf N8VrrjABr7B8Rr7JliNOqIhEFdHLCAo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727374412; a=rsa-sha256; cv=none; b=Nv+IsbQiJkEMTpBXYRCEIx9eZ2+kMxv1MgP52T9lbI9Hl0qWHSRRb39DO5FcNNv7A0jGTt JBsuWIP7r48gT10cBRdIV2xrI0nWawZv6GYRp6ss26tGPLqNO3/SySz5nyk925P/+o+ZQG zr7MRIAoKewDsQqZFDVX9saaLdan/Rg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ISIjj5pn; spf=pass (imf27.hostedemail.com: domain of almasrymina@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4582a5b495cso35851cf.1 for ; Thu, 26 Sep 2024 11:15:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727374532; x=1727979332; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ghI5wl9Rodk/qOTN/GXzEVkvMG0ZFbywRGFDuKaTQGk=; b=ISIjj5pnBZpxk9gCDAVISw8ng7mmdsnDhqXx18rHPkJmFOgQVY1moUIRKnU/3buIVo MGrNYdAukknnM987JsrG0kB7TpBZdesg3hTwAuRdKLyVDf9HrI+/QBiWxL1WTVFSUQHk 5WhtyQrcBPl7oNxZWqqmM0aojfZ+YT9+NoEfZr2ZRhMQP5SD3Uh9EwfaQfhH78fbEtAk xGBHp5PCQDbhD975IJdGHflf9/SyWJPGZLHqQKPB4d9G6NMiCag+7uyU/FcIRzOHQ7h2 YWCbh32A7t5u3nb9f6SigfDtw8OIgHjnMGA7SiDK0m77jNWD1tj0DuqKh7hATef5IDyE Zmxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727374532; x=1727979332; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ghI5wl9Rodk/qOTN/GXzEVkvMG0ZFbywRGFDuKaTQGk=; b=Z7JtgKP9nyUDDrkkY2iWDCXx5+UBA6JjZ7IGmkpBlx08t/umlYfViA+l4u5H9C+fVZ vd8bRVCNeQjeDHLw7LhfoG1+bEK2/EP2SR02yK95kKK/X5gdTgU9RZWBqdPGN/7gaZyf EDpPh16xo8O/Z0tj3ZraaZE1lZJGwTN24sePyC54mDFT9dKGUn0TwJBIahv4dSuiL1Jb arm+MfH7AmiVAbAR5A/hWMp6AgIYGrdXNTckbk+IYNhatG3JV0UWf5zg9wPXaAMsMQKH ITaOLgkHhZ+6Hb74IHwHhB9uszatX2OvL5+sSwe+cU0y3HLMe7sqRyE0N1Zu/wNdxnNF 8kCw== X-Forwarded-Encrypted: i=1; AJvYcCUd6mCORhyjJ3UCgqpMepCwbbAV6jeb4cANrL59O1GDq5c2abG10dTqlWWLA/jGavdvgaPXzcPX5w==@kvack.org X-Gm-Message-State: AOJu0YzHWzlI1PHTnDw6vmlg19fAf9UnXKf4LDnLE6O3ZhBS20vIDsJm KlvIajuQE3FpVJpzwPt+ALiqkPzEOceuz/Dz1q8+DWamKhxgbsUL9e6MorDPOjK6bwAf+Opstrr 3rAzOKTzekFGG+TXjeCDUjKvUu6/a/BNTH50c X-Google-Smtp-Source: AGHT+IEKskNfWv42/OYoUzFnd0ZbE37pwmmbLc0UUgMIXvnGdU5BvAly31C36TplRHtvwCbfAj6CwKAwWZwa5Zx659M= X-Received: by 2002:a05:622a:24a:b0:453:62ee:3fe with SMTP id d75a77b69052e-45ca03e60f6mr134981cf.17.1727374531410; Thu, 26 Sep 2024 11:15:31 -0700 (PDT) MIME-Version: 1.0 References: <20240925075707.3970187-1-linyunsheng@huawei.com> <20240925075707.3970187-3-linyunsheng@huawei.com> In-Reply-To: <20240925075707.3970187-3-linyunsheng@huawei.com> From: Mina Almasry Date: Thu, 26 Sep 2024 11:15:15 -0700 Message-ID: Subject: Re: [PATCH net v2 2/2] page_pool: fix IOMMU crash when driver has already unbound To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, liuyonglong@huawei.com, fanghaiqing@huawei.com, zhangkun09@huawei.com, Robin Murphy , Alexander Duyck , IOMMU , Wei Fang , Shenwei Wang , Clark Wang , Eric Dumazet , Tony Nguyen , Przemek Kitszel , Alexander Lobakin , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Felix Fietkau , Lorenzo Bianconi , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Andrew Morton , Ilias Apalodimas , imx@lists.linux.dev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, bpf@vger.kernel.org, linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DFA4440019 X-Stat-Signature: 8uf1oarake6ou1n1xousishejw7q55ua X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1727374532-619912 X-HE-Meta: U2FsdGVkX19iYbo53+WqRIOAYrSwQ/22W1Ejj8fxlVcT6fHhw8voxXBeVEa2RkzCcT7uta5GE0rM4jWs+qsz5mwhrDgTVfIXSljnzKhmODNF6ZKkAXKw3FkLp17NsheJcMzR+BYyvGrt8FRMixawwfJc2HfiY/uTzK6S8QpNGOgVKt+0qq8A50FKSXaUzgIpJd8xb3ZHFt9bodf2dB1AojXNZKp5C9aDV7RnPGeu82hvC1GWO4EAhMJHN8/l4QWp9eoO3VhnKE2/B7Ya8qXzK1gkWlGvqpdtvgLPcK/20R5HWjNphBKKEGTq4A1UcF6pCFs+ossvMf3XBhx9UYUetoitrQli2gmQPF4ai+PG28/EH0/3aHF9O5nhzWRvHQ7ju4zIYGofB+s4Ya0S4vSNbRyXFygTrcldkg6CAWlaX6hoAbo4sf8imIIYHZHJtlRleuTdhOrOnh1+G7DAPJ4MzxU7IMiMFXzLM0HgZzraWkB/4W5O3TN8vkS9M5nNFKf8oSTRpMOI6LYb4jELd8dfal5ti+Gvz0Rmu6zcFXwS8XiCNvMm6Kgshs1UaOTvHTT9tjNPY3WUC0j/tMEg9SM/bWG1L8ojYK21+bXrsBwG4lWr2acrPmrOElOO2kBZfpCfAi/TvBqzjsHTZTRW+/QUHr1518MvY5qCH0HvufAJ5Vh3ftIxugIMb2QdFqC7aHVH+AfHpGeWNPxYFOG6Tmcrk93L5iIg4InBZPhB0JrQ47FBytrmi2ryaRDoBZp+WDM24OKzkXHZAYDubaBSJ/SZ1p/5hg91WeguO6ZRyHkoALAmf38KoF58GQwVoQimnP+v+SzSEVoXrtAcTp6qhftiYJ5V3YH0ovmj/cn+GT1+wxhf24+Mt9ZGNFlc59Ru8m7jRTUKIpZXroXCYa79nHwnh4B1eZXDLPKnCIHH/cG/e/zRelcxkvGJXIQJk8akdy/jYNBTmpuD0RzDepdk9TU LLIegknQ MJbsiiL2kPb+2MB73ByOq3L+p7U6xWF7JSDNyKqaWiDd39zo5Unp8EIs4R831pyCsFSBr/to4aMuaMWbbardbVE++tVBfIoS0Rwv5k2KbqiSsA+GZghHlXlxOQRUsoVkJ7oN+TimhONslMdGlhM4xcb02TkWB6r6jYQZ7i260R9UxtRHYSGctyyCczajjFrfhjrQNSWCdxTbmS4zU5UDjhfHkvblFoGVUmqlDvCSob/cfqRVxTxCyBhKH8RIjlUDzlhnnW9VexOhVrJ9692QxKYUX+944nIcwKjwv6sNZUYMAXckmUl7NkzV6khBmFUeU6RzO4DwK0KkFVLuSUTpDeRGyfw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 25, 2024 at 1:03=E2=80=AFAM Yunsheng Lin wrote: > > Networking driver with page_pool support may hand over page > still with dma mapping to network stack and try to reuse that > page after network stack is done with it and passes it back > to page_pool to avoid the penalty of dma mapping/unmapping. > With all the caching in the network stack, some pages may be > held in the network stack without returning to the page_pool > soon enough, and with VF disable causing the driver unbound, > the page_pool does not stop the driver from doing it's > unbounding work, instead page_pool uses workqueue to check > if there is some pages coming back from the network stack > periodically, if there is any, it will do the dma unmmapping > related cleanup work. > > As mentioned in [1], attempting DMA unmaps after the driver > has already unbound may leak resources or at worst corrupt > memory. Fundamentally, the page pool code cannot allow DMA > mappings to outlive the driver they belong to. > > Currently it seems there are at least two cases that the page > is not released fast enough causing dma unmmapping done after > driver has already unbound: > 1. ipv4 packet defragmentation timeout: this seems to cause > delay up to 30 secs. > 2. skb_defer_free_flush(): this may cause infinite delay if > there is no triggering for net_rx_action(). > I think additionally this is dependent on user behavior, right? AFAIU, frags allocated by the page_pool will remain in the socket receive queue until the user calls recvmsg(), and AFAIU they are stuck there arbitrarily long. > In order not to do the dma unmmapping after driver has already > unbound and stall the unloading of the networking driver, add > the pool->items array to record all the pages including the ones > which are handed over to network stack, so the page_pool can > do the dma unmmapping for those pages when page_pool_destroy() > is called. One thing I could not understand from looking at the code: if the items array is in the struct page_pool, why do you need to modify the page_pool entry in the struct page and in the struct net_iov? I think the code could be made much simpler if you can remove these changes, and you wouldn't need to modify the public api of the page_pool. > As the pool->items need to be large enough to avoid > performance degradation, add a 'item_full' stat to indicate the > allocation failure due to unavailability of pool->items. > I'm not sure there is any way to size the pool->items array correctly. Can you use a data structure here that can grow? Linked list or xarray? AFAIU what we want is when the page pool allocates a netmem it will add the netmem to the items array, and when the pp releases a netmem it will remove it from the array. Both of these operations are slow paths, right? So the performance of a data structure more complicated than an array may be ok. bench_page_pool_simple will tell for sure. > Note, the devmem patchset seems to make the bug harder to fix, > and may make backporting harder too. As there is no actual user > for the devmem and the fixing for devmem is unclear for now, > this patch does not consider fixing the case for devmem yet. > net_iovs don't hit this bug, dma_unmap_page_attrs() is never called on them, so no special handling is needed really. However for code quality reasons lets try to minimize the number of devmem or memory provider checks in the code, if possible. --=20 Thanks, Mina