From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96A20CEBF78 for ; Fri, 27 Sep 2024 05:54:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0AAA16B00A6; Fri, 27 Sep 2024 01:54:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0314B6B00A9; Fri, 27 Sep 2024 01:54:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC7226B00AA; Fri, 27 Sep 2024 01:54:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BADDB6B00A6 for ; Fri, 27 Sep 2024 01:54:18 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 681EC1209E2 for ; Fri, 27 Sep 2024 05:54:18 +0000 (UTC) X-FDA: 82609452996.16.DFD26DE Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf10.hostedemail.com (Postfix) with ESMTP id 99868C0006 for ; Fri, 27 Sep 2024 05:54:16 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lpYoEGSi; spf=pass (imf10.hostedemail.com: domain of almasrymina@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727416419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fbg8QJSSUgH7Q4nHSRemybsPSlB855qz0uznqiadyvc=; b=w+yHX8DErBfW5DAvXn7x2sm7lhBYnrulW69+pTg/OVhRaJFRquI3T+3KdkCGskROKOyI5J K816FUV2glyiCM2Rj9euiH+0nDm79se6wJnurJfm4KHUFDmBcl0ZXKkAhF5Rbe3u2d80Te hNAYiSt0qVG46BWUE4zNZL36Z9FlINE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lpYoEGSi; spf=pass (imf10.hostedemail.com: domain of almasrymina@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727416419; a=rsa-sha256; cv=none; b=Enj6Z6OuZqA49cZrfKkld+1WuFnItCNb6YQFAOUzlmL0/s+17tOYx9WtKTfEl2FFre9OWG 4oZ6moQo0Iq/TT3h9FQoXs0ItXPws//P1fr0Qo3SsyR1WmM/e2XjIE/YNCRFHqktwW4t/O cfUNcGcPXJJDIJRIh7LxfEBSJkzJEdg= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-4582fa01090so166441cf.0 for ; Thu, 26 Sep 2024 22:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727416456; x=1728021256; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Fbg8QJSSUgH7Q4nHSRemybsPSlB855qz0uznqiadyvc=; b=lpYoEGSiDnCBUYfNB59mjniTecQm3ZmLilUEOYdiRUOwMsYkwhMTk2Bd1iZu25l0cT NJKpabxv/z7qmrCD0SJvKVO2LIMPKHEJEIthcYv3PymhF21kZcfNLgELKGuvitYgdZyW bFdIsB1x6Am8tg8I5Gn6RJiBwRPfdv0ojwvC85X9yBcIfZgRxuUZmacxgh9AHsdedRZL yrMny8USSNKMmPGQF9KEuOJeerm3xXUve0F1f6a/inugdqwSWwgYt6juYg9g+N3Qkoov wvU5JaqmedQSqtK5TdUFVJEK6/PNhRS0+1NmhYD4aF7ToSQ4XlZcjOnSdrNhrAR2cvN2 yMmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727416456; x=1728021256; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Fbg8QJSSUgH7Q4nHSRemybsPSlB855qz0uznqiadyvc=; b=BR2YmC93IaDaGesV7S1jyO0juIbvN6RGxw70ICjreKqPCThy0NFDZQI5JKqaZ9sDNn NEo1MQd5gkeIOOTZyHl7QvhIIxdEdoXJ0USdDCb6Edz+beJXaLlx/gbDhf/KfTgTDmqj LcZ1CM8WjFyRNA6yPqv5PS/JrW7p3tZp8y1AOgxIqk/TEhdHgO4CicQRVdcWsXI4avz3 IiFqRa5HoNjRsaGiGCYHzYfiMr3KIwAoC/jZk9rYTNLApqJujChuh4eSkpAgHq3EgMPg JBCiSA2tOyZjJvP8tig1hV2uZbaV5O/wteoCLKJKlzMA3RcPJxs2FqJVHSss9/vth4IE WAWg== X-Forwarded-Encrypted: i=1; AJvYcCV8KWaLSWPHuDGeC4xPkm6rw8ugqzuKVQQWXINPsV2TJRDBv94m0eMTcHxox7C6bO1lzil4yBLtQg==@kvack.org X-Gm-Message-State: AOJu0YwB941QIyafrSDzLijYAqcN0MoLC9TzmcVE0ESzEtJuB9+G1OK4 WKDW9acnlL5AR5ozyyodIc+Khvth30FPD8lgdpCunyZMWtaIEE5+iIW5yfBayZsL3qGK0rFlHqq EVJ9PiL3itKLHARS3Jqf/LjaN3mKkH2eAw1Ky X-Google-Smtp-Source: AGHT+IFlWAzqms00GpX5qYW9TQZFSIQORw0rsVCeZaOGfdUcQ3vPP90oLkUJ8sEZ5IgWc6qBEG83z217FGnc5KOy/AE= X-Received: by 2002:a05:622a:7d0e:b0:453:58b6:e022 with SMTP id d75a77b69052e-45ca02b0365mr2089571cf.28.1727416455179; Thu, 26 Sep 2024 22:54:15 -0700 (PDT) MIME-Version: 1.0 References: <20240925075707.3970187-1-linyunsheng@huawei.com> <20240925075707.3970187-3-linyunsheng@huawei.com> <842c8cc6-f716-437a-bc98-70bc26d6fd38@huawei.com> In-Reply-To: <842c8cc6-f716-437a-bc98-70bc26d6fd38@huawei.com> From: Mina Almasry Date: Thu, 26 Sep 2024 22:54:01 -0700 Message-ID: Subject: Re: [PATCH net v2 2/2] page_pool: fix IOMMU crash when driver has already unbound To: Yunsheng Lin Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, liuyonglong@huawei.com, fanghaiqing@huawei.com, zhangkun09@huawei.com, Robin Murphy , Alexander Duyck , IOMMU , Wei Fang , Shenwei Wang , Clark Wang , Eric Dumazet , Tony Nguyen , Przemek Kitszel , Alexander Lobakin , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Saeed Mahameed , Leon Romanovsky , Tariq Toukan , Felix Fietkau , Lorenzo Bianconi , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Andrew Morton , Ilias Apalodimas , imx@lists.linux.dev, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, bpf@vger.kernel.org, linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: mw5w1e3mtj3yrm1a56976juii9wchsa6 X-Rspamd-Queue-Id: 99868C0006 X-Rspamd-Server: rspam11 X-HE-Tag: 1727416456-184004 X-HE-Meta: U2FsdGVkX18SsEwGSiXQ1ZOX7re8YHBp60wIAx7vgWv45/je/ZUX/eKg5hpBhMuwx7pdu+syTVNG7ewCR6oOoI6gixNmjKNm9ifEvzNjhxKCyU1+8rGftTrM3/wZLAxs9uDihQ71j6rV8wEEOiv3cTsA+aT88rieg0pHyxj+XlJNRDDcpSdOUDZxkd5sizcmykmR7qG2Yc1yqGllne9jwv49NOvnqmVLaCpRnsA5K4xc8oIZMnNIk0xjTd1hE8zIDBrMVOOrC5/khOn0zTDwmEQkyTjq4rkxxnhu1jwznX/vYeAHhB7EtB8oj87Keej4TkB41xrRSzQxFu4Y7A3Bnc6Vr1vl32Kc91GPMks7NKJoHzFl8+NXiqiIj4d1HibsHbXDViICwoDMoXj2rQCYJ320Yjc8RjBLqSn24cCLwIW9hoLkVFXHPe1i9z43/5F4++MXtaXcKiUrsAT8hvy+foRiqBCAD6mGHFVMT3WqzlajEBWgkRmnaX/wszbQFGq/VUiwELhrURNnbFrHkoXZYc8EvqNKA4JyeNGOUbXJpVZJPBxW7VQ+s6t5+DqZN1RGvqJkUJHiGdrdDkZrhT7EkvXfBjTZAT5gEuuHY7yH1lOEmWby5ZH7otM1z4j8pybe0RKHWmkdNejirP033WBQiKgXYB0f9UMlDej27yW4w4FIgwA2SMnTgSb3gv3Y9ps6J/SgnBdzh337oCoBaDKjyjzCFMI7idlnlVzGGCMNVAmToUbKBnC0rK+7rA5VrZhwB6Cr7RLNRSvNoxhrzpkntIXeyqnxx8tnMYjjClXFb9ZAjeeVq2WM4gDn6cqkoHPytTvyxTFLkaN1bytwfVEpXPM7Sw3K8sBJHv+P5Et/snmpcyKM+EFwfzANdFzfglMJnYonNhY5Ue9Ohdv7DJ3Q+CXlnOTVkt18x0voZBfKB3znfqALeGYs/Tc9wwidZWLPkS2f7NvIgXTuBz9A7Nw T+PupeTe P3Ni/noXKSwemKC5BF8BrLEGqz6cjVT01Ja0OFIivG2JXnCmTm8aD8mV5CaLO7agSj9JnAkriF03dBvL32pUp3O/mgJGeCcWYjNGcghVCTL8VW+KU+WlYMlRjDw4b6vBjkt/GlUfPWZt6+csveZNAgQqTVynxGNHk8Ji7cFw553ma5tQ5oT/4qyYwiXem50FDXmt7cphfkWV4jQJ+MPtSeQXxkDHIW1f3jnDQNiQWIFXoEm9J/XCH9oZ9T9vfbNi2FA2WgLrFB0KCeK2AbjdhxeddEGxdpAzxhLIaOY1wcjSRW3tNyaBKIAcJzybg5bBZ/9e8rP42x6JFU08UFrcVoFOyGd0c30raO48ykIhzGFr6lCx6+EzDWJKHppOXJY+RTs8ec71PJgA5eviZm6GIucqjpaI+bY6ryjLy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 26, 2024 at 8:58=E2=80=AFPM Yunsheng Lin wrote: > > On 2024/9/27 2:15, Mina Almasry wrote: > > > >> In order not to do the dma unmmapping after driver has already > >> unbound and stall the unloading of the networking driver, add > >> the pool->items array to record all the pages including the ones > >> which are handed over to network stack, so the page_pool can > >> do the dma unmmapping for those pages when page_pool_destroy() > >> is called. > > > > One thing I could not understand from looking at the code: if the > > items array is in the struct page_pool, why do you need to modify the > > page_pool entry in the struct page and in the struct net_iov? I think > > the code could be made much simpler if you can remove these changes, > > and you wouldn't need to modify the public api of the page_pool. > > As mentioned in [1]: > "There is no space in 'struct page' to track the inflight pages, so > 'pp' in 'struct page' is renamed to 'pp_item' to enable the tracking > of inflight page" > > As we still need pp for "struct page_pool" for page_pool_put_page() > related API, the container_of() trick is used to get the pp from the > pp_item. > > As you had changed 'struct net_iov' to be mirroring the 'struct page', > so change 'struct net_iov' part accordingly. > > 1. https://lore.kernel.org/all/50a463d5-a5a1-422f-a4f7-d3587b12c265@huawe= i.com/ > I'm not sure we need the pages themselves to have the list of pages that need to be dma unmapped on page_pool_destroy. The pool can have the list of pages that need to be unmapped on page_pool_destroy, and the individual pages need not track them, unless I'm missing something. > > > >> As the pool->items need to be large enough to avoid > >> performance degradation, add a 'item_full' stat to indicate the > >> allocation failure due to unavailability of pool->items. > >> > > > > I'm not sure there is any way to size the pool->items array correctly. > > Currently the size of pool->items is calculated in page_pool_create_percp= u() > as below, to make sure the size of pool->items is somewhat twice of the > size of pool->ring so that the number of page sitting in the driver's rx > ring waiting for the new packet is the similar to the number of page that= is > still being handled in the network stack as most drivers seems to set the > pool->pool_size according to their rx ring size: > > +#define PAGE_POOL_MIN_INFLIGHT_ITEMS 512 > + unsigned int item_cnt =3D (params->pool_size ? : 1024) + > + PP_ALLOC_CACHE_SIZE + PAGE_POOL_MIN_INFLI= GHT_ITEMS; > + item_cnt =3D roundup_pow_of_two(item_cnt); > I'm not sure it's OK to add a limitation to the page_pool that it can only allocate N pages. At the moment, AFAIU, N is unlimited and it may become a regression if we add a limitation. > > Can you use a data structure here that can grow? Linked list or > > xarray? > > > > AFAIU what we want is when the page pool allocates a netmem it will > > add the netmem to the items array, and when the pp releases a netmem > > it will remove it from the array. Both of these operations are slow > > paths, right? So the performance of a data structure more complicated > > than an array may be ok. bench_page_pool_simple will tell for sure. > > The question would be why do we need the pool->items to grow with the > additional overhead and complication by dynamic allocation of item, using > complicated data structure and concurrent handling? > > As mentioned in [2], it was the existing semantics, but it does not means > we need to keep it. The changing of semantics seems like an advantage > to me, as we are able to limit how many pages is allowed to be used by > a page_pool instance. > > 2. https://lore.kernel.org/all/2fb8d278-62e0-4a81-a537-8f601f61e81d@huawe= i.com/ > > > > >> Note, the devmem patchset seems to make the bug harder to fix, > >> and may make backporting harder too. As there is no actual user > >> for the devmem and the fixing for devmem is unclear for now, > >> this patch does not consider fixing the case for devmem yet. > >> > > > > net_iovs don't hit this bug, dma_unmap_page_attrs() is never called on > > them, so no special handling is needed really. However for code > > I am really doubtful about your above claim. As at least the below > implementaion of dma_buf_unmap_attachment_unlocked() called in > __net_devmem_dmabuf_binding_free() seems be using the DMA API directly: > > https://elixir.bootlin.com/linux/v6.7-rc8/source/drivers/gpu/drm/amd/amdg= pu/amdgpu_dma_buf.c#L215 > > Or am I missing something obvious here? > I mean currently net_iovs don't hit the __page_pool_release_page_dma function that causes the crash in the stack trace. The dmabuf layer handles the unmapping when the dmabuf dies (I assume correctly). --=20 Thanks, Mina