From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43754C36008 for ; Thu, 27 Mar 2025 01:37:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38B202800BD; Wed, 26 Mar 2025 21:37:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 320BC2800A5; Wed, 26 Mar 2025 21:37:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118812800BD; Wed, 26 Mar 2025 21:37:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E49352800A5 for ; Wed, 26 Mar 2025 21:37:55 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A4776BAFF9 for ; Thu, 27 Mar 2025 01:37:56 +0000 (UTC) X-FDA: 83265619752.17.860A153 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf19.hostedemail.com (Postfix) with ESMTP id A0AB11A0005 for ; Thu, 27 Mar 2025 01:37:54 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2nOQSBTr; spf=pass (imf19.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743039474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cHwMT+3FgNwEQkr669gNNkCyBOJVEsIajaRLpg15NhE=; b=K1hgDg1gFJPCb0pNkhzQ2pAThr5gi1xrS5YjG9TD2+ZmpguAjuWTjZMdRnRYhqs+1ffrvY EiiI9r9XMnzdaFAMMVdmikkWNlfGn0nxzLqB+B+UQnq6l+GTSqQH5FpE7RIw2enwVRegH2 QHkb0XBFsuo+Qrx20/DNKkr1axR776s= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2nOQSBTr; spf=pass (imf19.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743039474; a=rsa-sha256; cv=none; b=MabLA4+NVUrQEdDbh30EP2ZnlyomkecvdEjo5CxvpIbP2GFs9sIT2f+B4Ke/HLOI2Mpv6L 9S5vUauZVZ1hBs68eccxrMfAFLmVR8BZ2GCxv4QsYhn6db2bVRWlEOxRhCEhA/irTaPNha b0iDtlz5ick/tkne/QiWqfiP0eTFMPU= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-2264c9d0295so103315ad.0 for ; Wed, 26 Mar 2025 18:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743039473; x=1743644273; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cHwMT+3FgNwEQkr669gNNkCyBOJVEsIajaRLpg15NhE=; b=2nOQSBTrxvVnKlqNFivzeX8jAtPqMhduUITp8Szb08F4S5bt1+9zgD6aubEdwjzw/g 1V+muLZGAzSJ70ub71Q3Bf0FU0YNdB4BMFlMpmHlDzvpM33jukDi67cQLzxZ0F9g1ap2 ETpl3GShxdAlQgHT+ob9ffQJfBegTYO9SvmAJfIymcPH/ZFceaes4rpsETSozhxZfx+M ou2kl7IJazCpOEm7JgLSqugF07uNzKTExojWslD1cZTSXInKmkMtEniIQ42F8YY096tE tHfb1SBTWr/lbcoBNj1JfoZwPlNVs9wVuu3LGtdOtOE3ayg4jHMIprY0nodF5ywliDvO ql0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743039473; x=1743644273; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cHwMT+3FgNwEQkr669gNNkCyBOJVEsIajaRLpg15NhE=; b=uSHKCoqbpB9OC2C7VZDWbVncRehEcfVHapWl8A5yH+vroLf8QJNAadK7IfSnsRwoVE 2ymV24hdkNMo7PdIUozAj81JPeYcR/ebBoUb5/msL36GtddLHiW4IbxMUrGxXzh8Ckj8 eeY8CovM1/j/GRomBlsoxBQVEpjLuXzjxf1SSdnKVUv5vGfn0Y/5PzR85E+F+wc6Q361 YIRnu3v4+WsusIW37PVVP9rsIEpHRpkpmB/gDsW+gnSCOabIWOvzI8rG0Bz9o7dYMyPY rVtv2cCKmmoVpCs5JGktI5abdi7xP5tdcWH87uxTAg4RU+npj7avSPS4Yi8uCKcmPpfP Fs+g== X-Forwarded-Encrypted: i=1; AJvYcCXpyf8KHA5wGxq1OvbUmCkcNyqc59lDtovAAE862Bb29QrXtjsJ9HnBn6+bftP7kLAv9372ToSPWg==@kvack.org X-Gm-Message-State: AOJu0YyrFxUjMHf47cQ+qBq8LMdReWkizaghQd83+XgZPUJWhoQWwsc2 001aFHpCX6a95VTyJ0xpqRIN7rlsts52ug221ITlWlbmFMr07veUFppE/x/CbY8SQ+5UuN6hFr1 D8u1WQtJnu4e4mcVMHGpmEHJLMfz+0kTqnDKr X-Gm-Gg: ASbGncuIh+uTgnFiK9L9vqzoCz5K8Ytp1SAiBLq96lutRL28WK60woHAVQxDiIsp6/A cQ8+bSyNu3o78tnbDRGj44jrbtv6RkDlqZLp2Qp6cjp1hBAwlxTv7oBINJVyJArqfhjnK727smK sRxNi59iX2zzAN/alEytbukuPyZfw= X-Google-Smtp-Source: AGHT+IGyejTjcOqm9GhRjNM27uprdZ+w9l6Vw5AAoXb7lkUEV1bgap3tiBcG/R07VPwIsNrbRpfXRtioIUysEK92hYU= X-Received: by 2002:a17:902:d4d1:b0:216:2839:145 with SMTP id d9443c01a7336-22808f6d3b0mr343815ad.1.1743039473091; Wed, 26 Mar 2025 18:37:53 -0700 (PDT) MIME-Version: 1.0 References: <20250325-page-pool-track-dma-v2-0-113ebc1946f3@redhat.com> <20250325-page-pool-track-dma-v2-3-113ebc1946f3@redhat.com> In-Reply-To: From: Mina Almasry Date: Wed, 26 Mar 2025 18:37:38 -0700 X-Gm-Features: AQ5f1JqcogOvbg_4h2M8tFNkIPcgLEda3D9T4-VFCwVABZiq4bBI80W1nsox_KU Message-ID: Subject: Re: [PATCH net-next v2 3/3] page_pool: Track DMA-mapped pages and unmap them when destroying the pool To: Saeed Mahameed Cc: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer , Leon Romanovsky , Tariq Toukan , Andrew Lunn , Eric Dumazet , Paolo Abeni , Ilias Apalodimas , Simon Horman , Andrew Morton , Yonglong Liu , Yunsheng Lin , Pavel Begunkov , Matthew Wilcox , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-rdma@vger.kernel.org, linux-mm@kvack.org, Qiuling Ren , Yuying Ma Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A0AB11A0005 X-Stat-Signature: oy96hnn5n7nyjt91yeh5cxxnykkfxkwk X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1743039474-948221 X-HE-Meta: U2FsdGVkX19vUdmumio4zSgHq9pq8okYvxam5ElNtrEAE2DsQLsD0RVK7+RVgauhJHEsMP9Z3SaNSTSHDKAizZlW/1AfNB0NQwf+9/8bhQAKpse1xjD5XV6KtNjP0LozsbkRAgQDp2NkKenxSZAbqX6EYK1ecX/nnFQ4zHczKhEd+scB7wKqyIRnVAVSgtOv9sUKi4BgpZTQzSnE4FTh5+LD1J0hYT6L1QfKNfeLHuxlGfSIs85kN8XQ/HtxwqxiIWJtJhT1iw1zDu6vdmgmI8qhRhiGyVyx6izw75v2nGM72rTrFq2I2tXRLuUxLYRpIfd0sDJvmHKfsQ/q1WxnT3pYM9aSRnVxBWgyyY9LcBuD46ALWuh7ZydyvAEfEzRo8wv7q8oSIqUFbd+VXu8aRNMV8SbHeTh9fgAchZ6lA9TbTPIhSUiUi78yAmsIvjFobAJ9jWtJjLvxIL4njC8R5VYc64QVjB7fr7M2Z91PG3bdfJ9Zba6FnvHb0FSqgIBUobAyuwXQCNTlKCHU6uAyW7R0+a8TBgOFTZLCcn4DuL5ICyH486y+YJ9+wwJ7WNrZH5kugvyAdLqeVcJEPJSCJs+pb95P2wXEXEi386S9CxFQCbbTkjf0qGuXeeIV0a6hxL7Iy78jVjnkj4AiICN93b7AeeC5ouetrQL7rIqZZhxHnuMELanXLYxsVWiUnPkbGoWi+nZGMlsH+XOHB99/Z+c/+J4f+PIi4kmZuqLcFyDqRbEpgZa2q0hPcrX2wjdJv9mI+JihSk5vcGsv2bQ9HMYoqwilFHXV3q+uH8+2Dy5bZUaO3W1Dgs/wCzXmy2mHfRE/L73mRwHFGNb+0tzg2GuqpfDolT98gOhXTQ3DN6/ultbSSf7irpWhREHnf4LpltROXd3Gudiob1cTaCYlXIieBAYnIXi+memqAMRnSu9jHLSQGnilD/s8fp4gX6T+plVSIk1eOUK4Ondpom8 FEfjz28u dwtemaOxy6ASVUnGwp7J9e07jpSdGxI4bpTCIdTgZr72NkijvOe9wv0bKDjCym4AhBg6gf5D3NnBSEnggXIZ+MtKvnTdrZTNmSjx4mBJcfSrRTT61tndgBDVI+31bYikJgW7zBIlzfoCGqgIocHQnv2MztT0r2hk+/P05Ca1bcy7yT/FbhkxYgfrbQHGjBGm1rTU4RF+NXNAuCIpr64OR3/Z+z6/80t1nQQ5GanXjpxnSALWJ8DtvqejktGBvmObYlzoChcsn8gpZqc4bGPxjIPLmOkjBPF7V9s3PVd3HK4Dep1JlBVIM/enMxCMqJIbSo8W+buvNdMFzW8S8jioVEQcunh3hDdReWw+UOhg5zKl/rSKdjb7uAr7xIQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 26, 2025 at 5:29=E2=80=AFPM Saeed Mahameed = wrote: > > On 26 Mar 13:02, Mina Almasry wrote: > >On Wed, Mar 26, 2025 at 11:22=E2=80=AFAM Saeed Mahameed wrote: > >> > >> On 25 Mar 16:45, Toke H=C3=B8iland-J=C3=B8rgensen wrote: > >> >When enabling DMA mapping in page_pool, pages are kept DMA mapped unt= il > >> >they are released from the pool, to avoid the overhead of re-mapping = the > >> >pages every time they are used. This causes resource leaks and/or > >> >crashes when there are pages still outstanding while the device is to= rn > >> >down, because page_pool will attempt an unmap through a non-existent = DMA > >> >device on the subsequent page return. > >> > > >> > >> Why dynamically track when it is guaranteed the page_pool consumer (dr= iver) > >> will return all outstanding pages before disabling the DMA device. > >> When a page pool is destroyed by the driver, just mark it as "DMA-inac= tive", > >> and on page_pool_return_page() if DMA-inactive don't recycle those pag= es > >> and immediately DMA unmap and release them. > > > >That doesn't work, AFAIU. DMA unmaping after page_pool_destroy has > >been called in what's causing the very bug this series is trying to > >fix. What happens is: > > > >1. Driver calls page_pool_destroy, > > Here the driver should already have returned all inflight pages to the > pool, the problem is that those returned pages are recycled back, instead > of being dma unmapped, all we need to do is to mark page_pool as "don't > recycle" after the driver destroyed it. > > >2. Driver removes the net_device (and I guess the associated iommu > >structs go away with it). > > if the driver had not yet released those pages at this point then there i= s a > more series leak than just dma leak. If the pool has those pages, which i= s > probably the case, then they were already release by the driver, the prob= lem > is that they were recycled back. > > >3. Page-pool tries to unmap after page_pool_destroy is called, trying > >to fetch iommu resources that have been freed due to the netdevice > >gone away =3D bad stuff. > > > >(but maybe I misunderstood your suggestion) > > Yes, see above, but I just double checked the code and I though that the > page_pool_destroy would wait for all inflight pages to be release before = it > returns back to the caller, but apparently I was mistaken, if the pages a= re > still being held by stack/user-space then they will still be considered > "inflight" although the driver is done with them :/. > Right, I think we're on the same page now. page_pool_destroy doesn't (currently) wait for all the inflight pages to be returned before it returns to the driver calling it, even if the driver is fully done with all the pages. There could be pages still held by the userspace/net stack. > So yes tracking "ALL" pages is one way to solve it, but I still think tha= t > the correct way to deal with this is to hold the driver/netdev while ther= e > are inflight pages, but no strong opinion if we expect pages to remain in > userspace for "too" long, then I admit, tracking is the only way. > AFAICT, the amount of time the userspace can hold onto an inflight page is actually indefinite. The pathological edge case is the userspace opens a receive socket, never closes it, and never recvmsg()'s it. In that case skbs will wait in the socket's receive queue forever. FWIW Jakub did suggest a fix where the page_pool will stall the netdevice removal while there are inflight pages. I never provided Reviewed-by because I was nervous about GVE failing to soft reset or unload or something because some userspace is holding onto a page_pool page, but maybe you like it better: https://lore.kernel.org/netdev/20240809205717.0c966bad@kernel.org/T/#mbca7f= 9391ba44840444e747c9038ef6617142048 My personal feeling is that adding overhead to the slow page_alloc + dma mapping path is preferable to blocking netdev unregister. --=20 Thanks, Mina