From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6280BC282EC for ; Tue, 11 Mar 2025 13:44:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38A49280002; Tue, 11 Mar 2025 09:44:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EDC7280001; Tue, 11 Mar 2025 09:44:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11B12280002; Tue, 11 Mar 2025 09:44:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E2D3E280001 for ; Tue, 11 Mar 2025 09:44:26 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 04DDB58282 for ; Tue, 11 Mar 2025 13:44:27 +0000 (UTC) X-FDA: 83209389816.27.7F81969 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 711BB14000E for ; Tue, 11 Mar 2025 13:44:25 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WDTRLt7k; spf=pass (imf09.hostedemail.com: domain of toke@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=toke@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741700665; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nnlkdy9LotJFQ0UxmvrGKU1FaCCohrRSPeHBox0BLM0=; b=fMubWdWqqiPEFSCb405EznE+GN1J4q3BqCvAKEvNLZ4y1iF4o3KJ2sLwP1BOLCb7uDgW2S k9AXks3aa9Lu0bX0tRukbXeIIIpXgybUz+rta8YaREKxXeKp4Id1Eh0clGv2ObKgroKsmC mvlNbqLgwDcsFkhEahJr4HFhOjQ3X4k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741700665; a=rsa-sha256; cv=none; b=QLjFMGrEp+4LNKZzH04Gq1Mz1bpQsdpq9Zz8/SZPW0BXD4y5RlNwCVZI911KFyF2GpZIUC MFujwDPecfeInfnb8WbThT95t6y4dNDMmrA2O0gclmy93nna/mKQSDiPy+JEqAfe7eF1HQ N0IJdKdk681bvMaGXL5c9W8WbEoLbtQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WDTRLt7k; spf=pass (imf09.hostedemail.com: domain of toke@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=toke@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741700663; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nnlkdy9LotJFQ0UxmvrGKU1FaCCohrRSPeHBox0BLM0=; b=WDTRLt7k1w++esV6i+TdoQTeLHeopRC7yH0IYVpwgzCUYXHIsXldRpa0NnEaa3hoLiVamF xkxGsVPvhMiKOamXSWqkwAXhv4z0TxfuB4ECVrLw4vN6JhYsXRV/2oC06UekDN1R8+97WH EBqSHqWSs41kfLZ2rPPA1N5mOXZMTMQ= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-601-YuU6r517NQGB4jEriSSlBQ-1; Tue, 11 Mar 2025 09:44:22 -0400 X-MC-Unique: YuU6r517NQGB4jEriSSlBQ-1 X-Mimecast-MFC-AGG-ID: YuU6r517NQGB4jEriSSlBQ_1741700660 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-3072f9103d7so38409761fa.3 for ; Tue, 11 Mar 2025 06:44:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741700660; x=1742305460; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZDvh640mXUQQF6buBwSlPYU4wimDdWWpP0QsQuvfiII=; b=rcn1/8F8fi5XLvZsrjz1zTVokYwEjH6kufRGzvNxJVVnJzhPrUVPJ0Ac8oPexoZuor kEl5pFic4XkNRy2FoNoIpPbMyNuWYzQIiOKAj0SymJjeGGV05C/Q1LC4RK0ltDuaET7z KRTzjZxdSvD0vcM5pcCVE9Xd/fJG18pxQXFXCbZBlso2PAU7Ot2tlTCovu3kg8mU2tQ9 vA9Thy4udaYTQNnpo2Hl4LF4Q8Dsbdu0zcOWnad90v7wAOXxbY4hZfObqFP5yvyT52Bg t0CizWpsLqlmMPVCi7cS67Go0CwoeWvBtmykXR8kTIvmxG5X8BoYIuL3a8fYrP98AzKs LRkQ== X-Forwarded-Encrypted: i=1; AJvYcCVivJahMi8gQiCwSOU8bVNeZfa8LAMMWJk4BPglfCUjutb8C/UozP52q3IQHQu95bLYq0kor0mmmA==@kvack.org X-Gm-Message-State: AOJu0YyIbWwkD0nCI0LdjaonlVCP0J/YO/1qF5eobOVCRB4tWpRUOJ5a OyatQqVDkrqCzpPsan/g7xn8QdB99NYavzSFEowK+pLjdXjEvr+8gy+PCOFUBS40topcIbNxMm+ U6KXcGSuoOY+lKIV3fKVD20rsUkOag2b/57YBsuB0b0N7OpZp X-Gm-Gg: ASbGnctrJnz2i5EuGyBT2qeXv7oDI/KzLDYegm2fXFN/aZmB2zvoSUzj3uEYbBPlZAZ yr8X79ubDSGnvD9RAl4Sk4xqvOoZTakc/6f1h7uOxKekcUUeZNQ4O6E0y+jgGc3mZBQnHY5/59J 80QYLUE+4aZV48F/+L/MLDZAdHCdtEXF1E9TTrX6IjbDX6qo6+0wh1cnBBd2IhZne/ylsImmqzx RQU74DxceaGaeJ8jCez/rP7V17ByG23UTjztpNeaFBubctzBLwTIY1LCbXNdx0Mli0/x133IrzT 1TJDhH2VlrWb X-Received: by 2002:a2e:b8d2:0:b0:308:eb58:6580 with SMTP id 38308e7fff4ca-30bf46387a6mr58649221fa.33.1741700660362; Tue, 11 Mar 2025 06:44:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG1tGX9fEpRk8FaOlpVoYgpzUcN+2qhGPldpH8b1ULOUF6Trv2KpWAfRh7Mt31RQi4bHuMV0w== X-Received: by 2002:a2e:b8d2:0:b0:308:eb58:6580 with SMTP id 38308e7fff4ca-30bf46387a6mr58648961fa.33.1741700659848; Tue, 11 Mar 2025 06:44:19 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-30bfd2737a6sm14329771fa.103.2025.03.11.06.44.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Mar 2025 06:44:18 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 9882818FA5AA; Tue, 11 Mar 2025 14:44:15 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Pavel Begunkov , Mina Almasry , David Wei Cc: Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" , Yunsheng Lin , Yonglong Liu , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , linux-mm@kvack.org, netdev@vger.kernel.org Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool In-Reply-To: References: <20250308145500.14046-1-toke@redhat.com> <87cyeqml3d.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Tue, 11 Mar 2025 14:44:15 +0100 Message-ID: <87tt7ziswg.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ALW8KwCpz6W2A1jfa_-hn26Tv_wSE-_0ZfBmzoeErWY_1741700660 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 711BB14000E X-Stat-Signature: ihmu5bsxoskjx1gai3h5xunc6thrwjae X-Rspamd-Server: rspam10 X-HE-Tag: 1741700665-81349 X-HE-Meta: U2FsdGVkX1+xsc+1hxFo6L/78uz8U8uojSZHgqnf8wLD1dVcAkytyHsF07sms30oc3L+GX6WL5ae+mAVTB6Lfl+c3mJbD9TLU9/V7B/I8a+p74rfa+bwrQ70lLukWl4Nl421SGDRhiwZY2NcWPxw/3l1BEYPatJbai4ahU3xg5c9GAj+kISBlp6cKZ8V8Y+6HUfuP1EZutXKJnK8gzfKFPhxpavbWIrT9ezz/EpyQCS3VNfqr854Cc06UrTZlvmKxQE6CZmq3RcqGdYldiACpTrDQ7ajhsHnedp+cAJ5RL0YHBRNnXoWn6s32vjg8zqjHKJKZe6xa/iASIMWTaH5kvMb7oHiovSeeM57F/1WW6a+xrR7awWS/Y24uBfJ3ryE5VJW3TiWTVnTANabk7ouoGjDax4TaU+YzsvzXpxIMmqoinMWTaK+eqkUiYWQeJanDMiCLAARf5wd8HgDUKsyYaIZefpe+Ucmy1gKhfwBLEV9RZluqgxDR8CwNflqDWNoPffP4phvyqlhqIHFYJgYEq8iW4b/XcSe0ynVRvjiGcYbY0C9KbOat7PVKk3HZrFilk8DXVbnPZ3ENH01Qq6fuNhHmf9EAMvyXADDzToNl6M0D1FHo7AfNrCfLPmWJjROp90Ly6StfUM5ZQsAKIyWjKZfpBL6aIMA+70n+lbLLMZmLKmEk2kaprETDdF/X0aZ3oZKQi1JAc19JTegcUJV42ABo38tUM0iPG2US7C80zZZI3ABhHC5lDuRNYPZnuhd4oBcRCIsw4oUejLUcpRoRYAb+pHzSSelzq7iKg6C/6fiw1FbUK49mXsG+anOIqjZKlzG4PXRybklx+1bNJ/MwTnLrgnqx6oc7mXUI9o/8g0J0V8Xhs0E2J2yJl/1JOgBG6uKCp3xUAua7sOl2PtAbAOQH7egYMQ7FvSa4rON/XHkNUqquFJ7SC98Ep9pYQL6yQv9mLHh4/PS7nhpgMW R8rl4gCe upkNeirdZE/au7Kgnq65gkmhjC8TeTZ453VkytZWCxxKROPXa7GFfVO+i1sRAq+1IdnguGWlwZ0Y+MWCuXdzhfvpU993V8ajTGQgfAXmJGchj+SJoMnD1EgOgMB4iEfMmCbqeCMx3t/GmUxN1qfDBUdxgxSkyGBlSKrRqaHBjYn30S5fwPwLs4hA1rdhlHKjj4nIJdJVwHXTh6oLuoY8HHuJ8M9I/8vYImBuGUQwOlyY+yF+FTy9XO21krKcd0hU4YxakzCWKiRrpY/MFvhLGDL8SyXjF895MyXCMpJ55ZWmCS7Ou9Z53svojdSp55BtZf5EhhnVEy9t8wmYZ9dqG9G3Jf3CXn7k2OtnUjgUHAn14mtOJafLkrx15lwbdy62vMFJSIyZvPlXtSeaf/3+KRQkmHVlNEDONz7nV+0zxnT1GMafKlUMz7OhGnY2fFd7P2TbubIbdbc6Q6djVZ65exyxFmhLLQ1FBW4xbfUcaUUrTwTfXJBWB3hRC08tQxehnX9ammGIY7Vg7a0Lzmb0M9ZrqotPwVNXpupKRg5qXarmjWXLrzjT9eEHimnT7B4MMSvhJ3AdyCcKF0Aky1VpzDvudGA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Pavel Begunkov writes: > On 3/9/25 12:42, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> Mina Almasry writes: >>=20 >>> On Sat, Mar 8, 2025 at 6:55=E2=80=AFAM Toke H=C3=B8iland-J=C3=B8rgensen= wrote: >>>> >>>> When enabling DMA mapping in page_pool, pages are kept DMA mapped unti= l >>>> they are released from the pool, to avoid the overhead of re-mapping t= he >>>> pages every time they are used. This causes problems when a device is >>>> torn down, because the page pool can't unmap the pages until they are >>>> returned to the pool. This causes resource leaks and/or crashes when >>>> there are pages still outstanding while the device is torn down, becau= se >>>> page_pool will attempt an unmap of a non-existent DMA device on the >>>> subsequent page return. >>>> >>>> To fix this, implement a simple tracking of outstanding dma-mapped pag= es >>>> in page pool using an xarray. This was first suggested by Mina[0], and >>>> turns out to be fairly straight forward: We simply store pointers to >>>> pages directly in the xarray with xa_alloc() when they are first DMA >>>> mapped, and remove them from the array on unmap. Then, when a page poo= l >>>> is torn down, it can simply walk the xarray and unmap all pages still >>>> present there before returning, which also allows us to get rid of the >>>> get/put_device() calls in page_pool. >>> >>> THANK YOU!! I had been looking at the other proposals to fix this here >>> and there and I had similar feelings to you. They add lots of code >>> changes and the code changes themselves were hard for me to >>> understand. I hope we can make this simpler approach work. >>=20 >> You're welcome :) >> And yeah, me too! >>=20 >>>> Using xa_cmpxchg(), no additional >>>> synchronisation is needed, as a page will only ever be unmapped once. >>>> >>> >>> Very clever. I had been wondering how to handle the concurrency. I >>> also think this works. >>=20 >> Thanks! >>=20 >>>> To avoid having to walk the entire xarray on unmap to find the page >>>> reference, we stash the ID assigned by xa_alloc() into the page >>>> structure itself, in the field previously called '_pp_mapping_pad' in >>>> the page_pool struct inside struct page. This field overlaps with the >>>> page->mapping pointer, which may turn out to be problematic, so an >>>> alternative is probably needed. Sticking the ID into some of the upper >>>> bits of page->pp_magic may work as an alternative, but that requires >>>> further investigation. Using the 'mapping' field works well enough as >>>> a demonstration for this RFC, though. >>>> >>> >>> I'm unsure about this. I think page->mapping may be used when we map >>> the page to the userspace in TCP zerocopy, but I'm really not sure. >>> Yes, finding somewhere else to put the id would be ideal. Do we really >>> need a full unsigned long for the pp_magic? >>=20 >> No, pp_magic was also my backup plan (see the other thread). Tried >> actually doing that now, and while there's a bit of complication due to >> the varying definitions of POISON_POINTER_DELTA across architectures, >> but it seems that this can be defined at compile time. I'll send a v2 >> RFC with this change. > > FWIW, personally I like this one much more than an extra indirection > to pp. > > If we're out of space in the page, why can't we use struct page * > as indices into the xarray? Ala > > struct page *p =3D ...; > xa_store(xarray, index=3D(unsigned long)p, p); > > Indices wouldn't be nicely packed, but it's still a map. Is there > a problem with that I didn't consider? Huh. As I just replied to Yunsheng, I was under the impression that this was not supported. But since you're now the second person to suggest this, I looked again, and it looks like I was wrong. There does indeed seem to be other places in the kernel that does this. As you say the indices won't be as densely packed, though. So I'm wondering if using the bits in pp_magic would be better in any case to get the better packing? I guess we can try benchmarking both approaches and see if there's a measurable difference. -Toke