From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68985C28B2E for ; Mon, 10 Mar 2025 15:24:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60E21280006; Mon, 10 Mar 2025 11:24:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BE45280004; Mon, 10 Mar 2025 11:24:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46180280006; Mon, 10 Mar 2025 11:24:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 23EFC280004 for ; Mon, 10 Mar 2025 11:24:15 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3A7431A1152 for ; Mon, 10 Mar 2025 15:24:17 +0000 (UTC) X-FDA: 83206012554.02.51E9C85 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id BAD861A000F for ; Mon, 10 Mar 2025 15:24:14 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NeA0zjVu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of toke@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=toke@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741620255; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3Gev0xFc6C9NHGJGMmWjhC1MStV+1E0J4PVCCQMcUPU=; b=nTJ65T5uXw8oRI95lva1eeMR1CLXmjTG/xd+qV81p9Bm+DzpidPMgmpDTSukFxF8cUPgdn dHdlyhR675NQQEyLwIunCZNzncHY2Jy4qs0gfL2FLhq3l0dFqPrAormcwF1lU2bIbC07Ds RcykgWWcXQ5jkV+dmQvVbdVUnU+DUHo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741620255; a=rsa-sha256; cv=none; b=pPkkIuWgw8rKp1Nnmgr5mHzAa7aTWR7UpH85AtAaIaZNBBhfL2XX1IXzK8B2p+ghg11HuN Y6jN7Em3AnShA5kGOM5xZvYOcOlxR5O675i89NfBD0msARs8TWCXTshF8ffwxgwlz+oRcW fkLcRzLUjz1GJmkNLzjEu35k+XIWg4w= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NeA0zjVu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of toke@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=toke@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741620254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3Gev0xFc6C9NHGJGMmWjhC1MStV+1E0J4PVCCQMcUPU=; b=NeA0zjVutgLIBh/qYXhhZNUmuMt63vZSo8jUc5AHn7zf90dBDMO3ldrWH+ldV5Mzd3D+xU zhsMTTprLJ0Ppl4GzTUOgwG4Lpy2Qw4NCnWtGMroUbZaH2EszuoMAOavmJyRb4gp2pRxVK kMeXZAMQEi9N2oBbJdDz2jrL8cazx2M= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-251-W8qzyi2ZN3KQ7ICxcazqTw-1; Mon, 10 Mar 2025 11:24:10 -0400 X-MC-Unique: W8qzyi2ZN3KQ7ICxcazqTw-1 X-Mimecast-MFC-AGG-ID: W8qzyi2ZN3KQ7ICxcazqTw_1741620249 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-30c04c54f11so8610911fa.0 for ; Mon, 10 Mar 2025 08:24:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741620249; x=1742225049; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gT0ohx+zOS/zPD/WSmJupNwb94kkNT4Ggfrcv3WdyqA=; b=TE5gbEVbRPakfjX68KQ8u4bQAKg/pdlUDA0EdBk43cB0Q5Hn0Dl8sozMy74uc3vyI8 gw82WgiA8e624zAs5a/iAo8YA0okPVFHT/hbN7GqWpun9EjVjlXjXIgcyblgQWWQvNDj +Jy4Y+SNRW39eaHIw9KuyHkM01cRlYrBvZztBzGNALuLvIcgfivcRf30bcy/ygQds0sM iUPiR8g8qxZq3N+gDr+X5PbGBOSdDpzVZ9bV3vF9dwMv9Qov7a6wftS/pT52exfR7i3k nG4eL4yhJ5d+yio4rYG5iFOKylEb0sUc7/LHaSjK4WgDBGWnsVwvjQmlmwTWHuxsyB3Q gV2g== X-Forwarded-Encrypted: i=1; AJvYcCV7WWahaHuDGHtSohtM2RvnjkQMMLkZ2P+OvQl1Mzh74CyCl8cCBagbUd8owtXvb8YasD4/qfVMVQ==@kvack.org X-Gm-Message-State: AOJu0YzS5JUr2f74ZZI8AaoOIiKA+32LfOeAEngGys8FJO9WwFeibESV /PvXQZ5J082eD6pS+RoGUfL2U6p9zVr4L1JzaiovHKfIYARz4wMfsf+znzHOWT1nhYrWOt8GhV6 ogBO2JP/Aia0NHfCGp/iTU1yosNaQjyxoObYAT2CaHJkHcP9l X-Gm-Gg: ASbGncu3GLSPUpd0usyS3rN96eBmBXsxODaA+oHPqj5ChERC4SkMM7uRgNsOFhv8Qhl B6QMc3zlJWq3XKAWM1YDjcxGctidpC+uOFp6njVnX5MSgcVfDNLftlU8hrwXDL3kCGsjQ0FbGZa m8nQzqqzpJiDXzyjY10fivLL6WrPBQB/1zlUBsHtupYfZhtHal+qc7SlyIXx3oImCkP8Nqc4ii9 h6wl3QNYxHXN/xF6SQm1U2cMj8VFT+oVbiwaYb0xhUqGLayhpy+UwdzjWKmspz91dKiK1a6QPRZ MqlonHRCP9GM X-Received: by 2002:a2e:a98c:0:b0:30c:c7a:dcc with SMTP id 38308e7fff4ca-30c0c7a0f93mr19895171fa.20.1741620249227; Mon, 10 Mar 2025 08:24:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFdwdFCTP2ui6I3zc3cYUYdtUUmj72eqkPsmEzG3D6wTDHgRFDGSXMZVRRdpiJEJUHtv7zqBw== X-Received: by 2002:a2e:a98c:0:b0:30c:c7a:dcc with SMTP id 38308e7fff4ca-30c0c7a0f93mr19895041fa.20.1741620248525; Mon, 10 Mar 2025 08:24:08 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-30be98f0187sm16393411fa.43.2025.03.10.08.24.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Mar 2025 08:24:08 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 7BC2A18FA322; Mon, 10 Mar 2025 16:24:05 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Yunsheng Lin , Yunsheng Lin , Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" Cc: Yonglong Liu , Mina Almasry , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , linux-mm@kvack.org, netdev@vger.kernel.org Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool In-Reply-To: <2c363f6a-f9e4-4dd2-941d-db446c501885@huawei.com> References: <20250308145500.14046-1-toke@redhat.com> <87cyepxn7n.fsf@toke.dk> <2c363f6a-f9e4-4dd2-941d-db446c501885@huawei.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Mon, 10 Mar 2025 16:24:05 +0100 Message-ID: <875xkgykmi.fsf@toke.dk> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: JpVrsHDDGRxTSd9sKj_AdNemSpldW3qCNnzDfv0MKF4_1741620249 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Stat-Signature: epm4ytnwzjuozgih7a6cdkenuhm9t19y X-Rspamd-Server: rspam05 X-Rspam-User: X-Rspamd-Queue-Id: BAD861A000F X-HE-Tag: 1741620254-412464 X-HE-Meta: U2FsdGVkX19FjgCgAdvgGvbT7KvSdpK0XOeT7b0Wsk7svldy7xbhBL2dDf0UIXVaIyz97cKx/DmQR2ifrypcntC5Ck5nCaT+z4MYl3s9DicNeR+EmomW1lYgIo+STyDosBZ+E19Ov3yFfNb4QozzyEv6AiDezHBVp+u198FZ35AmuuE99B8uOHlhKcAKVO/E9GyP9u8Xyp7m/hExLypdVHfuft9Ll6Fq15nE5BG39z5msGAuQLrhxUUPo9+fMHFRmUQ5rzrn8li7Si8/08Elt7BdMGPXxrYVTYLCrtZ+52zeogR0B2cPQVQ+IPC0T5gq8n2M2X2nNfnc4nS329T5f58Mj+EqHdPq21Cw+AcfrlGEo+x/Yuf399GjSEbW6vzteEJRNVL7miNx0ZXCUuTAshgRs2fTCv2hY/BmsxuPspqPKxRychCLPqPVrlOdN4qcy7pytqH4Nfj2nkXNeVvB69ksZ4hjtwmfGa1t/oK1o1rJiwMdHhS7PUfwynZg25nwk5DPYE7nk8BgT0BRIyVqRfQ5g//66Al8t03fWb981AaE8r0RvHgQyqrwvUdE/FYbzQDVerJ1jtivlkbT1CsS3BvGTlO3T+FRGNAeej+GmtYvpZ6A0xGxvPp0u1dmsypeFTwnUKPNAe4pQUTd0ydicUDH/+sdkwGSvi22jXJ7fP+6sO9ynTm/Yt6e7AX7IJObMtzwo7obaBbkmVPG5wG83yTdPV4+Hbfk4pn4O5tX4tHWWlPyWKLdIpJsMrqN3YI1DJv3feoCE97rnauEpPKKELZQOl5jLfn3AyxFSDipUmHl+IKo550jUkJtlRPJCvTAVQMUfduQdAR7s1KUDQE5ScmNsCFVKQ7vv+BYQngaRWoyTO4B2B51DG7fadUzcxgpXkRd2ccOrAj5Mf4GP88oABq4mm2DWUIDX8ZV3ax1pyVFVm/7xE+xUjHPDuoPNkhBws/QaJYLJ78t96MjUGW tO6kxZYS 6P9oavB5QJ9pCIiowqQ9ZenwcgrxShSse/9XlE6OZ078LZhiF/9Qg0toZOvcveE0rU9PGxJ7KItq0VU9igK43SNohsJvBe5FC6ccNr8En9KhVnlzPeXpG308svLZkjIigu7SsTZwO0XA8e2vbi3yKsMjxOmCJ9VvM4zQ2N2tF6yD7UHk7M4wYYrMBumQV7FSrTIuB/LCy7sG1VJAAAj9jTjI77NUM4OQtetDCrFDXWD/ho83BYRP5FTxk1UE7q20OLepvwRW7F/KcK+P0KUCU7BR+OIqg0VXl9MWaRpIqsjaTzVtwKH6eZFeGB7NctlNS3sCDOQHGygjw3T/IEYc8cx/6V5eP2XkQWJON/1EhIY/xK18vJzqYhuj0tuTeANYI4TaMgCwTBqJm/OZhnj1ZfPdjO6Hac/qyahpS4UTpf5PHHSHC3rCOl/kEDEmaJlbisHTpk+MOrA9Kkjtzx1xcohWbHojyD8ldPb2DzCf0H0wMmOexHFaxsAJvZIAt/RlM9gPP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Yunsheng Lin writes: > On 2025/3/10 17:13, Toke H=C3=B8iland-J=C3=B8rgensen wrote: > > ... > >>=20 >>> Also, we might need a similar locking or synchronisation for the dma >>> sync API in order to skip the dma sync API when page_pool_destroy() is >>> called too. >>=20 >> Good point, but that seems a separate issue? And simpler to solve (just > > If I understand the comment from DMA experts correctly, the dma_sync API > should not be allowed to be called after the dma_unmap API. Ah, right, I see what you mean; will add a check for this. >> set pool->dma_sync to false when destroying?). > > Without locking or synchronisation, there is some small window between > pool->dma_sync checking and dma sync API calling, during which the driver > might have already unbound. > >>=20 >>>> To avoid having to walk the entire xarray on unmap to find the page >>>> reference, we stash the ID assigned by xa_alloc() into the page >>>> structure itself, in the field previously called '_pp_mapping_pad' in >>>> the page_pool struct inside struct page. This field overlaps with the >>>> page->mapping pointer, which may turn out to be problematic, so an >>>> alternative is probably needed. Sticking the ID into some of the upper >>>> bits of page->pp_magic may work as an alternative, but that requires >>>> further investigation. Using the 'mapping' field works well enough as >>>> a demonstration for this RFC, though. >>> page->pp_magic seems to only have 16 bits space available at most when >>> trying to reuse some unused bits in page->pp_magic, as BPF_PTR_POISON >>> and STACK_DEPOT_POISON seems to already use 16 bits, so: >>> 1. For 32 bits system, it seems there is only 16 bits left even if the >>> ILLEGAL_POINTER_VALUE is defined as 0x0. >>> 2. For 64 bits system, it seems there is only 12 bits left for powerpc >>> as ILLEGAL_POINTER_VALUE is defined as 0x5deadbeef0000000, see >>> arch/powerpc/Kconfig. >>> >>> So it seems we might need to limit the dma mapping count to 4096 or >>> 65536? >>> >>> And to be honest, I am not sure if those 'unused' 12/16 bits can really= =20 >>> be reused or not, I guess we might need suggestion from mm and arch >>> experts here. >>=20 >> Why do we need to care about BPF_PTR_POISON and STACK_DEPOT_POISON? >> AFAICT, we only need to make sure we preserve the PP_SIGNATURE value. >> See v2 of the RFC patch, the bit arithmetic there gives me: >>=20 >> - 24 bits on 32-bit architectures >> - 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE) >> - 32 bits on other 64-bit architectures >>=20 >> Which seems to be plenty? > > I am really doubtful it is that simple, but we always can hear from the > experts if it isn't:) Do you have any specific reason to think so? :) >>>> Since all the tracking is performed on DMA map/unmap, no additional co= de >>>> is needed in the fast path, meaning the performance overhead of this >>>> tracking is negligible. The extra memory needed to track the pages is >>>> neatly encapsulated inside xarray, which uses the 'struct xa_node' >>>> structure to track items. This structure is 576 bytes long, with slots >>>> for 64 items, meaning that a full node occurs only 9 bytes of overhead >>>> per slot it tracks (in practice, it probably won't be this efficient, >>>> but in any case it should be an acceptable overhead). >>> >>> Even if items is stored sequentially in xa_node at first, is it possibl= e >>> that there may be fragmentation in those xa_node when pages are release= d >>> and allocated many times during packet processing? If yes, is there any >>> fragmentation info about those xa_node? >>=20 >> Some (that's what I mean with "not as efficient"). AFAICT, xa_array does >> do some rebalancing of the underlying radix tree, freeing nodes when >> they are no longer used. However, I am not too familiar with the xarray >> code, so I don't know exactly how efficient this is in practice. > > I guess that is one of the disadvantages that an advanced struct like > Xarray is used:( Sure, there will be some overhead from using xarray, but I think the simplicity makes up for it; especially since we can limit this to the cases where it's absolutely needed. -Toke