From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EFEEC28B2F for ; Sun, 9 Mar 2025 13:27:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01B59280003; Sun, 9 Mar 2025 09:27:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE5ED280001; Sun, 9 Mar 2025 09:27:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8516280003; Sun, 9 Mar 2025 09:27:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B26EE280001 for ; Sun, 9 Mar 2025 09:27:29 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7D2E81226E9 for ; Sun, 9 Mar 2025 13:27:31 +0000 (UTC) X-FDA: 83202089502.09.4C7977F Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by imf19.hostedemail.com (Postfix) with ESMTP id 887391A0005 for ; Sun, 9 Mar 2025 13:27:29 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DAOVuFRA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of yunshenglin0825@gmail.com designates 209.85.214.194 as permitted sender) smtp.mailfrom=yunshenglin0825@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741526849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RcEFCWQYYHxFZExcvX+pJuL+eOu8jZ+IcoamTjSx8g4=; b=0WtrgOxaC49JqMC4O/UG+ZauOJsQlJfGLdkhluPflVGKDEXaUV77iCSkWzOtUeYnRiaBDX HsecCkjj8ygStmEdBHCp1ZyyQJKLtHaEX1Z/gK5TP6L9tB/4Iv9MZPVoNzm3tW1N3iKH1z c8GWH2WLI4ed/TuM7ASpnoTJtAV/lfA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741526849; a=rsa-sha256; cv=none; b=3Z8ol+RdHzU5VLilXo3eMl+CMm57awcd7Dzr/rc6wc0muUq7JPzeAMXPz9rWid5dfZrIsS eyXGAX76WVe8ANgREZnjToiqy4teomp8HLGO+ytfDB/xkLgATY+oPYwtX0GCm0c6dQLUqc pFvDvQc0lFsaejeBA96J2ckJo8HrR1g= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DAOVuFRA; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of yunshenglin0825@gmail.com designates 209.85.214.194 as permitted sender) smtp.mailfrom=yunshenglin0825@gmail.com Received: by mail-pl1-f194.google.com with SMTP id d9443c01a7336-22409077c06so38906655ad.1 for ; Sun, 09 Mar 2025 06:27:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741526848; x=1742131648; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=RcEFCWQYYHxFZExcvX+pJuL+eOu8jZ+IcoamTjSx8g4=; b=DAOVuFRAZCYcupujGRYBx3fYvx/vXlf+tV3uAelhDIgP/jtsAEP2qH3Nv2MM68Fa+S jrI83PWlQGPIqBzzNUWYxXpw4fAEr5Gzzr3LVZkmy7o7cX0HmX0SnRdFysKIqyui0o8c A6pXxcEpaTEDNu5zV2ymdfF6aPO2OL4xD0+PGpuF8PoGHfYL3diBMjewH67zq5EqBxSn 6Qo6rHrAybunr+VASJGYmUXcDAeYKT0itJ4jIBAmptlE9QlisUH+hAA51mBxgsqfiXWj Sp8BBxJLFGFicyJAU08i43p+OsRcwa4TtXHoYM1hucP1mBD+a5aVG2qksIrh7hcasxwy j8fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741526848; x=1742131648; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RcEFCWQYYHxFZExcvX+pJuL+eOu8jZ+IcoamTjSx8g4=; b=tX+2EjuCkdVo/vp7ooAeHmTXRjy/Ma2U5VNWGwNuC2Z1Bt3WWLOHxDHe83Ep0Fvcox 9qb61b4fFV/4RJgKgXChylHvtZEeHo/my+E7NKLcZIwUXG49C78B5vNzs0RVnFz2je5R 4gmgvHHJekPhCmYZ+f0YlOlzZPqgdAYOuo54wHbP2GRpH5XyRQWC7wxKHyxfye5Ea/e+ u5+iYvamzD/5cpf2c3T/CwdHIF3tHNt8X/Zd1a7NqNz9DeLFuw4U0JyPWa6gdd5EgBZB ZT48UK8IJQGy9Sj7Gdg3Eqk8Qf7IwhjpTOPTEGu+FVKq0AeC6+LN2G1nOz3WAMSFMh1w sT2A== X-Forwarded-Encrypted: i=1; AJvYcCVeljPLgW/5h9cFW8TltsaN2vFNoH5zXnMbnhCiw0a6GpP8b/go6Ujx3ercEoSkVjUNgNyvFgvfUg==@kvack.org X-Gm-Message-State: AOJu0YyjYT978UqA1v3B2V6R2SEvgYHLL6wrnKIUyzKt7OveppigvYH4 i54FdBgISRPmv7m3WZPondNkw1kxQNDb92BusTZZPaRpjUf+i5Y0 X-Gm-Gg: ASbGnctepXtmcujayGX81yq2UKa9zTM0KU7glW1pUros2jfyLW9L+hqn6etsLJMjK3Y OaavaolaD+PkJ3azEjX/fU8RX2dqfxQ/hNZY5dA2MxWmTTEGSTlR2lwIYegjbi5Oi13F8bzfyqR FevThagdntX3O3Q+jH0T3Ls92PKXcRqJt/2YggTZ1ylT2eo/iH2TEYfnlpb7SbfdibOgEK39vmv il0wrZbtwSqxzBa/FOaHzah+HJeQjSiT91dTzrmcIyxAWxWbfbeJ5P6l+ZwH58fKID6s5+ptsRH reW7CPDZsV3czMBECqy1mO0Aefqibr+/VMScPH9mcYcMCIKd3zYqbfJnOWWWuD9s/h3fY+CKI++ Wfy+HXtm6FT9abE62XoeXCysMveplcw== X-Google-Smtp-Source: AGHT+IFZ7+BrV2GG3nZGskDU3JjfiErgPSS8b3veSeS0LDFZB/UF/uXXRB6778elDZmORNtRG19MEw== X-Received: by 2002:a05:6a21:1519:b0:1f5:7280:1cf7 with SMTP id adf61e73a8af0-1f572802cf6mr2379258637.16.1741526848313; Sun, 09 Mar 2025 06:27:28 -0700 (PDT) Received: from ?IPV6:2409:8a55:301b:e120:c508:514a:4065:877? ([2409:8a55:301b:e120:c508:514a:4065:877]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-736b3f2412csm4146568b3a.175.2025.03.09.06.27.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 09 Mar 2025 06:27:27 -0700 (PDT) Message-ID: Date: Sun, 9 Mar 2025 21:27:17 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap them when destroying the pool To: =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Andrew Morton , Jesper Dangaard Brouer , Ilias Apalodimas , "David S. Miller" Cc: Yunsheng Lin , Yonglong Liu , Mina Almasry , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , linux-mm@kvack.org, netdev@vger.kernel.org References: <20250308145500.14046-1-toke@redhat.com> Content-Language: en-US From: Yunsheng Lin In-Reply-To: <20250308145500.14046-1-toke@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 887391A0005 X-Rspamd-Server: rspam08 X-Stat-Signature: g6eiwchoez1c31wnfssdehjtihfdyyy5 X-HE-Tag: 1741526849-652845 X-HE-Meta: U2FsdGVkX1/KAT6PvUw+vcpskka2Mo49xbKTMtLcIu4n+u+KG1bb8pFEpct97jg/R1nbHs4TeO2zsbFnserhhA2rKDi2UYXuh8dA1lyf8DYNqsJUIEdddty29C2NH0777iLXncsp51MDtvrhTP6VQOI1zYPRByMkUmRnjxbjsVAl/7kS+mGjlCnN5ZcgqRpozUJbvhNc7wi8o7QRaLt5cz86QLdIFkAaPDIY0C0AE8IFwTgM6IKORGxH/Ci3QxuXgQdUFCt+/v7QHtwCZOKAP2GNCpxBQnfXXC50KozaSnSYUSEX5pYfUY+U09QsqOV/F7+fWoCFBhEOxBSbaXJGLJnwzDCbj2yaLY/0CWIc2MQJOAmv+APpi1QbQ6jJvFjoL8oTM3ddMBXZDsxE/u93OxTp1a564PFP7b5fsYv/4QHJKfpkhEe4YtEfybCeoE2t9gghkLvrOdXdepbLxqq6eZJlr3NYAiqLoOGqhe79UlecV8LEXrACuxZ7mHdVv1BwuvSANodMJ5iOQA0qxzjiIazxYlLerw20Hc/vu9Zl/ACE2pMELlqQ+Y/3WWAmO40NqW5ECONv8777d9zZkqDq/Q4KBsJ+skWsMh1XFL1kPwu+uReeqph4z4CHcBCKXlUbf+XyedYf+tzpH5XXQx5GyyitVX34MjA+yHScQZ/AEtxpbusQ6VP5Uew778Q79hDUexMeMQiMIEZ0ysEANysrzwxyLkjvLVrol5WudyiRfzUR+9acdYFNjluEJ56rmQW9EfGAFpJwykMbhOdh2XZKgZavEiU91NBcek31/nAwoAByJeoRBXPAtSWpmo+ns64MUeiY38gNnAt5PDAo0+VJarxyBi+vB5zvDzeZJ6DBf9GPM/7a93Uj04DSZ4AO8tvDNSwfFwF6SMa20ZLvkxAm8/nuorz9lx8SJ9h3AlAkStWAd/WLpOQHZnb442ngOWOqXKQ6i+H3aWMh+SOFyG2 fFpoN+5m hkw9uy9Q6dmuglcySHTTG0nCEE6OS/WN7/fE884b48RZ6mdwOfNKN+ouOvfSW+KpV5NpJk27RydwUgQQoCF6El2NidgPdXdc6v3JJN7tqx6wsWPKZUHr4ePQZgVOXiBGu85B2tfT0NHRuDAUnoRlvDzHXHmgTeW/exL9FFk9Db1bCCTpGN+1Za62AZcfyFUyaDEPfc1X+dpNTnuEiOIqfFnhVCtZthqy5n7IekviPgpUjbYf8v5ZCnXSBLZYBDZ1F6pszutzryhU+0CqaHeCxFkGW+EMFW2U04+wK9mtTG8Fu55jwsNDL144yBgcfJWqC23G7fj4L0xBuHDhMp6Pk2gKOgpU9fyMA1YY8GDlLivowSMM/pV2sjrBUvNQvxXT/tt+nTG2kyhoo5VUzAH9e++xM05l970SUCasWeDskYMgmE8QZ8fKe9l/ssCEcXwES3OOANDhBU8Lj01aLPn+BEAD5d0UASREmquTEmp0e99/LUYDKudevDTOJCIBc25fNYS+uihy/4WvWxEkcXZP376th8hGwStqsp94n/ZnuZqIaLhAOGbKA5AYCnhoZvfe6tt1HMZHFj41swmhjdbopfPKzfD7FZ53AE7T/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/8/2025 10:54 PM, Toke Høiland-Jørgensen wrote: > When enabling DMA mapping in page_pool, pages are kept DMA mapped until > they are released from the pool, to avoid the overhead of re-mapping the > pages every time they are used. This causes problems when a device is > torn down, because the page pool can't unmap the pages until they are > returned to the pool. This causes resource leaks and/or crashes when > there are pages still outstanding while the device is torn down, because > page_pool will attempt an unmap of a non-existent DMA device on the > subsequent page return. > > To fix this, implement a simple tracking of outstanding dma-mapped pages > in page pool using an xarray. This was first suggested by Mina[0], and > turns out to be fairly straight forward: We simply store pointers to > pages directly in the xarray with xa_alloc() when they are first DMA > mapped, and remove them from the array on unmap. Then, when a page pool > is torn down, it can simply walk the xarray and unmap all pages still > present there before returning, which also allows us to get rid of the > get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional > synchronisation is needed, as a page will only ever be unmapped once. The implementation of xa_cmpxchg() seems to take the xa_lock, which seems to be a per-Xarray spin_lock. Yes, if if we were to take a per-Xarray lock unconditionaly, additional synchronisation like rcu doesn't seems to be needed. But it seems an excessive overhead for normal packet processing when page_pool_destroy() is not called yet? Also, we might need a similar locking or synchronisation for the dma sync API in order to skip the dma sync API when page_pool_destroy() is called too. > > To avoid having to walk the entire xarray on unmap to find the page > reference, we stash the ID assigned by xa_alloc() into the page > structure itself, in the field previously called '_pp_mapping_pad' in > the page_pool struct inside struct page. This field overlaps with the > page->mapping pointer, which may turn out to be problematic, so an > alternative is probably needed. Sticking the ID into some of the upper > bits of page->pp_magic may work as an alternative, but that requires > further investigation. Using the 'mapping' field works well enough as > a demonstration for this RFC, though. page->pp_magic seems to only have 16 bits space available at most when trying to reuse some unused bits in page->pp_magic, as BPF_PTR_POISON and STACK_DEPOT_POISON seems to already use 16 bits, so: 1. For 32 bits system, it seems there is only 16 bits left even if the ILLEGAL_POINTER_VALUE is defined as 0x0. 2. For 64 bits system, it seems there is only 12 bits left for powerpc as ILLEGAL_POINTER_VALUE is defined as 0x5deadbeef0000000, see arch/powerpc/Kconfig. So it seems we might need to limit the dma mapping count to 4096 or 65536? And to be honest, I am not sure if those 'unused' 12/16 bits can really be reused or not, I guess we might need suggestion from mm and arch experts here. > > Since all the tracking is performed on DMA map/unmap, no additional code > is needed in the fast path, meaning the performance overhead of this > tracking is negligible. The extra memory needed to track the pages is > neatly encapsulated inside xarray, which uses the 'struct xa_node' > structure to track items. This structure is 576 bytes long, with slots > for 64 items, meaning that a full node occurs only 9 bytes of overhead > per slot it tracks (in practice, it probably won't be this efficient, > but in any case it should be an acceptable overhead). Even if items is stored sequentially in xa_node at first, is it possible that there may be fragmentation in those xa_node when pages are released and allocated many times during packet processing? If yes, is there any fragmentation info about those xa_node? > > [0] https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@mail.gmail.com/ > > Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code") > Reported-by: Yonglong Liu > Suggested-by: Mina Almasry > Reviewed-by: Jesper Dangaard Brouer > Tested-by: Jesper Dangaard Brouer > Signed-off-by: Toke Høiland-Jørgensen > --- > This is an alternative to Yunsheng's series. Yunsheng requested I send > this as an RFC to better be able to discuss the different approaches; see > some initial discussion in[1], also regarding where to store the ID as > alluded to above. As mentioned before, I am not really convinced there is still any space left in 'struct page' yet, otherwise we might already use that space to fix the DMA address > 32 bits problem in 32 bits system, see page_pool_set_dma_addr_netmem(). Also, Using the more space in 'struct page' for the page_pool seems to make page_pool more coupled to the mm subsystem, which seems to not align with the folios work that is trying to decouple non-mm subsystem from the mm subsystem by avoid other subsystem using more of the 'struct page' as metadata from the long term point of view. > > -Toke > > [1] https://lore.kernel.org/r/40b33879-509a-4c4a-873b-b5d3573b6e14@gmail.com >