From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 128F5C5AE59 for ; Thu, 5 Jun 2025 18:59:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F0576B00B4; Thu, 5 Jun 2025 14:59:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A0EC6B00B6; Thu, 5 Jun 2025 14:59:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B6D96B00BA; Thu, 5 Jun 2025 14:59:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3495A6B00B4 for ; Thu, 5 Jun 2025 14:59:41 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 408515944C for ; Thu, 5 Jun 2025 18:59:40 +0000 (UTC) X-FDA: 83522260920.23.4B39732 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf20.hostedemail.com (Postfix) with ESMTP id 5F3151C0005 for ; Thu, 5 Jun 2025 18:59:38 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VAnn16jQ; spf=pass (imf20.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749149978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H05/LnQbfBjMagEcyvCs4/p13Cyqf8KAitB4wn3TCi8=; b=eUQ0WXgS4XxrnCYpOEK5Oe8n7UkPkG//UA6J1Ath0V9AIjv0N2TszYwpDm3xd6zf33145p DOWMDVb6WXdnXH3QCblHVc3880VlM8uEtuZfOXrR0EWQcp2hS6f+X+3ujkoCheT97KkY5D lsgW70uWmId/o/oLUz2Zl5+5nrwO2K8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VAnn16jQ; spf=pass (imf20.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749149978; a=rsa-sha256; cv=none; b=KIM/7TQ91qXoxvqEJFn2jQGypEnpJ7s9jxG4jGCeYmOcMx+oRFgf5oFRzq1GFKJ8kE77av uNZbO0IZfX0jnkGxC05mRlO/O2rIE7zPKOfx+/iWegNC6iWJOXPfJuwkGKODxtZtOqAF8c ordc9ExLK1R9B+bAfSKkujpcPfm/HYA= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-235ca5eba8cso30245ad.0 for ; Thu, 05 Jun 2025 11:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749149977; x=1749754777; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=H05/LnQbfBjMagEcyvCs4/p13Cyqf8KAitB4wn3TCi8=; b=VAnn16jQiWCV2zxvaub7Uo3YUnUJhev/bNF7ahRs1AwnbzZxhBrhOPSrjU01Axs/dL K6BvET3FxXX0PD/ZjnNnUj+h62uOuEnzhgoByAab7hN+bO3WVDSjF0BF9UXFO6rJ2COs mOPUvHUQOOm1MBd6nRyf1KbOkiY6HRGA+QSUNZkDNSMuqx+SBQyXOoweLUcsrbuVLQpC X4grE6vM2vEL2yVqMF9Ks25bVpnI7xPo+EIugYzdHnR7mEchmb4jJ+HgJOHBVgyI8cNf dpC+DgOTJjfhxeba7uNZ96AIg+H4SN9rUUrRoT5n6dk9oRo3vWE0U+2d4+zap35P1um6 QdNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749149977; x=1749754777; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=H05/LnQbfBjMagEcyvCs4/p13Cyqf8KAitB4wn3TCi8=; b=dQeYUsVewsYxuZrYDtJSRQdaGy9kHx5v+0DeVw2SYaWWUReWnReN7SvWVjDkL15QW1 tkR/UxCwhDnt0tREVTfj0LNJrDMMtCfr75SW6va5kfqaWIn1qZrTVOhnmWkc/bOIec0e q3WuyP2V/wf8F8ucEuA34+ixLfeB+p/L8HkQJ2F3s1AmvbdPAeqWCPfxex9DCRYEHqre Ub2QFBMu5Z9PgYGzkmFvcQEu4cMTd0VjNIZjJB2oJ4DYmrgKywSwS4Wxk6BOrxVKEHo8 g+btTdhlTjUaYcebdmgm+LmGtv7vu5cyZrPvRiPOEl11sESzxEAgC1EWW5pkJLZ3uYJT NKAg== X-Forwarded-Encrypted: i=1; AJvYcCWA7xvTttScgQU/pDBwGEk3m9TEEUn01sqcMoAt21rxb8tttSGkMp80baaGAKX0y97lgrKoA6Ee4Q==@kvack.org X-Gm-Message-State: AOJu0YwP/B7ygeUT2UAGn916To9ToRX8gJ/hIffHT6XFiIDdb9aMy0ud dCEcWMXtK1T8OqHmqXTry2f1xuG0W3/AnYDysmcr7uiWMAmfzsDdE3BZhHuv2N16wMi7DbwDpfJ CDheSILTO7Oa2h7WqHeeare+DFk4CHKU5ZLLLfkcQ X-Gm-Gg: ASbGnctqnL4C/7YHMVaCtGYJjGWYTzVAtYi7deADa7lEbTxZ6qyMO66lWqimA34zxlL 1CkkFKnOGcuXF3A4H+MsUpfNQwaSewNL8qZRIFCLyETdKYGPL0jMGSdmy2xyvZP7VrZbkh1GJ9R 2pkvkTjgQBjeahD0jFnzRJIlofJZB28Wkw8HE0S8AbEw2y X-Google-Smtp-Source: AGHT+IH0Z19F+C9739ZW11S4OWSArlhc70SyePq2+bEvFgwOuWKH5ok0NpwbUSbVk9PZRJ/HXQ1ZP8O7qcCObcGc9VM= X-Received: by 2002:a17:902:cec2:b0:235:e1fa:1fbc with SMTP id d9443c01a7336-23602119b58mr512155ad.0.1749149976908; Thu, 05 Jun 2025 11:59:36 -0700 (PDT) MIME-Version: 1.0 References: <770012.1748618092@warthog.procyon.org.uk> <1097885.1749048961@warthog.procyon.org.uk> In-Reply-To: <1097885.1749048961@warthog.procyon.org.uk> From: Mina Almasry Date: Thu, 5 Jun 2025 11:59:24 -0700 X-Gm-Features: AX0GCFvXkMQX3ul5swdW02rs4DNQ2tHgyFN68VxdxFtirNcZNnjY_XTNytYjMiA Message-ID: Subject: Re: Device mem changes vs pinning/zerocopy changes To: David Howells Cc: Stanislav Fomichev , willy@infradead.org, hch@infradead.org, Jakub Kicinski , Eric Dumazet , netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5F3151C0005 X-Stat-Signature: ha5a6nh1uthjoxhfurbm7nn6hswy15nz X-Rspam-User: X-HE-Tag: 1749149978-26893 X-HE-Meta: U2FsdGVkX1/5aRXYA2uR018AQeM8Hnen4zN/zMFvXcxpTMgfH5XJ5rkr7Gv+0LJQjnxLQbg2s/VLBmJm4ch0mPM7XYQrMupjBCRjXwgtGD8iY96vI0YCLhFebT4FVMhu/Q7e14euahm4e7dJDPD+u8wLMYjN2aqLhfsOdktrIGacjIq2xU+M+q+vcvakqx3XqSXDT6mE4WKB9NTcXQfS1VeA6giuf/xOZhW1GI+bPHo+jQEhAU9zwIT7nHAcUDFuW/+5YD3ueT6XHY3SgMfBtFnNpS6WKL4pbJA2T0F9q+XnFJThFtGOPKlTMz4FV5FQoaH5UaqjoD1elFGjBo3HduHENKR+gvCM5MIiDgKCPTkWypD0u70IrltyZiK1+us3gIYiRG7FAsD9hpXqwvHvDse2M8BtgekZBXa/Fe88nYAe71bqbO+FMrZMCyBA46enEABK4ecQnq2C86IsjnW6VBepZKpGz8VB+jK7omgGOXI9TrnM1Zjkjkl3BsQKxKLauvrNgWPTyqvph3Ltq0Nwy7IEF2x1kH8Ys7oYT/VYn7dVMjEJHGJdLZBvonKV9cmYdvWG6EPBwUrtDA7mUgfTBz86alT8gqIYYx2gwamupTsZZcFiwh1uqzf2Dfbh0D0k5rpTHf3oV7lNaAX3GFCZiNtXFngHXvBfoa32Ibq7lPDdVDWQkpJHcPLy4W925Y/4PxddGtwjCfyX+WUKobY4a0p4vBRmaS37y20+lplJ+SweZlTszYJDMe5TwAt3NaHq3Yg9w3XsHHQEybRTGj57LzjRPx3ZonjQRd+evpJzOyW7/6K9pZoo4UiBLfW/akSRocSavMGofCM7rxELVbLSlBPt9XqJP5FzWU224aHRFH24nTs5p3tazHKjCy3jQoUboBLn6W++6iZAKUTk+ekBDGTjoExGJ6bkK0Iz0Ap/sJzuznzMt4NCQZCxrvEsPRu8PAucH6q3XwXMfj86DD+ 4f7ex0JH BTk1AUpFLiuoQ+82CmrqHlY+HGOD/4MhtazADqYW2eCUEW6785oOHx+81E+ViSGr38OiQSrK9/06H4Wn42aVZns67DHPPJkOPfdAYdISV1WQ+YK3JuwlkbwVfeoW7nNsoLDkgXf9uC2S//dlcSOg5XgVqHes1VJPMJZ/iF6SAWMC9G8VPxfehg2GllP5E84ck8dSrZGtPxfbIbNmJ4UJ+lmYSakHU+w7KMSa8ECvWI9vgIKU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 4, 2025 at 7:56=E2=80=AFAM David Howells = wrote: > > Stanislav Fomichev wrote: > > > > (1) Separate fragment lifetime management from sk_buff. No more wan= gling > > > of refcounts in the skbuff code. If you clone an skb, you stick= an > > > extra ref on the lifetime management struct, not the page. > > > > For device memory TCP we already have this: net_devmem_dmabuf_binding > > is the owner of the frags. And when we reference skb frag we reference > > only this owner, not individual chunks: __skb_frag_ref -> get_netmem -> > > net_devmem_get_net_iov (ref on the binding). > > > > Will it be possible to generalize this to cover MSG_ZEROCOPY and splice > > cases? From what I can tell, this is somewhat equivalent of your net_tx= buf. > > Yes and no. The net_devmem stuff that's now upstream still manages refs = on a > per-skb-frag basis. Actually Stan may be right here, something similar to the net_devmem model may be what you want here. The net_devmem stuff actually never grabs references on the frags themselves, as Stan explained (which is what you want). We have an object 'net_devmem_dmabuf_binding', which represents a chunk of pinned devmem passed from userspace. When the net stack asks for a ref on a frag, we grab a ref on the binding the frag belongs too in this call path that Stan pointed to: __skb_frag_ref -> get_netmem -> net_devmem_get_net_iov (ref on the binding)= . This sounds earingly similar to what you want to do. You could have a new struct (net_zcopy_mem) which represents a chunk of zerocopy memory that you've pinned using GUP or whatever is the correct api is. Then when the net stack wants a ref on a frag, you (somehow) figure out which net_zcopy_mem it belongs to, and you grab a ref on the struct rather than the frag. Then when the refcount of net_zcopy_mem hits 0, you know you can un-GUP the zcopy memory. I think that model in general may work. But also it may be a case of everything looking like a nail to someone with a hammer. Better yet, we already have in the code a struct that represent zerocopy memory, struct ubuf_info_msgzc. Instead of inventing a new struct, you can reuse this one to do the memory pinning and refcounting on behalf of the memory underneath? --=20 Thanks, Mina