From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B392C5AE59 for ; Thu, 5 Jun 2025 19:28:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8FA8D6B00A1; Thu, 5 Jun 2025 15:28:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D1C96B00C3; Thu, 5 Jun 2025 15:28:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E8766B00C5; Thu, 5 Jun 2025 15:28:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5FB586B00A1 for ; Thu, 5 Jun 2025 15:28:06 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D5B711D7A10 for ; Thu, 5 Jun 2025 19:28:05 +0000 (UTC) X-FDA: 83522332530.18.A8AD178 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf26.hostedemail.com (Postfix) with ESMTP id EDDE5140013 for ; Thu, 5 Jun 2025 19:28:03 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QZiv1hvv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749151684; a=rsa-sha256; cv=none; b=Zv4CLdcpidyPs53XBJilrILNuuLpj9wmXGALJrSHjWGKtDQmWxWY3AOn5bozOr1TPMQxHY F2D5g30i5yULJkLje8P4+z7DzLzeS7fjH4OC8q/KRethrLxmvxX3JDrjra7JJL7Ck+5NBN XxAOAmukw/3qt9ZrkmOMIs+UQzGPx2o= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QZiv1hvv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of almasrymina@google.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749151684; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gmU5jgwgBwNq/krxcY8pPMdH4Rz6BwK40x1VIx+RxWE=; b=EuzsaIrnIv+xNZhy/ZpK4mZRiHIsGSdHDdgb9QA215wU7oLNCyZ1uViO7YK+ZaINLbvojy eAE0WA/cwRmy+RtimAgqol0rLQOBexqJj905si+cxYLZJPxDlpj06/XGeJyOx+weKN9xsD l+VWDUfu1Llh4DP3oWzR+M4pwRVArT0= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-2348ac8e0b4so2275ad.1 for ; Thu, 05 Jun 2025 12:28:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749151683; x=1749756483; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gmU5jgwgBwNq/krxcY8pPMdH4Rz6BwK40x1VIx+RxWE=; b=QZiv1hvvK0bGi/i+xFKX6NuPDLVQK6/dBGj/VlJUyUPOuJrUyKSbXUJlYweydPm3/w FPy2NV0x6lY/psLyfFNRmAF/8cyPoGLAaUCxK/BG918Xa5OHNbfx8RB6SY1/zM4CRLZa OGxbDlLnpNGGvNLvklj0u+2he8IXTnakAoTSSh70stSlt22PquGS4cK7iV+X3Nos6swB MzbsVlUGNbSCm94mPmROVKVj/NzckPw7drDeRr2FepV+p9vVNHoxdCa7vTKfevW3m/tq kUgK0JZAOZsOoMwtBm+MlV79xr+72ZwlfZn90EjDAj7ktYIlpDHJ1SJAucPPSIANuKSD PTCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749151683; x=1749756483; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gmU5jgwgBwNq/krxcY8pPMdH4Rz6BwK40x1VIx+RxWE=; b=xLQrO+UOFKzN2mEUlAS+Qs98QhoBqEqO1oAf52CEaqO3GnTFNBqPAmYb3WrBpCvZpk hXamK15HY7vdfBEWR8SgjEaL/1NxH5cX61HaiwytxdkJmsAAZGK2y2xv49JyINGb7GzL tBsn+0kKSOQfNbT33C8B8BuntK/WoDFfrtE0Tmp43bim0ikLxoUKy7+OqsVPWCdq/vwv xPm5XhGUGbP+lUIfR5umqIZywnUo5JvWp4T07zG8bFMncTL5Sb8wZCqQc8GFiad/Uieg SnPWTG2gniEBRf2xncklu9tW4IqNvftBuQIzX6HFH0yyozpdE8NymkRuQ/Bq3j1gIR1q Fctw== X-Forwarded-Encrypted: i=1; AJvYcCXplk+JQpMDwEaAjEELSjoTLs6mcQCDEHcPtQ+7Xp/UqVpWkUSbtBadSxZhYf8tYJRi0FifF00Wug==@kvack.org X-Gm-Message-State: AOJu0YwTqMNKufp9POa5LHu79cxr3Z4s43hdNwEGcDa2mtbWuZls5dqw BXcK+y+1pyOkI68ndqadzF9SJ3EJuBgWbNmrZTyXij2VsPF2Ciwbbr0f3spOv8Iud7eAPl7sO1f wBkXQI4pjcxx9hK2zyWdSYHprp5/+uLoKiNujl/u+ X-Gm-Gg: ASbGncuvhReIiG4OCqAv0wlyOWg2T7G/vz9EzHLSjZbnC/y+FaiOHF5cak94mFJGJMz xgp+r3HZ3RiRKr1a/Yn1yiIIubnuAZ2GSgiNG00gBMLvR/uai26Q47arjtlP7384U1tM8CB5AiM FzLcN6JL7IZSWn5a7bJ/lZ/fUS0OGcAeCi/0KAxEbXf/ez0Owh0Mydndc= X-Google-Smtp-Source: AGHT+IFSB1k1iGfOurbGRkfNDqIiusOwckbHKMfilmFI2IBhAFHdiWOcxmpqiuagVNmXUXI0Db/8WyC397cn56HcfjE= X-Received: by 2002:a17:902:cec2:b0:235:e1fa:1fbc with SMTP id d9443c01a7336-23602119b58mr618285ad.0.1749151682508; Thu, 05 Jun 2025 12:28:02 -0700 (PDT) MIME-Version: 1.0 References: <770012.1748618092@warthog.procyon.org.uk> <1098853.1749051265@warthog.procyon.org.uk> In-Reply-To: <1098853.1749051265@warthog.procyon.org.uk> From: Mina Almasry Date: Thu, 5 Jun 2025 12:27:49 -0700 X-Gm-Features: AX0GCFuJhZKWoveVe84eQDh04RXfblHAGI46IIBQheHQs446JQ4neVlmb69Eroo Message-ID: Subject: Re: Device mem changes vs pinning/zerocopy changes To: David Howells Cc: willy@infradead.org, hch@infradead.org, Jakub Kicinski , Eric Dumazet , netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: EDDE5140013 X-Stat-Signature: r8dnrzxd3dyouk39q539bmp8m6bfurou X-Rspam-User: X-HE-Tag: 1749151683-57322 X-HE-Meta: U2FsdGVkX1/KxtM08cK7uul3ZRC4uMdByfgwWYC/Ie7EPY3+EdRugjhkrlXOkhlrHLN6z8CW2PNOL1IbmMLi/JL1AY9E+nPIQy+KhKfw7UUQwNSq3KxtVjvqiTj7FhEpF/1xRivkbcDTsudDYQqVg633W1fPwlB01nZlp2O5B7/8bVMEbroBUMeoxaww6ALsBMqhjrn72lg/xUm+BYffqw4OWTxuwhdbhedERo+um66c4z9orPgKp1AUCicCbzxSx+LINau9dzHup8uibt61Fxk2jTajOQ8Y89xlFxYjC+8J7O4gkOdVFf0uuNFED2+2EH/+P+oFGMEEKN2/dE25SMfqNj+LKkgMOl6c7BOCT6OCTbFrj/A0Z0v4zg957b+X4xPGdoiMF9SzKeuH9JZSLl185YTVXt4AXgx5wWBuji2+IyAzSmPcT4qgwMAw55YsyKWf/8Istj0i/AyiCQ+HbT5JIRtzfAKSa70c9earckR8UN3jZSP6xuRE+i2iqCb1OokvCfSsgKCh5hPP19C74QIEES5Zcz7utXb7pNsVH6FnUColr9/Tr/isAECJ5ymvjeERbNTBClTgaqY2XMhsdTIuMoBunUGq1tTArBohfT+jm5585W+gXKLI62BDz6e8ROqRHgWouynUDlotNvcqb2kE73bFkYM73cGzSihxpNj1SEuf9aV6kh/17lc+sXn8A/2umjrm6SRPPYYFw+JQUbBEh58ImZWQqg7dnAG4DjEkHhlSzIvxWezNHirSC4I7V/ALOkZgdWAJJbdlQiCA0icoC98glWPMs0G7HPJ/2c4oPc8+yYVorDL+c9tio73fwhXSEP+zq3xSOKbancVCvUnubuo4knNz9mfg2rQrk8dGgDKsV4LDjOUrGplerjtJwsG3ZWodxU1psK56js8n5CcWm+dAsJl0MKypItiYBu6hPYLgOAdYVNyD7o7BquWYHyj73bnamFc35WXq7sg QCNroaTK ASw0WMW3Vp8KYoYQw5D3Csq07gXlFGwA3KbgBnSpFKOaCbL338jqN2cR/1cIbabGNPt1fRCZx/njOk759QqvP3IKe7R/sxUDEjBC6unDEVgGW0o+aW806pHsFnM9qN1MQIis2E9vccRx8JFLu2JMETzbAxmxcrhYmi9sVSCoAIERmKl7VvA+CsKyObsX0/jfHiXpPHW69RSZpmgQK02XGvvJJaw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 4, 2025 at 8:34=E2=80=AFAM David Howells = wrote: > > FWIW, my initial gut feeling is that the work doesn't conflict that muc= h. > > The tcp devmem netmem/net_iov stuff is designed to follow the page stuf= f, > > and as the usage of struct page changes we're happy moving net_iovs and > > netmems to do the same thing. My read is that it will take a small amou= nt of > > extra work, but there are no in-principle design conflicts, at least AF= AICT > > so far. > > The problem is more the code you changed in the current merge window I'm = also > wanting to change, so merge conflicts will arise. > > However, I'm also looking to move the points at which refs are taken/drop= ped > which will directly inpinge on the design of the code that's currently > upstream. > > Would it help if I created some diagrams to show what I'm thinking of? > I think I understand what you want to do, but I'm happy looking at diagrams or jumping on a call if needed. [snip] > > I think to accomplish what you're describing we need to modify > > skb_frag_ref to do something else other than taking a reference on the > > page or net_iov. I think maybe taking a reference on the skb itself > > may be acceptable, and the skb can 'guarantee' that the individual > > frags underneath it don't disappear while these functions are > > executing. > > Maybe. There is an issue with that, though it may not be insurmountable:= If a > userspace process does, say, a MSG_ZEROCOPY send of a page worth of data = over > TCP, under a typicalish MTU, say, 1500, this will be split across at leas= t > three skbuffs. > > This would involve making a call into GUP to get a pin - but we'd need a > separate pin for each skbuff and we might (in fact we currently do) end u= p > calling into GUP thrice to do the address translation and page pinning. > > What I want to do is to put this outside of the skbuff so that GUP pin ca= n be > shared - but if, instead, we attach a pin to each skbuff, we need to get = that > extra pin in some way. Now, it may be reasonable to add a "get me an ext= ra > pin for such-and-such a range" thing and store the {physaddr,len} in the > skbuff fragment, but we also have to be careful not to overrun the pin co= unt - > if there's even a pin count per se. > I think I understand. Currently the GUP is done in this call stack (some helpers omitted), right? tcp_send_message_locked skb_zerocopy_iter_stream zerocopy_fill_skb_from_iter iov_iter_get_pages2 get_user_pages_fast I think maybe the extra ref management you're referring to can be tacked on to ubuf_info_msgzc? I still don't understand the need for a completely new net_txbuf when the existing one seems to be almost what you need, but I may be missing something. I'm thinking, very roughly, I'm probably missing a lot of details: 1. Move the GUP call to msg_zerocopy_realloc, and save the pages array ther= e. 2. Pass the ubuf_info_msgzc down to zerocopy_fill_skb_from_iter, and have it fill the skb with pages from the GUP. 3. Modify skb_frag_ref such that if we want a reference on a frag that belongs to a ubuf_info_msgzc, we grab a reference on the ubuf rather than the frag. 4. Onces the ubuf_info_msgzc refcount hits 0, you can un-GUP the memory? --=20 Thanks, Mina