From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79F7DC3DA45 for ; Wed, 10 Jul 2024 20:29:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2DFA6B00A4; Wed, 10 Jul 2024 16:29:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB6B36B00A6; Wed, 10 Jul 2024 16:29:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C09B26B00BC; Wed, 10 Jul 2024 16:29:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A0BA56B00A4 for ; Wed, 10 Jul 2024 16:29:19 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1C6F0120252 for ; Wed, 10 Jul 2024 20:29:19 +0000 (UTC) X-FDA: 82324982838.05.CFE4C13 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) by imf24.hostedemail.com (Postfix) with ESMTP id 48A6018002D for ; Wed, 10 Jul 2024 20:29:17 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="tDMANdG/"; spf=pass (imf24.hostedemail.com: domain of almasrymina@google.com designates 209.85.210.47 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720643314; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ip/0MDOrHp5lX/Irp7UEa2BIGo3LvwiW3nUCmkpVR3o=; b=g8vWl4+c5CW6TxdAgLPoXqvkw6U/gzrpnC8+9yeag1Wv67tKjIlgT5HqZkJbc4odCKRYu/ Sp9cIh7thXnAUNGZY+4BJiJCjdtfqjWwAZTx6elZsvhQ611iH0Pi0qx3TFxYgPYzLQNG/l CL8WmfdkHb7eJJMAfLjgMV222Dn86eM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="tDMANdG/"; spf=pass (imf24.hostedemail.com: domain of almasrymina@google.com designates 209.85.210.47 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720643314; a=rsa-sha256; cv=none; b=XnXJz+zAnGaafV0vt+cTKVPvJP4rKdCBFUIer+hK91zAplOBOChBgCG8gWO5Y2uSLzTKWk fMFJQJtWBxBJSiYZQiH+6I+qvSLZLcEwET2rFEvTPjrFLvI/StPaNK+EanCaSVcPe/oNsD xr8VvLMwe1tlHxokHFqsOAF7YC3Vwdc= Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-7039e4a4a03so72747a34.3 for ; Wed, 10 Jul 2024 13:29:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720643356; x=1721248156; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ip/0MDOrHp5lX/Irp7UEa2BIGo3LvwiW3nUCmkpVR3o=; b=tDMANdG/47GoMb7MwUmyrdiGA0ecq/qMjbhZMH926GWh81Ic/iknRANbPLjDnbGrsb PhrncJqIv/PVkT7d1h1WuYEtscrHmeaI+kXuPHTBZTE83wSWya1PTVb1k7dyttfeeEGB Cy7BUh3eMO6JmQ+mNNX6lAyB70a7PDKK/CZIQzgQlU9DmEUbI4xbjxv/MmiTEnRWunCv cb4SPSYV0orR6b6yx42jB/HPOoFBg9NcJWyg/flgsHHGN0ki1dtsWqbvPU8zM3nsX3f5 TmQqnv7s/qJTmoDMJWpQBrBflYRod6dH6EoRgoAtkwiZyKU9XJTp1z0vYNrxNYeLCC19 S31g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720643356; x=1721248156; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ip/0MDOrHp5lX/Irp7UEa2BIGo3LvwiW3nUCmkpVR3o=; b=oBQ8vJn9DfhEWWzVD1agYFjqv7AJjY95YvGOOeWYYCbRW82YjEf44RLgo3jgHKYKLz 2FCyNVppy4Cr5Cu2gxnzRZAMRWFu0w0t2atkxa3qtTZszSVqfkdR/0WqEF7iOla/O6Ch dCjzkK3DysWJLvBLkjaw/sLu8knTI6p3tXeBRi0XoVx5sk0T7J85HKGawz7ZvT5wN54k H37AfGUONU1QShucDx6k9mNj1pL7zCti3JjhMZUS0iieiKYK3SNZ6VUZSYLH7mQH9Wer zG4Bp8rMKmzjq6EIg58uGm1sgf5DYBf3REGMcA3NPIomZLEqguhlR0x1trUcAfi8xprk 1kXQ== X-Forwarded-Encrypted: i=1; AJvYcCVmgOwizsnkT6neHJmb+zuuAarp/rNEAB7kjYajz7u5SzbLHZsbiCQU+Rct0mOf4W9Rj7/VhjDqh7l5LEhu4hwJxGg= X-Gm-Message-State: AOJu0Yz6ItnSxl0UXnKlyM/3K+I4DlGdq14P3abdKniOuQlqC3RxHhh6 994PKSgL2g5MyY4rkH4AQQiMEqXiUsT/CVu9VJuiTM41sj1owSoS7GwHZqJTPZBPs2YqP0FZuBH HAKtvej+pnwkk7ar1h3XjdF6BeskVUoQjtH2m X-Google-Smtp-Source: AGHT+IFh61JWNinxflrD0b4ZVWkvEczGu6mIg9bnLfCV6X6pe63Th6j/ytHY2kvRgro//9ydMtTk5x6Go2aTaB1/aws= X-Received: by 2002:a05:6830:22ed:b0:704:4995:3733 with SMTP id 46e09a7af769-704499539f5mr3557560a34.31.1720643355799; Wed, 10 Jul 2024 13:29:15 -0700 (PDT) MIME-Version: 1.0 References: <20240710001749.1388631-1-almasrymina@google.com> <20240710001749.1388631-6-almasrymina@google.com> <20240710094900.0f808684@kernel.org> In-Reply-To: <20240710094900.0f808684@kernel.org> From: Mina Almasry Date: Wed, 10 Jul 2024 13:29:03 -0700 Message-ID: Subject: Re: [PATCH net-next v16 05/13] page_pool: devmem support To: Jakub Kicinski Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, Donald Hunter , "David S. Miller" , Eric Dumazet , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Steffen Klassert , Herbert Xu , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Bagas Sanjaya , Christoph Hellwig , Nikolay Aleksandrov , Taehee Yoo , Pavel Begunkov , David Wei , Jason Gunthorpe , Yunsheng Lin , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi , linux-mm@kvack.org, Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 48A6018002D X-Stat-Signature: wztimn8ijji4zbexuy8i8qww7nxd8zyz X-Rspam-User: X-HE-Tag: 1720643357-880858 X-HE-Meta: U2FsdGVkX1/13FR2Tq5VDn0AqBsghKONjfWAeQc8GbpLJxsUPI/TTq5o68XraKLCq+0RZ3F6ftCueCQ1a3XFSaIXsS7t/fgoQ8czOKHbUOfN1tg5rkhiKTQ4InAdx5GJyLmEfC2miKubviEPpy/V3pyp+JsP6AcsAUL7wiuZmSpfWc8wJbjS6Npk7QZ19MqMpndHI1VHWpFyz8/GowgQfOL0Kqm1TwwYoFChI5lRLIAzQvmSOuwBXEYgvEd6JPqYymB7D7LB2BYbu+bQ3h7HR+5JG9SMHZv4I7L3x3/Pw20A26vPO3mnwDDUJrWZoZG2X6kBuNjeIBIWFj6bqD/C+285vWYY9reaeBqdoVRHlFvRfD8ou8rT/l2TS7jLAHhC8cnLYw3fMTECBbzBzMOWspkOQRgYb2WktP5PlelHapY0zQj2YBffEXNP9/OsOeY0r0AZ6H8RIhhF6h0w+TAeXs5Z1pSjP3rwaOisX3V9XSY61SyemfUWD4WsJ0xJpIp2znhuJslpTTfHhwm4J81Z3TGQw8MnwfcCcjpkQrSnv0xNKMAf5VJfauQRcJJFl9sVJzbRdulC07QP29JaQPtYdihsZGCpQoEhJLbudvu/A3cjwPu7nVc9Jfxm42FdTMNfHpOIvRNEhsBO9ggQt5PyyPVrbtRaY9rdKPeTqu1zQNz/pjQgQ5prFG+vMVucZpFwuOgRL4oucqSykyxev9rKMpLdQ4jVL50E0j2k1rKiPVU4UCQhJsCKeGZJYGTWxdPQ9/aGy9VBCy278nPUmFqqarqnuNjzKQVYYnlX0RVPeXbxBiSRfYePBiFqY3AQ+3nGz1nVYuOjHADjIsduvskvuTTTKq9RSIWwirhVA05/c+42YrVCPUFLnRbJyyo638m1kypCyGwksipK2Nh91EVK6hu2S1k9Olg7BpMOsNxcIsxMD1KUJYm9KHo9Jcd3ORY0TpNVPJT+OLD4Y8YyyW5 H8tjO16t D+M9Z2rq6/QUBNOQOdZMk3s1Skxz5zzM4cWZzBxeg8m6fy3T9NqUpG4/xHra+TRkcmSlBSmKBOEtfZ06jFDTFPFMAn7SHlSX4NbrAvg2UVC65bmUPDGixvJ/Tvy4oAnKaOK6H7uxeVbFnUJzm8a8rNjudAexqm1RQGbTDX/dk7wl8IgLBd1XVTIq7Bg8C5Rs6TUjCVrixJqrAJXeUASeAwc3sdRBm98pwoyneoeAktZNyZMQTYp95fnzPdZuqvDQNvZtDHj2PhX5iHX0Rm7RAoFLsyd675kpkHYwiLvElv84s1y6C/vqOkB//v5luXPFQ2RfcnTrR9r13Js/mzV09K0bogD6Wlnr+cy10 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 10, 2024 at 9:49=E2=80=AFAM Jakub Kicinski wr= ote: > > On Wed, 10 Jul 2024 00:17:38 +0000 Mina Almasry wrote: > > @@ -68,17 +107,103 @@ static inline netmem_ref page_to_netmem(struct pa= ge *page) > > > > static inline int netmem_ref_count(netmem_ref netmem) > > { > > + /* The non-pp refcount of net_iov is always 1. On net_iov, we onl= y > > + * support pp refcounting which uses the pp_ref_count field. > > + */ > > + if (netmem_is_net_iov(netmem)) > > + return 1; > > + > > return page_ref_count(netmem_to_page(netmem)); > > } > > How can this work if we had to revert the patch which made all of > the networking stack take pp-aware refs? Maybe we should add the > refcount, and let it be bumped, but WARN() if the net_iov is released > with refcount other than 1? Or we need a very solid explanation why > the conversion had to be reverted and this is fine. > Right, as you are aware, page refcounting is based on 2 refcounts: pp refs and full page refs. To be honest I find the 2-ref flow confusing and I made an effort to avoid porting this bit to net_iovs. net_iovs just supports 1 refcount, which is the pp-ref. My intention is that when a reference is needed on a net_iov, we obtain the pp-ref, and when we drop a reference on a net_iov, we drop the pp_ref. This is able to work for net_iov but not pages, because (as you explained to me) pages can be inserted into the net stack with full page refs. So when it comes to refcounting pages we need to be careful which ref to obtain or drop depending on is_pp_netmem() and skb->pp_recycle (as pp_recycle serves as a concurrency check, like you explained). AFAICT, since net_iovs always originate from the net stack, we can make the simplification that they're always seeded with 1 pp-ref, and never support non-pp-refs. This simplifies the refcounting such that: 1. net_iov are always is_pp_netmem (they are never disconnected from the pp as they never have elevated non-pp refcount), and 2. net_iov refcounting doesn't need to check skb->pp_recycle for refcounting, because we can be sure that the caller always has a non-pp ref (since it's the only one supported). Currently, as written, I just realized I did not add net_iov support to __skb_frag_ref(). But net_iov does support skb_pp_frag_ref(). So there is no way to increment a non-pp ref for net_iov. If we want to add __skb_frag_ref() support for net_iov I suggest something = like: diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488a..02f7f4c7d4821 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,13 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + netmem_ref netmem =3D skb_frag_netmem(frag); + + /* netmem always uses pp-refs for refcounting. Never non-pp refs. *= / + if (!netmem_is_net_iov(netmem)) + get_page(netmem_to_page(netmem)); + else + page_pool_ref_netmem(netmem); } If you don't like the 1 ref simplification, I can definitely add a second refcount as you suggest, but AFAICT the simplification is safe due to how net_iov are originated, and maybe also because devmem usage in the net stack is limited due to all the skb_is_readable() checks, and it's possible that the edge cases don't reproduce. I was looking to find a concrete bug report with devmem before taking a hammer and adding a secondary refcount, rather than do it preemptively, but I'm happy to look into it if you insist. > > static inline unsigned long netmem_to_pfn(netmem_ref netmem) > > { > > + if (netmem_is_net_iov(netmem)) > > + return 0; > > + > > return page_to_pfn(netmem_to_page(netmem)); > > } > > Can we move this out and rename it to netmem_pfn_trace() ? > Silently returning 0 is not generally okay, but since it's only > for tracing we don't care. > Yes, I will do. > > +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) > > +{ > > + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IO= V); > > +} > > + > > +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) > > +{ > > + return __netmem_clear_lsb(netmem)->pp_magic; > > +} > > + > > +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long= pp_magic) > > +{ > > + __netmem_clear_lsb(netmem)->pp_magic |=3D pp_magic; > > +} > > + > > +static inline void netmem_clear_pp_magic(netmem_ref netmem) > > +{ > > + __netmem_clear_lsb(netmem)->pp_magic =3D 0; > > +} > > + > > +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) > > +{ > > + return __netmem_clear_lsb(netmem)->pp; > > +} > > + > > +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *= pool) > > +{ > > + __netmem_clear_lsb(netmem)->pp =3D pool; > > +} > > Why is all this stuff in the main header? It's really low level. > Please put helpers which are only used by the core in a header > under net/core/, like net/core/dev.h Sorry, will do. -- Thanks, Mina