From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77EFDC2BB3F for ; Wed, 15 Nov 2023 19:05:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DEA7280004; Wed, 15 Nov 2023 14:05:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 08C9D280001; Wed, 15 Nov 2023 14:05:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4B1C280004; Wed, 15 Nov 2023 14:05:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D644E280001 for ; Wed, 15 Nov 2023 14:05:47 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AB41380183 for ; Wed, 15 Nov 2023 19:05:47 +0000 (UTC) X-FDA: 81461117934.14.C1538E9 Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) by imf08.hostedemail.com (Postfix) with ESMTP id D163C160027 for ; Wed, 15 Nov 2023 19:05:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mKQTUkmD; spf=pass (imf08.hostedemail.com: domain of almasrymina@google.com designates 209.85.160.48 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700075145; a=rsa-sha256; cv=none; b=HbijVn0OjkLRpYBifKQStTpVKgBWAeyEORQw4ChakHwwgpWPumkGpREbaRWKu4yq/yMdbJ SkstGo3okH2n/vUa/68TKOKwa+9D+Cq4qZ9a2vrLHVZbhvL1KDCNZR8znJJNY4vYR2QetR jqnWMyKUHm9WGBYifXJpUDirsqxmeTU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=mKQTUkmD; spf=pass (imf08.hostedemail.com: domain of almasrymina@google.com designates 209.85.160.48 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700075145; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l7gRaeRzzj5lepzboTcMyTTCMCWwZvb+WSwh0pgbFKk=; b=YT2614tHn6WaW/6CJZ7V5H+O6yy6YMgdkRFV9vD+SS/xWJ2KJA8142xLOjjD5tPjXI3s2c Fnw60fg6UDRtWJTRr3KIP+4BNIecuxlwpU5d6J2CgMsE8dbF2Tlbdgadcqm4KZIi1uxsIH XXqUN7tXqWOXGnO2RJW93XOEkMfbCqo= Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-1efabc436e4so4239204fac.1 for ; Wed, 15 Nov 2023 11:05:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700075145; x=1700679945; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=l7gRaeRzzj5lepzboTcMyTTCMCWwZvb+WSwh0pgbFKk=; b=mKQTUkmDdovbaDMwJSPDCsH2Uwlpie2bv1dWnDl9mHXMwqXR7IwP8pZtn2ylOsjHNl JR671teWhp1d4pVtwYNboYIOTMPai4QG4xJlh6RKg4bCK7T0GY3flyWdXO/eOqPwDkPM uTzQN5wkteAKvHWkszB7YAxZ6oI+e9mBx8M8+cbwxBs+lAItsu1oycOp/DGQDgqoDkUc ZsUPKZpREXkVGCkqjCl8uUzuhYKqHXQs5bwNezcn/qy3A6PxTWWJSaqZjFu3Qta9uFzN JYEFctI1UqG5amafQa5El9RrrX/A1Dx97+iKgYFq7HVhlh0kvR3iKLkz1jJw7OjK9/6v 455w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700075145; x=1700679945; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l7gRaeRzzj5lepzboTcMyTTCMCWwZvb+WSwh0pgbFKk=; b=X/JNrShVCdtTCvMNS9vhqzS027MrLzIZRGvg3NXB6sediTWafuJ+2shd68lbH46eN6 KO7emgyv8Y59Ais2AoS4aBPmB6xNVB6tW355+zm79BzMuoLnd0ef7M1zzgnjP7cKWJGA V45j7rVU2L/+QAWlsm6h9e/SLgXavnV0ePwoVs89yhslK8KLH/Z+swgVuYawoUcNHrO+ FoVr8Bbhp06a9QwvHoODp9n0hV8OU78k8ap8cA2lTLNUV8PUfHIcqvOU+MHafpnkhixj MNSOUrBFh0Odb2M2Ir97IzDqOrrIFqvntWPHLqhaph95OHPCXg/eKy3+bTxOUNH+sj8/ Kn1g== X-Gm-Message-State: AOJu0Yy9YL7JsSxPi6WjBbNUxoVuAl9miRWpH+9EoMJjBpOkN8mHC6KK V4EYGuS07skMTDQsmuhQQY2+CXcmRtsO1OwR0sI7rg== X-Google-Smtp-Source: AGHT+IEmiHDmukUy44qSaauIGSOmkgreUYLdvuvHAAxPw4sFcrapHVFXkOuqhXEPJWEdff7gyyJkd0rYC19zIOUvO9A= X-Received: by 2002:a05:6358:713:b0:16b:f554:f359 with SMTP id e19-20020a056358071300b0016bf554f359mr6881080rwj.7.1700075144624; Wed, 15 Nov 2023 11:05:44 -0800 (PST) MIME-Version: 1.0 References: <20231113130041.58124-1-linyunsheng@huawei.com> <20231113130041.58124-4-linyunsheng@huawei.com> <20231113180554.1d1c6b1a@kernel.org> <0c39bd57-5d67-3255-9da2-3f3194ee5a66@huawei.com> <3ff54a20-7e5f-562a-ca2e-b078cc4b4120@huawei.com> <6553954141762_1245c529423@willemb.c.googlers.com.notmuch> <8b7d25eb-1f10-3e37-8753-92b42da3fb34@huawei.com> In-Reply-To: From: Mina Almasry Date: Wed, 15 Nov 2023 11:05:31 -0800 Message-ID: Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider To: Yunsheng Lin Cc: Willem de Bruijn , Jakub Kicinski , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Willem de Bruijn , Kaiyuan Zhang , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , =?UTF-8?Q?Christian_K=C3=B6nig?= , Jason Gunthorpe , Matthew Wilcox , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D163C160027 X-Stat-Signature: 3f781tstn8bny4ejyx4u7eq64spfigxq X-Rspam-User: X-HE-Tag: 1700075145-17177 X-HE-Meta: U2FsdGVkX1+qmmSD7hoMWfpm38EUTreCfgz1Toa+MmR/0TkPBEQiRTwlPIrvpCNzjROafISSjbMgatdAlm1jLd7s8m9r89N0NO72kbUaMKHs9Kgwh4grFDLeKiVZkzYCKKZTd4Geyp/gZHeWlebzUl2x03WqLygrYMpv1N3jn/QcUONzVxDEM/2N+aYETy+huz3697z1BwLEHDXKZlNw2mV/667cBFOqin0uZaaw+Juz4TUQwi8rnBU2A9EOPt5OI705QSf2a8NTzUM9+SSVkpjby9XoiG7HA3T1flZTti1jncveuFBjkdp82QoxSz/KgEdCe9zlUWjOnXhB3u8d904u0LFCew36PDkyqoz6RSp1UHQisUrd3Y8b6oMZBfTLzXiArobfOgH/vNX2tClah5Tftyt8ti8BhcmFCMcm6v9BbdYNRoNB0K00y3q0wPf5KDYfnhwANR+sY4JL8dMfTdkKCFd3HMqY8H6FMsX/Ab8GbvabiEUGTtanaZkiv66jRWgHmwqSB9h1PIMDOVQyZu4RcxcD3Jl1fwocjoMVbSaU60tzV+6pPdZ1iFZu+9ZkRZvJP8P+LSXHSbh/ZsCMUAH8TZrq3cLJUdLw99+S/iRR4/zRo6bY7LH41mAaYPpO/cz7Cfd+SLGuArseYc1Norsti0mlQckoYTzhAO9xHc9vSGixjsLeV6UG/NSLzOU4MYqlvLVvBLm9QYS6wbbMN9SoJDeuMW2JjR2YxxUqlkrqTU/ahDMI+XPRElVbObDiRru/SDtUtbcBZz0F3G1swMj0Akgpr/0bRTdwxzI7CbZAyEtbDn6WDtd9ImdvdnU6x+xTutfUHOq719oO/s9VyydBuSIepWhcjmq1rxiOzsp6AKr1TZiOb72V0b/XpdrT/gOhTLOTowXI/gg834fRXLbGen89+9MvfTtoV5g3gaHC1Muyl98HPBwZ6B/HbqMJ0Y2GbOCiacyIMNO0TGp TqCyXYvT p/GOdkX9LDZe/ZhsA/2ps2uPtyfzwetHwNIesCzgmJewybT9SrEHcLU3iN37oLkLZ42wQG9zgmPue0NXUbmWmpLdQnFDdHQQCprq+c9MyrI3m9hWdmy3udvEQp/KZeMTZ+w9soAY46K4JeRw//gfXwEF/4DHf31hi7yoQ2emhRBZOqbeFLg7qDH19TGezvv+fRC0+i7jJ7P27W7jGUDttJQlTq6l4NXdd7UpyBwsDWA7p3NaSPKEqF5xa3GbS64zkO0bkhjUgjQ86jE3O1+wg7Vt/nwqxzodHcteXc/v2qw1odYk0O5NgfeXxKUUWZbAJkzF1h1kHjiuGrNnu5Ihmkuersf+L0zjORTEmTpusIsFquTRoxrB0rvdRA65g43puRM5HCMCQKsm/OvIJ4Rwz9fGISiWS0b2FITzvlbgfqd39eawkLkmMvHlaOrtVbC8Fqs2ZJ2VPXSoDI042aSjzKZvKhaHnSo/2ua3SdjHRgrV2XaXF6GtwkRMZp3K+cVUwGZl/lgFGLtnyWhfp0rad5aGDWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 15, 2023 at 10:07=E2=80=AFAM Mina Almasry wrote: > > On Wed, Nov 15, 2023 at 1:29=E2=80=AFAM Yunsheng Lin wrote: > > > > On 2023/11/14 23:41, Willem de Bruijn wrote: > > >> > > >> I am not sure dma-buf maintainer's concern is still there with this = patchset. > > >> > > >> Whatever name you calling it for the struct, however you arrange eac= h field > > >> in the struct, some metadata is always needed for dmabuf to intergra= te into > > >> page pool. > > >> > > >> If the above is true, why not utilize the 'struct page' to have more= unified > > >> handling? > > > > > > My understanding is that there is a general preference to simplify st= ruct > > > page, and at the least not move in the other direction by overloading= the > > > struct in new ways. > > > > As my understanding, the new struct is just mirroring the struct page p= ool > > is already using, see: > > https://elixir.free-electrons.com/linux/v6.7-rc1/source/include/linux/m= m_types.h#L119 > > > > If there is simplifying to the struct page_pool is using, I think the n= ew > > stuct the devmem memory provider is using can adjust accordingly. > > > > As a matter of fact, I think the way 'struct page' for devmem is decoup= led > > from mm subsystem may provide a way to simplify or decoupled the alread= y > > existing 'struct page' used in netstack from mm subsystem, before this > > patchset, it seems we have the below types of 'struct page': > > 1. page allocated in the netstack using page pool. > > 2. page allocated in the netstack using buddy allocator. > > 3. page allocated in other subsystem and passed to the netstack, such a= s > > zcopy or spliced page? > > > > If we can decouple 'struct page' for devmem from mm subsystem, we may b= e able > > to decouple the above 'struct page' from mm subsystem one by one. > > > > > > > > If using struct page for something that is not memory, there is ZONE_= DEVICE. > > > But using that correctly is non-trivial: > > > > > > https://lore.kernel.org/all/ZKyZBbKEpmkFkpWV@ziepe.ca/ > > > > > > Since all we need is a handle that does not leave the network stack, > > > a network specific struct like page_pool_iov entirely avoids this iss= ue. > > > > Yes, I am agree about the network specific struct. > > I am wondering if we can make the struct more generic if we want to > > intergrate it into page_pool and use it in net stack. > > > > > RFC v3 seems like a good simplification over RFC v1 in that regard to= me. > > > I was also pleasantly surprised how minimal the change to the users o= f > > > skb_frag_t actually proved to be. > > > > Yes, I am agreed about that too. Maybe we can make it simpler by using > > a more abstract struct as page_pool, and utilize some features of > > page_pool too. > > > > For example, from the page_pool doc, page_pool have fast cache and > > ptr-ring cache as below, but if napi_frag_unref() call > > page_pool_page_put_many() and return the dmabuf chunk directly to > > gen_pool in the memory provider, then it seems we are bypassing the > > below caches in the page_pool. > > > > I think you're just misunderstanding the code. The page recycling > works with my patchset. napi_frag_unref() calls napi_pp_put_page() if > recycle =3D=3D true, and that works the same with devmem as with regular > pages. > > If recycle =3D=3D false, we call page_pool_page_put_many() which will cal= l > put_page() for regular pages and page_pool_iov_put_many() for devmem > pages. So, the memory recycling works exactly the same as before with > devmem as with regular pages. In my tests I do see the devmem being > recycled correctly. We are not bypassing any caches. > > Ah, taking a closer look here, the devmem recycling works for me but I think that's a side effect to the fact that the page_pool support I implemented with GVE is unusual. I currently allocate pages from the page_pool but do not set skb_mark_for_recycle(). The page recycling still happens when GVE is done with the page and calls page_pool_put_full_pgae(), as that eventually checks the refcount on the devmem and recycles it. I will fix up the GVE to call skb_mark_for_recycle() and ensure the napi_pp_put_page() path recycles the devmem or page correctly in the next version. > > +------------------+ > > | Driver | > > +------------------+ > > ^ > > | > > | > > | > > v > > +--------------------------------------------+ > > | request memory | > > +--------------------------------------------+ > > ^ ^ > > | | > > | Pool empty | Pool has entries > > | | > > v v > > +-----------------------+ +------------------------+ > > | alloc (and map) pages | | get page from cache | > > +-----------------------+ +------------------------+ > > ^ ^ > > | | > > | cache available | No entries, = refill > > | | from ptr-rin= g > > | | > > v v > > +-----------------+ +------------------+ > > | Fast cache | | ptr-ring cache | > > +-----------------+ +------------------+ > > > > > > > > > > . > > > > > > > -- > Thanks, > Mina --=20 Thanks, Mina