From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58C90C54FB9 for ; Thu, 16 Nov 2023 11:31:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C34D9440161; Thu, 16 Nov 2023 06:31:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE521440160; Thu, 16 Nov 2023 06:31:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8A5C440161; Thu, 16 Nov 2023 06:31:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 99052440160 for ; Thu, 16 Nov 2023 06:31:11 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 62F9FB5EE4 for ; Thu, 16 Nov 2023 11:31:11 +0000 (UTC) X-FDA: 81463601142.02.16D5E2A Received: from mail-vk1-f182.google.com (mail-vk1-f182.google.com [209.85.221.182]) by imf05.hostedemail.com (Postfix) with ESMTP id 9B5CA100022 for ; Thu, 16 Nov 2023 11:31:08 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="cvX/PpPk"; spf=pass (imf05.hostedemail.com: domain of almasrymina@google.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700134268; a=rsa-sha256; cv=none; b=N/Z3+lhKDheho8uQmb0pm0XZcTnzmJLZQ2c0U2ENTaT8+jSHqenv0LKm+uaAlOorPIdew9 jXS4Zaea+tjzHQ1OER2kbej5QDW7dSfs8+LHyQ+XCaAct2mEWBnXbsfstAWU9c8Et3Vc8g Wr0reee3VqSzxIK61Mk9S/6+JmlW3oY= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="cvX/PpPk"; spf=pass (imf05.hostedemail.com: domain of almasrymina@google.com designates 209.85.221.182 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700134268; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+5ZN/4d17A7tB0MhyQTTzrWx/HMOUSq5lA4jSyJp+JY=; b=FeOThhPbA18hiUSSJ/cU0JBzhhHgi5Vu0VpSsvt+pCCdb0zRCGtBPruXP/qUwBwDOmQKZ9 EVuFcxfSNW2cX0h0GfNottT28O21At/OjiI98FkgffEfeAdyA0QrtNEoGpf3KEfDA8EYTW 5ixi8NO+gfX9kmQUr8VXBG6pfbI0qvo= Received: by mail-vk1-f182.google.com with SMTP id 71dfb90a1353d-4ac2e60c912so260733e0c.0 for ; Thu, 16 Nov 2023 03:31:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700134267; x=1700739067; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+5ZN/4d17A7tB0MhyQTTzrWx/HMOUSq5lA4jSyJp+JY=; b=cvX/PpPkMj1LSjdMX3oSZvswH7HTDsPI1IiCC5CNSD1nDAXfh7x4HN2s8NinMKtJrp mvXMzIxziS63L3VTSWjl9ThIjw/qAVPGXh8CVnoLHkruOoqB/N7OCYGTdO7GIn2orC5q /Gi9btxCLVFF9YdcuDY/Wy3YhPlptkzSDPHflXM5T3PV2TWBlDeFdftkfRRv89IPzHCn A/QDhKJiuiVP8hH4A5I3w5ZNJ/M9PKMbvQ+SiQjXJ1b5BT/Gkhrk7Zad70sd4M1f8JqC IEHBTnysb4p5bZcQnvfQKg5vAZ/rmmBdN215d1h0acLiHh570jDQF1VTmuRIN6PmfOg7 z6zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700134267; x=1700739067; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+5ZN/4d17A7tB0MhyQTTzrWx/HMOUSq5lA4jSyJp+JY=; b=RgvQHLvwz/wduoJfiFggKUun5PA2nCXCUnBL+jzMTHg1wVF0/nDo7l/P5kYOkuHo0D 0l9NVytqEp/20VeNCtsn/ZF9rOWfOAPuoJXnl4m4ppsiBC1whIow3R/tJrYEPKTvi6G8 twUN97nv07vxor/7vs8odNakxinqnmlwWz88nnhYXasOvqVX7PotYLoDnTUhrA/r0K48 CbvQEBgp3vFPyUJ9fwZSaLW/wrqNsFpd9H+3bQcMZSkNP4323KS4JFizlEB6atyKOGty wq7nnxos56c6nzO3AB4GFzwWSPpG2xZiqnhRR5bnnlbMLbTbhqXCLINOtXkHZ7hCjiBX AFpQ== X-Gm-Message-State: AOJu0YzUVCe4IFfquKJC8aVQkHOSBojFi/LeM3oDoVei222yW83XqyOE 7W6y9VpHuVPaJADMQh/5+ZJ2lpsMz4Nol1nelkM0xw== X-Google-Smtp-Source: AGHT+IHFKFgWat9IoRWwq8su2ufFgXT3dkMF7jf+D3n8ImjCoAZTvVLo9HLgvhei72vQNdk1/LYi/TVgFGX3xOftTNk= X-Received: by 2002:a05:6122:914:b0:4ac:b0a6:4c16 with SMTP id j20-20020a056122091400b004acb0a64c16mr12023352vka.10.1700134267121; Thu, 16 Nov 2023 03:31:07 -0800 (PST) MIME-Version: 1.0 References: <20231113130041.58124-1-linyunsheng@huawei.com> <20231113130041.58124-4-linyunsheng@huawei.com> <20231113180554.1d1c6b1a@kernel.org> <0c39bd57-5d67-3255-9da2-3f3194ee5a66@huawei.com> <3ff54a20-7e5f-562a-ca2e-b078cc4b4120@huawei.com> <6553954141762_1245c529423@willemb.c.googlers.com.notmuch> <8b7d25eb-1f10-3e37-8753-92b42da3fb34@huawei.com> <0366d9e8-7796-b15a-d309-d2fd81f9d700@huawei.com> In-Reply-To: <0366d9e8-7796-b15a-d309-d2fd81f9d700@huawei.com> From: Mina Almasry Date: Thu, 16 Nov 2023 03:30:53 -0800 Message-ID: Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider To: Yunsheng Lin Cc: Willem de Bruijn , Jakub Kicinski , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Willem de Bruijn , Kaiyuan Zhang , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , =?UTF-8?Q?Christian_K=C3=B6nig?= , Jason Gunthorpe , Matthew Wilcox , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 9B5CA100022 X-Stat-Signature: ysq9rxyh39uabhq13f9ec5hbsskj55bq X-Rspam-User: X-HE-Tag: 1700134268-22972 X-HE-Meta: U2FsdGVkX19i5c6cxEX9AxwuJH+8QnS2qGU1ckhezxSmxvt3UsLfI12+gMvSteTKXjpBM5LwsI073AHQ6oZ3j/a5gne0jbfid+PG3Qmn+OyqxGoUnbOr90QgdvC0Yca3x9NevZOfrbem0R2zEDl6s0XwT2ikjOIv9+XZyny2gVbMgy5rYVtiTWx1viW7wiknUv3BHuBYJks/Vlu1B++fuYaaO1RGtLbyebYNfqzsaCzo1C31oOc8DO5lrZHP2379joppiibkTGjxfU8k+/DMlQ7PXBoG+nptHYfRNbLKGfYof45CgUUDhsUJcdoyOAOUFDSMxQZHi7kMycuNdYlE11822YyyBY5Pes5qmFCS6C9PH7F2zR25zw6gjBJlWxMfE94cGrsvYUFrhFxZ0e8MYPWzwTaIXuvuucJnfNCWs/RTmfKD4oiTiZBmiJUiZQub0Esiy9B2C2UZudru76gVobU0gBC+lkMWPmdDkyqQODeHCgqWwkTFw6qg+erZSawCVg4BcgDjyhQUQHxat+Ix9aw002AxHnYVfNa7lirDY/HTvSaCL6AUG3g6k6xTGnDvBH7UiIuub0swVW5QIg8ULyHrKA+qUcSzM+DDd+Dm4CMYdbtCDiFLgaO5gVIHkfUoMlUOVFwfha7/lHoClk14cY2G+HWLgPgtGa+fyBWUdt6Oh1xBIZe+FqBvyJe3KIod5fGjCzENfHzQxLB0SxIBAAvnVwuYd9/yAiIO73C195MT9Gy4Ct1KcjQ3K3ChMrHP/BcMzscFqgeOL/xoSCMYq47JtlOxm6KVHOrCZ/4cbUwLh6UQIHejjE9ae5gAFFDw3DNC+q89k/Ar9wATUnhzOPGzgFbgVuPx85qYdwQh6Uo52p+lHJGPbxDdqnJj+rSGbxGP8t0NBYnixPqlhAyjw3jptnJvdGi7HTHtDggQ3LSmWdgpObD73FklxKbys2XEf85wbL+m73T2iAISiwU GRO+PDvD xDD4O5jRpFc97QNj49c1AVerAkGCDZ1+/kG2t2rp6JufxbPYoH3kDEQISpsYnAZyJthRtDPWEMw6Yf8dNn3BNt3seeO4XExIVhZu4TVTiDzqawyLCpty3j7bN3Qs0V0L3qC6p6Q406D2wHa7mTBLPkqcPbr5JHNTbMe47k9YASAjmMYznKFwoiWoBwLpKroc4tyCuAxZr1/qOujQKna3HDRi79vrJBAhqd4V8WVaF8SWSp1SpXVJDPugbSONOHfgG8hTLDtESmVbnrxRYsNqHFSnr+fXfhEuf/ZqZudIVkrHtZ6Esoh15eX7pHgYBWOj//z5diGIC525foLpRUy4hT/pkbHPoJeAgbfZ3Ow0R2QIM08J8GpnDERkzM+WTyJxNYIObalPybEMHxWiZRBEW1DsfX76JpExDbqOCe5suC6W/T6vruVWA5UPwhRzHD6Vt3YNFTtuOS7LmHjMic8qX2kRYWlY0OzMoQ2g5T+4zZGBP5eH2nnGFkexhmGspidajNr/ole0id3PRWIswvYxxLj611w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 16, 2023 at 3:12=E2=80=AFAM Yunsheng Lin wrote: > > On 2023/11/16 3:05, Mina Almasry wrote: > > On Wed, Nov 15, 2023 at 10:07=E2=80=AFAM Mina Almasry wrote: > >> > >> On Wed, Nov 15, 2023 at 1:29=E2=80=AFAM Yunsheng Lin wrote: > >>> > >>> On 2023/11/14 23:41, Willem de Bruijn wrote: > >>>>> > >>>>> I am not sure dma-buf maintainer's concern is still there with this= patchset. > >>>>> > >>>>> Whatever name you calling it for the struct, however you arrange ea= ch field > >>>>> in the struct, some metadata is always needed for dmabuf to intergr= ate into > >>>>> page pool. > >>>>> > >>>>> If the above is true, why not utilize the 'struct page' to have mor= e unified > >>>>> handling? > >>>> > >>>> My understanding is that there is a general preference to simplify s= truct > >>>> page, and at the least not move in the other direction by overloadin= g the > >>>> struct in new ways. > >>> > >>> As my understanding, the new struct is just mirroring the struct page= pool > >>> is already using, see: > >>> https://elixir.free-electrons.com/linux/v6.7-rc1/source/include/linux= /mm_types.h#L119 > >>> > >>> If there is simplifying to the struct page_pool is using, I think the= new > >>> stuct the devmem memory provider is using can adjust accordingly. > >>> > >>> As a matter of fact, I think the way 'struct page' for devmem is deco= upled > >>> from mm subsystem may provide a way to simplify or decoupled the alre= ady > >>> existing 'struct page' used in netstack from mm subsystem, before thi= s > >>> patchset, it seems we have the below types of 'struct page': > >>> 1. page allocated in the netstack using page pool. > >>> 2. page allocated in the netstack using buddy allocator. > >>> 3. page allocated in other subsystem and passed to the netstack, such= as > >>> zcopy or spliced page? > >>> > >>> If we can decouple 'struct page' for devmem from mm subsystem, we may= be able > >>> to decouple the above 'struct page' from mm subsystem one by one. > >>> > >>>> > >>>> If using struct page for something that is not memory, there is ZONE= _DEVICE. > >>>> But using that correctly is non-trivial: > >>>> > >>>> https://lore.kernel.org/all/ZKyZBbKEpmkFkpWV@ziepe.ca/ > >>>> > >>>> Since all we need is a handle that does not leave the network stack, > >>>> a network specific struct like page_pool_iov entirely avoids this is= sue. > >>> > >>> Yes, I am agree about the network specific struct. > >>> I am wondering if we can make the struct more generic if we want to > >>> intergrate it into page_pool and use it in net stack. > >>> > >>>> RFC v3 seems like a good simplification over RFC v1 in that regard t= o me. > >>>> I was also pleasantly surprised how minimal the change to the users = of > >>>> skb_frag_t actually proved to be. > >>> > >>> Yes, I am agreed about that too. Maybe we can make it simpler by usin= g > >>> a more abstract struct as page_pool, and utilize some features of > >>> page_pool too. > >>> > >>> For example, from the page_pool doc, page_pool have fast cache and > >>> ptr-ring cache as below, but if napi_frag_unref() call > >>> page_pool_page_put_many() and return the dmabuf chunk directly to > >>> gen_pool in the memory provider, then it seems we are bypassing the > >>> below caches in the page_pool. > >>> > >> > >> I think you're just misunderstanding the code. The page recycling > >> works with my patchset. napi_frag_unref() calls napi_pp_put_page() if > >> recycle =3D=3D true, and that works the same with devmem as with regul= ar > >> pages. > >> > >> If recycle =3D=3D false, we call page_pool_page_put_many() which will = call > >> put_page() for regular pages and page_pool_iov_put_many() for devmem > >> pages. So, the memory recycling works exactly the same as before with > >> devmem as with regular pages. In my tests I do see the devmem being > >> recycled correctly. We are not bypassing any caches. > >> > >> > > > > Ah, taking a closer look here, the devmem recycling works for me but I > > think that's a side effect to the fact that the page_pool support I > > implemented with GVE is unusual. I currently allocate pages from the > > page_pool but do not set skb_mark_for_recycle(). The page recycling > > still happens when GVE is done with the page and calls > > page_pool_put_full_pgae(), as that eventually checks the refcount on > > the devmem and recycles it. > > > > I will fix up the GVE to call skb_mark_for_recycle() and ensure the > > napi_pp_put_page() path recycles the devmem or page correctly in the > > next version. > > What about other features? Like dma mapping optimization and frag support > in the page pool. > PP_FLAG_DMA_MAP will be supported and required in the next version per Jakub's request. frag support is something I disabled in the initial versions of the patchset, but only out of convenience and to simplify the initial implementation. At google we typically use page aligned MSS and the frag support isn't really that useful for us. I don't see an issue extending frag support to devmem and iovs in the future if needed. We'd probably add the pp_frag_count field to page_pool_iov and handle it similarly to how it's handled for pages. > I understand that you use some trick in the gen_gool to avoid the per chu= nk > dma_addr storage in the 'struct page_pool_iov' and do not need frag suppo= rt > for now. > > But for other memory provider, if they need those supports, we probably n= eed > to extend 'struct page_pool_iov' to support those features, then we may h= ave > a 'struct page_pool_iov' to be looking alike to the sepcific union for pa= ge_pool > in 'struct page', see: > > https://elixir.free-electrons.com/linux/v6.7-rc1/source/include/linux/mm_= types.h#L119 > > We currently don't have a way to decouple the existing 'struct page' from= mm > subsystem yet as my understanding, if we don't mirror the above union in = the > 'struct page', we may have more 'if' checking added in the future. --=20 Thanks, Mina