From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7D09C4332F for ; Tue, 14 Nov 2023 12:49:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DAA98D0047; Tue, 14 Nov 2023 07:49:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 38B6B8D002E; Tue, 14 Nov 2023 07:49:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 252398D0047; Tue, 14 Nov 2023 07:49:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 16C508D002E for ; Tue, 14 Nov 2023 07:49:34 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id DB992160934 for ; Tue, 14 Nov 2023 12:49:33 +0000 (UTC) X-FDA: 81456541026.23.4E8B447 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf25.hostedemail.com (Postfix) with ESMTP id 1E151A001A for ; Tue, 14 Nov 2023 12:49:30 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf25.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699966172; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dm4VMXyGvQiufbpyidbyn2A8Oukb1n4YnwlINfdBpP4=; b=K8SOqw2p+Hw502iJkAsrvq1x3zTnlO1hD9hXrfaWfMDuawpgCEUFsWxZwe0RBjoo5qVexT iohu7cGByJqOUdO8ob5wzOguqJPfL6ycm1lqRfQP+7SJhZDHflGjujIXDhPqlmwwlN7d3A 32xwC8FmYFaGgRViWcRiGQFz0zpIV4Y= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf25.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699966172; a=rsa-sha256; cv=none; b=i3IE9KYhWoIuET8MJx5chxCNAzGRmKehZ4mf3roh/ncNCrZwxlC50R/RNVtSmfymN9GYnF VeRacUpedtwPYH/0OkFBEs1gCf9grfazPNIj9pEZ8M/G9Qq6I+RMCruEfquboGUiAoG+iD 3lGl4t9E3Dc9/EchY6nigD9flpvnjMY= Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.54]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4SV5ct2vmsz1P7cN; Tue, 14 Nov 2023 20:45:50 +0800 (CST) Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Tue, 14 Nov 2023 20:49:08 +0800 Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider To: Mina Almasry CC: Jakub Kicinski , , , , , Willem de Bruijn , Kaiyuan Zhang , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , =?UTF-8?Q?Christian_K=c3=b6nig?= , Jason Gunthorpe , Matthew Wilcox , Linux-MM References: <20231113130041.58124-1-linyunsheng@huawei.com> <20231113130041.58124-4-linyunsheng@huawei.com> <20231113180554.1d1c6b1a@kernel.org> <0c39bd57-5d67-3255-9da2-3f3194ee5a66@huawei.com> From: Yunsheng Lin Message-ID: Date: Tue, 14 Nov 2023 20:49:08 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 1E151A001A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 1by5icymp5hhbwnxi1rfps7prtibj8ce X-HE-Tag: 1699966170-46247 X-HE-Meta: U2FsdGVkX18JMpKkmJTfw4UgTDgow3UplDlcSKOnYeGOMAvC8WLCNu+SYCDoB0J4VQOXZkUfzVtGS5kkFmlh3AlGGSPCUH19KkRiM3U8G6s3Yrd66T1rfzFgKFhQGaVqAl1yhF2euBT1pt6s300EUu1jUFAH9/sUHn1BD0wqLvvDTF1JB/sHI5wo8Z300yfa+j6TZJ72uGou9pU5XpU0Zn7z9+xQ4uBTgl8zOyudOIB4xfKCG0pdZuvQv050mHdC/sA5txdHZKoJ/LFsXooqzx1s2NhSFhvgUXg186JCCsIjt7Y5TqBWiVywTOy87Gu6qvys2dO0bmZ1asWca/S1H2wG8y/7Tdx8QwjWT9PqgX9hSaa4EWyPsFXgWtnbQI2/I8D/XtUu4oOYTh9yCmllw0ov234RcAjcsW4NT2gf+7iF/r7AzS+BNF0X3JfRbn2eP1ouw9lh5iIiCpOjBf/wVMwuGab3790rF4FTsOWOv3czM1BGFFyBNrmER6gouyrnCYwI6WM9vfvfqc8G9ch5HSizntioERjrKQE4wrgKQlylyUjiC+WzBmoCWvWNHG18WITGGgphN9d57gabwdstxnwFVsyhXig6SpNL3wwuBWflcOl1geX1e+SaRAxzfy8GnZaODBl0uczGIcXOrV/gyx1eA8KT8M8xwk+Yl9NMglF+7jXoPZzqn7H5lpzBP5w0/+e+ERlpOKQtwHSCPQAbkG7SJeqHlvEpF1uMnNHQZOsmi49du1UB/T1WiDz7Eb8kfgNbPK7y2VHx4bVadTAFAKfTfezKk8rFJ7PjXUUzkjxUpMkr1PRsA+ikm2dvwXT/SJqIEEHhiP3IimGa3MRdjfVgyr8jR6Xn7g5ISF1LYjJzDd907F4NtoRmHDEb3soUaRpJSWZPYU8WMtZmcgDwOiyCEabN+MP1RepW6/PhvzHQmk5Xq90PZvdkKnU6k/C7Rgg16YiBEMy7Sw5OZ/c goQAuhqn LL1zPtSarwzZ48/EerhW6v5kj6VnAseFMIoqaZeJn/WTw1Ke2LWoGRdwHfBNgTeJsHE/wauoS0vkYnlKhLCv/krptMOPeO/e0N7gO2/M2iae2rrb9XjzUPAHKUXPmeJGrwpVBvXIVCw8ijkeRG/KAlq/IwXD8Nl1nuJSsb0Nxtx4b0IhbPFzFP2lytQ5QNUoF8/SDuvY3/CfLHtm2WJmqS/xYTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2023/11/14 20:21, Mina Almasry wrote: > On Tue, Nov 14, 2023 at 12:23 AM Yunsheng Lin wrote: >> >> +cc Christian, Jason and Willy >> >> On 2023/11/14 7:05, Jakub Kicinski wrote: >>> On Mon, 13 Nov 2023 05:42:16 -0800 Mina Almasry wrote: >>>> You're doing exactly what I think you're doing, and what was nacked in RFC v1. >>>> >>>> You've converted 'struct page_pool_iov' to essentially become a >>>> duplicate of 'struct page'. Then, you're casting page_pool_iov* into >>>> struct page* in mp_dmabuf_devmem_alloc_pages(), then, you're calling >>>> mm APIs like page_ref_*() on the page_pool_iov* because you've fooled >>>> the mm stack into thinking dma-buf memory is a struct page. >> >> Yes, something like above, but I am not sure about the 'fooled the mm >> stack into thinking dma-buf memory is a struct page' part, because: >> 1. We never let the 'struct page' for devmem leaking out of net stacking >> through the 'not kmap()able and not readable' checking in your patchset. > > RFC never used dma-buf pages outside the net stack, so that is the same. > > You are not able to get rid of the 'net kmap()able and not readable' > checking with this approach, because dma-buf memory is fundamentally > unkmapable and unreadable. This approach would still need > skb_frags_not_readable checks in net stack, so that is also the same. Yes, I am agreed that checking is still needed whatever the proposal is. > >> 2. We inititiate page->_refcount for devmem to one and it remains as one, >> we will never call page_ref_inc()/page_ref_dec()/get_page()/put_page(), >> instead, we use page pool's pp_frag_count to do reference counting for >> devmem page in patch 6. >> > > I'm not sure that moves the needle in terms of allowing dma-buf > memory to look like struct pages. > >>>> >>>> RFC v1 was almost exactly the same, except instead of creating a >>>> duplicate definition of struct page, it just allocated 'struct page' >>>> instead of allocating another struct that is identical to struct page >>>> and casting it into struct page. >> >> Perhaps it is more accurate to say this is something between RFC v1 and >> RFC v3, in order to decouple 'struct page' for devmem from mm subsystem, >> but still have most unified handling for both normal memory and devmem >> in page pool and net stack. >> >> The main difference between this patchset and RFC v1: >> 1. The mm subsystem is not supposed to see the 'struct page' for devmem >> in this patchset, I guess we could say it is decoupled from the mm >> subsystem even though we still call PageTail()/page_ref_count()/ >> page_is_pfmemalloc() on 'struct page' for devmem. >> > > In this patchset you pretty much allocate a struct page for your > dma-buf memory, and then cast it into a struct page, so all the mm > calls in page_pool.c are seeing a struct page when it's really dma-buf > memory. > > 'even though we still call > PageTail()/page_ref_count()/page_is_pfmemalloc() on 'struct page' for > devmem' is basically making dma-buf memory look like struct pages. > > Actually because you put the 'strtuct page for devmem' in > skb->bv_frag, the net stack will grab the 'struct page' for devmem > using skb_frag_page() then call things like page_address(), kmap, > get_page, put_page, etc, etc, etc. Yes, as above, skb_frags_not_readable() checking is still needed for kmap() and page_address(). get_page, put_page related calling is avoided in page_pool_frag_ref() and napi_pp_put_page() for devmem page as the above checking is true for devmem page: (pp_iov->pp_magic & ~0x3UL) == PP_SIGNATURE > >> The main difference between this patchset and RFC v3: >> 1. It reuses the 'struct page' to have more unified handling between >> normal page and devmem page for net stack. > > This is what was nacked in RFC v1. > >> 2. It relies on the page->pp_frag_count to do reference counting. >> > > I don't see you change any of the page_ref_* calls in page_pool.c, for > example this one: > > https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L601 > > So the reference the page_pool is seeing is actually page->_refcount, > not page->pp_frag_count? I'm confused here. Is this a bug in the > patchset? page->_refcount is the same as page_pool_iov->_refcount for devmem, which is ensured by the 'PAGE_POOL_MATCH(_refcount, _refcount);', and page_pool_iov->_refcount is set to one in mp_dmabuf_devmem_alloc_pages() by calling 'refcount_set(&ppiov->_refcount, 1)' and always remains as one. So the 'page_ref_count(page) == 1' checking is always true for devmem page.