From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEA26C4332F for ; Tue, 14 Nov 2023 12:58:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73DB86B01FF; Tue, 14 Nov 2023 07:58:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6ED856B0201; Tue, 14 Nov 2023 07:58:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58E8C6B0202; Tue, 14 Nov 2023 07:58:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4171A6B01FF for ; Tue, 14 Nov 2023 07:58:21 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2323880A38 for ; Tue, 14 Nov 2023 12:58:21 +0000 (UTC) X-FDA: 81456563202.20.AAD1954 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf21.hostedemail.com (Postfix) with ESMTP id 6ECB81C0011 for ; Tue, 14 Nov 2023 12:58:19 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="m4WVwJm/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of almasrymina@google.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699966699; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FS8IAn5PULatUOjoN0juShog4xVcwSnYkfc1t7O7Fqc=; b=ddO/EWogc+3xef47HVxNfbB8S+ch0GARUZNtiYSn03lXZVMQs/vjnFmTKRTdStSJ4UnGRy foSMFMFEeniZ2uja4KneI9/bsqKv899lyo3Qma3rY7lA2F1l9w7FQvyRhRlSiNXSpAFTwE u1BFIvymvCJOvcm8QqKkJBCSkDZDcBY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="m4WVwJm/"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of almasrymina@google.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699966699; a=rsa-sha256; cv=none; b=CTP3nsUPH4S8RrS4BaSP2OFqQfn0Z/JpydNlqEwLM697S/mlAO3JuKuV+JnfeRyyb1735M SenaDYyvQQ0jgEvyeDFGbqBnvgm8i3J1RU0KI1SVyzp5rS20dG7t/ca5dXcJkPH6ckNYW7 WaI9k1f9lsR1tGZA4qPrWN3OY9/S0MA= Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-45da75867d3so1993640137.2 for ; Tue, 14 Nov 2023 04:58:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699966698; x=1700571498; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FS8IAn5PULatUOjoN0juShog4xVcwSnYkfc1t7O7Fqc=; b=m4WVwJm/rfpSUW+ceFUOeoYmPW1wTfo7KzW/BrZMG55oMuMQl1rQLcvdGE9LludAi7 BOfrUufauPBNk8fJclobO5WIjuQ+tH7F9vPyT59qnOPU6zLkb/DNxHmXTk90gy9a0mXC iJhmIOW7/dxao3WsEKkMs6gz1rFg0XF1OX8HvDfPXw2TCKY2DSe+QP/RhM1joaO0Laul NsxKrkxchEwNL5iWZl3T5qbV5QgmncRaZKIUAEjzsBD/ms4xuLKnPR8I/u++WFQb50yd T4sHshzFV+NrCyHgygNbPNWcKloSlXcuAcpO5iTw77COyDbZX2K5eDrQesUBytw6hG0+ LRgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699966698; x=1700571498; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FS8IAn5PULatUOjoN0juShog4xVcwSnYkfc1t7O7Fqc=; b=q6NeYZG5hY0ZPqrp0DZoUpHF+OuPTbnSNnsg535T8rytE5X4nx9uIHNFt0a4/Ea2yo MX18z7R+IyU7EsnT7OUGSNM13ZYAnYyo81I2VGlsG2n2XBEtJPdIkxawcEjLhJK6DhfQ LNT0PBpgxDsBHD21j3h+E9pGH9mHU388L2X6p3aqNs/LoKsULz59VEUAoIn1o6uISX7V lcf9e41WWdmYleoG24S/7GwVjgjkX/B6ymxKZUNrv0f9YfnEE5uObqW8LXZJXfCIHJhi h8rKy+XkOmZ3eIGhkMGBp+VRwKFaybWEw03acmn3KdwUerUJ5KIjwoy/w36kA8er97cU 0cxQ== X-Gm-Message-State: AOJu0YwAnaiMeToM+djjUvUYvcXHznYw1aD7bW0Ij1KKhzkRQG0Rm6vF 4XmqL7C1Dd1RtnnxtwA7pTC6a/UYCzRJd3wr8DZ46Q== X-Google-Smtp-Source: AGHT+IEenboxEXPbs90yp1639VqsCglLDY6FsP46EbOdp/50wGHYhS7OCQcFdZpT4KtLJ77fxxytJq6E2le6U4eGtFw= X-Received: by 2002:a67:c999:0:b0:45f:1d2:30d7 with SMTP id y25-20020a67c999000000b0045f01d230d7mr7570368vsk.8.1699966698373; Tue, 14 Nov 2023 04:58:18 -0800 (PST) MIME-Version: 1.0 References: <20231113130041.58124-1-linyunsheng@huawei.com> <20231113130041.58124-4-linyunsheng@huawei.com> <20231113180554.1d1c6b1a@kernel.org> <0c39bd57-5d67-3255-9da2-3f3194ee5a66@huawei.com> In-Reply-To: From: Mina Almasry Date: Tue, 14 Nov 2023 04:58:05 -0800 Message-ID: Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider To: Yunsheng Lin Cc: Jakub Kicinski , davem@davemloft.net, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Willem de Bruijn , Kaiyuan Zhang , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , =?UTF-8?Q?Christian_K=C3=B6nig?= , Jason Gunthorpe , Matthew Wilcox , Linux-MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6ECB81C0011 X-Stat-Signature: oumabpbbk9jyfnf5u6x4ane1wc5jgctj X-HE-Tag: 1699966699-253561 X-HE-Meta: U2FsdGVkX1/PTSZOQnFsEVaLJP8e04cmK9KhZjW0gp+by0lTX09Nn1ujaMNAA6Sa3LyTCDagUjp7IJo8EBskCvXXU85boBYEsmEYFW2k0EhnccaPORX435zCgI9kYp3QR+IPaeYOgEx+ZGQflxU9cuj5s77nlwGqS59SQ3OskTcOstey5rod3MytU1x/xa5zB4NWUppj4eSAzXdOF6y/97xgcCCnd4HI+dzuqENpAtfnz+LgvqSoWlv2xoOshpniM4/M8RiZX71QeDOywlj+AtdqqEGcc6TOSlBdxMZvFg1dneIDKGahXq9q8FX9tRa1yW7Cxhd82V9m7rjKpAqt3H4DRPPFe7P2B8bdOaf1MSCq0yCuL4SHUL3VNwlq6caRQsxUrOG9YNexEVWrVCH+1asui6eqQFyxg/BR6oDX2m0uZJO7TpCFUtHw9Bi47t/u4/D/rFugROq1/h13BiC+Mz44PhEeU029HrUXZA3TmByY/EDRTkYe+PWCyPGTJTLm9wuKpRZa2CIQIYV+/VJv+Gji4M6nASl5wQunAliVh8wRK0c2+wxFlsCGGy/w/FBU3eA+5ti8jTazBMVHOfN9+fG3WMLCD8FLsOH4If6VmQi3QO/w4XHn0ibkAVYT8uNipfuVujMbC6hK+djW/NZvGU6qe3LObVmj4DeNFEZ+IcR7I6VkX6m/bM+Uct/e7IntwnKtfPf3JYDqWpvGcc6Pt9GH/orD0BFQyId8+ZSwAr8qcdK7xRb1srL0SdqQ53mRfs1YwHCuELHacqfmaPkI+R7fbVl05FGfGYAiv6gMfhhEaQq5lIOW/8e+yzSm7ucl3r2kRiq86CVNrbywU0jQPlEXg7D3IhuSExHElFpP+O1kTLLhCTFdb/v1J3rHJs7F/EfkNUTOTrcTtQAOBmmcbUlZ+AeVwDT7v6FOJcCWwjP+9OH0nuf7RN8oBh3Eph/VqxFf82iOHrFUQo/PLit 4UWtVxmA oK20lhuIf+eMC3WNzoxFBGBrAUv4nN94980XBvh3fFpYKW8SuMsyg80+Y7eNvNqFwtMoBMF0A4tmMuvPaiNpvDARTpfLGcrJ/uDmY3Bh5yuVB+e9qiL81XvvGFcCTfioo3tFE89INDR6ptOQt8IrLC5otEd66TTOlQszH8L5leRGaKsI73P4JNeUJtlkICcPPEGwqwXAXl8fSTZWSDLv/3ZtoUiJqcyQWpQw4lIT6F/1nHHps9TIdNI2a2YNe8n9yq2YxtqJPfvLntz4cUhe83y7ZlzuMoDCG/bCPkrCNHHN1aVPKuoUAFakLKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 14, 2023 at 4:49=E2=80=AFAM Yunsheng Lin wrote: > > On 2023/11/14 20:21, Mina Almasry wrote: > > On Tue, Nov 14, 2023 at 12:23=E2=80=AFAM Yunsheng Lin wrote: > >> > >> +cc Christian, Jason and Willy > >> > >> On 2023/11/14 7:05, Jakub Kicinski wrote: > >>> On Mon, 13 Nov 2023 05:42:16 -0800 Mina Almasry wrote: > >>>> You're doing exactly what I think you're doing, and what was nacked = in RFC v1. > >>>> > >>>> You've converted 'struct page_pool_iov' to essentially become a > >>>> duplicate of 'struct page'. Then, you're casting page_pool_iov* into > >>>> struct page* in mp_dmabuf_devmem_alloc_pages(), then, you're calling > >>>> mm APIs like page_ref_*() on the page_pool_iov* because you've foole= d > >>>> the mm stack into thinking dma-buf memory is a struct page. > >> > >> Yes, something like above, but I am not sure about the 'fooled the mm > >> stack into thinking dma-buf memory is a struct page' part, because: > >> 1. We never let the 'struct page' for devmem leaking out of net stacki= ng > >> through the 'not kmap()able and not readable' checking in your patc= hset. > > > > RFC never used dma-buf pages outside the net stack, so that is the same= . > > > > You are not able to get rid of the 'net kmap()able and not readable' > > checking with this approach, because dma-buf memory is fundamentally > > unkmapable and unreadable. This approach would still need > > skb_frags_not_readable checks in net stack, so that is also the same. > > Yes, I am agreed that checking is still needed whatever the proposal is. > > > > >> 2. We inititiate page->_refcount for devmem to one and it remains as o= ne, > >> we will never call page_ref_inc()/page_ref_dec()/get_page()/put_pag= e(), > >> instead, we use page pool's pp_frag_count to do reference counting = for > >> devmem page in patch 6. > >> > > > > I'm not sure that moves the needle in terms of allowing dma-buf > > memory to look like struct pages. > > > >>>> > >>>> RFC v1 was almost exactly the same, except instead of creating a > >>>> duplicate definition of struct page, it just allocated 'struct page' > >>>> instead of allocating another struct that is identical to struct pag= e > >>>> and casting it into struct page. > >> > >> Perhaps it is more accurate to say this is something between RFC v1 an= d > >> RFC v3, in order to decouple 'struct page' for devmem from mm subsyste= m, > >> but still have most unified handling for both normal memory and devmem > >> in page pool and net stack. > >> > >> The main difference between this patchset and RFC v1: > >> 1. The mm subsystem is not supposed to see the 'struct page' for devme= m > >> in this patchset, I guess we could say it is decoupled from the mm > >> subsystem even though we still call PageTail()/page_ref_count()/ > >> page_is_pfmemalloc() on 'struct page' for devmem. > >> > > > > In this patchset you pretty much allocate a struct page for your > > dma-buf memory, and then cast it into a struct page, so all the mm > > calls in page_pool.c are seeing a struct page when it's really dma-buf > > memory. > > > > 'even though we still call > > PageTail()/page_ref_count()/page_is_pfmemalloc() on 'struct page' for > > devmem' is basically making dma-buf memory look like struct pages. > > > > Actually because you put the 'strtuct page for devmem' in > > skb->bv_frag, the net stack will grab the 'struct page' for devmem > > using skb_frag_page() then call things like page_address(), kmap, > > get_page, put_page, etc, etc, etc. > > Yes, as above, skb_frags_not_readable() checking is still needed for > kmap() and page_address(). > > get_page, put_page related calling is avoided in page_pool_frag_ref() > and napi_pp_put_page() for devmem page as the above checking is true > for devmem page: > (pp_iov->pp_magic & ~0x3UL) =3D=3D PP_SIGNATURE > So, devmem needs special handling with if statement for refcounting, even after using struct pages for devmem, which is not allowed (IIUC the dma-buf maintainer). > > > >> The main difference between this patchset and RFC v3: > >> 1. It reuses the 'struct page' to have more unified handling between > >> normal page and devmem page for net stack. > > > > This is what was nacked in RFC v1. > > > >> 2. It relies on the page->pp_frag_count to do reference counting. > >> > > > > I don't see you change any of the page_ref_* calls in page_pool.c, for > > example this one: > > > > https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L60= 1 > > > > So the reference the page_pool is seeing is actually page->_refcount, > > not page->pp_frag_count? I'm confused here. Is this a bug in the > > patchset? > > page->_refcount is the same as page_pool_iov->_refcount for devmem, which > is ensured by the 'PAGE_POOL_MATCH(_refcount, _refcount);', and > page_pool_iov->_refcount is set to one in mp_dmabuf_devmem_alloc_pages() > by calling 'refcount_set(&ppiov->_refcount, 1)' and always remains as one= . > > So the 'page_ref_count(page) =3D=3D 1' checking is always true for devmem= page. Which, of course, is a bug in the patchset, and it only works because it's a POC for you. devmem pages (which shouldn't exist according to the dma-buf maintainer, IIUC) can't be recycled all the time. See SO_DEVMEM_DONTNEED patch in my RFC and refcounting needed for devmem. --=20 Thanks, Mina