From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56207C4345F for ; Thu, 18 Apr 2024 20:19:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBA746B00CF; Thu, 18 Apr 2024 16:19:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D91996B00D0; Thu, 18 Apr 2024 16:19:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0BEC6B00D1; Thu, 18 Apr 2024 16:19:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A2EE36B00CF for ; Thu, 18 Apr 2024 16:19:44 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5FEF2C0F1F for ; Thu, 18 Apr 2024 20:19:44 +0000 (UTC) X-FDA: 82023768288.01.6614901 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf15.hostedemail.com (Postfix) with ESMTP id 00B48A0016 for ; Thu, 18 Apr 2024 20:19:41 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=aZFd8whs; spf=pass (imf15.hostedemail.com: domain of hawk@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713471582; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vqM2B9LKKOuZvnpkCkHleX11Iwy71xsLrHWhd8fkQtg=; b=WqT4j60OPMn9gpe3NkMnkplTcqMoqGpAdYTkfctXGzS3mvjjTl+TJxAAA1lIwU/221UL1M R6L7ZF/Y9NEh3QlspWCb/jxQGg8KYo0jEw9H9F/JgaSR11wioSYpHAf1MnWdJCKNLO7ILe O8mf/R8yFqvz1bHT/+7N7lfvHtE7IhQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713471582; a=rsa-sha256; cv=none; b=LntB7fvObwscCgpROcUslCnqf9pJ1QVNEg86XauSJyRbHzuQtD4Vf1MpvQgWu07gvMYUkM KgV/Lu7sngGkAB3mCK2R4WRVB+xKp6bVJMYuM7gGYlE1gZeJl47aYsdEFRS0HXLjvIWIXc ZnLET4BrGY0lGGg0WamRh6Kk6ay1iMQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=aZFd8whs; spf=pass (imf15.hostedemail.com: domain of hawk@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id D1987CE1905; Thu, 18 Apr 2024 20:19:38 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82BAEC113CC; Thu, 18 Apr 2024 20:19:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1713471578; bh=GG7Q42d4SWcX+x6h5SvZbRjfruU7g6fkWJO6/WrdsSI=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=aZFd8whsAsjxT38Mj4Om4qBW8tijFEjj5lt4khVQ11XxBV+9sMVTbZfDylBVqpm6l LM1BuKH5qFJuTXVqy6XmR4sdyepARwLOY7KsAY0tHrFy01Td8LViQM/4K+ivTjaZmS CEBfxEn8oXVo8CcD5iwDO2lfgEpSLcIabgDfTq5GjE2Vo10SujsdSXr5t3BWo0/P6W UmNoH+Nu4DDgpAMoXJNZGYLwRpCc//0eRDPMDzrPcrO0Ip70EIHUUQfAHjMt9B+LH4 E1xNT2Tar53WplVLSYd9+UughKOK9AWrCDar8UM3+CrdSSgFuYlTQStoA5K/EjpJa5 cVxR0XklPn0Fw== Message-ID: Date: Thu, 18 Apr 2024 22:19:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page To: Xuan Zhuo , Jason Wang Cc: virtualization@lists.linux.dev, "Michael S. Tsirkin" , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, Linux-MM , Matthew Wilcox , Ilias Apalodimas , Mel Gorman References: <20240411025127.51945-1-xuanzhuo@linux.alibaba.com> <20240411025127.51945-4-xuanzhuo@linux.alibaba.com> <1712900153.3715405-1-xuanzhuo@linux.alibaba.com> <1713146919.8867755-1-xuanzhuo@linux.alibaba.com> <1713170201.06163-2-xuanzhuo@linux.alibaba.com> <1713171554.2423792-1-xuanzhuo@linux.alibaba.com> <1713317444.7698638-1-xuanzhuo@linux.alibaba.com> <1713342055.436048-1-xuanzhuo@linux.alibaba.com> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <1713342055.436048-1-xuanzhuo@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 47dk66dam5pfyeb1oj9c9hrjmndf4j49 X-Rspamd-Queue-Id: 00B48A0016 X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1713471581-42859 X-HE-Meta: U2FsdGVkX19IkQ7GU4+4liDqe3MNCim8CUnGr/R8pQhj/5whBzCYGD/ztlhTKfhS3I5yg+kk9G8Ijzj+2atpBQxOBwrzIcpTulZO4I7ufbuHmXN9NGZ4rCPCIM5LsJrliGA3gZ0uWTpZPHtdflmCpv+ixsCAwKVRDIx3M9SEVuqjCuO6em9307lgurnmqqhBf01PZ+cYEELgca8/Wnx95+zEVUUaxq/SCMEbb2MN7jczGiLum/2OyKHvzznfs7fkwXThEHpZd7lG0I6w5Ct06qRB8M+Lm/aM2s9ATSyo7teaokoKQJjTgfucQqRhTtmvLY1fIhU85nk+F5eO1W5NRqc0ESKH0dbXiHx/CCWbsxsxs4tsjH/XfIX9Qk6Rq3LTQI0a7EYYMrqfBFI2sNl3ICUv8DvwONAEC+GgcMwx8FoItgo/DjnTP1i4ZuZBl/IAeSnv+mYteHAK+RqcBtyFD3DpvHR0TWMcK42zO/yqdg7++wDP9jBwj5r3zGnMVTPTyZNfj3cWN7JrilcH1PG7tzYG37a5AO6rEOQG8B/ssbCPQLOeiZRHGq9wfLPvEyfb3OW7JDV9MtGSNEA9JzEcGSw+lp+JsrrmwFo+72RAT82VM55HcvgVcY2JQrEg1B0KMnSH+yJvnP43SdcB+bMn7VQkm8u5/R68/lE5LT8128Ee8Hve36HVPRq3hU+oJnmBkYjCsbHUSzao07XJwNgLwcHHjojhNxAHA+CRIIwvOFFJFSgzAONQdNL+nOg1asojUYdxHmr+LrKbdvVK6X4HFZMa4qz9GAvTHrNdWmQv4gqlM5X3cHu+m77EyRlrAY7GqcojBF0vXKgRpgzyqBjSApFKlb1lMIRF+XKynuJaf4IkdDbBgu6e+J7GXIZ4uluWlVaQFG7+4IQojYY6tER2LCEa9cvAQIlWIelr8I9mmv0T/jghWLqcWiAXttmIGTm4DG6RFtspQrMB28F373X vGQQtibb hm8xi9RbQ5t7zD7GVPnIQpoePX4CafXGrhmk/CWgJT7U/uV5WM8qHoPg/lhmguQBdIZ7gsKXyOvsUcPnZ4dAfEnSnUIfnYSXYZVfTr6LZjduTTHuYgn7yO0NPJgvt+ciuqXC/I4MU/ypkc94FRCGED+DytzHWhhzQU5RoWiQ8SONvINDc/qVSpMUigqwd8iLlarUURLD/NtAfSJgg+FH+mvmZTsxnyRnPFoYVLpNOQzGhPIHhxLWcIk7Uz1CCx/6vOLYHFm0krHLNkclVo9drBLy+V+p7MCLYR7u9aY5Wg8jAosTBXLqtB+ckEMGYi94LKTyQ8fmF1/dID3uN89kE8YrLBzanlFxi9wkTqfCz6ykM0yFAU3CQ6avO9Fw4xHBkW19VLVtOyPfjkUOPaNSFFPI1SvuC5+PMk7QFFQTf1h2+g6I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17/04/2024 10.20, Xuan Zhuo wrote: > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang wrote: >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo wrote: >>> >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang wrote: >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo wrote: >>>>> >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang wrote: >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo wrote: >>>>>>> >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang wrote: >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo wrote: >>>>>>>>> >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang wrote: >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang wrote: >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable. >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr. >>>>>>>>>>>>> >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable >>>>>>>>>>>>> variable for storing the DMA addr. >>>>>>>>>>>>> >>>>>>>>>>>>> struct { /* Page cache and anonymous pages */ >>>>>>>>>>>>> /** >>>>>>>>>>>>> * @lru: Pageout list, eg. active_list protected by >>>>>>>>>>>>> * lruvec->lru_lock. Sometimes used as a generic list >>>>>>>>>>>>> * by the page owner. >>>>>>>>>>>>> */ >>>>>>>>>>>>> union { >>>>>>>>>>>>> struct list_head lru; >>>>>>>>>>>>> >>>>>>>>>>>>> /* Or, for the Unevictable "LRU list" slot */ >>>>>>>>>>>>> struct { >>>>>>>>>>>>> /* Always even, to negate PageTail */ >>>>>>>>>>>>> void *__filler; >>>>>>>>>>>>> /* Count page's or folio's mlocks */ >>>>>>>>>>>>> unsigned int mlock_count; >>>>>>>>>>>>> }; >>>>>>>>>>>>> >>>>>>>>>>>>> /* Or, free page */ >>>>>>>>>>>>> struct list_head buddy_list; >>>>>>>>>>>>> struct list_head pcp_list; >>>>>>>>>>>>> }; >>>>>>>>>>>>> /* See page-flags.h for PAGE_MAPPING_FLAGS */ >>>>>>>>>>>>> struct address_space *mapping; >>>>>>>>>>>>> union { >>>>>>>>>>>>> pgoff_t index; /* Our offset within mapping. */ >>>>>>>>>>>>> unsigned long share; /* share count for fsdax */ >>>>>>>>>>>>> }; >>>>>>>>>>>>> /** >>>>>>>>>>>>> * @private: Mapping-private opaque data. >>>>>>>>>>>>> * Usually used for buffer_heads if PagePrivate. >>>>>>>>>>>>> * Used for swp_entry_t if PageSwapCache. >>>>>>>>>>>>> * Indicates order in the buddy system if PageBuddy. >>>>>>>>>>>>> */ >>>>>>>>>>>>> unsigned long private; >>>>>>>>>>>>> }; >>>>>>>>>>>>> >>>>>>>>>>>>> But within the page pool struct, we have a variable called >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr. >>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage. >>>>>>>>>>>>> >>>>>>>>>>>>> struct { /* page_pool used by netstack */ >>>>>>>>>>>>> /** >>>>>>>>>>>>> * @pp_magic: magic value to avoid recycling non >>>>>>>>>>>>> * page_pool allocated pages. >>>>>>>>>>>>> */ >>>>>>>>>>>>> unsigned long pp_magic; >>>>>>>>>>>>> struct page_pool *pp; >>>>>>>>>>>>> unsigned long _pp_mapping_pad; >>>>>>>>>>>>> unsigned long dma_addr; >>>>>>>>>>>>> atomic_long_t pp_ref_count; >>>>>>>>>>>>> }; >>>>>>>>>>>>> >>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct. >>>>>>>>>>>>> So this patch replaces the "private" with "pp". >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo >>>>>>>>>>>>> --- >>>>>>>>>>>> >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to >>>>>>>>>>>> bother the dma stuffs. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs. >>>>>>>>>>> So we can not use the page pool directly. >>>>>>>>>> >>>>>>>>>> I found this: >>>>>>>>>> >>>>>>>>>> define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA >>>>>>>>>> * map/unmap >>>>>>>>>> >>>>>>>>>> It seems to work here? >>>>>>>>> >>>>>>>>> >>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it >>>>>>>>> directly. We can make the page pool to bypass the DMA operations. >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address >>>>>>>>> to the page. >>>>>>>>> >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the >>>>>>>>> unmapping and remapping steps. >>>>>>>>> >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished >>>>>>>>> without a chance for unmapping. Could Jakub's memory provider for PP help your use-case? See: [1] https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@google.com/ Or: [2] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.com/T/ [...] >>>>>> >>>>>> Adding Jesper for some comments. >>>>>> >>>>>>> >>>>>>> Back to this patch set, I think we should keep the virtio-net to manage >>>>>>> the pages. >>>>>>> For context the patch: [3] https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@linux.alibaba.com/ >>>>>>> What do you think? >>>>>> >>>>>> I might be wrong, but I think if we need to either >>>>>> >>>>>> 1) seek a way to manage the pages by yourself but not touching page >>>>>> pool metadata (or Jesper is fine with this) >>>>> >>>>> Do you mean working with page pool or not? >>>>> >>>> >>>> I meant if Jesper is fine with reusing page pool metadata like this patch. >>>> >>>>> If we manage the pages by self(no page pool), we do not care the metadata is for >>>>> page pool or not. We just use the space of pages like the "private". >>>> >>>> That's also fine. >>>> I'm not sure it is "fine" to, explicitly choosing not to use page pool, and then (ab)use `struct page` member (pp) that intended for page_pool for other stuff. (In this case create a linked list of pages). +#define page_chain_next(p) ((struct page *)((p)->pp)) +#define page_chain_add(p, n) ((p)->pp = (void *)n) I'm not sure that I (as PP maintainer) can make this call actually, as I think this area belong with the MM "page" maintainers (Cc MM-list + people) to judge. Just invention new ways to use struct page fields without adding your use-case to struct page, will make it harder for MM people to maintain (e.g. make future change). --Jesper