linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>, Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux.dev,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	Matthew Wilcox <willy@infradead.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
Date: Thu, 18 Apr 2024 22:19:33 +0200	[thread overview]
Message-ID: <ad98cb14-cc1b-4a01-aacc-8fb53445049e@kernel.org> (raw)
In-Reply-To: <1713342055.436048-1-xuanzhuo@linux.alibaba.com>



On 17/04/2024 10.20, Xuan Zhuo wrote:
> On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>
>>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>
>>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>
>>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private variable.
>>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
>>>>>>>>>>>>> premapped mode. This requires additional space to store the dma addr.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no suitable
>>>>>>>>>>>>> variable for storing the DMA addr.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  struct {        /* Page cache and anonymous pages */
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @lru: Pageout list, eg. active_list protected by
>>>>>>>>>>>>>                           * lruvec->lru_lock.  Sometimes used as a generic list
>>>>>>>>>>>>>                           * by the page owner.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          union {
>>>>>>>>>>>>>                                  struct list_head lru;
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                  /* Or, for the Unevictable "LRU list" slot */
>>>>>>>>>>>>>                                  struct {
>>>>>>>>>>>>>                                          /* Always even, to negate PageTail */
>>>>>>>>>>>>>                                          void *__filler;
>>>>>>>>>>>>>                                          /* Count page's or folio's mlocks */
>>>>>>>>>>>>>                                          unsigned int mlock_count;
>>>>>>>>>>>>>                                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                  /* Or, free page */
>>>>>>>>>>>>>                                  struct list_head buddy_list;
>>>>>>>>>>>>>                                  struct list_head pcp_list;
>>>>>>>>>>>>>                          };
>>>>>>>>>>>>>                          /* See page-flags.h for PAGE_MAPPING_FLAGS */
>>>>>>>>>>>>>                          struct address_space *mapping;
>>>>>>>>>>>>>                          union {
>>>>>>>>>>>>>                                  pgoff_t index;          /* Our offset within mapping. */
>>>>>>>>>>>>>                                  unsigned long share;    /* share count for fsdax */
>>>>>>>>>>>>>                          };
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @private: Mapping-private opaque data.
>>>>>>>>>>>>>                           * Usually used for buffer_heads if PagePrivate.
>>>>>>>>>>>>>                           * Used for swp_entry_t if PageSwapCache.
>>>>>>>>>>>>>                           * Indicates order in the buddy system if PageBuddy.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          unsigned long private;
>>>>>>>>>>>>>                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>> But within the page pool struct, we have a variable called
>>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
>>>>>>>>>>>>> And that struct is used by netstack. That works to our advantage.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                  struct {        /* page_pool used by netstack */
>>>>>>>>>>>>>                          /**
>>>>>>>>>>>>>                           * @pp_magic: magic value to avoid recycling non
>>>>>>>>>>>>>                           * page_pool allocated pages.
>>>>>>>>>>>>>                           */
>>>>>>>>>>>>>                          unsigned long pp_magic;
>>>>>>>>>>>>>                          struct page_pool *pp;
>>>>>>>>>>>>>                          unsigned long _pp_mapping_pad;
>>>>>>>>>>>>>                          unsigned long dma_addr;
>>>>>>>>>>>>>                          atomic_long_t pp_ref_count;
>>>>>>>>>>>>>                  };
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the other side, we should use variables from the same sub-struct.
>>>>>>>>>>>>> So this patch replaces the "private" with "pp".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
>>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need to
>>>>>>>>>>>> bother the dma stuffs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
>>>>>>>>>>> So we can not use the page pool directly.
>>>>>>>>>>
>>>>>>>>>> I found this:
>>>>>>>>>>
>>>>>>>>>> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
>>>>>>>>>>                                          * map/unmap
>>>>>>>>>>
>>>>>>>>>> It seems to work here?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have studied the page pool mechanism and believe that we cannot use it
>>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
>>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated from the page
>>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate the DMA address
>>>>>>>>> to the page.
>>>>>>>>>
>>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want to return
>>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we can omit the
>>>>>>>>> unmapping and remapping steps.
>>>>>>>>>
>>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it disconnects
>>>>>>>>> and releases the pages. When the pool hits its capacity, pages are relinquished
>>>>>>>>> without a chance for unmapping.

Could Jakub's memory provider for PP help your use-case?

See: [1] 
https://lore.kernel.org/all/20240403002053.2376017-3-almasrymina@google.com/
Or: [2]
https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.com/T/


[...]
>>>>>>
>>>>>> Adding Jesper for some comments.
>>>>>>
>>>>>>>
>>>>>>> Back to this patch set, I think we should keep the virtio-net to manage
>>>>>>> the pages.
>>>>>>>

For context the patch:
  [3] 
https://lore.kernel.org/all/20240411025127.51945-4-xuanzhuo@linux.alibaba.com/

>>>>>>> What do you think?
>>>>>>
>>>>>> I might be wrong, but I think if we need to either
>>>>>>
>>>>>> 1) seek a way to manage the pages by yourself but not touching page
>>>>>> pool metadata (or Jesper is fine with this)
>>>>>
>>>>> Do you mean working with page pool or not?
>>>>>
>>>>
>>>> I meant if Jesper is fine with reusing page pool metadata like this patch.
>>>>
>>>>> If we manage the pages by self(no page pool), we do not care the metadata is for
>>>>> page pool or not. We just use the space of pages like the "private".
>>>>
>>>> That's also fine.
>>>>

I'm not sure it is "fine" to, explicitly choosing not to use page pool,
and then (ab)use `struct page` member (pp) that intended for page_pool
for other stuff. (In this case create a linked list of pages).

  +#define page_chain_next(p)	((struct page *)((p)->pp))
  +#define page_chain_add(p, n)	((p)->pp = (void *)n)

I'm not sure that I (as PP maintainer) can make this call actually, as I
think this area belong with the MM "page" maintainers (Cc MM-list +
people) to judge.

Just invention new ways to use struct page fields without adding your
use-case to struct page, will make it harder for MM people to maintain
(e.g. make future change).

--Jesper




       reply	other threads:[~2024-04-18 20:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240411025127.51945-1-xuanzhuo@linux.alibaba.com>
     [not found] ` <20240411025127.51945-4-xuanzhuo@linux.alibaba.com>
     [not found]   ` <CACGkMEsC7AEi2SOmqNOo6KJDpx92raGWYwYzxZ_MVhmnco_LYQ@mail.gmail.com>
     [not found]     ` <1712900153.3715405-1-xuanzhuo@linux.alibaba.com>
     [not found]       ` <CACGkMEvKC6JpsznW57GgxFBMhmMSk4eCZPvESpew9j5qfp9=RA@mail.gmail.com>
     [not found]         ` <1713146919.8867755-1-xuanzhuo@linux.alibaba.com>
     [not found]           ` <CACGkMEvmaH9NE-5VDBPpZOpAAg4bX39Lf0-iGiYzxdV5JuZWww@mail.gmail.com>
     [not found]             ` <1713170201.06163-2-xuanzhuo@linux.alibaba.com>
     [not found]               ` <CACGkMEvsXN+7HpeirxzR2qek_znHp8GtjiT+8hmt3tHHM9Zbgg@mail.gmail.com>
     [not found]                 ` <1713171554.2423792-1-xuanzhuo@linux.alibaba.com>
     [not found]                   ` <CACGkMEuK0VkqtNfZ1BUw+SW=gdasEegTMfufS-47NV4bCh3Seg@mail.gmail.com>
     [not found]                     ` <1713317444.7698638-1-xuanzhuo@linux.alibaba.com>
     [not found]                       ` <CACGkMEvjwXpF_mLR3H8ZW9PUE+3spcxKMQV1VvUARb0-Lt7NKQ@mail.gmail.com>
     [not found]                         ` <1713342055.436048-1-xuanzhuo@linux.alibaba.com>
2024-04-18 20:19                           ` Jesper Dangaard Brouer [this message]
2024-04-18 21:56                             ` Matthew Wilcox
2024-04-19  7:11                             ` Xuan Zhuo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad98cb14-cc1b-4a01-aacc-8fb53445049e@kernel.org \
    --to=hawk@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=willy@infradead.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox