linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Huan Yang <link@vivo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Harry Yoo <harry.yoo@oracle.com>, Xu Xin <xu.xin16@zte.com.cn>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Joshua Hahn <joshua.hahnjy@gmail.com>,
	Rakie Kim <rakie.kim@sk.com>, Byungchul Park <byungchul@sk.com>,
	Gregory Price <gourry@gourry.net>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	Alistair Popple <apopple@nvidia.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Christian Brauner <brauner@kernel.org>,
	Usama Arif <usamaarif642@gmail.com>, Yu Zhao <yuzhao@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/9] introduce PGTY_mgt_entry page_type
Date: Thu, 24 Jul 2025 10:15:08 +0100	[thread overview]
Message-ID: <764c48ad-8869-4f69-898e-0a1c58684f7d@lucifer.local> (raw)
In-Reply-To: <20250724084441.380404-1-link@vivo.com>

NAK. This series is completely un-upstreamable in any form.

David has responded to you already, but to underline.

The lesson here is that you really ought to discuss things with people in
the subsystem you are changing in advance of spending a lot of time doing
work like this which you intend to upstream.

On Thu, Jul 24, 2025 at 04:44:28PM +0800, Huan Yang wrote:
> Summary
> ==
> This patchset reuses page_type to store migrate entry count during the
> period from migrate entry setup to removal, enabling accelerated VMA
> traversal when removing migrate entries, following a similar principle to
> early termination when folio is unmapped in try_to_migrate.
>
> In my self-constructed test scenario, the migration time can be reduced
> from over 150+ms to around 30+ms, achieving nearly a 70% performance
> improvement. Additionally, the flame graph shows that the proportion of
> remove_migration_ptes can be reduced from 80%+ to 60%+.

This sounds completely contrived. I don't even know if you have a use case
here.

>
> Notice: migrate entry specifically refers to migrate PTE entry, as large
> folio are not supported page type and 0 mapcount reuse.
>
> Principle
> ==
> When a page removes all PTEs in try_to_migrate and sets up a migrate PTE
> entry, we can determine whether the traversal of remaining VMAs can be
> terminated early by checking if mapcount is zero. This optimization
> helps improve performance during migration.
>
> However, when removing migrate PTE entries and setting up PTEs for the
> destination folio in remove_migration_ptes, there is no such information
> available to assist in deciding whether the traversal of remaining VMAs
> can be ended early. Therefore, it is necessary to traversal all VMAs
> associated with this folio.
>
> In reality, when a folio is fully unmapped and before all migrate PTE
> entries are removed, the mapcount will always be zero. Since page_type
> and mapcount share a union, and referring to folio_mapcount, we can
> reuse page_type to record the number of migrate PTE entries of the
> current folio in the system as long as it's not a large folio. This
> reuse does not affect calls to folio_mapcount, which will always return
> zero.

OK so - if you ever find yourself thinking this way, please stop. We are in
the midst of fundamentally changing how folios and pages work.

There is absolutely ZERO room for reusing arbitrary fields in this way. Any
series that attempts to do this will be rejected.

Again, I must say - if you had raised this ahead of time we could have
saved you some effort.

>
> Therefore, we can set the folio's page_type to PGTY_mgt_entry when
> try_to_migrate completes, the folio is already unmapped, and it's not a
> large folio. The remaining 24 bits can then be used to record the number
> of migrate PTE entries generated by try_to_migrate.

I mean there's so much wrong here. The future is large folios. Making some
fundamental change that relies on not-large folio is a mistake. 24
bits... I mean no.

>
> Then, in remove_migration_ptes, when the nr_mgt_entry count drops to
> zero, we can terminate the VMA traversal early.
>
> It's important to note that we need to initialize the folio's page_type
> to PGTY_mgt_entry and set the migrate entry count only while holding the
> rmap walk lock.This is because during the lock period, we can prevent
> new VMA fork (which would increase migrate entries) and VMA unmap
> (which would decrease migrate entries).

No, no no. NO.

You are not introducing new locking complexity for this.

I could go on, but there's no point.

This series is not upstreamable, NAK.


  parent reply	other threads:[~2025-07-24  9:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-24  8:44 Huan Yang
2025-07-24  8:44 ` [RFC PATCH 1/9] mm: introduce PAGE_TYPE_SHIFT Huan Yang
2025-07-24  8:44 ` [RFC PATCH 2/9] mm: add page_type value helper Huan Yang
2025-07-24  8:44 ` [RFC PATCH 3/9] mm/rmap: simplify rmap_walk invoke Huan Yang
2025-07-24  8:44 ` [RFC PATCH 4/9] mm/rmap: add args in rmap_walk_control done hook Huan Yang
2025-07-24  8:44 ` [RFC PATCH 5/9] mm/rmap: introduce exit hook Huan Yang
2025-07-24  8:44 ` [RFC PATCH 6/9] mm/rmap: introduce migrate_walk_arg Huan Yang
2025-07-24  8:44 ` [RFC PATCH 7/9] mm/migrate: rename rmap_walk_arg folio Huan Yang
2025-07-24  8:44 ` [RFC PATCH 8/9] mm/migrate: infrastructure for migrate entry page_type Huan Yang
2025-07-24  8:44 ` [RFC PATCH 9/9] mm/migrate: apply " Huan Yang
2025-07-24  8:59 ` [RFC PATCH 0/9] introduce PGTY_mgt_entry page_type David Hildenbrand
2025-07-24  9:09   ` Huan Yang
2025-07-24  9:12     ` David Hildenbrand
2025-07-24  9:20       ` David Hildenbrand
2025-07-24  9:32         ` David Hildenbrand
2025-07-24  9:36           ` Huan Yang
2025-07-24  9:45             ` Lorenzo Stoakes
2025-07-24  9:56               ` Huan Yang
2025-07-24  9:58                 ` Lorenzo Stoakes
2025-07-24 10:01                   ` Huan Yang
2025-07-24  9:15 ` Lorenzo Stoakes [this message]
2025-07-24  9:29   ` Huan Yang
2025-07-25  1:37     ` Huang, Ying
2025-07-25  1:47       ` Huan Yang
2025-07-25  9:26         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=764c48ad-8869-4f69-898e-0a1c58684f7d@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=byungchul@sk.com \
    --cc=chengming.zhou@linux.dev \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=harry.yoo@oracle.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=link@vivo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox