From: Bharata B Rao <bharata@amd.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Jonathan.Cameron@huawei.com, dave.hansen@intel.com,
gourry@gourry.net, hannes@cmpxchg.org,
mgorman@techsingularity.net, mingo@redhat.com,
peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com,
rientjes@google.com, sj@kernel.org, weixugc@google.com,
ying.huang@linux.alibaba.com, ziy@nvidia.com, dave@stgolabs.net,
nifan.cxl@gmail.com, xuezhengchu@huawei.com, yiannis@zptcorp.com,
akpm@linux-foundation.org, david@redhat.com
Subject: Re: page_ext and memdescs
Date: Tue, 17 Jun 2025 13:58:13 +0530 [thread overview]
Message-ID: <9f9ce455-262a-4d55-829f-ff485f67dc7a@amd.com> (raw)
In-Reply-To: <aFAkkOzJius6XiO6@casper.infradead.org>
On 16-Jun-25 7:35 PM, Matthew Wilcox wrote:
> On Mon, Jun 16, 2025 at 07:09:30PM +0530, Bharata B Rao wrote:
<snip>
>> +#define PAGE_EXT_MIG_NID_MASK ((1UL << PAGE_EXT_MIG_NID_SHIFT) - 1)
>> +#define PAGE_EXT_MIG_FREQ_MASK ((1UL << PAGE_EXT_MIG_FREQ_SHIFT) - 1)
>> +#define PAGE_EXT_MIG_TIME_MASK ((1UL << PAGE_EXT_MIG_TIME_SHIFT) - 1)
>
> OK, so we need to have a conversation about page_ext. Sorry this is
> happening to you. I've kind of skipped over page_ext when talking
> about folios and memdescs up to now, so it's not that you've missed
> anything.
>
> As the comment says,
>
> * Page Extension can be considered as an extended mem_map.
>
> and we need to do this because we don't want to grow struct page beyond
> 64 bytes. But memdescs are dynamically allocated, so we don't need
> page_ext any more, and all that code can go away.
>
> lib/alloc_tag.c:struct page_ext_operations page_alloc_tagging_ops = {
> mm/page_ext.c:static struct page_ext_operations page_idle_ops __initdata = {
> mm/page_ext.c:static struct page_ext_operations *page_ext_ops[] __initdata = {
> mm/page_owner.c:struct page_ext_operations page_owner_ops = {
> mm/page_table_check.c:struct page_ext_operations page_table_check_ops = {
>
> I think all of these are actually per-memdesc thangs and not per-page
> things, so we can get rid of them all. That means I don't want to see
> new per-page data being added to page_ext.
Fair point.
>
> So, what's this really used for? It seems like it's really
> per-allocation, not per-page. Does it need to be preserved across
> alloc/free or can it be reset at free time?
The context here is to track the pages that need to be migrated. Whether
it is for NUMA Balancing or for any other sub-system that would need to
migrate (or promote) pages across nodes, I am trying to come up with a
kernel thread based migrator that would migrate the identified pages in
an async and batched manner. For this, the basic information that is
required for each such ready-to-be-migrated page is the target NID.
Since I have chosen to walk the zones and PFNs of the zone to iterate
over each page, an additional info that I want per ready-to-be-migrated
page is an indication that the page is indeed ready now to be migrated
by the migrator thread.
In addition to these two things, if we want to carve out a single system
(like kpromoted approach) that handles inputs from multiple page hotness
sources, maintains heuristics to decide when exactly to migrate/promote
a page, then it would be good to store a few other information for such
pages (like access frequency, access timestamp etc).
With that background, I am looking for an optimal place to store this
information. In my earlier approaches, I was maintaining a global list
of such hot pages and realized that such an approach will not scale and
hence in the current approach I am tying that information with the page
itself. With that, there is no overhead of maintaining such a list,
synchronizing between producers and migrator thread, no allocation for
each maintained page. Hence it appeared to me that a pre-allocated
per-page info would be preferable. At this point, page extension
appeared a good place to have this information.
Sorry for the long reply, but coming to your specific question now.
So I really need to maintain such data only for pages that can be
migrated. Pages like most anonymous pages, file backed pages, pages that
are mapped to user page tables, THP pages etc are candidates. I wonder
which memdesc type/types would cover all such pages. Would "folio" as
memdesc (https://kernelnewbies.org/MatthewWilcox/FolioAlloc) be broad
enough type for this?
As you note, it appears to me that it could be per-allocation rather
than per-page and the information needn't be preserved across alloc/free.
Regards,
Bharata.
next prev parent reply other threads:[~2025-06-17 8:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-16 13:39 [RFC PATCH v1 0/4] Kernel thread based async batch migration Bharata B Rao
2025-06-16 13:39 ` [RFC PATCH v1 1/4] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-06-16 13:39 ` [RFC PATCH v1 2/4] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-06-16 13:39 ` [RFC PATCH v1 3/4] mm: kmigrated - Async kernel migration thread Bharata B Rao
2025-06-16 14:05 ` page_ext and memdescs Matthew Wilcox
2025-06-17 8:28 ` Bharata B Rao [this message]
2025-06-24 9:47 ` David Hildenbrand
2025-07-07 9:36 ` [RFC PATCH v1 3/4] mm: kmigrated - Async kernel migration thread Byungchul Park
2025-07-08 3:43 ` Bharata B Rao
2025-06-16 13:39 ` [RFC PATCH v1 4/4] mm: sched: Batch-migrate misplaced pages Bharata B Rao
2025-06-20 6:39 ` [RFC PATCH v1 0/4] Kernel thread based async batch migration Huang, Ying
2025-06-20 8:58 ` Bharata B Rao
2025-06-20 9:59 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f9ce455-262a-4d55-829f-ff485f67dc7a@amd.com \
--to=bharata@amd.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nifan.cxl@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox