From: Raghavendra K T <raghavendra.kt@amd.com>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: AneeshKumar.KizhakeVeetil@arm.com, Hasan.Maruf@amd.com,
Michael.Day@amd.com, akpm@linux-foundation.org, bharata@amd.com,
dave.hansen@intel.com, david@redhat.com,
dongjoo.linux.dev@gmail.com, feng.tang@intel.com,
gourry@gourry.net, hannes@cmpxchg.org, honggyu.kim@sk.com,
hughd@google.com, jhubbard@nvidia.com, jon.grimm@amd.com,
k.shutemov@gmail.com, kbusch@meta.com, kmanaouil.dev@gmail.com,
leesuyeon0506@gmail.com, leillc@google.com,
liam.howlett@oracle.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, mgorman@techsingularity.net,
mingo@redhat.com, nadav.amit@gmail.com, nphamcs@gmail.com,
peterz@infradead.org, riel@surriel.com, rientjes@google.com,
rppt@kernel.org, santosh.shukla@amd.com, shivankg@amd.com,
shy828301@gmail.com, sj@kernel.org, vbabka@suse.cz,
weixugc@google.com, willy@infradead.org,
ying.huang@linux.alibaba.com, ziy@nvidia.com,
Jonathan.Cameron@huawei.com, dave@stgolabs.net,
yuanchu@google.com, kinseyho@google.com, hdanton@sina.com
Subject: Re: [RFC PATCH V2 03/13] mm: Scan the mm and create a migration list
Date: Thu, 26 Jun 2025 11:57:12 +0530 [thread overview]
Message-ID: <02c6c58f-2c2c-42eb-bde6-175de71f7d47@amd.com> (raw)
In-Reply-To: <aFxzEFhTDOyL4y0e@hyeyoo>
On 6/26/2025 3:37 AM, Harry Yoo wrote:
> On Tue, Jun 24, 2025 at 05:56:07AM +0000, Raghavendra K T wrote:
>> Since we already have the list of mm_struct in the system, add a module to
>> scan each mm that walks VMAs of each mm_struct and scan all the pages
>> associated with that.
>>
>> In the scan path: Check for the recently acccessed pages (folios) belonging
>> to slowtier nodes. Add all those folios to a list.
>>
>> Signed-off-by: Raghavendra K T <raghavendra.kt@amd.com>
>> ---
>
> Hi, just taking a quick look...
Hello Harry,
Thanks for taking a look at the patches.
>
>> mm/kscand.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 318 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/kscand.c b/mm/kscand.c
>> index d5b0d3041b0f..0edec1b7730d 100644
>> --- a/mm/kscand.c
>> +++ b/mm/kscand.c
>> @@ -42,6 +55,8 @@ static struct kmem_cache *kscand_slot_cache __read_mostly;
>> @@ -84,11 +122,275 @@ static void kscand_wait_work(void)
>> scan_sleep_jiffies);
>> }
>>
>> +static inline bool is_valid_folio(struct folio *folio)
>> +{
>> + if (!folio || folio_test_unevictable(folio) || !folio_mapped(folio) ||
>> + folio_is_zone_device(folio) || folio_maybe_mapped_shared(folio))
>> + return false;
>> +
>> + return true;
>> +}
>
> What makes it undesirable to migrate shared folios?
This was mostly to avoid shared libraries, but yes this also
should have accompanied with EXEC flag to refine further.
This also avoids moving around shared data. I will experiment more
and add additional filters OR remove the check.
>> +static bool folio_idle_clear_pte_refs_one(struct folio *folio,
>> + struct vm_area_struct *vma,
>> + unsigned long addr,
>> + pte_t *ptep)
>> +{
>> + bool referenced = false;
>> + struct mm_struct *mm = vma->vm_mm;
>> + pmd_t *pmd = pmd_off(mm, addr);
>> +
>> + if (ptep) {
>> + if (ptep_clear_young_notify(vma, addr, ptep))
>> + referenced = true;
>> + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>> + if (!pmd_present(*pmd))
>> + WARN_ON_ONCE(1);
>> + if (pmdp_clear_young_notify(vma, addr, pmd))
>> + referenced = true;
>> + } else {
>> + WARN_ON_ONCE(1);
>> + }
>
> This does not look good.
>
> I think pmd entry handling should be handled in
> mm_walk_ops.pmd_entry callback?
>
Thanks, Let me check on this. Some part came when I was referring
to idle page tracking.
>> +
>> + if (referenced) {
>> + folio_clear_idle(folio);
>> + folio_set_young(folio);
>> + }
>> +
>> + return true;
>> +}
>> +
>> +static void page_idle_clear_pte_refs(struct page *page, pte_t *pte, struct mm_walk *walk)
>> +{
>> + bool need_lock;
>> + struct folio *folio = page_folio(page);
>> + unsigned long address;
>> +
>> + if (!folio_mapped(folio) || !folio_raw_mapping(folio))
>> + return;
>> +
>> + need_lock = !folio_test_anon(folio) || folio_test_ksm(folio);
>> + if (need_lock && !folio_trylock(folio))
>> + return;
>
> Why acquire folio lock here?
>
> And I'm not even sure if it's safe to acquire it?
> The locking order is folio_lock -> pte_lock
>
> page walk should have already acquired pte_lock before calling
> ->pte_entry() callback.
>
I saw you clarified later.
>> + address = vma_address(walk->vma, page_pgoff(folio, page), compound_nr(page));
>> + VM_BUG_ON_VMA(address == -EFAULT, walk->vma);
>> + folio_idle_clear_pte_refs_one(folio, walk->vma, address, pte);
>> +
>> + if (need_lock)
>> + folio_unlock(folio);
>> +}
>> +
>> +static const struct mm_walk_ops hot_vma_set_idle_ops = {
>> + .pte_entry = hot_vma_idle_pte_entry,
>> + .walk_lock = PGWALK_RDLOCK,
>> +};
>> +
>> +static void kscand_walk_page_vma(struct vm_area_struct *vma, struct kscand_scanctrl *scanctrl)
>> +{
>> + if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
>> + is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
>> + return;
>> + }
>> + if (!vma->vm_mm ||
>> + (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ)))
>> + return;
>
> Why not walk writable file VMAs?
>
This is mainly followed from numa balancing logic
" Avoid hinting faults in read-only file-backed mappings or the vDSO
as migrating the pages will be of marginal benefit. "
But I have not measured benefits either way. Let me try once.
>> + if (!vma_is_accessible(vma))
>> + return;
>> +
>> + walk_page_vma(vma, &hot_vma_set_idle_ops, scanctrl);
>> +}
>
Regards
- Raghu
next prev parent reply other threads:[~2025-06-26 6:27 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-24 5:56 [RFC PATCH V2 00/13] mm: slowtier page promotion based on PTE A bit Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 01/13] mm: Add kscand kthread for PTE A bit scan Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 02/13] mm: Maintain mm_struct list in the system Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 03/13] mm: Scan the mm and create a migration list Raghavendra K T
2025-06-25 22:07 ` Harry Yoo
2025-06-25 23:05 ` Harry Yoo
2025-06-26 6:27 ` Raghavendra K T [this message]
2025-07-08 11:17 ` Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 04/13] mm: Create a separate kthread for migration Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 05/13] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 06/13] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 07/13] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 08/13] mm: Add initial scan delay Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 09/13] mm: Add a heuristic to calculate target node Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 10/13] sysfs: Add sysfs support to tune scanning Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 11/13] vmstat: Add vmstat counters Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 12/13] trace/kscand: Add tracing of scanning and migration Raghavendra K T
2025-06-24 7:09 ` Masami Hiramatsu
2025-06-24 7:50 ` Raghavendra K T
2025-06-24 5:56 ` [RFC PATCH V2 13/13] prctl: Introduce new prctl to control scanning Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02c6c58f-2c2c-42eb-bde6-175de71f7d47@amd.com \
--to=raghavendra.kt@amd.com \
--cc=AneeshKumar.KizhakeVeetil@arm.com \
--cc=Hasan.Maruf@amd.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=Michael.Day@amd.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dongjoo.linux.dev@gmail.com \
--cc=feng.tang@intel.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hdanton@sina.com \
--cc=honggyu.kim@sk.com \
--cc=hughd@google.com \
--cc=jhubbard@nvidia.com \
--cc=jon.grimm@amd.com \
--cc=k.shutemov@gmail.com \
--cc=kbusch@meta.com \
--cc=kinseyho@google.com \
--cc=kmanaouil.dev@gmail.com \
--cc=leesuyeon0506@gmail.com \
--cc=leillc@google.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=nphamcs@gmail.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=santosh.shukla@amd.com \
--cc=shivankg@amd.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox