Re: [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Gregory Price <gourry@gourry.net>
To: Bharata B Rao <bharata@amd.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Jonathan.Cameron@huawei.com, dave.hansen@intel.com,
	hannes@cmpxchg.org, mgorman@techsingularity.net,
	mingo@redhat.com, peterz@infradead.org, raghavendra.kt@amd.com,
	riel@surriel.com, rientjes@google.com, sj@kernel.org,
	weixugc@google.com, willy@infradead.org,
	ying.huang@linux.alibaba.com, ziy@nvidia.com, dave@stgolabs.net,
	nifan.cxl@gmail.com, joshua.hahnjy@gmail.com,
	xuezhengchu@huawei.com, yiannis@zptcorp.com,
	akpm@linux-foundation.org, david@redhat.com
Subject: Re: [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages
Date: Wed, 21 May 2025 23:55:36 -0400	[thread overview]
Message-ID: <aC6gOFBrO0mduHrl@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <20250521080238.209678-3-bharata@amd.com>

On Wed, May 21, 2025 at 01:32:38PM +0530, Bharata B Rao wrote:
>  
> +static void task_check_pending_migrations(struct task_struct *curr)
> +{
> +	struct callback_head *work = &curr->numa_mig_work;
> +
> +	if (work->next != work)
> +		return;
> +
> +	if (time_after(jiffies, curr->numa_mig_interval) ||
> +	    (curr->migrate_count > NUMAB_BATCH_MIGRATION_THRESHOLD)) {
> +		curr->numa_mig_interval = jiffies + HZ;
> +		task_work_add(curr, work, TWA_RESUME);
> +	}
> +}
> +
>  /*
>   * Drive the periodic memory faults..
>   */
> @@ -3610,6 +3672,8 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr)
>  	if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
>  		return;
>  
> +	task_check_pending_migrations(curr);
> +

So I know this was discussed in the cover leter a bit and alluded to in
the patch, but I want to add my 2cents from work on the unmapped page
cache set.

In that set, I chose to always schedule the task work on the next return
to user-space, rather than defer to a tick like the current numa-balance
code.  This was for two concerns:

1) I didn't want to leave a potentially large number of isolated folios
   on a list that may not be reaped for an unknown period of time.

   I don't know the real limitations on the number of isolated folios,
   but given what we have here I think we can represent a mathematical
   worst case on the nubmer of stranded folios.

   If (N=1,000,000, and M=511) then we could have ~1.8TB of pages
   stranded on these lists - never to be migrated because it never hits
   the threshhold.  In practice this won't happen to that extreme, but
   in practice it absolutely will happen for some chunk of tasks.

   So I chose to never leave kernel space with isolated folios on the
   task numa_mig_list.

   This discussion changes if the numa_mig_list is not on the
   task_struct and instead some per-cpu list routinely reaped by a
   kthread (kpromoted or whatever).

2) I was not confident I could measure the performance implications of
   the migrations directly when it was deferred.  When would I even know
   it happened?  The actual goal is to *not* know it happened, right?

   But now it might happen during a page fault, or any random syscall.

   This concerned me - so i just didn't defer.  That was largely out of
   lack of confidence in my own understanding of the task_work system.

So i think this, as presented, is a half-measure - and I don't think
it's a good half-measure.  I think we might need to go all the way to a
set of per-cpu migration lists that a kernel work can pluck the head of
on some interval.  That would bound the number of isolated folios to the
number of CPUs rather than the number of tasks.

~Gregory

next prev parent reply	other threads:[~2025-05-22  3:55 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21  8:02 [RFC PATCH v0 0/2] Batch migration for NUMA balancing Bharata B Rao
2025-05-21  8:02 ` [RFC PATCH v0 1/2] migrate: implement migrate_misplaced_folio_batch Bharata B Rao
2025-05-22 15:59   ` David Hildenbrand
2025-05-22 16:03     ` Gregory Price
2025-05-22 16:08       ` David Hildenbrand
2025-05-26  8:16   ` Huang, Ying
2025-05-21  8:02 ` [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages Bharata B Rao
2025-05-21 18:25   ` Donet Tom
2025-05-21 18:40     ` Zi Yan
2025-05-22  3:24       ` Gregory Price
2025-05-22  5:23         ` Bharata B Rao
2025-05-22  4:42       ` Bharata B Rao
2025-05-22  4:39     ` Bharata B Rao
2025-05-23  9:05       ` Donet Tom
2025-05-22  3:55   ` Gregory Price [this message]
2025-05-22  7:33     ` Bharata B Rao
2025-05-22 15:38       ` Gregory Price
2025-05-22 16:11   ` David Hildenbrand
2025-05-22 16:24     ` Zi Yan
2025-05-22 16:26       ` David Hildenbrand
2025-05-22 16:38         ` Zi Yan
2025-05-22 17:21           ` David Hildenbrand
2025-05-22 17:30             ` Zi Yan
2025-05-26  8:33               ` Huang, Ying
2025-05-26  9:29               ` David Hildenbrand
2025-05-26 14:20                 ` Zi Yan
2025-05-27  1:18                   ` Huang, Ying
2025-05-27  1:27                     ` Zi Yan
2025-05-28 12:25                   ` Karim Manaouil
2025-05-26  5:14     ` Bharata B Rao
2025-05-21 18:45 ` [RFC PATCH v0 0/2] Batch migration for NUMA balancing SeongJae Park
2025-05-22  3:08   ` Gregory Price
2025-05-22 16:30     ` SeongJae Park
2025-05-22 17:40       ` Gregory Price
2025-05-22 18:52         ` SeongJae Park
2025-05-22 18:43   ` Apologies and clarifications on DAMON-disruptions (was Re: [RFC PATCH v0 0/2] Batch migration for NUMA balancing) SeongJae Park
2025-05-26  5:20   ` [RFC PATCH v0 0/2] Batch migration for NUMA balancing Bharata B Rao
2025-05-27 18:50     ` SeongJae Park
2025-05-26  8:46 ` Huang, Ying
2025-05-27  8:53   ` Bharata B Rao
2025-05-27  9:05     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aC6gOFBrO0mduHrl@gourry-fedora-PF4VCD3F \
    --to=gourry@gourry.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox