linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Chen, Yu C" <yu.c.chen@intel.com>
To: Libo Chen <libo.chen@oracle.com>
Cc: "Jain, Ayush" <ayushjai@amd.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ingo Molnar" <mingo@redhat.com>, "Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Mel Gorman" <mgormanmgorman@suse.de>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Chen, Tim C" <tim.c.chen@intel.com>,
	"Aubrey Li" <aubrey.li@intel.com>,
	cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"K Prateek Nayak" <kprateek.nayak@amd.com>,
	"Madadi Vineeth Reddy" <vineethr@linux.ibm.com>,
	Neeraj.Upadhyay@amd.com, "Peter Zijlstra" <peterz@infradead.org>,
	"Michal Koutný" <mkoutny@suse.com>
Subject: Re: [PATCH v3] sched/numa: add statistics of numa balance task migration
Date: Tue, 6 May 2025 13:36:54 +0800	[thread overview]
Message-ID: <bc93c650-ba55-4434-98f6-3b7f556ae44b@intel.com> (raw)
In-Reply-To: <c20fbc3b-5adf-488c-b6f3-0d4e3c9da5c3@oracle.com>

On 5/6/2025 5:57 AM, Libo Chen wrote:
> 
> 
> On 5/5/25 14:32, Libo Chen wrote:
>>
>>
>> On 5/5/25 11:49, Libo Chen wrote:
>>>
>>>
>>> On 5/5/25 11:27, Chen, Yu C wrote:
>>>> Hi Michal,
>>>>
>>>> On 5/6/2025 1:46 AM, Michal Koutný wrote:
>>>>> On Mon, May 05, 2025 at 11:03:10PM +0800, "Chen, Yu C" <yu.c.chen@intel.com> wrote:
>>>>>> According to this address,
>>>>>>      4c 8b af 50 09 00 00    mov    0x950(%rdi),%r13  <--- r13 = p->mm;
>>>>>>      49 8b bd 98 04 00 00    mov    0x498(%r13),%rdi  <--- p->mm->owner
>>>>>> It seems that this task to be swapped has NULL mm_struct.
>>>>>
>>>>> So it's likely a kernel thread. Does it make sense to NUMA balance
>>>>> those? (I naïvely think it doesn't, please correct me.) ...
>>>>>
>>>>
>>>> I agree kernel threads are not supposed to be covered by
>>>> NUMA balance, because currently NUMA balance only considers
>>>> user pages via VMAs, and one question below:
>>>>
>>>>>>    static void __migrate_swap_task(struct task_struct *p, int cpu)
>>>>>>    {
>>>>>>           __schedstat_inc(p->stats.numa_task_swapped);
>>>>>> -       count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
>>>>>> +       if (p->mm)
>>>>>> +               count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
>>>>>
>>>>> ... proper fix should likely guard this earlier, like the guard in
>>>>> task_numa_fault() but for the other swapped task.
>>>> I see. For task swapping in task_numa_compare(),
>>>> it is triggered when there are no idle CPUs in task A's
>>>> preferred node.
>>>> In this case, we choose a task B on A's preferred node,
>>>> and swap B with A. This helps improve A's Numa locality
>>>> without introducing the load imbalance between Nodes.
>>>>
>> Hi Chenyu
>>
>> There are two problems here:
>> 1. Many kthreads are pinned, with all the efforts in task_numa_compare()
>> and task_numa_find_cpu(), the swapping may not end up happening. I only see a
>> check on source task: cpumask_test_cpu(cpu, env->p->cpus_ptr) but not dst task.
> 
> NVM I was blind. There is a check on dst task in task_numa_compare()
> 
>> 2. Assuming B is migratable, that can potentially make B worse, right? I think
>> some kthreads are quite cache-sensitive, and we swap like their locality doesn't
>> matter.

This makes sense. I wonder if it could be extended beyond kthreads.
We don't want to swap task B that has no explicit NUMA preference,
do we?

>>
>> Ideally we probably just want to stay off kthreads, if we cannot find any others
>> p->mm tasks, just don't swap (?). That sounds like a brand new patch though.
>>
> 
> A change as simple as that should work:
> 
> @@ -2492,7 +2492,7 @@ static bool task_numa_compare(struct task_numa_env *env,
> 
>          rcu_read_lock();
>          cur = rcu_dereference(dst_rq->curr);
> -       if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
> +       if (cur && ((cur->flags & PF_EXITING) || !cur->mm || is_idle_task(cur)))

something like
if (cur && ((cur->flags & PF_EXITING) ||
     cur->numa_preferred_nid == NUMA_NO_NODE ||
    !cur->numa_faults || is_idle_task(cur)))

But overall it looks good to me, would you like to post this as a
formal patch, or do you want me to fold your change into a patch set?

thanks,
Chenyu

>                  cur = NULL;
>


  

>>
>>
>> Libo
>>>> But B's Numa node preference is not mandatory in
>>>> current implementation IIUC, because B's load is mainly
>>>
>>> hmm, that's doesn't seem to be right, can we choose B that
>>> is not a kthread from A's preferred node?
>>>
>>>> considered. That is to say, is it legit to swap a
>>>> Numa sensitive task A with a non-Numa sensitive kernel
>>>> thread B? If not, I think we can add kernel thread
>>>> check in task swap like the guard in
>>>> task_tick_numa()/task_numa_fault().
>>>>
>>>
>>>
>>>> thanks,
>>>> Chenyu
>>>>
>>>>>
>>>>> Michal
>>>>
>>>
>>
> 


  parent reply	other threads:[~2025-05-06  5:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-30 10:36 Chen Yu
2025-05-01  7:00 ` Libo Chen
2025-05-02  9:30   ` Chen, Yu C
2025-05-05  6:43 ` Jain, Ayush
2025-05-05 15:03   ` Chen, Yu C
2025-05-05 17:25     ` Venkat Rao Bagalkote
2025-05-07 11:36       ` Chen, Yu C
2025-05-05 17:46     ` Michal Koutný
2025-05-05 18:27       ` Chen, Yu C
2025-05-05 18:49         ` Libo Chen
2025-05-05 21:32           ` Libo Chen
2025-05-05 21:57             ` Libo Chen
2025-05-06  5:06               ` Jain, Ayush
2025-05-06  5:36               ` Chen, Yu C [this message]
2025-05-06  7:03                 ` Libo Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc93c650-ba55-4434-98f6-3b7f556ae44b@intel.com \
    --to=yu.c.chen@intel.com \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=aubrey.li@intel.com \
    --cc=ayushjai@amd.com \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=kprateek.nayak@amd.com \
    --cc=libo.chen@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgormanmgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tim.c.chen@intel.com \
    --cc=tj@kernel.org \
    --cc=vineethr@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox