From: Libo Chen <libo.chen@oracle.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>, "Michal Koutný" <mkoutny@suse.com>
Cc: "Jain, Ayush" <ayushjai@amd.com>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Jonathan Corbet <corbet@lwn.net>,
Mel Gorman <mgormanmgorman@suse.de>,
Michal Hocko <mhocko@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
"Chen, Tim C" <tim.c.chen@intel.com>,
Aubrey Li <aubrey.li@intel.com>,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
K Prateek Nayak <kprateek.nayak@amd.com>,
Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
Neeraj.Upadhyay@amd.com, Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v3] sched/numa: add statistics of numa balance task migration
Date: Mon, 5 May 2025 14:57:29 -0700 [thread overview]
Message-ID: <c20fbc3b-5adf-488c-b6f3-0d4e3c9da5c3@oracle.com> (raw)
In-Reply-To: <c7444174-fa5e-44c1-bd16-c8971d118b1b@oracle.com>
On 5/5/25 14:32, Libo Chen wrote:
>
>
> On 5/5/25 11:49, Libo Chen wrote:
>>
>>
>> On 5/5/25 11:27, Chen, Yu C wrote:
>>> Hi Michal,
>>>
>>> On 5/6/2025 1:46 AM, Michal Koutný wrote:
>>>> On Mon, May 05, 2025 at 11:03:10PM +0800, "Chen, Yu C" <yu.c.chen@intel.com> wrote:
>>>>> According to this address,
>>>>> 4c 8b af 50 09 00 00 mov 0x950(%rdi),%r13 <--- r13 = p->mm;
>>>>> 49 8b bd 98 04 00 00 mov 0x498(%r13),%rdi <--- p->mm->owner
>>>>> It seems that this task to be swapped has NULL mm_struct.
>>>>
>>>> So it's likely a kernel thread. Does it make sense to NUMA balance
>>>> those? (I naïvely think it doesn't, please correct me.) ...
>>>>
>>>
>>> I agree kernel threads are not supposed to be covered by
>>> NUMA balance, because currently NUMA balance only considers
>>> user pages via VMAs, and one question below:
>>>
>>>>> static void __migrate_swap_task(struct task_struct *p, int cpu)
>>>>> {
>>>>> __schedstat_inc(p->stats.numa_task_swapped);
>>>>> - count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
>>>>> + if (p->mm)
>>>>> + count_memcg_event_mm(p->mm, NUMA_TASK_SWAP);
>>>>
>>>> ... proper fix should likely guard this earlier, like the guard in
>>>> task_numa_fault() but for the other swapped task.
>>> I see. For task swapping in task_numa_compare(),
>>> it is triggered when there are no idle CPUs in task A's
>>> preferred node.
>>> In this case, we choose a task B on A's preferred node,
>>> and swap B with A. This helps improve A's Numa locality
>>> without introducing the load imbalance between Nodes.
>>>
> Hi Chenyu
>
> There are two problems here:
> 1. Many kthreads are pinned, with all the efforts in task_numa_compare()
> and task_numa_find_cpu(), the swapping may not end up happening. I only see a
> check on source task: cpumask_test_cpu(cpu, env->p->cpus_ptr) but not dst task.
NVM I was blind. There is a check on dst task in task_numa_compare()
> 2. Assuming B is migratable, that can potentially make B worse, right? I think
> some kthreads are quite cache-sensitive, and we swap like their locality doesn't
> matter.
>
> Ideally we probably just want to stay off kthreads, if we cannot find any others
> p->mm tasks, just don't swap (?). That sounds like a brand new patch though.
>
A change as simple as that should work:
@@ -2492,7 +2492,7 @@ static bool task_numa_compare(struct task_numa_env *env,
rcu_read_lock();
cur = rcu_dereference(dst_rq->curr);
- if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
+ if (cur && ((cur->flags & PF_EXITING) || !cur->mm || is_idle_task(cur)))
cur = NULL;
>
>
> Libo
>>> But B's Numa node preference is not mandatory in
>>> current implementation IIUC, because B's load is mainly
>>
>> hmm, that's doesn't seem to be right, can we choose B that
>> is not a kthread from A's preferred node?
>>
>>> considered. That is to say, is it legit to swap a
>>> Numa sensitive task A with a non-Numa sensitive kernel
>>> thread B? If not, I think we can add kernel thread
>>> check in task swap like the guard in
>>> task_tick_numa()/task_numa_fault().
>>>
>>
>>
>>> thanks,
>>> Chenyu
>>>
>>>>
>>>> Michal
>>>
>>
>
next prev parent reply other threads:[~2025-05-05 21:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-30 10:36 Chen Yu
2025-05-01 7:00 ` Libo Chen
2025-05-02 9:30 ` Chen, Yu C
2025-05-05 6:43 ` Jain, Ayush
2025-05-05 15:03 ` Chen, Yu C
2025-05-05 17:25 ` Venkat Rao Bagalkote
2025-05-07 11:36 ` Chen, Yu C
2025-05-05 17:46 ` Michal Koutný
2025-05-05 18:27 ` Chen, Yu C
2025-05-05 18:49 ` Libo Chen
2025-05-05 21:32 ` Libo Chen
2025-05-05 21:57 ` Libo Chen [this message]
2025-05-06 5:06 ` Jain, Ayush
2025-05-06 5:36 ` Chen, Yu C
2025-05-06 7:03 ` Libo Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c20fbc3b-5adf-488c-b6f3-0d4e3c9da5c3@oracle.com \
--to=libo.chen@oracle.com \
--cc=Neeraj.Upadhyay@amd.com \
--cc=akpm@linux-foundation.org \
--cc=aubrey.li@intel.com \
--cc=ayushjai@amd.com \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=kprateek.nayak@amd.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgormanmgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=peterz@infradead.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tim.c.chen@intel.com \
--cc=tj@kernel.org \
--cc=vineethr@linux.ibm.com \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox