Re: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Abel Wu <wuyun.abel@bytedance.com>
To: Zhongkun He <hezhongkun.hzk@bytedance.com>,
	peterz@infradead.org, mgorman@suse.de, ying.huang@intel.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process
Date: Tue, 23 Jul 2024 21:38:50 +0800	[thread overview]
Message-ID: <e3a75483-d3f7-4963-9332-4893d22463ad@bytedance.com> (raw)
In-Reply-To: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com>

Hi Zhongkun,

On 7/23/24 1:32 PM, Zhongkun He Wrote:
> I found a problem in my test machine that the memory of a process is
> repeatedly migrated between two nodes and does not stop.
> 
> 1.Test step and the machines.
> ------------
> VM machine: 4 numa nodes and 10GB per node.
> 
> stress --vm 1 --vm-bytes 12g --vm-keep
> 
> The info of numa stat:
> while :;do cat memory.numa_stat | grep -w anon;sleep 5;done
> anon N0=98304 N1=0 N2=10250747904 N3=2634334208

I am curious what was the exact reason made the worker migrated
to N3? And later...

> anon N0=98304 N1=0 N2=10250747904 N3=2634334208
> anon N0=98304 N1=0 N2=9937256448 N3=2947825664
> anon N0=98304 N1=0 N2=8863514624 N3=4021567488
> anon N0=98304 N1=0 N2=7789772800 N3=5095309312
> anon N0=98304 N1=0 N2=6716030976 N3=6169051136
> anon N0=98304 N1=0 N2=5642289152 N3=7242792960
> anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> anon N0=98304 N1=0 N2=4837007360 N3=8048074752
> anon N0=98304 N1=0 N2=3763265536 N3=9121816576
> anon N0=98304 N1=0 N2=2689523712 N3=10195558400
> anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> anon N0=98304 N1=0 N2=2515148800 N3=10369933312

.. why it was moved back to N2?

> anon N0=98304 N1=0 N2=3320455168 N3=9564626944
> anon N0=98304 N1=0 N2=4394196992 N3=8490885120
> anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> anon N0=98304 N1=0 N2=6174195712 N3=6710886400
> anon N0=98304 N1=0 N2=7247937536 N3=5637144576
> anon N0=98304 N1=0 N2=8321679360 N3=4563402752
> anon N0=98304 N1=0 N2=9395421184 N3=3489660928
> anon N0=98304 N1=0 N2=10247872512 N3=2637209600
> anon N0=98304 N1=0 N2=10247872512 N3=2637209600
> 
> 2. Root cause:
> Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded
> PTEs which are on the right node")the PTE of local pages will not be
> changed in change_pte_range() for single-threaded process, so no
> page_faults information will be generated in do_numa_page(). If a
> single-threaded process has memory on another node, it will
> unconditionally migrate all of it's local memory to that node,
> even if the remote node has only one page.

IIUC the remote pages will be moved to the node where the worker
is running since local (private) PTEs are not set to protnone and
won't be faulted on.

> 
> So, let's fix it. The memory of single-threaded process should follow
> the cpu, not the numa faults info in order to avoid memory thrashing.

Don't forget the 'Fixes' tag for bugfix patches :)

> 
> ...> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 24dda708b699..d7cbbda568fb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p)
>   		numa_group_count_active_nodes(ng);
>   		spin_unlock_irq(group_lock);
>   		max_nid = preferred_group_nid(p, max_nid);
> +	} else if (atomic_read(&p->mm->mm_users) == 1) {
> +		/*
> +		 * The memory of a single-threaded process should
> +		 * follow the CPU in order to avoid memory thrashing.
> +		 */
> +		max_nid = numa_node_id();
>   	}
>   
>   	if (max_faults) {

Since you don't want to respect the faults info, can we simply
skip task placement?

next prev parent reply	other threads:[~2024-07-23 13:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-23  5:32 Zhongkun He
2024-07-23  6:15 ` Anshuman Khandual
2024-07-23  7:00   ` [External] " Zhongkun He
2024-07-23 13:38 ` Abel Wu [this message]
2024-07-24  3:55   ` Zhongkun He
2024-07-24 12:11     ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3a75483-d3f7-4963-9332-4893d22463ad@bytedance.com \
    --to=wuyun.abel@bytedance.com \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox