From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Zhongkun He <hezhongkun.hzk@bytedance.com>,
peterz@infradead.org, mgorman@suse.de, ying.huang@intel.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
wuyun.abel@bytedance.com
Subject: Re: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process
Date: Tue, 23 Jul 2024 11:45:46 +0530 [thread overview]
Message-ID: <ea121294-eeaf-42b1-bc1c-186f4ea7be1d@arm.com> (raw)
In-Reply-To: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com>
On 7/23/24 11:02, Zhongkun He wrote:
> I found a problem in my test machine that the memory of a process is
> repeatedly migrated between two nodes and does not stop.
>
> 1.Test step and the machines.
> ------------
> VM machine: 4 numa nodes and 10GB per node.
>
> stress --vm 1 --vm-bytes 12g --vm-keep
>
> The info of numa stat:
> while :;do cat memory.numa_stat | grep -w anon;sleep 5;done
> anon N0=98304 N1=0 N2=10250747904 N3=2634334208
> anon N0=98304 N1=0 N2=10250747904 N3=2634334208
> anon N0=98304 N1=0 N2=9937256448 N3=2947825664
> anon N0=98304 N1=0 N2=8863514624 N3=4021567488
> anon N0=98304 N1=0 N2=7789772800 N3=5095309312
> anon N0=98304 N1=0 N2=6716030976 N3=6169051136
> anon N0=98304 N1=0 N2=5642289152 N3=7242792960
> anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> anon N0=98304 N1=0 N2=4837007360 N3=8048074752
> anon N0=98304 N1=0 N2=3763265536 N3=9121816576
> anon N0=98304 N1=0 N2=2689523712 N3=10195558400
> anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> anon N0=98304 N1=0 N2=2515148800 N3=10369933312
> anon N0=98304 N1=0 N2=3320455168 N3=9564626944
> anon N0=98304 N1=0 N2=4394196992 N3=8490885120
> anon N0=98304 N1=0 N2=5105442816 N3=7779639296
> anon N0=98304 N1=0 N2=6174195712 N3=6710886400
> anon N0=98304 N1=0 N2=7247937536 N3=5637144576
> anon N0=98304 N1=0 N2=8321679360 N3=4563402752
> anon N0=98304 N1=0 N2=9395421184 N3=3489660928
> anon N0=98304 N1=0 N2=10247872512 N3=2637209600
> anon N0=98304 N1=0 N2=10247872512 N3=2637209600
>
> 2. Root cause:
> Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded
> PTEs which are on the right node")the PTE of local pages will not be
> changed in change_pte_range() for single-threaded process, so no
> page_faults information will be generated in do_numa_page(). If a
> single-threaded process has memory on another node, it will
> unconditionally migrate all of it's local memory to that node,
> even if the remote node has only one page.
>
> So, let's fix it. The memory of single-threaded process should follow
> the cpu, not the numa faults info in order to avoid memory thrashing.
>
> After a long time of testing, there is no memory thrashing
> from the beginning.
>
> while :;do cat memory.numa_stat | grep -w anon;sleep 5;done
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
> anon N0=2548117504 N1=10336903168 N2=139264 N3=0
>
> V1:
> -- Add the test results (numa stats) from Ying's feedback
>
> Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
> Acked-by: "Huang, Ying" <ying.huang@intel.com>
> ---
> kernel/sched/fair.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 24dda708b699..d7cbbda568fb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p)
> numa_group_count_active_nodes(ng);
> spin_unlock_irq(group_lock);
> max_nid = preferred_group_nid(p, max_nid);
> + } else if (atomic_read(&p->mm->mm_users) == 1) {
> + /*
> + * The memory of a single-threaded process should
> + * follow the CPU in order to avoid memory thrashing.
> + */
> + max_nid = numa_node_id();
> }
>
> if (max_faults) {
This in fact makes sense for a single threaded process but just
wondering could there be any other unwanted side effects ?
next prev parent reply other threads:[~2024-07-23 6:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 5:32 Zhongkun He
2024-07-23 6:15 ` Anshuman Khandual [this message]
2024-07-23 7:00 ` [External] " Zhongkun He
2024-07-23 13:38 ` Abel Wu
2024-07-24 3:55 ` Zhongkun He
2024-07-24 12:11 ` Abel Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea121294-eeaf-42b1-bc1c-186f4ea7be1d@arm.com \
--to=anshuman.khandual@arm.com \
--cc=hezhongkun.hzk@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=peterz@infradead.org \
--cc=wuyun.abel@bytedance.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox