From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B1E8C3DA63 for ; Tue, 23 Jul 2024 06:15:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA8946B007B; Tue, 23 Jul 2024 02:15:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C58AF6B0083; Tue, 23 Jul 2024 02:15:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B512F6B0085; Tue, 23 Jul 2024 02:15:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9737C6B007B for ; Tue, 23 Jul 2024 02:15:58 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 388E1A5E01 for ; Tue, 23 Jul 2024 06:15:58 +0000 (UTC) X-FDA: 82370006796.04.C648401 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf17.hostedemail.com (Postfix) with ESMTP id E84F040021 for ; Tue, 23 Jul 2024 06:15:55 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721715333; a=rsa-sha256; cv=none; b=Ga74PSXDIIrSuqxTy32RCaDkoRh9yRaVd7/4hJ/o4y50JXSSGJDnw8zvCvmHQEeqUnbG71 KZEP/qfHFLcbBU5cgDvk9oIDJ4aiZy69H29jWBVd0Hgz/O6zZe6F7UED21VRHamtxnnMV1 Q/8XexuFXrz6wm8bGjOS5cP/XAgdV1k= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721715333; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O3jHkhtuljRiZoYyOL30Prb/ZWiFnT79q7HWQlDEYgg=; b=Wh7R/tmhG367tK0vw7bGnvsVKl+WvWrGQ2c0qNzlZwdGdeoEfypYmAhPh2eCDXM/GFP6Su 8KUjvpwjASiAxJTsnkNtaNhfWLz4FDcc1JkdMMRJsm2UhcVXW2uYMlDID/+KOE44wl2aCc fvCdU9ej+5gN4PLyQtfl/cEj+3zN3mo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5B872139F; Mon, 22 Jul 2024 23:16:20 -0700 (PDT) Received: from [10.163.54.37] (unknown [10.163.54.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 619103F73F; Mon, 22 Jul 2024 23:15:50 -0700 (PDT) Message-ID: Date: Tue, 23 Jul 2024 11:45:46 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process To: Zhongkun He , peterz@infradead.org, mgorman@suse.de, ying.huang@intel.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com References: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com> Content-Language: en-US From: Anshuman Khandual In-Reply-To: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 3dfe3imjz1asna3kwyewkz4y793kgdbu X-Rspamd-Queue-Id: E84F040021 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1721715355-2119 X-HE-Meta: U2FsdGVkX1+gu95fkG+8XoZI3TzSdwRXUKGlT5r4pqefjHBV8+fBb6nc9wc7OT2K5EwKZJGY1qD5uEwbYWbZnQAJ8UIE/NBDrlHu+K09gPQcxOxBg7B8uXZIK3/CF8QFl6B4ELBlVciUjc9kSUEJ/MPaBl8WP7uT7bidVqPjNktZgirA239bvZ4oDyc7iD8laKlTkcanL9VzYVvkbc2g8gX7zpqvJGoWdhOGpJWi6lMd+WTW2SOVC5JDa2xw5sUse7H/bTaBUm/C05mUN3MJZ5v7lfES82shmQmlCoWXFuOoHkp1IYeFswsRe2Ab586Uaq4K0gK2tQCDzk/3eC+hfw3l/j/2nacurGu0MNufVQdYzt7qc2gR18LOIfLM0P3J73j3YaXYMCvymp/2OzOwLB5aYl0b4Q+IYzvFxR1FYXU/ZCOQ+/pWIfG12HOeVMZkCWz3Y5epQbMkctmCOOW4dN9u9IE6bpi3oEpy9yG30QVFbnj5y0b1E4A02T1CR5rGE83XjpjppC/85pU7aN5a1kUfpS2xHL7YHDGumS9YpfnR9VFiJ4AboPAleipT7q8DTTRUOGGP4TXTHsIdc8Sw/qZAqrpzOGxu5xsqrJLf0HQSscSOVgPWo0eRN/HuJvQoUbyqjnIhVsBq2DEERQJ2liV8oowCgI3xapp0XDECV6FYUrhrKY8jzQD+jaUP9dVLR68pwwhH9Zknv0MuNICN5x3R/NtmAyxw+XF5aTRcVnPBRDuA2k7WfcHoN37iiZA6aCfhH+EMf+9bOCRIrGST3Xz2MGThnefukQBff+wm3r6uJuGQmFg9s7rgD15x1Cr+rwpgBOKzBlG6IPf3zMTSvKeDdaav9Uu7dPa0ooIi9VK2qHz0LcpFWpcJdpatUVjant1Cd4If3gKeCz/TEK1BFWdt5vd2hTmYAbLL+G8iSufiqc4GTryjZPhm3AR9yQob3PTJIb5yO95Qm8iz5ic VfSsr1FZ Y8LQdQyP27JI67S5iA20XJTJcQA1BlUuqnsrJR9x6Jfl+8jfTBOImUhhDsBZg8+5kiNlKU5f9vu1XqifJjuAO53GQmY313eWKxPBtpU/ux29uxwUX3NdYKbKBp3ajFideIQ4Dyy/7JXxvYBb2WVXpb2vQXzOpnMZBPWCwIzOzJWIJqIJPeUGCgE7pH6Sbvdk4l3IrG0yDnHGEDrdYKQZkP+KlmnvuQubEs0GtoHRK7ERC8uEHNysgTZvcWBjWXeviE1EKT286xAGHUWo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 7/23/24 11:02, Zhongkun He wrote: > I found a problem in my test machine that the memory of a process is > repeatedly migrated between two nodes and does not stop. > > 1.Test step and the machines. > ------------ > VM machine: 4 numa nodes and 10GB per node. > > stress --vm 1 --vm-bytes 12g --vm-keep > > The info of numa stat: > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done > anon N0=98304 N1=0 N2=10250747904 N3=2634334208 > anon N0=98304 N1=0 N2=10250747904 N3=2634334208 > anon N0=98304 N1=0 N2=9937256448 N3=2947825664 > anon N0=98304 N1=0 N2=8863514624 N3=4021567488 > anon N0=98304 N1=0 N2=7789772800 N3=5095309312 > anon N0=98304 N1=0 N2=6716030976 N3=6169051136 > anon N0=98304 N1=0 N2=5642289152 N3=7242792960 > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > anon N0=98304 N1=0 N2=4837007360 N3=8048074752 > anon N0=98304 N1=0 N2=3763265536 N3=9121816576 > anon N0=98304 N1=0 N2=2689523712 N3=10195558400 > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > anon N0=98304 N1=0 N2=3320455168 N3=9564626944 > anon N0=98304 N1=0 N2=4394196992 N3=8490885120 > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > anon N0=98304 N1=0 N2=6174195712 N3=6710886400 > anon N0=98304 N1=0 N2=7247937536 N3=5637144576 > anon N0=98304 N1=0 N2=8321679360 N3=4563402752 > anon N0=98304 N1=0 N2=9395421184 N3=3489660928 > anon N0=98304 N1=0 N2=10247872512 N3=2637209600 > anon N0=98304 N1=0 N2=10247872512 N3=2637209600 > > 2. Root cause: > Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded > PTEs which are on the right node")the PTE of local pages will not be > changed in change_pte_range() for single-threaded process, so no > page_faults information will be generated in do_numa_page(). If a > single-threaded process has memory on another node, it will > unconditionally migrate all of it's local memory to that node, > even if the remote node has only one page. > > So, let's fix it. The memory of single-threaded process should follow > the cpu, not the numa faults info in order to avoid memory thrashing. > > After a long time of testing, there is no memory thrashing > from the beginning. > > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > anon N0=2548117504 N1=10336903168 N2=139264 N3=0 > > V1: > -- Add the test results (numa stats) from Ying's feedback > > Signed-off-by: Zhongkun He > Acked-by: "Huang, Ying" > --- > kernel/sched/fair.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 24dda708b699..d7cbbda568fb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p) > numa_group_count_active_nodes(ng); > spin_unlock_irq(group_lock); > max_nid = preferred_group_nid(p, max_nid); > + } else if (atomic_read(&p->mm->mm_users) == 1) { > + /* > + * The memory of a single-threaded process should > + * follow the CPU in order to avoid memory thrashing. > + */ > + max_nid = numa_node_id(); > } > > if (max_faults) { This in fact makes sense for a single threaded process but just wondering could there be any other unwanted side effects ?