From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F340FC3DA49 for ; Tue, 23 Jul 2024 13:40:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 808B16B008A; Tue, 23 Jul 2024 09:40:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7B8CF6B008C; Tue, 23 Jul 2024 09:40:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 681126B0095; Tue, 23 Jul 2024 09:40:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 498E96B008A for ; Tue, 23 Jul 2024 09:40:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DEE6A81FBC for ; Tue, 23 Jul 2024 13:40:20 +0000 (UTC) X-FDA: 82371126600.22.DC23032 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) by imf12.hostedemail.com (Postfix) with ESMTP id E1EB640160 for ; Tue, 23 Jul 2024 13:38:57 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=HKAnruPj; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf12.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.215.182 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721741903; a=rsa-sha256; cv=none; b=uFyXQ2KY4wNCdXBrO3xp0kpEpqXavb/3+SbWzEbdg2/U/MQCoHDRc21s3mEA2WW3TTXIgG zid6URssVq7qfUcbc8kiNPlNo4+lErCZSLEdrjziwjX6CL6QgJ0eYhgK/vmLZEAIV/LV0a 4szMacH38Rp8RRDFGpw5pqR3aX1693M= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=HKAnruPj; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf12.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.215.182 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721741903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yusSAvRfd/ioZ+bDOVdXvZOOrr/DXwdWhxWzOGauvYU=; b=nfa8/ek5kwYXhiD9z1uKZfZRJWdzs+7rPkyNgl30zotcDcHDuH7EhqVlAtA41sFb9ATLzw N4h3+E1kXSrDnl3MTJQ5roznSNiTeR9Tdf+xhUcq7W9tdznCUu6oLS23ktMcLpovWP3D0F gAE+cjWoW+XoBCKOsinh3hxqtdhDH38= Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-7669d62b5bfso546782a12.1 for ; Tue, 23 Jul 2024 06:38:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1721741936; x=1722346736; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=yusSAvRfd/ioZ+bDOVdXvZOOrr/DXwdWhxWzOGauvYU=; b=HKAnruPjpqYjpkvTTMq/7ywYZ6HJEX5ZquexStCRLJ5kmBOLT8Zq26DKCNGwJd5G+U LGAZzlrihjG4zjZgYg7t1jztjNklWY2OJGvvbrra/gMPBlG1yMO+5OTQAzZ/1W4ZUCZ7 2vA+D5rQH2UWxajgsrVk5BPJkiJx8G3eKdyQYmFtGXzPYlCok23SoeBoQJmMWlmtzuFJ N2Xe4PQftLXmPASQHf9O6tDWXoEUm4qYgB27B2mXV4cJExB2Nze5Uptl1f/9GIkNN1bk phkVzo5wgWSokdYFFPbhpohe+eY/U3CTh0TD31X9yD1rtpSn7SUUkXA+C3cQOQyQwG62 rtXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721741936; x=1722346736; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yusSAvRfd/ioZ+bDOVdXvZOOrr/DXwdWhxWzOGauvYU=; b=npvYCJZrC7hss9NazQBPw2761R9ewzwLp4tdjRjJ4j+YAADZCA0tnugpMVlvyiVCNB 3cds5lA4WT7aGQR4Pj45unLs5KQcJM1HHQsTmeu+MLnZmOCYLfEB2XN6emVx4oVxVgNg i8Py0iHchHyqQtFUtOiJ/DzGvdHmFCiH8HQmufFdqSP1WUd5e5eO71VEaDnRLR8oDTnO XNP0Zo2iuUKqyDz1mrVSbZwyoua+HkgTsBJMT10KVmXkq0hK/00Wf8eVg02LU2SlSHMB ef8iCuawbS36KUvIKfjuuiLTgXckDk17fmiX17yvT1e4pZnjtVpRrmvbtYOK7IoniHGB U+wA== X-Gm-Message-State: AOJu0YweJNqFOtzfuaEvqRxBFeC1wD4w4MoGxa6fghBldXqXWpy+3BvC fQ94nqpN99d0oU+VDdVf3JICk5TlDYmWJrrzhxDR4TJB5a5gjAbnW+urNWqSBV4= X-Google-Smtp-Source: AGHT+IH8sMffgQYvkVWYv0CcnvOJkibmRKNAI0zUHLWKAAHTz6w9PjLw+JR4PLpsYysPOFoDg1f8eg== X-Received: by 2002:a17:90b:120c:b0:2c9:9fdf:f72e with SMTP id 98e67ed59e1d1-2cd27431338mr6227037a91.26.1721741936481; Tue, 23 Jul 2024 06:38:56 -0700 (PDT) Received: from [10.254.218.171] ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fd6f29d806sm74197365ad.105.2024.07.23.06.38.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 23 Jul 2024 06:38:55 -0700 (PDT) Message-ID: Date: Tue, 23 Jul 2024 21:38:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process To: Zhongkun He , peterz@infradead.org, mgorman@suse.de, ying.huang@intel.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com> Content-Language: en-US From: Abel Wu In-Reply-To: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: E1EB640160 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: dy13enson3du4315ms14jujay1ho3e4c X-HE-Tag: 1721741937-877563 X-HE-Meta: U2FsdGVkX1/vOTr1UE64mgd8sU1K/wFGQBPgH975F2etZeHsAMfYfNzkjm0TpS6+Axmr4WjO5r6IURLT/yrSJQk/FeuY6YqlIFtquAVBITIpSyVvCIN3KgvkcEoy9DnZDtDuQ5ZZL6zQu9DTsTs87V6rqadTFnKwtwgZPJ42Jt5ycBKvigIrylg07abMA2zWKsgtYs8dg+9/wWEoHpSDlAamigR1apNAocc1yKu4c1ZLXiriPsHdxdvaWai0SB3MIjoDUBjYzP1EssjjLGMzRroECIqVd+mql518ROOQhr8RsVxy5QX5Y3vPD2S46KrnS+AFDThTx0K8sU4yzxigKMntJKXED5XkcPJA64obh7iwkhixQIT71mPdMYvAr4n89VqGyic1xGrGrJJwAqAhpnZMABQcZLxheN7/JbLSz+n0lgecSeFZInAwCr8LcU0CZtd4RCjzjcPQ57Uos/jKkc4TBDAHrdKasfZtLXuMYpabOPUqvKaMZ64KwFXEToxpgjcU7cy9aq27uSpNh92lLTXh9UIDpfJ8fdy51kA+idlI4fhAWCR6T3x82jbpVZf4GaWPT8E2mh9q1C7pWSCIJvYZjo+k/m6hSAoXBa+wW4eu+h7txf6X/2iWPPhAerXdlE498pmXKzKT67a+D80FzpkCTtxEn/7Nm8haj/mVEk94mCwJCYivSqZh5F2FyQbRY140sLHzbH8W9VJJCTvbg8RAfC4sgViilU9YXuZK2Qp3t2bCAWw9Xo8MZdirL/4iOuNhVS3DuVFxHoP/DJFHgvhDHpCtjzeVDN2smblBq+beNTGxxuNPSDFp6PtxNQMRZn7fCPx690HUIDstDAssb8qGe+1y0zagXzn//zBomwdh36mZaQkufi8Fper2S1XZnk8XAV0wWcKLvh2xnAyY/HZvHga446gmmnErb4MgOjTp8lqlCii6+3qLUfiGLVdL6KM5BE+u35r3B4OADAJ Q8C2EnHS HAPg9EPoI2RzX34Ay6uULtCYQbmfEIaE8Jv6/hszaXOE++yaWJ0F9PBeWGb522yVmxLS07z0XcbNQlfF7q+Nkq11S4uZfuPv10TPt7NR/fHvWopz47mSUqG3DyLnol2BChGWRVFAldofwY3As6YFIUalPDwnIw4VmxsuCJWH6kQkkcFhVPpBSCz1+X0TRHugoTY7XFNxxwpS9AzE6PjyPYujeBPZqWbvM1iZMdLfLGLz9qosxseioARfNz5+67W/IAbjQ59gubHp/8I2T5yIksaNdN+Qg9Rqy5+1B06LMfdGz/XV7vhcWfFr6/lA2HBoNgDafssWU6gWCdCyDnG4m+ZCdLksiLcPg8fYTo6NDW2UupQg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Zhongkun, On 7/23/24 1:32 PM, Zhongkun He Wrote: > I found a problem in my test machine that the memory of a process is > repeatedly migrated between two nodes and does not stop. > > 1.Test step and the machines. > ------------ > VM machine: 4 numa nodes and 10GB per node. > > stress --vm 1 --vm-bytes 12g --vm-keep > > The info of numa stat: > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done > anon N0=98304 N1=0 N2=10250747904 N3=2634334208 I am curious what was the exact reason made the worker migrated to N3? And later... > anon N0=98304 N1=0 N2=10250747904 N3=2634334208 > anon N0=98304 N1=0 N2=9937256448 N3=2947825664 > anon N0=98304 N1=0 N2=8863514624 N3=4021567488 > anon N0=98304 N1=0 N2=7789772800 N3=5095309312 > anon N0=98304 N1=0 N2=6716030976 N3=6169051136 > anon N0=98304 N1=0 N2=5642289152 N3=7242792960 > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > anon N0=98304 N1=0 N2=4837007360 N3=8048074752 > anon N0=98304 N1=0 N2=3763265536 N3=9121816576 > anon N0=98304 N1=0 N2=2689523712 N3=10195558400 > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 > anon N0=98304 N1=0 N2=2515148800 N3=10369933312 .. why it was moved back to N2? > anon N0=98304 N1=0 N2=3320455168 N3=9564626944 > anon N0=98304 N1=0 N2=4394196992 N3=8490885120 > anon N0=98304 N1=0 N2=5105442816 N3=7779639296 > anon N0=98304 N1=0 N2=6174195712 N3=6710886400 > anon N0=98304 N1=0 N2=7247937536 N3=5637144576 > anon N0=98304 N1=0 N2=8321679360 N3=4563402752 > anon N0=98304 N1=0 N2=9395421184 N3=3489660928 > anon N0=98304 N1=0 N2=10247872512 N3=2637209600 > anon N0=98304 N1=0 N2=10247872512 N3=2637209600 > > 2. Root cause: > Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded > PTEs which are on the right node")the PTE of local pages will not be > changed in change_pte_range() for single-threaded process, so no > page_faults information will be generated in do_numa_page(). If a > single-threaded process has memory on another node, it will > unconditionally migrate all of it's local memory to that node, > even if the remote node has only one page. IIUC the remote pages will be moved to the node where the worker is running since local (private) PTEs are not set to protnone and won't be faulted on. > > So, let's fix it. The memory of single-threaded process should follow > the cpu, not the numa faults info in order to avoid memory thrashing. Don't forget the 'Fixes' tag for bugfix patches :) > > ...> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 24dda708b699..d7cbbda568fb 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_struct *p) > numa_group_count_active_nodes(ng); > spin_unlock_irq(group_lock); > max_nid = preferred_group_nid(p, max_nid); > + } else if (atomic_read(&p->mm->mm_users) == 1) { > + /* > + * The memory of a single-threaded process should > + * follow the CPU in order to avoid memory thrashing. > + */ > + max_nid = numa_node_id(); > } > > if (max_faults) { Since you don't want to respect the faults info, can we simply skip task placement?