From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE593C3DA61 for ; Wed, 24 Jul 2024 03:55:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 384046B007B; Tue, 23 Jul 2024 23:55:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 333B76B0082; Tue, 23 Jul 2024 23:55:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2223E6B0083; Tue, 23 Jul 2024 23:55:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 044A66B007B for ; Tue, 23 Jul 2024 23:55:46 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5E3D5C0604 for ; Wed, 24 Jul 2024 03:55:46 +0000 (UTC) X-FDA: 82373282292.08.A4129CF Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf19.hostedemail.com (Postfix) with ESMTP id F22A41A001B for ; Wed, 24 Jul 2024 03:55:43 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=EBxB4Sc1; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf19.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721793306; a=rsa-sha256; cv=none; b=DJ4nP0dEE70uylmk9/J/DSL6IVLh0w+Mwn4666NY+U8sHbrZE60r6cOieaIxENbikpDNMQ BrT61zd8M9oco8DXJAxQWrUck1NCADqheoHk+n8srghtsLExsjsS0DVyRZFg8bvGwKVvyX b9UzKyrLaEU77gjBbh1zEzDgrFZYelY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=EBxB4Sc1; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf19.hostedemail.com: domain of hezhongkun.hzk@bytedance.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=hezhongkun.hzk@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721793306; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UjW83MMgMNmNykm863/7zI9UniCx1U6HkyFOsE4xJeo=; b=ByH4GnA6G4kkNiaOxk1bJrvsbQg+N1sVW3WezGaVmisWAqGNC5Kbi90XXBREo1EqXYguBD KX87L6Wf0/eVQRz779cFejHOQNYuSRQB/A8T+9/1U4FnvsN9cijMj3EIG2z+JrzqA9jBpX c9quX+Jst/QTu5cKzlxz5Mkht8ME2Gg= Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-52f04c29588so4066284e87.3 for ; Tue, 23 Jul 2024 20:55:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1721793342; x=1722398142; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UjW83MMgMNmNykm863/7zI9UniCx1U6HkyFOsE4xJeo=; b=EBxB4Sc1i4e0QuzyeAp9OW/k6NVj3GiCxJLaw7E+UOxIclairpPyEtzA2A7F42Yivj /8Og0l6Rj3gfszv6P1dUXxz3QYbYtrVwISxjQycGmHHlC+v2rU5IjY0CpkCFBgoIqwyA iYY5Urwk0etEr+SMfP208oaFm7OQVKwYMUqrFwQdZ1SEc6U1rJN1A0YL1zoCetERAKLo gnq3eOhuSadTrF6e8oETDyEU1pbjykWsi9M6RfkLOmlRU3BuWTQmU7Y4OlR1LkDQkTbC 8ZLYVfRjTQer3XNij7G9mMy4A8/x+Y182A+HmWz4TuYpD2vX/Kl8aw1Dtt+1xaPoCMIV Ydaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721793342; x=1722398142; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UjW83MMgMNmNykm863/7zI9UniCx1U6HkyFOsE4xJeo=; b=wUPDwp06FFUjype/bYEXpUS6Qayq0AwbHhwn2YpGDY/CeGffBytiFYdVOHmOss+0xb FrT86MBDmd7xROhGUImHbPyUvl/2om5X9UcmxLH2PmkdMlsARrTkkYX5fWXdrjRHtzHr dQ2TD0/pg3VI9IrgH9ahDk2seFJzpD5xrd4+x9uHerE9WfI/WdrNQ6C97keQ4WLD5xwr hwUh7RWxNzh5N2+yT8uL6t7YheF/zXicWdhgXEyYX/xS5jxVCKFWiMYvyyC4hSzDkaFk RdOfvNAAtHMhQ+J7hNXbNYJPILYr2+jnH2KpAAW7UoLANqfuxn8AoDile3EyA/jLGoaZ EgaA== X-Forwarded-Encrypted: i=1; AJvYcCX0WdNa931IFipVZmJyHIhWp7k/Rj8p/06akTyadmDzrNnl59Knb/dRmn2OhulayO8ls5V41HdGveaNjqNnTv01niY= X-Gm-Message-State: AOJu0YyvERGOHlOkYPdyGwWDtrrqlxlSeAGYtvqrrov83WoLKE0HOw4r 934oCFJTNzCj1fHOmwnreubgd051qwUiS0h5X9SICqfUg5AsNGTFX2aJzIvxDm9ZBVeoVJYEjvp 0KbkUSjXCQ+QGVEgXl4QLYqZ1FYY7bsZMSYddgg== X-Google-Smtp-Source: AGHT+IHEK8eWMDuG+P/puyBvh9nmKaBaWxBzZXX48bttB4uMjDGsoqa8ttv7WAuSIeFE+8u7Y1gl68i9+8i4nyAlx+Y= X-Received: by 2002:ac2:4e05:0:b0:52e:f463:977d with SMTP id 2adb3069b0e04-52efb79830emr7565710e87.20.1721793341889; Tue, 23 Jul 2024 20:55:41 -0700 (PDT) MIME-Version: 1.0 References: <20240723053250.3263125-1-hezhongkun.hzk@bytedance.com> In-Reply-To: From: Zhongkun He Date: Wed, 24 Jul 2024 11:55:30 +0800 Message-ID: Subject: Re: [PATCH v1] mm/numa_balancing: Fix the memory thrashing problem in the single-threaded process To: Abel Wu Cc: peterz@infradead.org, mgorman@suse.de, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F22A41A001B X-Stat-Signature: b9bd41rjcw8hnt4padxauxey6hosipcc X-Rspam-User: X-HE-Tag: 1721793343-789252 X-HE-Meta: U2FsdGVkX1/EOek66zo8NlNJHkIILAEE4DZHNG/mi7ZFad1vWGzdOFGju9rtb2MMOih1Gkx6NXITFsX75MRwxQkLmZi0BckZVuny9NPp7LI7u8vVpUNVmpnJBc0VXPD04u1xi01BbtWhVPDYDWJHXkycbF49L0aGSnvoeK7etdCsiSHTtCbzb11U8wGZvTydaINtq4du4JPNDB0OxvPnO4Va7CjWbZ5RdWvXyooHzC+rTjAW2brf1RlTkWwHCZQQW+400ApKyAVfeDFiaDLS7uo90gRlY9/r78nShnwMdNipE4Uz1V0I6wymty1+PLiv6/9sxW1q4E4qC4ow7m0Dlzohv7kqfndcAQJLjDt0Bq8hPyrVtAceVq2ZDkwd/J4XBOI3CRB2ctcYZbJv1iDUWQaONlk5jKDzEWZjugnKp0isWIdMFw4Q986/iaFgJbqmQTMYxGQDBPaduCrKFlFPZaw3LWNLSKhw0Mbge9i1BbOltHd08YO6k6WoaGE0iZRhhJSDnMzK2pcN9z82A/BhRMEhcOfBZWFUAruZW5FJ0EqDv/L5FOO2o8N2GB1aqkKYG+S62OsWBo0B1uxlCKN2ZzkCb9bWfYdvklybN9ZLg6W6JTGdADvA4jHd0wtO6rGQprwyIA0phNX8ClENqePyv60KIRyvFI7ZIf7wAFHVjmyiNlMLzT5VqjAsW1Nlw7lKqaWbwR7aO5RExwiFoz35Xbb+oqELPKhOuNB3PYyi5ULpv9xEhIFhdG80OgObC/Wt7Y9gWTCMSHKx9HDHd5FBzzUGA+C0oixejfCM+vcADKa0mhj4PimUsPAwRG5J5RNQoRaDM6eOTH7KKKGtTwGwJcXAp82Gt7pVl678dcoa6MJOkIwF569WZ40E4gj+lehKteVrpRPcpbkQUMLDHzdiq5EmvpZQ+3TSCOBzl7TvGGb89K2y3kxZ8q2LcY7Fn8Ohz4F+T+/f4QswiCATmEt 5m05Xsvv mVVxwUeE6BRfljoEU1O2QHKUPQHaLFcebZNGTk8QH9Z5df7Hd7+FopBrJCTOcxIY9BHa1Ycv/qhgKIzDz3Rgy9/y4nbZ6X7OIqcruDRhY6gVma9IIHBbWima+WO5QBO9z4n6UJlNgtqAKyfxogQlaeRTcHbTA9II6kwFgzuITglNz+olpToZC7ezAqAYcvFHC/tATNLrTPN6fBOsXZLd6xhyjYGERe2zbE33sCNsqhHehaQzAbfHSms0Scrp+Sz6VpyY2GMh+43GfG9WNLJ6d9tPqLxM71vck4PGq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000976, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 23, 2024 at 9:39=E2=80=AFPM Abel Wu = wrote: > > Hi Zhongkun, > > On 7/23/24 1:32 PM, Zhongkun He Wrote: > > I found a problem in my test machine that the memory of a process is > > repeatedly migrated between two nodes and does not stop. > > > > 1.Test step and the machines. > > ------------ > > VM machine: 4 numa nodes and 10GB per node. > > > > stress --vm 1 --vm-bytes 12g --vm-keep > > > > The info of numa stat: > > while :;do cat memory.numa_stat | grep -w anon;sleep 5;done > > anon N0=3D98304 N1=3D0 N2=3D10250747904 N3=3D2634334208 > > I am curious what was the exact reason made the worker migrated > to N3? And later... The maximum capacity of each node is 10 GB=EF=BC=8C but it requires 12GB, so there's always 2G on other nodes. With the patch below we only have page_faults in other nodes, not local. so we will migrate pages to other nodes because p->numa_preferred_nid is always the other node. > > > anon N0=3D98304 N1=3D0 N2=3D10250747904 N3=3D2634334208 > > anon N0=3D98304 N1=3D0 N2=3D9937256448 N3=3D2947825664 > > anon N0=3D98304 N1=3D0 N2=3D8863514624 N3=3D4021567488 > > anon N0=3D98304 N1=3D0 N2=3D7789772800 N3=3D5095309312 > > anon N0=3D98304 N1=3D0 N2=3D6716030976 N3=3D6169051136 > > anon N0=3D98304 N1=3D0 N2=3D5642289152 N3=3D7242792960 > > anon N0=3D98304 N1=3D0 N2=3D5105442816 N3=3D7779639296 > > anon N0=3D98304 N1=3D0 N2=3D5105442816 N3=3D7779639296 > > anon N0=3D98304 N1=3D0 N2=3D4837007360 N3=3D8048074752 > > anon N0=3D98304 N1=3D0 N2=3D3763265536 N3=3D9121816576 > > anon N0=3D98304 N1=3D0 N2=3D2689523712 N3=3D10195558400 > > anon N0=3D98304 N1=3D0 N2=3D2515148800 N3=3D10369933312 > > anon N0=3D98304 N1=3D0 N2=3D2515148800 N3=3D10369933312 > > anon N0=3D98304 N1=3D0 N2=3D2515148800 N3=3D10369933312 > > .. why it was moved back to N2? The private page_faults on N2 are larger than that on N3. > > > anon N0=3D98304 N1=3D0 N2=3D3320455168 N3=3D9564626944 > > anon N0=3D98304 N1=3D0 N2=3D4394196992 N3=3D8490885120 > > anon N0=3D98304 N1=3D0 N2=3D5105442816 N3=3D7779639296 > > anon N0=3D98304 N1=3D0 N2=3D6174195712 N3=3D6710886400 > > anon N0=3D98304 N1=3D0 N2=3D7247937536 N3=3D5637144576 > > anon N0=3D98304 N1=3D0 N2=3D8321679360 N3=3D4563402752 > > anon N0=3D98304 N1=3D0 N2=3D9395421184 N3=3D3489660928 > > anon N0=3D98304 N1=3D0 N2=3D10247872512 N3=3D2637209600 > > anon N0=3D98304 N1=3D0 N2=3D10247872512 N3=3D2637209600 > > > > 2. Root cause: > > Since commit 3e32158767b0 ("mm/mprotect.c: don't touch single threaded > > PTEs which are on the right node")the PTE of local pages will not be > > changed in change_pte_range() for single-threaded process, so no > > page_faults information will be generated in do_numa_page(). If a > > single-threaded process has memory on another node, it will > > unconditionally migrate all of it's local memory to that node, > > even if the remote node has only one page. > > IIUC the remote pages will be moved to the node where the worker > is running since local (private) PTEs are not set to protnone and > won't be faulted on. > Yes. > > > > So, let's fix it. The memory of single-threaded process should follow > > the cpu, not the numa faults info in order to avoid memory thrashing. > > Don't forget the 'Fixes' tag for bugfix patches :) OK, thanks. > > > > > ...> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 24dda708b699..d7cbbda568fb 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2898,6 +2898,12 @@ static void task_numa_placement(struct task_stru= ct *p) > > numa_group_count_active_nodes(ng); > > spin_unlock_irq(group_lock); > > max_nid =3D preferred_group_nid(p, max_nid); > > + } else if (atomic_read(&p->mm->mm_users) =3D=3D 1) { > > + /* > > + * The memory of a single-threaded process should > > + * follow the CPU in order to avoid memory thrashing. > > + */ > > + max_nid =3D numa_node_id(); > > } > > > > if (max_faults) { > > Since you don't want to respect the faults info, can we simply > skip task placement? This is a good suggestion. It would be even better if there were some feedback from others.