From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 346EDC48260 for ; Fri, 16 Feb 2024 06:17:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C01998D0007; Fri, 16 Feb 2024 01:17:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB0938D0006; Fri, 16 Feb 2024 01:17:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A51F38D0007; Fri, 16 Feb 2024 01:17:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9238B8D0006 for ; Fri, 16 Feb 2024 01:17:24 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1FB7C14B419 for ; Fri, 16 Feb 2024 05:26:37 +0000 (UTC) X-FDA: 81796532034.19.81748D8 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf14.hostedemail.com (Postfix) with ESMTP id D7E0F100012 for ; Fri, 16 Feb 2024 05:26:31 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf14.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708061195; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tzWHMEnBL9FPm1ni0RUPRdMUAYWLVtMLo4KIA064FEU=; b=LmnoajjN95hxfOXus5kyq/RBt/k3w0h8t7FJiA4/CMXgNAp/Lx7CHffS1NdT8mE6AsnRpR mmg/ugv/yJAm9gvwV6jrOQA6nOh+GAdCeN/09CHSKq3eA89rwP8Z1FYrdlFYC8yGJbMgoQ 6UHyoCxlIqqYjkyR6Os7NT6697KaSqw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf14.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708061195; a=rsa-sha256; cv=none; b=siazEMEsQLGKfSoHskT05yn2LMuIMo75s3P9OZFuXfMHJyeAJrvmhF/VZii5imOU2EFQ7s 4tKDV86HDG4g3cb9BMcBHZkEfZN0OhOq1Q3pf7y5YbIEXbC1bcqV0buihlgK2wTv/VkHI8 z3qoLZgZcafVMWhYuFekK723rbwH7Qc= X-AuditID: a67dfc5b-d85ff70000001748-73-65cef2034c86 Date: Fri, 16 Feb 2024 14:26:22 +0900 From: Byungchul Park To: Phil Auld Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org Subject: Re: [PATCH] sched/numa, mm: do not promote folios to nodes not set N_MEMORY Message-ID: <20240216052621.GA32626@system.software.com> References: <20240214035355.18335-1-byungchul@sk.com> <20240214123137.GA70927@lorien.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240214123137.GA70927@lorien.usersys.redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjkeLIzCtJLcpLzFFi42LhesuzSJf507lUgwtnZS3mrF/DZnHp8VU2 i+kvG1ksnk7Yymxxt38qi8XlXXPYLO6t+c9qMfndM0aLSwcWMFmc2HuW0eJ47wEmi30dD5gs Oo58Y7bYevQ7uwOfx5p5axg9WvbdYvdYsKnUY/MKLY9Nnyaxe9y5tofN48SM3ywe7/ddZfPY fLra4/MmuQCuKC6blNSczLLUIn27BK6MQ78PsBf81qxoO3+aqYHxhGIXIyeHhICJxPMn71i7 GDnA7L9LxUDCLAKqEg1P57GA2GwC6hI3bvxkBrFFBBQk3k3vYu9i5OJgFjjFJDHh8wV2kF5h gTCJv28UQGp4BSwk/n/cww5iCwlkScyacp4RIi4ocXLmE7CZzAJaEjf+vWQCaWUWkJZY/o8D JMwpYC8x7Vc/WImogLLEgW3HmUBWSQi0s0ss+zGBCeJkSYmDK26wTGAUmIVk7CwkY2chjF3A yLyKUSgzryw3MTPHRC+jMi+zQi85P3cTIzCWltX+id7B+OlC8CFGAQ5GJR7eA3/OpgqxJpYV V+YeYpTgYFYS4Z3UeyZViDclsbIqtSg/vqg0J7X4EKM0B4uSOK/Rt/IUIYH0xJLU7NTUgtQi mCwTB6dUA6PnpuomucLjDLfq2H/s/CZ69DiXZrvoQx79P/bnDFk3THS6w/4s+c/+g2md+9n+ hZdGPzgg+4HhgfnuaOPPi8qWn2gWT53YwJ0zk/fuN3a+8pOpghkiG/bNVJF6Ldj8VPjR1rrP d7rnOXX+/afz/VdOmwzDTNHv6RPb/fa0y7C88+0y3VqoOFOJpTgj0VCLuag4EQAJH9QvoQIA AA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrKLMWRmVeSWpSXmKPExsXC5WfdrMv86VyqwcafAhZz1q9hs7j0+Cqb xfSXjSwWTydsZba42z+VxeLw3JOsFpd3zWGzuLfmP6vF5HfPGC0uHVjAZHFi71lGi+O9B5gs 9nU8YLLoOPKN2WLr0e/sDvwea+atYfRo2XeL3WPBplKPzSu0PDZ9msTucefaHjaPEzN+s3i8 33eVzWPxiw9MHptPV3t83iQXwB3FZZOSmpNZllqkb5fAlXHo9wH2gt+aFW3nTzM1MJ5Q7GLk 4JAQMJH4u1Ssi5GTg0VAVaLh6TwWEJtNQF3ixo2fzCC2iICCxLvpXexdjFwczAKnmCQmfL7A DtIrLBAm8feNAkgNr4CFxP+Pe9hBbCGBLIlZU84zQsQFJU7OfAI2k1lAS+LGv5dMIK3MAtIS y/9xgIQ5Bewlpv3qBysRFVCWOLDtONMERt5ZSLpnIemehdC9gJF5FaNIZl5ZbmJmjqlecXZG ZV5mhV5yfu4mRmBkLKv9M3EH45fL7ocYBTgYlXh4D/w5myrEmlhWXJl7iFGCg1lJhHdS75lU Id6UxMqq1KL8+KLSnNTiQ4zSHCxK4rxe4akJQgLpiSWp2ampBalFMFkmDk6pBkZR9VPzRd6l CkUq/JrLfk9YS12MgS361cX/0qEsLL/5jj/jd7jyccn39yKWUTd3C4mUbgiZL3rRq9dU8rp5 yS+vqowDQkcsDwsHP+uetzxu7p60nNvPP1zKWSwa+u/2hvtLm41ZKtkatj8NkVHM6vF4+a+H Z576sw3mnhvlNyyaw5oib978wleJpTgj0VCLuag4EQBti/rFiAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: D7E0F100012 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: g3q1kcg9xupgwqaqrqxxppj4yzp6obwz X-HE-Tag: 1708061191-719747 X-HE-Meta: U2FsdGVkX18VJUrLetTeUmvSmsBirJzTHogu8w6aOACGisaB7Yg52RcskUPlIhqBjl4XIB/5cJ7Y6Wz0vcmuxRV4q869lb9C5rCka28IhHuORGL7kwyFIHnojWEDkJV4E9IwSenu3KTsf80bmEtfDa56AtfUYhcDB8q9PNhhlifBlVONTe9wW5SfaTd5DtHPX4ABsFWOgnbRMP78jc6xIiWY+9iSAr2rd9w1vRi8L9r/aEydZQwNgbGNS3h332KiEcjGPESkSdHsSdZVkAU69JgHM9mt0AD2TpZFzsKuI/xvzxHj2oJVCuFkUwqZGvAOku3qerrbJkQcV0Ifm2bc/iQmhkD+BjLBMSjqvdDwMq+5t2Aa53m9RRQxO7c4DsuU+TKYC3GyGKjfQ3wz7bYIu1CgG8awjV268N7p/ukUxxcshaMB7ADSbpcc3ZWvMCYQuHrLng/ooJQLQlWSwJd9E+DDnv7oeo6H4i5lFtYJXF2OV96+51nuEJ1n1ZZmATlq69melL7dJN5OAuCRC9EzSVX+/VrZFN94eg5QmR5u9QSk6bBQw8pzdZnD+xLucfLoQdnfuWyvrWiwyD9RHk4r2/axeFdOHSmShj3eXkcPH0VWV7gOjeVqNg47MZoc0QidlOlmBVlSg28oeleLE9n3QM1aTAnlXXfvg5/DuYkE+kZj1GoRTz3Q9UrWZ0S2SlgaS/rYgNLlm95tPQ8tv5ajl/eVj2bP7D3AHickXOjJxAVmR9ckP5VSgBYaUxNWkgajSn4Xbyx6SfVUjlPLwDQhLOh5BADwhzBA5rbvQXfKyaJ2kZQN4/wB0nGsnXpmy4RQSzjYWTc+oyzhpk78f5zHevqsxleWxYUHAHSDDT9vayajyjO061fJ6E078CiJZ+Vbhv4CL9UDRXOs7jGnPneEPwzVeT9KxAihMYJMO471zrsTZQbncgv5v76Y451xjGba1SJiXmvHpl4FDlCU2Y7 4olhwip3 wdSeJX8/afBPf9U3T8wc/WvgKaH2J/HUUw5/eNJnsV+hcS08PSsW6OCXJ9YNI3SFsBmPvyF/uawokMVGAQ8sU5vW6uI8Ud+fxI88efAuRad8X7q4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 14, 2024 at 07:31:37AM -0500, Phil Auld wrote: > Hi, > > On Wed, Feb 14, 2024 at 12:53:55PM +0900 Byungchul Park wrote: > > While running qemu with a configuration where some CPUs don't have their > > local memory and with a kernel numa balancing on, the following oops has > > been observed. It's because of null pointers of ->zone_pgdat of zones of > > those nodes that are not initialized at booting time. So should avoid > > nodes not set N_MEMORY from getting promoted. > > > > > BUG: unable to handle page fault for address: 00000000000033f3 > > > #PF: supervisor read access in kernel mode > > > #PF: error_code(0x0000) - not-present page > > > PGD 0 P4D 0 > > > Oops: 0000 [#1] PREEMPT SMP NOPTI > > > CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > > RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812) > > > Code: (omitted) > > > RSP: 0000:ffffc90004257d58 EFLAGS: 00010286 > > > RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003 > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480 > > > RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff > > > R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003 > > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940 > > > FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > PKRU: 55555554 > > > Call Trace: > > > > > > ? __die > > > ? page_fault_oops > > > ? __pte_offset_map_lock > > > ? exc_page_fault > > > ? asm_exc_page_fault > > > ? wakeup_kswapd > > > migrate_misplaced_page > > > __handle_mm_fault > > > handle_mm_fault > > > do_user_addr_fault > > > exc_page_fault > > > asm_exc_page_fault > > > RIP: 0033:0x55b897ba0808 > > > Code: (omitted) > > > RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287 > > > RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0 > > > RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0 > > > RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075 > > > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 > > > R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000 > > > > > > Modules linked in: > > > CR2: 00000000000033f3 > > > ---[ end trace 0000000000000000 ]--- > > > RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812) > > > Code: (omitted) > > > RSP: 0000:ffffc90004257d58 EFLAGS: 00010286 > > > RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003 > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480 > > > RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff > > > R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003 > > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940 > > > FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0 > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > PKRU: 55555554 > > > note: masim[895] exited with irqs disabled > > I think you could trim the down a little bit. Thank you for the feedback. I will. > > > > Signed-off-by: Byungchul Park > > Reported-by: hyeongtak.ji@sk.com > > --- > > kernel/sched/fair.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index d7a3c63a2171..6d215cc85f14 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -1828,6 +1828,23 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio, > > int dst_nid = cpu_to_node(dst_cpu); > > int last_cpupid, this_cpupid; > > > > + /* > > + * A node of dst_nid might not have its local memory. Promoting > > + * a folio to the node is meaningless. What's even worse, oops > > + * can be observed by the null pointer of ->zone_pgdat in > > + * various points of the code during migration. > > + * > > > + * For instance, oops has been observed at CPU2 while qemu'ing: > > + * > > + * {qemu} \ > > + * -numa node,nodeid=0,mem=1G,cpus=0-1 \ > > + * -numa node,nodeid=1,cpus=2-3 \ > > + * -numa node,nodeid=2,mem=8G \ > > + * ... > > This part above should probably be in the commit message not in the code. > The first paragraph of comment is plenty. I will. Thanks. I will respin it. Byungchul > Otherwise, I think the check probably makes sense. > > > Cheers, > Phil > > > + */ > > + if (!node_state(dst_nid, N_MEMORY)) > > + return false; > > + > > /* > > * The pages in slow memory node should be migrated according > > * to hot/cold instead of private/shared. > > -- > > 2.17.1 > > > > > > --