From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDA32CDB47E for ; Wed, 18 Oct 2023 11:56:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B6238D0155; Wed, 18 Oct 2023 07:56:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 665988D0016; Wed, 18 Oct 2023 07:56:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 554348D0155; Wed, 18 Oct 2023 07:56:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 43B178D0016 for ; Wed, 18 Oct 2023 07:56:17 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1BEB6160241 for ; Wed, 18 Oct 2023 11:56:17 +0000 (UTC) X-FDA: 81358429194.06.5B33956 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 4312A14002E for ; Wed, 18 Oct 2023 11:56:13 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=A472Y+Aj; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of mcasquer@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mcasquer@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697630174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DC7/vk7dP6oRIanNIVB76oxiPb7iU/8UpfTJ3/llL28=; b=PSxL3atJyySXziI5VQ0g9xp1yfEQzczQTTKkLXmPDXdUXIzcFqgoFEhCWCA94Qlf1Not24 HPGWMhwXn78dSaoB9EXII8V9e5WHb0GKQHKxzvukMTqJsYpTkjckPTTOtq399PWERSYklz hBUjEY4lWmH1J3ysCuXZ3OYOVQkKIis= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=A472Y+Aj; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of mcasquer@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=mcasquer@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697630174; a=rsa-sha256; cv=none; b=qTvBZ3m91BtAlnQOdT2O9yMU76azVVTeAbHZxBhCz+6bZiCP9Kfa4gbJwcNUmykiw44usO i3dHpVitIpnuUlnsaQKaW7ihLBXftuscs73i92yfkUQ05wh+STz4kJ1LnAvGfZClRyYAEt PS/X9JnT/HhJxieLdrR3ce3yWYKYwpY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697630172; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DC7/vk7dP6oRIanNIVB76oxiPb7iU/8UpfTJ3/llL28=; b=A472Y+AjnUnrn8zwX9erLGoszqk2t7o/7nTn/OJAakHpIALTm2BYOqsqUtKeVMs5L4OMtb aLMk1wvjt/BebAzthNMNRvFosHQvRnv0cnH4os7AhqZAZbIXHvS7nV8ntGlYC2XLjzmt2I 9QPQUrMkLBzPkmiV27tt8WRpGIrqHMg= Received: from mail-lj1-f197.google.com (mail-lj1-f197.google.com [209.85.208.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-328-slYtQGhnPfmHk8nb3hrjpg-1; Wed, 18 Oct 2023 07:56:10 -0400 X-MC-Unique: slYtQGhnPfmHk8nb3hrjpg-1 Received: by mail-lj1-f197.google.com with SMTP id 38308e7fff4ca-2c503af866dso54450651fa.2 for ; Wed, 18 Oct 2023 04:56:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697630169; x=1698234969; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DC7/vk7dP6oRIanNIVB76oxiPb7iU/8UpfTJ3/llL28=; b=VOdhCaDtZkB4kxOBxX0o9HcH7snXGRCtPKr6Jd6T196niNcuh2NrTFqSQ/VB/Vsoaq zJ0R5BQZEq0dNKl97ht1/WsuTUBfp+ZnUKyDRQc06KCJFDsnPD5BXFGHb1BxXCLa9uwS XL2SPmK4SBK/DReDDp95dhQqXkUCZALhonqi5uxomwPEXSSaFZMTnY9rR6sHlNLJuGPQ NaE2jyIZzX9lyQ9byROoS8q3C3cSyrAeLJLbckExvAMmVlQMGGoVvfMR4PdYx0k8V1zp rzFBc2kZ4vSlGR4hI+H8NjP+7ENs+ouQjDKTJt+3W8zTFdctABNMrUhv0pUuT2vOVSzZ UohQ== X-Gm-Message-State: AOJu0YzoFJrf0Ec/yqe1ckk6jONtDwsMe90fwPy4FM7skOYdfepSx3gp A1RV2DWY9eznSnG4LNy37gTScgl7D9RQOYDdwq9ypQ8UiZg3GqcEw1Eig/9OGQmwDexs38ubvd6 Vs+3I9JqIeJ02sgQnNVegcvFSYPc= X-Received: by 2002:a05:651c:2125:b0:2c5:24a8:c22d with SMTP id a37-20020a05651c212500b002c524a8c22dmr5030333ljq.3.1697630169110; Wed, 18 Oct 2023 04:56:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFUDvEZOuEJOYIcDuFZMfpkkOMV3E7K+kqYYpIEhH+i6S9WOeZz7MEeBQbHWN5Bm4gIOzCQMbs98DbAfm4w/RA= X-Received: by 2002:a05:651c:2125:b0:2c5:24a8:c22d with SMTP id a37-20020a05651c212500b002c524a8c22dmr5030303ljq.3.1697630168731; Wed, 18 Oct 2023 04:56:08 -0700 (PDT) MIME-Version: 1.0 References: <20231017062215.171670-1-rppt@kernel.org> In-Reply-To: <20231017062215.171670-1-rppt@kernel.org> From: Mario Casquero Date: Wed, 18 Oct 2023 13:55:57 +0200 Message-ID: Subject: Re: [PATCH] x86/mm: drop 4MB restriction on minimal NUMA node size To: Mike Rapoport Cc: x86@kernel.org, Andrew Morton , Andy Lutomirski , Borislav Petkov , Dave Hansen , David Hildenbrand , "H. Peter Anvin" , Ingo Molnar , Michal Hocko , Peter Zijlstra , Qi Zheng , Thomas Gleixner , linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 86bj5fx936359pzkqdurbdjan3cxccnn X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4312A14002E X-HE-Tag: 1697630173-434418 X-HE-Meta: U2FsdGVkX19C+Gir+T3UzkUlOMcTTfipTHcZpAlm2FfxfnoKgzoRCABWkrVHqo/GN8ZMduDSy51NVDLtxSgv/ePZgNHNrKEkhkL3tZWhf21yymXr1YM32PQI5lzx6S/2n4RmKIyFIX6tr06coMvFO+4mEjGD/8NFbGT3fgcQFdKCaFMPTjxeKQhk1tmk02f4FI7DcRqsdVY8znZGprEs/Uz0lf3Rxo7kORoTQdIzSVwIcTlzm//D5bkiajJRhtBtIVUxNW09qHYC6JuZAiwFm8gKHz1CxdwRzrVAtWGi0a00TC/lXFuwYBLmnjRxLFYYdpw5qaoTtVmSxVcJpzrIjSmJI4Elr+A9i+nz82rmr+TM7esEG4SHpTbAZx6Qy/Mi5ue5YfY9hPCYyOL/WDw7VUEOaAoSZk8mEGLqPtNuXPcfZecD6+RfRfeARS0Hyql/yM5017Lg5RhRfejCyDkuXBwTNTqRBmGOxaibJrp4Fq+Ymk+LTa9wJjmFRte6DTaQEClbU2LQvoRmrDuUKMufWYyU9e0TYOz80g87ERU6LIeXpS78x+VubNjmemBTS0+Bp7BU5QDfHkfchdSXAxTYJ+eRDEchrIjql2w+FtBvga6Jd5Yj1fCoRliz8AX0XOfGre9HdXvav4RB3CA2TjepHHBaXo57Go480yeEkcd5AcSC1GX7TnSJ812Nq1BtLDMS/Fk812M4RWnRPH1SRCsvAT0WnP9+sB81l+Gtg4Qx29+TfWYD2gWa4x4q6PSR9+h7eXBzvY+MuybFw5MsOv9pzgNOALqifDEYPb1O0u1jbhcsSuGZM6nEPi2ew58oiGK69zdTNNymLQilIqrgvjTkEORtlQOLxm6Ck2GQzC67Esit4pZFdCjNEhp9TqdcG9gHV63gyZ2BSsWIagx8YKZX39Fqn0IAyl0zn3jKoo4E9qNR2zLf48OlDZZF8FMbBYqNYNC9WNg/SXBZPbnXxI0 ZYnoZXWX QG1FKJtHoNxKGsAyovMxGDbk9MfFe+5KaYHNxI2tPZ1FrgmY4JuS+DKKM/PNJKz4KSP6jPbD0ImZJFmTCtEfs2jt84Np9aJdcY0zolh71nHbQY+kqQ7+MQKYEcW9n7sukP5DxP1u6mpDH9amLDKhnG8obYSBT7xgNeHmmi/k/D2TM8bDaaRvbNsGWQFEYd7W/SUYUXfpzjarkelAkROZiTkucdDQlCi62PzXA4+5tHu1wxWhtocIGxLT8XCGt88qPLCIOKfRBmWJ/NARd2AXq6bqIZeIuTchTUW48AJncQdTxZ+8QqqHfkmJlVmZe3MSIt2qn5awD3C1Qsol8KjpUlh/SumSgIeUJpzs8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch has been successfully tested by QE. Start a VM with two NUMA nodes, one of them with less than 2M of memory. Check there is no kernel panic and the VM boots up smoothly. Tested-by: Mario Casquero BR, Mario On Tue, Oct 17, 2023 at 8:24=E2=80=AFAM Mike Rapoport wro= te: > > From: "Mike Rapoport (IBM)" > > Qi Zheng reports crashes in a production environment and provides a > simplified example as a reproducer: > > For example, if we use qemu to start a two NUMA node kernel, > one of the nodes has 2M memory (less than NODE_MIN_SIZE), > and the other node has 2G, then we will encounter the > following panic: > > [ 0.149844] BUG: kernel NULL pointer dereference, address: 000000000= 0000000 > [ 0.150783] #PF: supervisor write access in kernel mode > [ 0.151488] #PF: error_code(0x0002) - not-present page > <...> > [ 0.156056] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40 > <...> > [ 0.169781] Call Trace: > [ 0.170159] > [ 0.170448] deactivate_slab+0x187/0x3c0 > [ 0.171031] ? bootstrap+0x1b/0x10e > [ 0.171559] ? preempt_count_sub+0x9/0xa0 > [ 0.172145] ? kmem_cache_alloc+0x12c/0x440 > [ 0.172735] ? bootstrap+0x1b/0x10e > [ 0.173236] bootstrap+0x6b/0x10e > [ 0.173720] kmem_cache_init+0x10a/0x188 > [ 0.174240] start_kernel+0x415/0x6ac > [ 0.174738] secondary_startup_64_no_verify+0xe0/0xeb > [ 0.175417] > [ 0.175713] Modules linked in: > [ 0.176117] CR2: 0000000000000000 > > The crashes happen because of inconsistency between nodemask that has > nodes with less than 4MB as memoryless and the actual memory fed into > core mm. > > The commit 9391a3f9c7f1 ("[PATCH] x86_64: Clear more state when ignoring > empty node in SRAT parsing") that introduced minimal size of a NUMA node > does not explain why a node size cannot be less than 4MB and what boot > failures this restriction might fix. > > Since then a lot has changed and core mm won't confuse badly about small > node sizes. > > Drop the limitation for the minimal node size. > > Reported-by: Qi Zheng > Signed-off-by: Mike Rapoport (IBM) > Acked-by: David Hildenbrand > Acked-by: Michal Hocko > Link: https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@byt= edance.com/ > --- > arch/x86/include/asm/numa.h | 7 ------- > arch/x86/mm/numa.c | 7 ------- > 2 files changed, 14 deletions(-) > > diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h > index e3bae2b60a0d..ef2844d69173 100644 > --- a/arch/x86/include/asm/numa.h > +++ b/arch/x86/include/asm/numa.h > @@ -12,13 +12,6 @@ > > #define NR_NODE_MEMBLKS (MAX_NUMNODES*2) > > -/* > - * Too small node sizes may confuse the VM badly. Usually they > - * result from BIOS bugs. So dont recognize nodes as standalone > - * NUMA entities that have less than this amount of RAM listed: > - */ > -#define NODE_MIN_SIZE (4*1024*1024) > - > extern int numa_off; > > /* > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > index 2aadb2019b4f..55e3d895f15c 100644 > --- a/arch/x86/mm/numa.c > +++ b/arch/x86/mm/numa.c > @@ -601,13 +601,6 @@ static int __init numa_register_memblks(struct numa_= meminfo *mi) > if (start >=3D end) > continue; > > - /* > - * Don't confuse VM with a node that doesn't have the > - * minimum amount of memory: > - */ > - if (end && (end - start) < NODE_MIN_SIZE) > - continue; > - > alloc_node_data(nid); > } > > > base-commit: 94f6f0550c625fab1f373bb86a6669b45e9748b3 > -- > 2.39.2 >