From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E659C433E0 for ; Mon, 6 Jul 2020 13:04:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 46F992070C for ; Mon, 6 Jul 2020 13:04:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46F992070C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=Huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8A01D6B0002; Mon, 6 Jul 2020 09:04:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 851066B0005; Mon, 6 Jul 2020 09:04:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 740906B0006; Mon, 6 Jul 2020 09:04:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id 60DB46B0002 for ; Mon, 6 Jul 2020 09:04:55 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 25438180ACEFC for ; Mon, 6 Jul 2020 13:04:55 +0000 (UTC) X-FDA: 77007670950.08.form33_4a124bb26eac Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 167B31819C3B0 for ; Mon, 6 Jul 2020 13:04:25 +0000 (UTC) X-HE-Tag: form33_4a124bb26eac X-Filterd-Recvd-Size: 7676 Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Mon, 6 Jul 2020 13:04:24 +0000 (UTC) Received: from lhreml710-chm.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id D5D78A10EBC0DE971CF6; Mon, 6 Jul 2020 14:04:21 +0100 (IST) Received: from localhost (10.52.123.111) by lhreml710-chm.china.huawei.com (10.201.108.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Mon, 6 Jul 2020 14:04:21 +0100 Date: Mon, 6 Jul 2020 14:03:17 +0100 From: Jonathan Cameron To: Justin He CC: Catalin Marinas , Will Deacon , Andrew Morton , Mike Rapoport , Baoquan He , Chuhong Yuan , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Kaly Xin , David Hildenbrand Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Message-ID: <20200706140317.00002f53@Huawei.com> In-Reply-To: References: <20200706011947.184166-1-justin.he@arm.com> <20200706011947.184166-2-justin.he@arm.com> <20200706112921.00006f7f@Huawei.com> <20200706114605.000050ac@Huawei.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.52.123.111] X-ClientProxiedBy: lhreml717-chm.china.huawei.com (10.201.108.68) To lhreml710-chm.china.huawei.com (10.201.108.61) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 167B31819C3B0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 6 Jul 2020 12:47:51 +0000 Justin He wrote: > Hi Jonathan, thanks for the comments. >=20 > > -----Original Message----- > > From: Jonathan Cameron > > Sent: Monday, July 6, 2020 6:46 PM > > To: Justin He > > Cc: Catalin Marinas ; Will Deacon > > ; Andrew Morton ; Mike > > Rapoport ; Baoquan He ; Chuhong Yuan > > ; linux-arm-kernel@lists.infradead.org; linux- > > kernel@vger.kernel.org; linux-mm@kvack.org; Kaly Xin > > Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa no= de > > is fake > >=20 > > On Mon, 6 Jul 2020 11:29:21 +0100 > > Jonathan Cameron wrote: > > =20 > > > On Mon, 6 Jul 2020 09:19:45 +0800 > > > Jia He wrote: > > > > > > Hi, > > > =20 > > > > Previously, numa_off is set to true unconditionally in =20 > > dummy_numa_init(), =20 > > > > even if there is a fake numa node. > > > > > > > > But acpi will translate node id to NUMA_NO_NODE(-1) in =20 > > acpi_map_pxm_to_node() =20 > > > > because it regards numa_off as turning off the numa node. =20 > > > > > > That is correct. It is operating exactly as it should, if SRAT hasn'= t =20 > > been parsed =20 > > > and you are on ACPI platform there are no nodes. They cannot be crea= ted =20 > > at =20 > > > some later date. The dummy code doesn't change this. It just does =20 > > enough to carry =20 > > > on operating with no specified nodes. > > > =20 > > > > > > > > Without this patch, pmem can't be probed as a RAM device on arm64 i= f =20 > > SRAT table =20 > > > > isn't present. > > > > > > > > $ndctl create-namespace -fe namespace0.0 --mode=3Ddevdax --map=3Dde= v -s 1g =20 > > -a 64K =20 > > > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] wit= h =20 > > invalid node: -1 =20 > > > > kmem: probe of dax0.0 failed with error -22 > > > > > > > > This fixes it by setting numa_off to false. =20 > > > > > > Without the SRAT protection patch [1] you may well run into problems = =20 >=20 > Sorry, doesn't quite understand here. Do you mean your [1] can resolve th= is > issue? But acpi_map_pxm_to_node() has returned with NUMA_NO_NODE after > following check: > if (pxm < 0 || pxm >=3D MAX_PXM_DOMAINS || numa_off) > return NUMA_NO_NODE; The point of that patch is it will make it safe to remove the numa_off beca= use any later accidental reference to a non existent node (i.e. one not defined in SRAT) will not blow up. It doesn't fix your original problem. What it does do, is fix the new probl= em case you introduce by removing numa_off below. It ensures you still return NUMA= _NO_NODE in cases which should do so (i.e. all of them if you have no SRAT and are u= sing ACPI). Of course, you could just not remove the numa_off =3D true bit then you won= 't hit that condition anyway. There are plenty of other reasons for the SRAT patch= though, it just happens to close a problem you were introducing here as well. For reference we had an AMD platform that had no SRAT, but provided _PXM for a few nodes in its DSDT. That result in non booting systems. It only aff= ected x86 because ARM64 had that numa_off =3D true being set. If we change the a= rm64 case without the patch to ensure the underlying problem is fixed, you are very l= ikely to hit the equivalent problem. There may well be platforms out there relying on th= at quirk of what the code currently does. > Seems even with your [1] patch, it is not helpful? Thanks for clarificati= on > if my understanding is wrong. > [1] https://patchwork.kernel.org/patch/11632063/ >=20 > > > because someone somewhere will have _PXM in a DSDT but will > > > have a non existent SRAT. We had this happen on an AMD platform whe= n =20 > > we =20 > > > tried to introduce working _PXM support for PCI. [2] > > > > > > So whilst this seems superficially safe, I'd definitely be crossing y= our =20 > > fingers. =20 > > > Note, at that time I proposed putting the numa_off =3D false into the= x86 =20 > > code =20 > > > path precisely to cut out that possibility (was rejected at the time,= at =20 > > least =20 > > > partly because the clarifications to the ACPI spec were not pubilc.) > > > > > > The patch in [1] should sort things out however by ensuring we only = =20 > > create =20 > > > new domains where we should actually be doing so. However, in your ca= se > > > it will return NUMA_NO_NODE anyway so this isn't the right way to fix= =20 > > things. =20 >=20 > Okay, let me try to summarize, there might be 3 possible fixing ways: > 1. this patch, seems it is not satisfied by you and David =F0=9F=98=89 > 2. my previous proposal [2], similar as what David suggested That looks like the correct approach to me as well. > 3. remove numa_off check in acpi_map_pxm_to_node() No way to that one. The only right return value from acpi_map_pxm_to_node when no node is provided (always the case if you have no SRAT) is NUMA_NO_NODE. Do not paper over that - fix the caller to handle a perfectly valid return value. Jonathan > e.g. > ... > if (pxm < 0 || pxm >=3D MAX_PXM_DOMAINS /*|| numa_off*/) > return NUMA_NO_NODE; >=20 > [2] https://lkml.org/lkml/2019/8/16/367 >=20 >=20 > -- > Cheers, > Justin (Jia He) >=20 >=20