From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A336DC433EF for ; Wed, 13 Apr 2022 06:28:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0567D6B0072; Wed, 13 Apr 2022 02:28:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F21676B0073; Wed, 13 Apr 2022 02:28:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC1D36B0074; Wed, 13 Apr 2022 02:28:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id CA51E6B0072 for ; Wed, 13 Apr 2022 02:28:02 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6C0C28249980 for ; Wed, 13 Apr 2022 06:28:02 +0000 (UTC) X-FDA: 79350875604.30.2A343AF Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf08.hostedemail.com (Postfix) with ESMTP id 3BC35160003 for ; Wed, 13 Apr 2022 06:28:01 +0000 (UTC) Received: from kwepemi100024.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KdXdl4MRlzFprM; Wed, 13 Apr 2022 14:25:31 +0800 (CST) Received: from kwepemm600017.china.huawei.com (7.193.23.234) by kwepemi100024.china.huawei.com (7.221.188.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 13 Apr 2022 14:27:56 +0800 Received: from [10.174.179.19] (10.174.179.19) by kwepemm600017.china.huawei.com (7.193.23.234) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 13 Apr 2022 14:27:55 +0800 Content-Type: multipart/alternative; boundary="------------0hycg50pgzxThAONA0No9y0L" Message-ID: <692ee24c-a705-0c54-7cad-a9ecf49a8f15@huawei.com> Date: Wed, 13 Apr 2022 14:27:54 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: Re: [PATCH v3 1/4] hugetlb: Fix wrong use of nr_online_nodes Content-Language: en-US To: Andrew Morton CC: , , , , , , , References: <20220413032915.251254-1-liupeng256@huawei.com> <20220413032915.251254-2-liupeng256@huawei.com> <20220412214238.84c20437a052458f6967e9fd@linux-foundation.org> From: "liupeng (DM)" In-Reply-To: <20220412214238.84c20437a052458f6967e9fd@linux-foundation.org> X-Originating-IP: [10.174.179.19] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm600017.china.huawei.com (7.193.23.234) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3BC35160003 X-Stat-Signature: 5k6oguquc4zzzracghdmw45j9ghbbtrf X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf08.hostedemail.com: domain of liupeng256@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=liupeng256@huawei.com X-HE-Tag: 1649831281-600045 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --------------0hycg50pgzxThAONA0No9y0L Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit On 2022/4/13 12:42, Andrew Morton wrote: > On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu wrote: > >> Certain systems are designed to have sparse/discontiguous nodes. In >> this case, nr_online_nodes can not be used to walk through numa node. >> Also, a valid node may be greater than nr_online_nodes. >> >> However, in hugetlb, it is assumed that nodes are contiguous. Recheck >> all the places that use nr_online_nodes, and repair them one by one. >> > What are the runtime effects of this shortcoming? > . For sparse/discontiguous nodes, the current code may treat a valid node as invalid, and will fail to allocate all hugepages on a valid node that "nid >= nr_online_nodes". As David suggested: if (tmp >= nr_online_nodes) goto invalid; Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming that "node < 2" is valid is wrong. --------------0hycg50pgzxThAONA0No9y0L Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: 7bit


On 2022/4/13 12:42, Andrew Morton wrote:
On Wed, 13 Apr 2022 03:29:12 +0000 Peng Liu <liupeng256@huawei.com> wrote:

Certain systems are designed to have sparse/discontiguous nodes. In
this case, nr_online_nodes can not be used to walk through numa node.
Also, a valid node may be greater than nr_online_nodes.

However, in hugetlb, it is assumed that nodes are contiguous. Recheck
all the places that use nr_online_nodes, and repair them one by one.

What are the runtime effects of this shortcoming?
.
For sparse/discontiguous nodes, the current code may treat a valid node
as invalid, and will fail to allocate all hugepages on a valid node that
"nid >= nr_online_nodes".

As David suggested:
if (tmp >= nr_online_nodes)
	goto invalid;

Just imagine node 0 and node 2 are online, and node 1 is offline. Assuming
that "node < 2" is valid is wrong.

--------------0hycg50pgzxThAONA0No9y0L--