From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAF60C369A4 for ; Tue, 8 Apr 2025 15:48:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E70B76B002D; Tue, 8 Apr 2025 11:48:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E201E6B0032; Tue, 8 Apr 2025 11:48:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE69F6B007B; Tue, 8 Apr 2025 11:48:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B27846B002D for ; Tue, 8 Apr 2025 11:48:46 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C44361A11C6 for ; Tue, 8 Apr 2025 15:48:47 +0000 (UTC) X-FDA: 83311309494.10.AD5C06C Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf30.hostedemail.com (Postfix) with ESMTP id DE8618000C for ; Tue, 8 Apr 2025 15:48:45 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4o4ViNzB; spf=pass (imf30.hostedemail.com: domain of fvdl@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744127325; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HChHYMT19CAAjwRKcO44l6baW5S92cXq7xYcS0sntkc=; b=cxJhF4ShzYFmFxk4DiO/ArqsIzKBJyoPeSoYljFWl1DoaBIByPlSQUKfdxcYrGjj7FhNgl y3iBc1sGzFByG5QNTEFeAJKxPuTuR2ysOZFXgf08K2kOfswo4X/X8YGVYawJGE2kxDFaLU 1w3imD17Umg568WfoYinl+fC1oKKDqQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4o4ViNzB; spf=pass (imf30.hostedemail.com: domain of fvdl@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744127325; a=rsa-sha256; cv=none; b=hHWY29vZZEqNviXpzBoXWvhSj6i+WvctCuWHw4jLVU8EHkSJRjpLar+JfUMe1OmaBbX7FJ 5bCisXYeG24n6IB+rajQoXiF+OV9CobDWQQWKRsKUJJy/12pPo+jNJ1QR3qW8uEp+2iBJE Uh5wKQM02qtvLeCrNKsYE78CCXQ2cKg= Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-4774611d40bso324661cf.0 for ; Tue, 08 Apr 2025 08:48:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744127325; x=1744732125; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HChHYMT19CAAjwRKcO44l6baW5S92cXq7xYcS0sntkc=; b=4o4ViNzBKmC+pgizRv3PCrXCfZR/GzglgD7XOACZMFIYAq99BxUR06MkpjcBSZb7LY VZiMTOUfMM0eSrvRyXSqYebnG0KtcALRShmNzpqBvOVvX71XTFfzSSdNIetoA5+Q33wV 1gFGjERTnfDgjDgZaltvx2FuOYgU0iEbgUes415LMkwkoowsMIfYsXSNd/3Lbv7Wpelt UheusNbeWbwh6x5MUP2LvTQakinZPOZ2jr55J877Xgm2oowz1oG0x+0HU7NcuTCk0pl8 5rFO8odeFAjhaZZan8M/X+r7nlGKonEJHcmsXfUecC1rgYGURotg9wg5w4P0Xs4zHeYy esjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744127325; x=1744732125; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HChHYMT19CAAjwRKcO44l6baW5S92cXq7xYcS0sntkc=; b=bdTFpDcy1FrBy4mVkbHp55g4LE+JVBe2tMtBe8vl/51DL2GIYjJ9pfCamhjn3pmMQC 2fc9xVhwVxZG5WzDXlaix7whY2ugn/zGeBehrsGD4844q6I3cMTJ1qJrpUYwwSdBCNnL BfJpeqyryYMLLAGz9X3iSyabnW4KkGazDk065KW5PbOuh2cx5dzbXrVdN8XlwzF9tbcy LU3o/pwmJEUllTwYor9zW+yco5kU27ulorvinprGeV5KJUsiej4KmAVblI/iapk1vdVE RGCAzvouyet0SQdnze6dcEJmMxY8b7ZsBQEm+8l7nvDKjmizzdG3o+owQ7WBj3bwItIs agSQ== X-Forwarded-Encrypted: i=1; AJvYcCU9aBmz/0lpgU+BpXhf+fgszj2DboUiHdaz72bS/AwjssFMKOe5LAKkdXUst7BpJq9u/ybQ386Ihg==@kvack.org X-Gm-Message-State: AOJu0Yy5HhRLiTKJh4SFgQ50o57T/2Ss6o0VS/fn8Ic3bJ0wupacQ7Kq ZgZ4dKyP56r4sVFPGbdQUTL1DX96yTJeBN/pSOawF1y0t8OlaK3nOHHHdfi7MwQJEzQLG5WkW1k TgcHkJDXHQ9Nso3UgoXxq6vVNZN3bA+piV8cx X-Gm-Gg: ASbGncs59xpiOzJDY4y+MW+xJncbKdM/QjP5pbiGBFXis8pdRgBh1m3O/+aLu3U3C/4 igqFUPIVuI1Tvip2Wg/R+8jszo1yjk9Jk/hP8v20ByimRA8iPSqm5IMYm6Jc2QmUcUrQRsFmH61 EdvMaEJVhNuMu3Op2VUJA402s= X-Google-Smtp-Source: AGHT+IF9VVNpziqW+dyk31xqzQBcnnW7MNMprL13YRhaXAX0XIZc2Ds4SBXJT8rQq90Zgkg2HrJYH8ieK1DCEoRHkfg= X-Received: by 2002:a05:622a:19a7:b0:476:f4e9:314e with SMTP id d75a77b69052e-47956378b7amr4383311cf.25.1744127324824; Tue, 08 Apr 2025 08:48:44 -0700 (PDT) MIME-Version: 1.0 References: <20250402205613.3086864-1-fvdl@google.com> In-Reply-To: From: Frank van der Linden Date: Tue, 8 Apr 2025 08:48:33 -0700 X-Gm-Features: ATxdqUERDLs5ZmGAg6XIviO_kdQUuaBhgMOkQJccFFxKfnrrT-WebuFMeWNu_UI Message-ID: Subject: Re: [PATCH] mm/hugetlb: use separate nodemask for bootmem allocations To: Oscar Salvador Cc: akpm@linux-foundation.org, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, david@redhat.com, luizcap@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: DE8618000C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 7imwadppjs3nzkupf8zrqpckdgqupg8f X-HE-Tag: 1744127325-333183 X-HE-Meta: U2FsdGVkX18Ave1o9zSu9iSGYv4B07B9rRZNNayG+J9CltKJAF54PJpQqDnOy1fj9bonOxP1KKohrHTEaQFAEOrBPL5VDIK6TYM0LLq1FelNcXbvHnRCzLxAqlYHVz+FY09zFii8qL3kg5rJ/Eb6jxT8SJPaGydcSU6i8H6wnoDdne1rwCDU4TIpBQOxdxnan+ZPQpgQhcxPvXWA3Ad3eyuLQxz4GioSD4VSENcrXmbipFwaAtAzdpFwuIgdKWRqAa+wY2pl8/Ab83F0mSUEfbUAUclQ2bxqbHe38MkqEx1JQ7SItgE4wWvoexskyVwC5JxHasBIKlFdtXhBLnJUguR53C6IuLQGD2aGrUt5zbp9SX98Mys5GCh67hP58R5NWn2plVObdX/nDoOJAJfK2DoEsojBxA6UlshXCg36bYQ6XUTvC3gXplSuKKfBkwrU46ev4HaQJ/DgN8GJE2f5wIdXWN9YOmj+YKZzpk+xukl8dnWwMVC79GpxypcF06Hn94HEqJT9Wt70VSjxQkao+COyg3y6DzijNIA4pZ4YCcKWBaNgExHP8HYqIp/Og5X3+GuozLQuVBKF1V1DpG0kb4JruUjD61FfUAzvWBwdax0+z7+9kQPSwRas6SDCKbP3/dAm1xWeGW8vefG7Umm4vyV+0zsgqFOR31B7JnZqhFEB09BYy6OJDVE9cz4n9qI8SVTcuVm6FhgzDCNvmOYf9E2UJYesK9QSumIQUhabOA9k6qR8zmXQmQhby/yeVFCW8u+yHUxhDeNEH7ZUrG83lxmAQMwNAGVgU9vDXdcVneY9czW3TEvlVo1o/EG2+HWiycL7RaRYBT/LyYc55Zy/IPdvHRMxJ3wfYQj1QSQms/vtqJ5p4NmgXOUAg/hq7jGlioUwSfg6HVoWegq2W6/aeoIl/eorHYkaZWCqa6scsfbknFYuK9xD+UB504zQB3Ha98NT5coeUwqXuTFY6Vk fkkr4owd 5vSBBur2FPLsWM+H8TEQVjbOZ171C9u+tmAvLE91Dva85/Ca6BXxFpjHYZpibpJGmRUYJoD/+JPjvdUUVjXjgJckoLDj8mfGY28ycsESYX1sTGkdTp0WBh2w+FvQt1vMZItjUe3XjCEl38SJPb/PYOHYfxDk2R7lhBVJvtgsNb+VjteZe8b/WgeglK6fwGUY+wJUV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 8, 2025 at 6:54=E2=80=AFAM Oscar Salvador w= rote: > > On Wed, Apr 02, 2025 at 08:56:13PM +0000, Frank van der Linden wrote: > > Hugetlb boot allocation has used online nodes for allocation since > > commit de55996d7188 ("mm/hugetlb: use online nodes for bootmem > > allocation"). This was needed to be able to do the allocations > > earlier in boot, before N_MEMORY was set. > > > > This might lead to a different distribution of gigantic hugepages > > across NUMA nodes if there are memoryless nodes in the system. > > > > What happens is that the memoryless nodes are tried, but then > > the memblock allocation fails and falls back, which usually means > > that the node that has the highest physical address available > > will be used (top-down allocation). While this will end up > > getting the same number of hugetlb pages, they might not be > > be distributed the same way. The fallback for each memoryless > > node might not end up coming from the same node as the > > successful round-robin allocation from N_MEMORY nodes. > > > > While administrators that rely on having a specific number of > > hugepages per node should use the hugepages=3DN:X syntax, it's > > better not to change the old behavior for the plain hugepages=3DN > > case. > > > > To do this, construct a nodemask for hugetlb bootmem purposes > > only, containing nodes that have memory. Then use that > > for round-robin bootmem allocations. > > > > This saves some cycles, and the added advantage here is that > > hugetlb_cma can use it too, avoiding the older issue of > > pointless attempts to create a CMA area for memoryless nodes > > (which will also cause the per-node CMA area size to be too > > small). > > Hi Frank, > > Makes sense. Hi Oskar, thanks for looking at the patch. > > There something I do not quite understand though > > > @@ -5012,7 +5039,6 @@ void __init hugetlb_bootmem_alloc(void) > > > > for_each_hstate(h) { > > h->next_nid_to_alloc =3D first_online_node; > > - h->next_nid_to_free =3D first_online_node; > > Why are you unsetting next_nid_to_free? I guess it is because > we do not use it during boot time and you already set it to > first_memory_node further down the road in hugetlb_init_hstates. Yes, that's exactly it - it's not used, so there was no need to set it, and I made sure it's set later. > > And the reason you are leaving next_nid_to_alloc set is to see if > there is any chance that first_online_node is part of hugetlb_bootmem_nod= es? next_nid_to_alloc is used to remember the last node that was allocated from in __alloc_bootmem_huge_page(), so that the next call will continue at the node after the one that was successfully allocated from. The code there looks a bit confusing, since the macro for_each_node_mask_to_alloc is used there not really as a for loop, but simply as a way of saying "try this node and remember the next one". I've been meaning to clean that code up for several reasons, but didn't get around to it, it's a separate issue. - Frank