From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8878DC02198 for ; Mon, 10 Feb 2025 18:57:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1C5F2280002; Mon, 10 Feb 2025 13:57:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 176DD280001; Mon, 10 Feb 2025 13:57:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03DC9280002; Mon, 10 Feb 2025 13:57:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D93C8280001 for ; Mon, 10 Feb 2025 13:57:06 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8EEB2C0289 for ; Mon, 10 Feb 2025 18:57:06 +0000 (UTC) X-FDA: 83104942452.30.16699E1 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf05.hostedemail.com (Postfix) with ESMTP id ADC67100002 for ; Mon, 10 Feb 2025 18:57:04 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="j56iaN/b"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of fvdl@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=fvdl@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739213824; a=rsa-sha256; cv=none; b=RorJejvuRHECdtQHFOCSLMX18tITWGpA1vL2ROIX1bur4UozJP6dZ76o6NDUz8s1a1hctd 0oioK222IrnV+E0a+GyKP8vVI6j9isl/8Wkf7mIhunD33hijqW1dTrEjTQqEgO43hlCFfV YQjL2oQbpNEQb3ieB/r0ytYYunP1hd4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="j56iaN/b"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of fvdl@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=fvdl@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739213824; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BRAb/D03pytUenhuHtQraadFW6pLnX0kbPnJgYRVOzo=; b=hm4GkySo3lMSOiBZ0j9fzTBYjRjZZAknjoa8OPvwWl7YQhnwOCFSv31yjW6sZDefXtkowS RHwb0efzxB3y0xaDMYkXXG1l4yyAMMClw4T+bGOKSyaqWYv6EpkElmGH6RwamzuOnoF6kQ 1gizBScwigPpDffOYwhWrWH7/fjdURQ= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-47180c199ebso21291cf.0 for ; Mon, 10 Feb 2025 10:57:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739213824; x=1739818624; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BRAb/D03pytUenhuHtQraadFW6pLnX0kbPnJgYRVOzo=; b=j56iaN/bjqt3fR8+SLhwU5FRMEISQlrFewu1TpduAehz+rDxIjSbIkWpDZgsgAs4rq P6uD7ZQiY8lZC1HfLuZwQcUEPUqolfFWLMbbaLSlNMVgQJdUjz7Rpzea/LEfkJuQHYR/ k4eXhsRA6zyu9mLaBJCtrUkSvuAwN84aCriXYEnWDJl2l4YtNxapHUV9eGfi8DR+Z5AG VHe8nPCh54cV8QkpkHQHayrnF9gFNlc/ruxt2zEVRYOAfxCZuuI/MoOyqffBlxgLBitz 5wTDzOBtGTNt7ZgXkGVx/sS91a/dX08O2d4ZJA3dU4O3j/z9aHYO5U9NiWlHUx5mI5vW qA8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739213824; x=1739818624; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BRAb/D03pytUenhuHtQraadFW6pLnX0kbPnJgYRVOzo=; b=Q4YdZXMbF1+q4KhW8lai3WvuRMmYa1W4n+uOf7Xzr+0Rkmwe3D9l6RsgU5lmgRyWft kj5zMPuVcrDzPQahGLsVe4F9LEDCKP2/ifgzlELM+P3sRBokIzt4eoFvVWeusw25XJXI GVBYDtjtT5Ub6YHHtcglsmVKPgiM47Y8OYcyqgMnagnbXLPY96NLMlju83XPIh61t9JO 6KAbKMTIoEZhMiP8eg1Jcz/HB3FfCJHkQg213UcxSnoG4IiNrWXP3Q/yxuHFJYAN5f0A U1TG6KU6K4vl63ObC89jmiqR41K83wT7Vaj4Hnoyml3pCB0JxIG+n87hZ2e04AfX9QST KpYA== X-Forwarded-Encrypted: i=1; AJvYcCVC5BDuq9gEfiky+adDY+bVJXAkefcZ1cf6r7UiSVh2HpjJlewENWPIiWMeVKmu51cJqJpFPgIvMw==@kvack.org X-Gm-Message-State: AOJu0YxQeQpa6pNgrNCU2AIFIJUpw56gul8DfQGoEE0AIClkpbxicA58 y28CyN0ztvOkya/+n11kZTYQ900CjoxqapFE67OdchP5/G24LY5kkaIqSL1XzaS2UBtb/QA/9g9 bzfVq7M8Xs5cHsT0/pkMxNmU392v1edYjFjS3 X-Gm-Gg: ASbGncv5ZI41lxRU/o9zuoIHF0rtqq8lv9akj0eY1tSDYmIsivYYUgTcOsjMCprF1dG CA8vv46mj8DYlqetCrVM6VgQovE5NbBymhk91+g4f/Ei0F2IuTwxx3uYk0+pv6W5gF002Iw== X-Google-Smtp-Source: AGHT+IHGrUGKctHgF5EXuFj6W7gx2uPEg3KGYpeyi4npFHQFkAoFsxTe5QJZlN3hiorx189s1yAIElU7d3eINrZp58w= X-Received: by 2002:a05:622a:189d:b0:461:4150:b835 with SMTP id d75a77b69052e-471a2113accmr91301cf.6.1739213822004; Mon, 10 Feb 2025 10:57:02 -0800 (PST) MIME-Version: 1.0 References: <20250206185109.1210657-1-fvdl@google.com> In-Reply-To: From: Frank van der Linden Date: Mon, 10 Feb 2025 10:56:50 -0800 X-Gm-Features: AWEUYZkSA6fxf7ilqP88ncNgDHw5A1iKJ25LiAsKbIkCmOgG3Ya5fm3t_K7yjG8 Message-ID: Subject: Re: [PATCH v3 00/28] hugetlb/CMA improvements for large systems To: Oscar Salvador Cc: akpm@linux-foundation.org, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yuzhao@google.com, usamaarif642@gmail.com, joao.m.martins@oracle.com, roman.gushchin@linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: ADC67100002 X-Stat-Signature: 8fhk69ztu7yq4sc8o9un8bob8crod7mg X-Rspam-User: X-HE-Tag: 1739213824-169206 X-HE-Meta: U2FsdGVkX1+UI+MddRKY0hW4f94cbqnuhYf4/qtyrWxIiwtnQutUvnewv3Oe+roIv0Dbb0xotiguSAdham5ngtPb8PrEe+m5LLDcXdxtmRy5CiUALtdZhYVWybU94wB1g1tPUgVP6/mbl4xlNt5zgZLJOr2ypK7gKFRUWxg1YTYoasovJwuNDFVJQQ6HRi+AB6RJfDrdGiviGo3q2eC1YWLTfME+7DQm5GzaPLx7q8hN1q2zdSdfgOS4FahVuXQFF1Ttuq+JeGy6KVBLZezRuMcXeBJ9FK/2HpQy7Vx1PESgDzV6khDkVuKoI/TBe5Y6j9j5j3EZzYjlzl4v1rS740b3QlRF4knWdWPC8asMxwoa+JJXItE0BJU+FJ/lMFxmSNkD+HiNcoi3cnAwQM8eb/RWDGlckwT+kePwF0rfQopqLm3A8A9dJjqptMZFJRoRALjYfkOyTb9iGt74a53NqPCEDQOLDk0N7MnVc72U6zheQbSzn63mihIjjnOJyJkzBEoH+QDp5HUwBMjWeJmMLTJcx7v9Omc2xryw+Im32Jg5mK4UAQsNW7iXUSaESrqte6KkRwfUyT09S+AweMiufrOfJIlPoMTCqURcTXKFWO2uff62Cx6vetGaGRO4F50idRyB2ms44c9UCfazKp6Om51xOXpyJCpKndTcDC7Vr2eOjGWljsDgSGMVVes23dmSWB9pKtYJ0d3igsdaucIS1IsumUZgCcVja1zNPw2ifhE6hOU/zepyqIyGAviXwhOw9t0reFjm0PlaEdn+f4FuKBSIkx7cgpTjRFJFmleZafeikQXvkq6lqUj2mgPYOnFo0eMzXpZIZ+kjK+BdtoAQ86hcW5TLtnYOt+x5jMGZvwfARO1O+LeOSq2pqmMlv53YIpylpnJvV82M15JlHBz3GbgkjL71Z0l3lcX1kdt4XkES7+Thhn7lIs65cOtacjq/mjDBgL7y2d1n8IV4tfw pYH8lZ1a FSm5a1HUXWM6aRWe0Cf8E8mJ7uleJZgeiAOrMKzqwzbLgbq7BtpsYS2iRO+IAqCRDH63uHhceOztH87RAqh0Nff8sNSnHRw0YcJUXIpVPK7nNyiV7vXJZQdz0YmSy6yBs3OhhW60YbMkDaeJes3elAG9OrNBeF7jQRBfhgwN6ksYSO8xHzheaXNZCbD9GarSDKqal X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 10, 2025 at 10:40=E2=80=AFAM Oscar Salvador = wrote: > > On Thu, Feb 06, 2025 at 06:50:40PM +0000, Frank van der Linden wrote: > > v3: > > * Fix SPDX comment include file format. > > * Add new hugetlb_cma.* files to MAINTAINERS > > * Document new ranges/ subdir in CMA debugfs. > > * Fix powerpc compilation for config without HAVE_BOOTMEM_INFO_NODE > > * Fix various other nits found by kernel test robot. > > * Use a PFN value of -1 to indicate a non-mirrored mapping > > in sparse-vmemmap.c, not 0. > > * Fix incorrect if() statement that got mangled in cma.c > > > > v2: > > * Add missing CMA debugfs code. > > * Minor cleanups in hugetlb_cma changes. > > * Move hugetlb_cma code to its own file to further clean > > things up. > > > > On large systems, we observed some issues with hugetlb and CMA: > > > > 1) When specifying a large number of hugetlb boot pages (hugepages=3D > > on the commandline), the kernel may run out of memory before it > > even gets to HVO. For example, if you have a 3072G system, and > > want to use 3024 1G hugetlb pages for VMs, that should leave > > you plenty of space for the hypervisor, provided you have the > > hugetlb vmemmap optimization (HVO) enabled. However, since > > the vmemmap pages are always allocated first, and then later > > in boot freed, you will actually run yourself out of memory > > before you can do HVO. This means not getting all the hugetlb > > pages you want, and worse, failure to boot if there is an > > allocation failure in the system from which it can't recover. > > > > 2) There is a system setup where you might want to use hugetlb_cma > > with a large value (say, again, 3024 out of 3072G like above), > > and then lower that if system usage allows it, to make room > > for non-hugetlb processes. For this, a variation of the problem > > above applies: the kernel runs out of unmovable space to allocate > > from before you finish boot, since your CMA area takes up all > > the space. > > > > 3) CMA wants to use one big contiguous area for allocations. Which > > fails if you have the aforementioned 3T system with a gap in the > > middle of physical memory (like the < 40bits BIOS DMA area seen on > > some AMD systems). You then won't be able to set up a CMA area for > > one of the NUMA nodes, leading to loss of half of your hugetlb > > CMA area. > > > > 4) Under the scenario mentioned in 2), when trying to grow the > > number of hugetlb pages after dropping it for a while, new > > CMA allocations may fail occasionally. This is not unexpected, > > some transient references on pages may prevent cma_alloc > > from succeeding under memory pressure. However, the hugetlb > > code then falls back to a normal contiguous alloc, which may > > end up succeeding. This is not always desired behavior. If > > you have a large CMA area, then the kernel has a restricted > > amount of memory it can do unmovable allocations from (a well > > known issue). A normal contiguous alloc may eat further in to > > this space. > > Hi Frank, > > While I plan to keep reviewing the series, I think it would make sense > to split this patchset into two smaller ones. > The way I see it, we are trying to deal with two different problems and t= heir > solutions. > > 1) pre-hvo at boot time > 2) multi-range support of CMA (only used for hugetlb) > > I did not go through the entire patchset yet, so I ignore whether the > respective patches to tackle these two problems are really dependent on > each other, but I think that would be very interesting to consider a > patchset per solution if that is not the case. > > IMHO, it would ease review quite a lot. Hi Oskar, Thanks a lot for reviewing this series. I certainly could split it up, but here are the dependencies (it's actually 3 parts): 1. Multi-range CMA (used by hugetlb) (patches 1-4) 2. Pre-HVO for hugetlb bootmem pages (patches 5-22) 3. Enable hugepages=3D (and pre-HVO) for CMA (patches 23-28) 1 and 2 are independent. 3 depends on 1 and 2. So, I could post 1) and 2) simultaneously, and 3) would have to wait until 1) and 2) are resolved. Andrew, do you have any thoughts on splitting it up? - Frank