From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 618D1C02198 for ; Mon, 10 Feb 2025 18:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F244E6B0093; Mon, 10 Feb 2025 13:40:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EACAD6B0095; Mon, 10 Feb 2025 13:40:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D26D46B0096; Mon, 10 Feb 2025 13:40:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AE6A86B0093 for ; Mon, 10 Feb 2025 13:40:14 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7E60BB01A9 for ; Mon, 10 Feb 2025 18:40:13 +0000 (UTC) X-FDA: 83104899906.19.41B0111 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf09.hostedemail.com (Postfix) with ESMTP id 280B7140009 for ; Mon, 10 Feb 2025 18:40:10 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=uvEA0Zz8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=ItKVL2sb; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=uvEA0Zz8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=ItKVL2sb; spf=pass (imf09.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739212811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z8eYSpepY8YjZg8SCKQmC3dIrcsgT/TVzJjXlabji/U=; b=dFXHp3M585E4I7TzgztO3iL6QKPxRIPzwH5T1fkfJKNZ32gBrw/zf9ut5ap+1AnBLf7zKn 62fPkpDrN7dh1HNCVlCfgE4u9U+zOGkNa0Zy4NXdTXmR5qleQRq4e5bXUJoFbwRy13VOn6 BaqRL/auoVRWTTxYxYCiunPg0Ka+i9M= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=uvEA0Zz8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=ItKVL2sb; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=uvEA0Zz8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=ItKVL2sb; spf=pass (imf09.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739212811; a=rsa-sha256; cv=none; b=vf5dqdgiPS1ijMb3tSH55w1xlBKENg8lyd+sAXjjVei4Jf4xRGKPaNjVV/pguiVAWZbePJ 6j51swDo/iQ8i0g/CHVSnnsaFVcOBGxhG89M6CZYVe5ee9FcyMszJsOWAV4GzCWZfyJP5D 9zmJe1U8VT+BDjO6+dAanG7PbHzR9KM= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 839872115B; Mon, 10 Feb 2025 18:40:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1739212809; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8eYSpepY8YjZg8SCKQmC3dIrcsgT/TVzJjXlabji/U=; b=uvEA0Zz8qz0omIOBwYQ2BVmYXXRaPGkbocvA7DblRQHo+1+1A2Cm/x08l2s76LbfcIKWvU I+0zm7a/l5UHtREYFD02rIQH/m06k65ydJuK81OxI2dUYGtjt8qaLkxyHMOSwb9/9YAZTk oUQ7xMnup1lsNyRXUW+MhhJ8LsNQnto= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1739212809; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8eYSpepY8YjZg8SCKQmC3dIrcsgT/TVzJjXlabji/U=; b=ItKVL2sbwewxplSSQj4PyQullyaDeTSfSi3MP0NqpQ/5LMln3TH1ClAAERCNbSzsTCOWH1 3/UPcEclpmK64ACw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1739212809; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8eYSpepY8YjZg8SCKQmC3dIrcsgT/TVzJjXlabji/U=; b=uvEA0Zz8qz0omIOBwYQ2BVmYXXRaPGkbocvA7DblRQHo+1+1A2Cm/x08l2s76LbfcIKWvU I+0zm7a/l5UHtREYFD02rIQH/m06k65ydJuK81OxI2dUYGtjt8qaLkxyHMOSwb9/9YAZTk oUQ7xMnup1lsNyRXUW+MhhJ8LsNQnto= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1739212809; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z8eYSpepY8YjZg8SCKQmC3dIrcsgT/TVzJjXlabji/U=; b=ItKVL2sbwewxplSSQj4PyQullyaDeTSfSi3MP0NqpQ/5LMln3TH1ClAAERCNbSzsTCOWH1 3/UPcEclpmK64ACw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 005A213707; Mon, 10 Feb 2025 18:40:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id ogCFOAhIqmf6dAAAD6G6ig (envelope-from ); Mon, 10 Feb 2025 18:40:08 +0000 Date: Mon, 10 Feb 2025 19:39:59 +0100 From: Oscar Salvador To: Frank van der Linden Cc: akpm@linux-foundation.org, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yuzhao@google.com, usamaarif642@gmail.com, joao.m.martins@oracle.com, roman.gushchin@linux.dev Subject: Re: [PATCH v3 00/28] hugetlb/CMA improvements for large systems Message-ID: References: <20250206185109.1210657-1-fvdl@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250206185109.1210657-1-fvdl@google.com> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 280B7140009 X-Stat-Signature: opgfojynt3c8xzjdqrupzhh9dt5g69et X-HE-Tag: 1739212810-7195 X-HE-Meta: U2FsdGVkX19Y/a26o3PDobtEoyN6HB0ORU2oOzL0AP2fDc41hp1fWs6yUEXMs+Bvy78VYQi/k5hYOdA/4NtN3h+yTE8aUg20gZNSSXZU5anA1kxQlGjiAFBO1j9fe4bWFbhnoyFBWmIX+He7aM1O/gYiWqpdosUd0XjYxvQ18okDScLz5VNQnSgfQ75ovdFvCkGSc4zuD5CuEQqn46WCnRahOzSMo3YbpQPhPLcyojsuUqWsyKaemNxa8cBq07kpwoN1kQh57VEYa4I0/ZIk1niE8WFVgHgTOK32HbCoSD2ihOLEV0FZEX+tirNXmEzk3Q40Eb06jEVTOGwWzLF6sSzmzANE1i/wptEGW0NCPG53cApQXURGztXQR5DwQLI6XuimD00BlajnS3SDW2xv+l+FCh9RqaKXVGTsrnBZoqgTFTER8sHxcw7vswzZeE460HQS2UI0NHrROqZUdLUl4U8now3vx4zUCuAxYdU8Uq4tuJJqdzkRqyZ++hN0e7xtJToAX90mc90cJXezbkMy1WQ8iO1ZHC+TjR2oEqcxg6bXP12YUePSm+XzTzaUa88zukY71u01YH8Pc1JYkgdgWmaRPLg7q9AlTzvqfY7ePs9Sc4evn0T42miGMjvlWwcneALNtQNOOWt102PIj2mnwihR01HxyFTd6W596d3RvxFT7xMHQDCp+nkiVdaJQiIsO5CZ9r/+wlBqJZgzhNEDFssXMpqypZrD1zBviWuxluQYf1MTVHuLac0CTvRt3ub0kCOgUbJpXkOJl0+4PY/ma/5UvDyVkEk+O1d5DBuZZELKJfJ1By5rLa04nwZUBiKySlt0egY9r58tDwPQ4fkE5fCVy9YSiIgL8aR+m1y1iiEeKoihvnauswU6A8KzVYSo3lPNkg8Ud6mC9RO9CF/fT9s4ePeNged+q+BGe8yieBCjtpw7hb47pMkc+bip7HKciRNIXk9J8DTITZtrN8R q5ALbsdG xGGlJJ7MZC9qRF1yt6TGIYFY5VfgUJbZIc/cKEzyJZRy4qC51QSZrRxENbkP0YrHWJ7I3eEPN+9TIcP0MxsxkH89O9DyDvbs9Arp8ouIWf4IWiaWrE8MqXXfACL6y574sVjlgMZNu3Om/JRjKoBZn9flIEE3jCU/EpqTYrvRoHF/IN36JmPu78HW+gqW1snT1c2KUdS+2f8qrulUMkXIiRY+2wJs/t2l1yjE5Kl+l9cE8mYhAJhrwTY6rv2wDC35PgXTTfvMWkuBQ+CE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 06, 2025 at 06:50:40PM +0000, Frank van der Linden wrote: > v3: > * Fix SPDX comment include file format. > * Add new hugetlb_cma.* files to MAINTAINERS > * Document new ranges/ subdir in CMA debugfs. > * Fix powerpc compilation for config without HAVE_BOOTMEM_INFO_NODE > * Fix various other nits found by kernel test robot. > * Use a PFN value of -1 to indicate a non-mirrored mapping > in sparse-vmemmap.c, not 0. > * Fix incorrect if() statement that got mangled in cma.c > > v2: > * Add missing CMA debugfs code. > * Minor cleanups in hugetlb_cma changes. > * Move hugetlb_cma code to its own file to further clean > things up. > > On large systems, we observed some issues with hugetlb and CMA: > > 1) When specifying a large number of hugetlb boot pages (hugepages= > on the commandline), the kernel may run out of memory before it > even gets to HVO. For example, if you have a 3072G system, and > want to use 3024 1G hugetlb pages for VMs, that should leave > you plenty of space for the hypervisor, provided you have the > hugetlb vmemmap optimization (HVO) enabled. However, since > the vmemmap pages are always allocated first, and then later > in boot freed, you will actually run yourself out of memory > before you can do HVO. This means not getting all the hugetlb > pages you want, and worse, failure to boot if there is an > allocation failure in the system from which it can't recover. > > 2) There is a system setup where you might want to use hugetlb_cma > with a large value (say, again, 3024 out of 3072G like above), > and then lower that if system usage allows it, to make room > for non-hugetlb processes. For this, a variation of the problem > above applies: the kernel runs out of unmovable space to allocate > from before you finish boot, since your CMA area takes up all > the space. > > 3) CMA wants to use one big contiguous area for allocations. Which > fails if you have the aforementioned 3T system with a gap in the > middle of physical memory (like the < 40bits BIOS DMA area seen on > some AMD systems). You then won't be able to set up a CMA area for > one of the NUMA nodes, leading to loss of half of your hugetlb > CMA area. > > 4) Under the scenario mentioned in 2), when trying to grow the > number of hugetlb pages after dropping it for a while, new > CMA allocations may fail occasionally. This is not unexpected, > some transient references on pages may prevent cma_alloc > from succeeding under memory pressure. However, the hugetlb > code then falls back to a normal contiguous alloc, which may > end up succeeding. This is not always desired behavior. If > you have a large CMA area, then the kernel has a restricted > amount of memory it can do unmovable allocations from (a well > known issue). A normal contiguous alloc may eat further in to > this space. Hi Frank, While I plan to keep reviewing the series, I think it would make sense to split this patchset into two smaller ones. The way I see it, we are trying to deal with two different problems and their solutions. 1) pre-hvo at boot time 2) multi-range support of CMA (only used for hugetlb) I did not go through the entire patchset yet, so I ignore whether the respective patches to tackle these two problems are really dependent on each other, but I think that would be very interesting to consider a patchset per solution if that is not the case. IMHO, it would ease review quite a lot. -- Oscar Salvador SUSE Labs