linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Frank van der Linden <fvdl@google.com>
Cc: akpm@linux-foundation.org, muchun.song@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	yuzhao@google.com, usamaarif642@gmail.com,
	joao.m.martins@oracle.com, roman.gushchin@linux.dev
Subject: Re: [PATCH v3 00/28] hugetlb/CMA improvements for large systems
Date: Mon, 10 Feb 2025 19:39:59 +0100	[thread overview]
Message-ID: <Z6pH_9X-kFhwPz2-@localhost.localdomain> (raw)
In-Reply-To: <20250206185109.1210657-1-fvdl@google.com>

On Thu, Feb 06, 2025 at 06:50:40PM +0000, Frank van der Linden wrote:
> v3:
> * Fix SPDX comment include file format.
> * Add new hugetlb_cma.* files to MAINTAINERS
> * Document new ranges/ subdir in CMA debugfs.
> * Fix powerpc compilation for config without HAVE_BOOTMEM_INFO_NODE
> * Fix various other nits found by kernel test robot.
> * Use a PFN value of -1 to indicate a non-mirrored mapping
>   in sparse-vmemmap.c, not 0.
> * Fix incorrect if() statement that got mangled in cma.c
> 
> v2:
> * Add missing CMA debugfs code.
> * Minor cleanups in hugetlb_cma changes.
> * Move hugetlb_cma code to its own file to further clean
>   things up.
> 
> On large systems, we observed some issues with hugetlb and CMA:
> 
> 1) When specifying a large number of hugetlb boot pages (hugepages=
>    on the commandline), the kernel may run out of memory before it
>    even gets to HVO. For example, if you have a 3072G system, and
>    want to use 3024 1G hugetlb pages for VMs, that should leave
>    you plenty of space for the hypervisor, provided you have the
>    hugetlb vmemmap optimization (HVO) enabled. However, since
>    the vmemmap pages are always allocated first, and then later
>    in boot freed, you will actually run yourself out of memory
>    before you can do HVO. This means not getting all the hugetlb
>    pages you want, and worse, failure to boot if there is an
>    allocation failure in the system from which it can't recover.
> 
> 2) There is a system setup where you might want to use hugetlb_cma
>    with a large value (say, again, 3024 out of 3072G like above),
>    and then lower that if system usage allows it, to make room
>    for non-hugetlb processes. For this, a variation of the problem
>    above applies: the kernel runs out of unmovable space to allocate
>    from before you finish boot, since your CMA area takes up all
>    the space.
> 
> 3) CMA wants to use one big contiguous area for allocations. Which
>    fails if you have the aforementioned 3T system with a gap in the
>    middle of physical memory (like the < 40bits BIOS DMA area seen on
>    some AMD systems). You then won't be able to set up a CMA area for
>    one of the NUMA nodes, leading to loss of half of your hugetlb
>    CMA area.
> 
> 4) Under the scenario mentioned in 2), when trying to grow the
>    number of hugetlb pages after dropping it for a while, new
>    CMA allocations may fail occasionally. This is not unexpected,
>    some transient references on pages may prevent cma_alloc
>    from succeeding under memory pressure. However, the hugetlb
>    code then falls back to a normal contiguous alloc, which may
>    end up succeeding. This is not always desired behavior. If
>    you have a large CMA area, then the kernel has a restricted
>    amount of memory it can do unmovable allocations from (a well
>    known issue). A normal contiguous alloc may eat further in to
>    this space.

Hi Frank,

While I plan to keep reviewing the series, I think it would make sense
to split this patchset into two smaller ones.
The way I see it, we are trying to deal with two different problems and their
solutions.

1) pre-hvo at boot time
2) multi-range support of CMA (only used for hugetlb)

I did not go through the entire patchset yet, so I ignore whether the
respective patches to tackle these two problems are really dependent on
each other, but I think that would be very interesting to consider a
patchset per solution if that is not the case.

IMHO, it would ease review quite a lot.


-- 
Oscar Salvador
SUSE Labs


  parent reply	other threads:[~2025-02-10 18:40 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 18:50 Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 01/28] mm/cma: export total and free number of pages for CMA areas Frank van der Linden
2025-02-10 10:22   ` Oscar Salvador
2025-02-10 18:18     ` Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 02/28] mm, cma: support multiple contiguous ranges, if requested Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 03/28] mm/cma: introduce cma_intersects function Frank van der Linden
2025-02-14 10:02   ` Alexander Gordeev
2025-02-06 18:50 ` [PATCH v3 04/28] mm, hugetlb: use cma_declare_contiguous_multi Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 05/28] mm/hugetlb: fix round-robin bootmem allocation Frank van der Linden
2025-02-10 12:57   ` Oscar Salvador
2025-02-10 18:30     ` Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 06/28] mm/hugetlb: remove redundant __ClearPageReserved Frank van der Linden
2025-02-10 13:14   ` Oscar Salvador
2025-02-06 18:50 ` [PATCH v3 07/28] mm/hugetlb: use online nodes for bootmem allocation Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 08/28] mm/hugetlb: convert cmdline parameters from setup to early Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 09/28] x86/mm: make register_page_bootmem_memmap handle PTE mappings Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 10/28] mm/bootmem_info: export register_page_bootmem_memmap Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 11/28] mm/sparse: allow for alternate vmemmap section init at boot Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 12/28] mm/hugetlb: set migratetype for bootmem folios Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 13/28] mm: define __init_reserved_page_zone function Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 14/28] mm/hugetlb: check bootmem pages for zone intersections Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 15/28] mm/sparse: add vmemmap_*_hvo functions Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 16/28] mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 17/28] mm/hugetlb: move huge_boot_pages list init " Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 18/28] mm/hugetlb: add pre-HVO framework Frank van der Linden
2025-02-06 18:50 ` [PATCH v3 19/28] mm/hugetlb_vmemmap: fix hugetlb_vmemmap_restore_folios definition Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 20/28] mm/hugetlb: do pre-HVO for bootmem allocated pages Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 21/28] x86/setup: call hugetlb_bootmem_alloc early Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 22/28] x86/mm: set ARCH_WANT_SPARSEMEM_VMEMMAP_PREINIT Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 23/28] mm/cma: simplify zone intersection check Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 24/28] mm/cma: introduce a cma validate function Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 25/28] mm/cma: introduce interface for early reservations Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 26/28] mm/hugetlb: add hugetlb_cma_only cmdline option Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 27/28] mm/hugetlb: enable bootmem allocation from CMA areas Frank van der Linden
2025-02-06 18:51 ` [PATCH v3 28/28] mm/hugetlb: move hugetlb CMA code in to its own file Frank van der Linden
2025-02-10 18:39 ` Oscar Salvador [this message]
2025-02-10 18:56   ` [PATCH v3 00/28] hugetlb/CMA improvements for large systems Frank van der Linden
2025-02-10 23:28     ` Andrew Morton
2025-02-11 17:21       ` Frank van der Linden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6pH_9X-kFhwPz2-@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=fvdl@google.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox