linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Björn Töpel" <bjorn@kernel.org>,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	linux-riscv@lists.infradead.org
Cc: "Björn Töpel" <bjorn@rivosinc.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Oscar Salvador" <osalvador@suse.de>,
	virtualization@lists.linux-foundation.org, linux@rivosinc.com,
	"Alexandre Ghiti" <alexghiti@rivosinc.com>
Subject: Re: [PATCH 0/7] riscv: Memory Hot(Un)Plug support
Date: Wed, 17 May 2023 15:49:27 +0200	[thread overview]
Message-ID: <9aa7d030-19b5-01df-70c0-86d8d6ab86a6@redhat.com> (raw)
In-Reply-To: <20230512145737.985671-1-bjorn@kernel.org>

On 12.05.23 16:57, Björn Töpel wrote:
> From: Björn Töpel <bjorn@rivosinc.com>
> 
> Memory Hot(Un)Plug support for the RISC-V port
> ==============================================
> 
> Introduction
> ------------
> 
> To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory
> hot(un)plug allows for increasing and decreasing the size of physical
> memory available to a machine at runtime."
> 
> This series attempts to add memory hot(un)plug support for the RISC-V
> Linux port.
> 
> I'm sending the series as a v1, but it's borderline RFC. It definitely
> needs more testing time, but it would be nice with some early input.
> 
> Implementation
> --------------
> 
>  From an arch perspective, a couple of callbacks needs to be
> implemented to support hot plugging:
> 
> arch_add_memory()
> This callback is responsible for updating the linear/direct map, and
> call into the memory hot plugging generic code via __add_pages().
> 
> arch_remove_memory()
> In this callback the linear/direct map is tore down.
> 
> vmemmap_free()
> The function tears down the vmemmap mappings (if
> CONFIG_SPARSEMEM_VMEMMAP is in-use), and also deallocates the backing
> vmemmap pages. Note that for persistent memory, an alternative
> allocator for the backing pages can be used -- the vmem_altmap. This
> means that when the backing pages are cleared, extra care is needed so
> that the correct deallocation method is used. Note that RISC-V
> populates the vmemmap using vmemmap_populate_basepages(), so currently
> no hugepages are used for the backing store.
> 
> The page table unmap/teardown functions are heavily based (copied!)
> from the x86 tree. The same remove_pgd_mapping() is used in both
> vmemmap_free() and arch_remove_memory(), but in the latter function
> the backing pages are not removed.
> 
> On RISC-V, the PGD level kernel mappings needs to synchronized with
> all page-tables (e.g. via sync_kernel_mappings()). Synchronization
> involves special care, like locking. Instead, this patch series takes
> a different approach (introduced by Jörg Rödel in the x86-tree);
> Pre-allocate the PGD-leaves (P4D, PUD, or PMD depending on the paging
> setup) at mem_init(), for vmemmap and the direct map.
> 
> Pre-allocating the PGD-leaves waste some memory, but is only enabled
> for CONFIG_MEMORY_HOTPLUG. The number pages, potentially unused, are
> ~128 * 4K.
> 
> Patch 1: Preparation for hotplugging support, by pre-allocating the
>           PGD leaves.
> 
> Patch 2: Changes the __init attribute to __meminit, to avoid that the
>           functions are removed after init. __meminit keeps the
>           functions after init, if memory hotplugging is enabled for
>           the build.
>           
> Patch 3: Refactor the direct map setup, so it can be used for hot add.
> 
> Patch 4: The actual add/remove code. Mostly a page-table-walk
>           exercise.
> 
> Patch 5: Turn on the arch support in Kconfig
> 
> Patch 6: Now that memory hotplugging is enabled, make virtio-mem
>           usable for RISC-V
>           
> Patch 7: Pre-allocate vmalloc PGD-leaves as well, which removes the
>           need for vmalloc faulting.
>           
> RFC
> ---
> 
>   * TLB flushes. The current series uses Big Hammer flush-it-all.
>   * Pre-allocation vs explicit syncs
> 
> Testing
> -------
> 
> ACPI support is still in the making for RISC-V, so tests that involve
> CXL and similar fanciness is currently not possible. Virtio-mem,
> however, works without proper ACPI support. In order to try this out
> in Qemu, some additional patches for Qemu are needed:
> 
>   * Enable virtio-mem for RISC-V
>   * Add proper hotplug support for virtio-mem
>   
> The patch for Qemu can be found is commit 5d90a7ef1bc0
> ("hw/riscv/virt: Support for virtio-mem-pci"), and can be found here
> 
>    https://github.com/bjoto/qemu/tree/riscv-virtio-mem
> 
> I will try to upstream that work in parallel with this.
>    
> Thanks to David Hildenbrand for valuable input for the Qemu side of
> things.
> 
> The series is based on the RISC-V fixes tree
>    https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/log/?h=fixes
> 

Cool stuff! I'm fairly busy right now, so some high-level questions upfront:

What is the memory section size (which implies the memory block size 
and)? This implies the minimum DIMM granularity and the high-level 
granularity in which virtio-mem adds memory.

What is the pageblock size, implying the minimum granularity that 
virtio-mem can operate on?

On x86-64 and arm64 we currently use the ACPI SRAT to expose the maximum 
physical address where we can see memory getting hotplugged. [1] From 
that, we can derive the "max_possible_pfn" and prepare the kernel 
virtual memory layourt (especially, direct map).

Is something similar required on RISC-V? On s390x, I'm planning on 
adding a paravirtualized mechanism to detect where memory devices might 
be located. (I had a running RFC, but was distracted by all other kinds 
of stuff)


[1] https://virtio-mem.gitlab.io/developer-guide.html

-- 
Thanks,

David / dhildenb



  parent reply	other threads:[~2023-05-17 13:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-12 14:57 Björn Töpel
2023-05-12 14:57 ` [PATCH 1/7] riscv: mm: Pre-allocate PGD leaves to avoid synchronization Björn Töpel
2023-05-12 14:57 ` [PATCH 2/7] riscv: mm: Change attribute from __init to __meminit for page functions Björn Töpel
2023-05-12 14:57 ` [PATCH 3/7] riscv: mm: Refactor create_linear_mapping_range() for hot add Björn Töpel
2023-06-21 23:56   ` Palmer Dabbelt
2023-06-22  4:56     ` Björn Töpel
2023-05-12 14:57 ` [PATCH 4/7] riscv: mm: Add memory hot add/remove support Björn Töpel
2023-05-12 14:57 ` [PATCH 5/7] riscv: Enable memory hot add/remove arch kbuild support Björn Töpel
2023-05-12 14:57 ` [PATCH 6/7] virtio-mem: Enable virtio-mem for RISC-V Björn Töpel
2023-05-12 14:57 ` [PATCH 7/7] riscv: mm: Pre-allocate vmalloc PGD leaves Björn Töpel
2023-05-17 13:49 ` David Hildenbrand [this message]
2023-05-17 18:53   ` [PATCH 0/7] riscv: Memory Hot(Un)Plug support Björn Töpel
2023-05-21  9:15     ` Björn Töpel
2023-05-22  8:21       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9aa7d030-19b5-01df-70c0-86d8d6ab86a6@redhat.com \
    --to=david@redhat.com \
    --cc=alexghiti@rivosinc.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=bjorn@kernel.org \
    --cc=bjorn@rivosinc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@rivosinc.com \
    --cc=osalvador@suse.de \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox