From: Isaku Yamahata <isaku.yamahata@gmail.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: akpm@linux-foundation.org, mike.kravetz@oracle.com,
muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com,
shuah@kernel.org, willy@infradead.org, brauner@kernel.org,
chao.p.peng@linux.intel.com, coltonlewis@google.com,
david@redhat.com, dhildenb@redhat.com, dmatlack@google.com,
erdemaktas@google.com, hughd@google.com,
isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com,
joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com,
kirill.shutemov@linux.intel.com, liam.merwick@oracle.com,
mail@maciej.szmigiero.name, mhocko@suse.com,
michael.roth@amd.com, qperret@google.com, rientjes@google.com,
rppt@kernel.org, steven.price@arm.com, tabba@google.com,
vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com,
vkuznets@redhat.com, wei.w.wang@intel.com,
yu.c.zhang@linux.intel.com, kvm@vger.kernel.org,
linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org
Subject: Re: [RFC PATCH 00/19] hugetlb support for KVM guest_mem
Date: Wed, 7 Jun 2023 21:38:10 -0700 [thread overview]
Message-ID: <20230608043810.GJ2244082@ls.amr.corp.intel.com> (raw)
In-Reply-To: <cover.1686077275.git.ackerleytng@google.com>
On Tue, Jun 06, 2023 at 07:03:45PM +0000,
Ackerley Tng <ackerleytng@google.com> wrote:
> Hello,
>
> This patchset builds upon a soon-to-be-published WIP patchset that Sean
> published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, mentioned
> at [1].
>
> The tree can be found at:
> https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1
>
> In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is introduced,
> allowing VM private memory (for confidential computing) to be backed by hugetlb
> pages.
>
> guest_mem provides userspace with a handle, with which userspace can allocate
> and deallocate memory for confidential VMs without mapping the memory into
> userspace.
>
> Why use hugetlb instead of introducing a new allocator, like gmem does for 4K
> and transparent hugepages?
>
> + hugetlb provides the following useful functionality, which would otherwise
> have to be reimplemented:
> + Allocation of hugetlb pages at boot time, including
> + Parsing of kernel boot parameters to configure hugetlb
> + Tracking of usage in hstate
> + gmem will share the same system-wide pool of hugetlb pages, so users
> don't have to have separate pools for hugetlb and gmem
> + Page accounting with subpools
> + hugetlb pages are tracked in subpools, which gmem uses to reserve
> pages from the global hstate
> + Memory charging
> + hugetlb provides code that charges memory to cgroups
> + Reporting: hugetlb usage and availability are available at /proc/meminfo,
> etc
>
> The first 11 patches in this patchset is a series of refactoring to decouple
> hugetlb and hugetlbfs.
>
> The central thread binding the refactoring is that some functions (like
> inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs
> concept, that the resv_map, subpool, hstate, are in a specific field in a
> hugetlb inode.
>
> Refactoring to parametrize functions by hstate, subpool, resv_map will allow
> hugetlb to be used by gmem and in other places where these data structures
> aren't necessarily stored in the same positions in the inode.
>
> The refactoring proposed here is just the minimum required to get a
> proof-of-concept working with gmem. I would like to get opinions on this
> approach before doing further refactoring. (See TODOs)
>
> TODOs:
>
> + hugetlb/hugetlbfs refactoring
> + remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs
> specific and used only in inode.c
> + remove_mapping_hugepages(), remove_inode_single_folio(),
> hugetlb_unreserve_pages() shouldn't need to take inode as a parameter
> + Updating inode->i_blocks can be refactored to a separate function and
> called from hugetlbfs and gmem
> + alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by
> vma
> + hugetlb_reserve_pages() should be refactored to be symmetric with
> hugetlb_unreserve_pages()
> + It should be parametrized by resv_map
> + alloc_hugetlb_folio_from_subpool() could perhaps use
> hugetlb_reserve_pages()?
> + gmem
> + Figure out if resv_map should be used by gmem at all
> + Probably needs more refactoring to decouple resv_map from hugetlb
> functions
Hi. If kvm gmem is compiled as kernel module, many symbols are failed to link.
You need to add EXPORT_SYMBOL{,_GPL} for exported symbols.
Or compile it to kernel instead of module?
Thanks,
> Questions for the community:
>
> 1. In this patchset, every gmem file backed with hugetlb is given a new
> subpool. Is that desirable?
> + In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one
> mount per hugetlb size (2M, 1G, etc)
> + memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so it
> (rightfully) uses the hugetlbfs kernel mounts and their subpools
> + I gave each file a subpool mostly to speed up implementation and still be
> able to reserve hugetlb pages from the global hstate based on the gmem
> file size.
> + gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so
> + Should there be multiple mounts, one for each hugetlb size?
> + Will the mounts be initialized on boot or on first gmem file creation?
> + Or is one subpool per gmem file fine?
> 2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace
> reservations?
>
> [1] https://lore.kernel.org/lkml/ZEM5Zq8oo+xnApW9@google.com/
>
> ---
>
> Ackerley Tng (19):
> mm: hugetlb: Expose get_hstate_idx()
> mm: hugetlb: Move and expose hugetlbfs_zero_partial_page
> mm: hugetlb: Expose remove_inode_hugepages
> mm: hugetlb: Decouple hstate, subpool from inode
> mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool
> and hstate
> mm: hugetlb: Provide hugetlb_filemap_add_folio()
> mm: hugetlb: Refactor vma_*_reservation functions
> mm: hugetlb: Refactor restore_reserve_on_error
> mm: hugetlb: Use restore_reserve_on_error directly in filesystems
> mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by
> resv_map
> mm: hugetlb: Parametrize hugetlb functions by resv_map
> mm: truncate: Expose preparation steps for truncate_inode_pages_final
> KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers
> KVM: guest_mem: Refactor cleanup to separate inode and file cleanup
> KVM: guest_mem: hugetlb: initialization and cleanup
> KVM: guest_mem: hugetlb: allocate and truncate from hugetlb
> KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem
> KVM: selftests: Support various types of backing sources for private
> memory
> KVM: selftests: Update test for various private memory backing source
> types
>
> fs/hugetlbfs/inode.c | 102 ++--
> include/linux/hugetlb.h | 86 ++-
> include/linux/mm.h | 1 +
> include/uapi/linux/kvm.h | 25 +
> mm/hugetlb.c | 324 +++++++-----
> mm/truncate.c | 24 +-
> .../testing/selftests/kvm/guest_memfd_test.c | 33 +-
> .../testing/selftests/kvm/include/test_util.h | 14 +
> tools/testing/selftests/kvm/lib/test_util.c | 74 +++
> .../kvm/x86_64/private_mem_conversions_test.c | 38 +-
> virt/kvm/guest_mem.c | 488 ++++++++++++++----
> 11 files changed, 882 insertions(+), 327 deletions(-)
>
> --
> 2.41.0.rc0.172.g3f132b7071-goog
--
Isaku Yamahata <isaku.yamahata@gmail.com>
next parent reply other threads:[~2023-06-08 4:38 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1686077275.git.ackerleytng@google.com>
2023-06-08 4:38 ` Isaku Yamahata [this message]
2023-06-16 18:28 ` Mike Kravetz
2023-06-21 9:01 ` Vishal Annapurve
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230608043810.GJ2244082@ls.amr.corp.intel.com \
--to=isaku.yamahata@gmail.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chao.p.peng@linux.intel.com \
--cc=coltonlewis@google.com \
--cc=david@redhat.com \
--cc=dhildenb@redhat.com \
--cc=dmatlack@google.com \
--cc=erdemaktas@google.com \
--cc=hughd@google.com \
--cc=jarkko@kernel.org \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=jthoughton@google.com \
--cc=jun.nakajima@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=mhocko@suse.com \
--cc=michael.roth@amd.com \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qperret@google.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=vipinsh@google.com \
--cc=vkuznets@redhat.com \
--cc=wei.w.wang@intel.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=yu.c.zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox