linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ackerley Tng <ackerleytng@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk,
	 jgg@nvidia.com, peterx@redhat.com, david@redhat.com,
	rientjes@google.com,  fvdl@google.com, jthoughton@google.com,
	seanjc@google.com,  pbonzini@redhat.com, zhiquan1.li@intel.com,
	fan.du@intel.com,  jun.miao@intel.com, isaku.yamahata@intel.com,
	muchun.song@linux.dev,  mike.kravetz@oracle.com,
	erdemaktas@google.com, qperret@google.com,  jhubbard@nvidia.com,
	willy@infradead.org, shuah@kernel.org,  brauner@kernel.org,
	bfoster@redhat.com, kent.overstreet@linux.dev,  pvorel@suse.cz,
	rppt@kernel.org, richard.weiyang@gmail.com,  anup@brainfault.org,
	haibo1.xu@intel.com, ajones@ventanamicro.com,
	 vkuznets@redhat.com, maciej.wieczor-retman@intel.com,
	pgonda@google.com,  oliver.upton@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-fsdevel@kvack.org
Subject: Re: [RFC PATCH 14/39] KVM: guest_memfd: hugetlb: initialization and cleanup
Date: Tue, 01 Oct 2024 23:00:38 +0000	[thread overview]
Message-ID: <diqz7cartomh.fsf@ackerleytng-ctop.c.googlers.com> (raw)
In-Reply-To: <CAGtprH-xw9DFwheTicNStXKhMJTsuXviBnq1PwvrxEHMNkb83A@mail.gmail.com> (message from Vishal Annapurve on Fri, 20 Sep 2024 11:17:45 +0200)

Vishal Annapurve <vannapurve@google.com> writes:

> On Wed, Sep 11, 2024 at 1:44 AM Ackerley Tng <ackerleytng@google.com> wrote:
>>
>> ...
>> +}
>> +
>> +static void kvm_gmem_evict_inode(struct inode *inode)
>> +{
>> +       u64 flags = (u64)inode->i_private;
>> +
>> +       if (flags & KVM_GUEST_MEMFD_HUGETLB)
>> +               kvm_gmem_hugetlb_teardown(inode);
>> +       else
>> +               truncate_inode_pages_final(inode->i_mapping);
>> +
>> +       clear_inode(inode);
>> +}
>> +
>>  static const struct super_operations kvm_gmem_super_operations = {
>>         .statfs         = simple_statfs,
>> +       .evict_inode    = kvm_gmem_evict_inode,
>
> Ackerley, can we use free_inode[1] callback to free any special
> metadata associated with the inode instead of relying on
> super_operations?
>
> [1] https://elixir.bootlin.com/linux/v6.11/source/include/linux/fs.h#L719
>

.free_inode() is not a direct replacement for .evict_inode().

If the .free_inode() op is NULL, free_inode_nonrcu(inode) handles freeing the
struct inode itself. Hence, the .free_inode() op is meant for freeing the inode
struct.

.free_inode() should undo what .alloc_inode() does.

There's more information about the ops free_inode() here
https://docs.kernel.org/filesystems/porting.html, specifically

| Rules for inode destruction:
|
| + if ->destroy_inode() is non-NULL, it gets called
| + if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
| + combination of NULL ->destroy_inode and NULL ->free_inode is treated as
|   NULL/free_inode_nonrcu, to preserve the compatibility.

The common setup is to have a larger containing struct containing a struct
inode, and the .free_inode() op will then free the larger struct. In our case,
we're not using a containing struct for the metadata, so .free_inode() isn't the
appropriate op.

I think this question might be related to Sean's question at LPC about whether
it is necessary for guest_memfd to have its own mount, as opposed to using the
anon_inode_mnt.

I believe having its own mount is the correct approach, my reasoning is as
follows

1. We want to clean up these inode metadata when the last reference to the inode
   is dropped
2. That means using some op on the iput_final() path.
3. All the ops on the iput_final() path are in struct super_operations, which is
   part of struct super_block
4. struct super_block should be used together with a mount

Hence, I think it is correct to have a guest_memfd mount. I guess it might be
possible to have a static super_block without a mount, but that seems hacky and
brittle, and I'm not aware of any precedent for a static super_block.

Sean, what are your concerns with having a guest_memfd mount?

Comparing the callbacks along the iput_final() path, we have these:

+ .drop_inode() determines whether to evict the inode, so that's not the
  approprate op.
+ .evict_inode() is the current proposal, which is a place where the inode's
  fields are cleaned up. HugeTLB uses this to clean up resv_map, which it also
  stores in inode->i_mapping->i_private_data.
+ .destroy_inode() should clean up inode allocation if inode allocation involves
  a containing struct (like shmem_inode_info). Shmem uses this to clean up a
  struct shared_policy, which we will eventually need to store as well.
+ .free_inode() is the rcu-delayed part that completes inode cleanup.

Using .free_inode() implies using a containing struct to follow the
convention. Between putting metadata in a containing struct and using
inode->i_mapping->i_private_data, I think using inode->i_mapping->i_private_data
is less complex since it avoids needing a custom .alloc_inode() op.

Other than using inode->i_mapping->i_private_data, there's the option of
combining the metadata with guest_memfd flags, and storing everything in
inode->i_private.

Because inode->i_mapping actually points to inode->i_data and i_data is
a part of the inode (not a pointer), .evict_inode() is still the op to
use to clean both inode->i_mapping->i_private_data and inode->i_private.

I think we should stick with storing metadata (faultability xarray and hugetlb
pool reference) in inode->i_mapping->i_private_data because both of these are
properties of the page cache/filemap.

When we need to store a memory policy, we might want to use .destroy_inode() to
align with shmem.

What do you all think?

And there's no way to set inode->free_inode directly and skip copying
from inode->i_sb->s_op. All the code paths going to i_callback() copy
inode->i_sb->s_op->free_inode to inode->free_inode before calling
.free_inode() in i_callback() to complete the inode cleanup.


  reply	other threads:[~2024-10-01 23:00 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10 23:43 [RFC PATCH 00/39] 1G page support for guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 01/39] mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 02/39] mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 03/39] mm: hugetlb: Remove unnecessary check for avoid_reserve Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 04/39] mm: mempolicy: Refactor out policy_node_nodemask() Ackerley Tng
2024-09-11 16:46   ` Gregory Price
2024-09-10 23:43 ` [RFC PATCH 05/39] mm: hugetlb: Refactor alloc_buddy_hugetlb_folio_with_mpol() to interpret mempolicy instead of vma Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 06/39] mm: hugetlb: Refactor dequeue_hugetlb_folio_vma() to use mpol Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 07/39] mm: hugetlb: Refactor out hugetlb_alloc_folio Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 08/39] mm: truncate: Expose preparation steps for truncate_inode_pages_final Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 09/39] mm: hugetlb: Expose hugetlb_subpool_{get,put}_pages() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 10/39] mm: hugetlb: Add option to create new subpool without using surplus Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 11/39] mm: hugetlb: Expose hugetlb_acct_memory() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 12/39] mm: hugetlb: Move and expose hugetlb_zero_partial_page() Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 13/39] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Ackerley Tng
2025-04-02  4:01   ` Yan Zhao
2025-04-23 20:22     ` Ackerley Tng
2025-04-24  3:53       ` Yan Zhao
2024-09-10 23:43 ` [RFC PATCH 14/39] KVM: guest_memfd: hugetlb: initialization and cleanup Ackerley Tng
2024-09-20  9:17   ` Vishal Annapurve
2024-10-01 23:00     ` Ackerley Tng [this message]
2024-12-01 17:59   ` Peter Xu
2025-02-13  9:47     ` Ackerley Tng
2025-02-26 18:55       ` Ackerley Tng
2025-03-06 17:33   ` Peter Xu
2024-09-10 23:43 ` [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb Ackerley Tng
2024-09-13 22:26   ` Elliot Berman
2024-10-03 20:23     ` Ackerley Tng
2024-10-30  9:01   ` Jun Miao
2025-02-11  1:21     ` Ackerley Tng
2024-12-01 17:55   ` Peter Xu
2025-02-13  7:52     ` Ackerley Tng
2025-02-13 16:48       ` Peter Xu
2024-09-10 23:43 ` [RFC PATCH 16/39] KVM: guest_memfd: Add page alignment check for hugetlb guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 17/39] KVM: selftests: Add basic selftests for hugetlb-backed guest_memfd Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 18/39] KVM: selftests: Support various types of backing sources for private memory Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 19/39] KVM: selftests: Update test for various private memory backing source types Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 20/39] KVM: selftests: Add private_mem_conversions_test.sh Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 21/39] KVM: selftests: Test that guest_memfd usage is reported via hugetlb Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 22/39] mm: hugetlb: Expose vmemmap optimization functions Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 23/39] mm: hugetlb: Expose HugeTLB functions for promoting/demoting pages Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 24/39] mm: hugetlb: Add functions to add/move/remove from hugetlb lists Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 25/39] KVM: guest_memfd: Split HugeTLB pages for guest_memfd use Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem_private Ackerley Tng
2024-10-10 16:06   ` Peter Xu
2024-10-11 23:32     ` Ackerley Tng
2024-10-15 21:34       ` Peter Xu
2024-10-15 23:42         ` Ackerley Tng
2024-10-16  8:45           ` David Hildenbrand
2024-10-16 20:16             ` Peter Xu
2024-10-16 22:51               ` Jason Gunthorpe
2024-10-16 23:49                 ` Peter Xu
2024-10-16 23:54                   ` Jason Gunthorpe
2024-10-17 14:58                     ` Peter Xu
2024-10-17 16:47                       ` Jason Gunthorpe
2024-10-17 17:05                         ` Peter Xu
2024-10-17 17:10                           ` Jason Gunthorpe
2024-10-17 19:11                             ` Peter Xu
2024-10-17 19:18                               ` Jason Gunthorpe
2024-10-17 19:29                                 ` David Hildenbrand
2024-10-18  7:15                                 ` Patrick Roy
2024-10-18  7:50                                   ` David Hildenbrand
2024-10-18  9:34                                     ` Patrick Roy
2024-10-17 17:11                         ` David Hildenbrand
2024-10-17 17:16                           ` Jason Gunthorpe
2024-10-17 17:55                             ` David Hildenbrand
2024-10-17 18:26                             ` Vishal Annapurve
2024-10-17 14:56                   ` David Hildenbrand
2024-10-17 15:02               ` David Hildenbrand
2024-10-16  8:50           ` David Hildenbrand
2024-10-16 10:48             ` Vishal Annapurve
2024-10-16 11:54               ` David Hildenbrand
2024-10-16 11:57                 ` Jason Gunthorpe
2025-02-25 20:37   ` Peter Xu
2025-04-23 22:07     ` Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files Ackerley Tng
2025-01-20 22:42   ` Peter Xu
2025-04-23 20:25     ` Ackerley Tng
2025-03-04 23:24   ` Peter Xu
2025-04-02  4:07   ` Yan Zhao
2025-04-23 20:28     ` Ackerley Tng
2024-09-10 23:43 ` [RFC PATCH 28/39] KVM: guest_memfd: Use vm_type to determine default faultability Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 29/39] KVM: Handle conversions in the SET_MEMORY_ATTRIBUTES ioctl Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap Ackerley Tng
2024-09-16 20:00   ` Elliot Berman
2024-10-03 21:32     ` Ackerley Tng
2024-10-03 23:43       ` Ackerley Tng
2024-10-08 19:30         ` Sean Christopherson
2024-10-07 15:56       ` Patrick Roy
2024-10-08 18:07         ` Ackerley Tng
2024-10-08 19:56           ` Sean Christopherson
2024-10-09  3:51             ` Manwaring, Derek
2024-10-09 13:52               ` Andrew Cooper
2024-10-10 16:21             ` Patrick Roy
2024-10-10 19:27               ` Manwaring, Derek
2024-10-17 23:16               ` Ackerley Tng
2024-10-18  7:10                 ` Patrick Roy
2024-09-10 23:44 ` [RFC PATCH 31/39] KVM: selftests: Allow vm_set_memory_attributes to be used without asserting return value of 0 Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 32/39] KVM: selftests: Test using guest_memfd memory from userspace Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 33/39] KVM: selftests: Test guest_memfd memory sharing between guest and host Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 34/39] KVM: selftests: Add notes in private_mem_kvm_exits_test for mmap-able guest_memfd Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 35/39] KVM: selftests: Test that pinned pages block KVM from setting memory attributes to PRIVATE Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 36/39] KVM: selftests: Refactor vm_mem_add to be more flexible Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 37/39] KVM: selftests: Add helper to perform madvise by memslots Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 38/39] KVM: selftests: Update private_mem_conversions_test for mmap()able guest_memfd Ackerley Tng
2024-09-10 23:44 ` [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct HugeTLB page Ackerley Tng
2025-04-03 12:33   ` Yan Zhao
2025-04-23 22:02     ` Ackerley Tng
2025-04-24  1:09       ` Yan Zhao
2025-04-24  4:25         ` Yan Zhao
2025-04-24  5:55           ` Chenyi Qiang
2025-04-24  8:13             ` Yan Zhao
2025-04-24 14:10               ` Vishal Annapurve
2025-04-24 18:15                 ` Ackerley Tng
2025-04-25  4:02                   ` Yan Zhao
2025-04-25 22:45                     ` Ackerley Tng
2025-04-28  1:05                       ` Yan Zhao
2025-04-28 19:02                         ` Vishal Annapurve
2025-04-30 20:09                         ` Ackerley Tng
2025-05-06  1:23                           ` Yan Zhao
2025-05-06 19:22                             ` Ackerley Tng
2025-05-07  3:15                               ` Yan Zhao
2025-05-13 17:33                                 ` Ackerley Tng
2024-09-11  6:56 ` [RFC PATCH 00/39] 1G page support for guest_memfd Michal Hocko
2024-09-14  1:08 ` Du, Fan
2024-09-14 13:34   ` Vishal Annapurve
2025-01-28  9:42 ` Amit Shah
2025-02-03  8:35   ` Ackerley Tng
2025-02-06 11:07     ` Amit Shah
2025-02-07  6:25       ` Ackerley Tng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=diqz7cartomh.fsf@ackerleytng-ctop.c.googlers.com \
    --to=ackerleytng@google.com \
    --cc=ajones@ventanamicro.com \
    --cc=anup@brainfault.org \
    --cc=bfoster@redhat.com \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=erdemaktas@google.com \
    --cc=fan.du@intel.com \
    --cc=fvdl@google.com \
    --cc=haibo1.xu@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=jun.miao@intel.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pgonda@google.com \
    --cc=pvorel@suse.cz \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=roypat@amazon.co.uk \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vkuznets@redhat.com \
    --cc=willy@infradead.org \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox