linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Ackerley Tng <ackerleytng@google.com>, Fuad Tabba <tabba@google.com>
Cc: Elliot Berman <quic_eberman@quicinc.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Patrick Roy <roypat@amazon.co.uk>,
	qperret@google.com, linux-coco@lists.linux.dev,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kvm@vger.kernel.org
Subject: Re: [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages
Date: Sat, 17 Aug 2024 00:03:50 +0200	[thread overview]
Message-ID: <93a010dd-d938-4c49-8643-047c7c1b33b9@redhat.com> (raw)
In-Reply-To: <diqz4j7km8yu.fsf@ackerleytng-ctop.c.googlers.com>

On 16.08.24 23:52, Ackerley Tng wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 16.08.24 19:45, Ackerley Tng wrote:
>>>
>>> <snip>
>>>
>>> IIUC folio_lock() isn't a prerequisite for taking a refcount on the
>>> folio.
>>
>> Right, to do folio_lock() you only have to guarantee that the folio
>> cannot get freed concurrently. So you piggyback on another reference
>> (you hold indirectly).
>>
>>>
>>> Even if we are able to figure out a "safe" refcount, and check that the
>>> current refcount == "safe" refcount before removing from direct map,
>>> what's stopping some other part of the kernel from taking a refcount
>>> just after the check happens and causing trouble with the folio's
>>> removal from direct map?
>>
>> Once the page was unmapped from user space, and there were no additional
>> references (e.g., GUP, whatever), any new references can only be
>> (should, unless BUG :) ) temporary speculative references that should
>> not try accessing page content, and that should back off if the folio is
>> not deemed interesting or cannot be locked. (e.g., page
>> migration/compaction/offlining).
> 
> I thought about it again - I think the vmsplice() cases are taken care
> of once we check that the folios are not mapped into userspace, since
> vmsplice() reads from a mapping.
> 
> splice() reads from the fd directly, but that's taken care since
> guest_memfd doesn't have a .splice_read() handler.
> 
> Reading /proc/pid/mem also requires the pages to first be mapped, IIUC,
> otherwise the pages won't show up, so checking that there are no more
> mappings to userspace takes care of this.

You have a misconception.

You can map pages to user space, GUP them, and then unmap them from user 
space. A GUP reference can outlive your user space mappings, easily.

So once there is a raised refcount, it could as well just be from 
vmsplice, or a pending reference from /proc/pid/mem, O_DIRECT, ...

> 
>>
>> Of course, there are some corner cases (kgdb, hibernation, /proc/kcore),
>> but most of these can be dealt with in one way or the other (make these
>> back off and not read/write page content, similar to how we handled it
>> for secretmem).
> 
> Does that really leave us with these corner cases? And so perhaps we
> could get away with just taking the folio_lock() to keep away the
> speculative references? So something like
> 
>    1. Check that the folio is not mapped and not pinned.

To do that, you have to lookup the folio first. That currently requires 
a refcount increment, even if only temporarily. Maybe we could avoid 
that, if we can guarantee that we are the only one modifying the 
pageache here, and we sync against that ourselves.

>    2. folio_lock() all the folios about to be removed from direct map
>    -- With the lock, all other accesses should be speculative --
>    3. Check that the refcount == "safe" refcount
>        3a. Unlock and return to userspace with -EAGAIN
>    4. Remove from direct map
>    5. folio_unlock() all those folios
> 
> Perhaps a very naive question: can the "safe" refcount be statically
> determined by walking through the code and counting where refcount is
> expected to be incremented?


Depends on how we design it. But if you hand out "safe" references to 
KVM etc, you'd have to track that -- and how often -- somehow. At which 
point we are at "increment/decrement" safe reference to track that for you.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-08-16 22:04 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 18:34 [PATCH RFC 0/4] mm: Introduce guest_memfd library Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 1/4] mm: Introduce guest_memfd Elliot Berman
2024-08-06 13:48   ` David Hildenbrand
2024-08-08 18:39   ` Ackerley Tng
2024-08-05 18:34 ` [PATCH RFC 2/4] kvm: Convert to use mm/guest_memfd Elliot Berman
2024-08-05 18:34 ` [PATCH RFC 3/4] mm: guest_memfd: Add option to remove guest private memory from direct map Elliot Berman
2024-08-06 14:08   ` David Hildenbrand
     [not found]     ` <396fb134-f43e-4263-99a8-cfcef82bfd99@amazon.com>
2024-08-15 19:08       ` Manwaring, Derek
2024-08-06 15:39   ` Patrick Roy
     [not found]     ` <20240806104702482-0700.eberman@hu-eberman-lv.qualcomm.com>
     [not found]       ` <a43ae745-9907-425f-b09d-a49405d6bc2d@amazon.co.uk>
     [not found]         ` <90886a03-ad62-4e98-bc05-63875faa9ccc@amazon.co.uk>
     [not found]           ` <20240807113514068-0700.eberman@hu-eberman-lv.qualcomm.com>
     [not found]             ` <7166d51c-7757-44f2-a6f8-36da3e86bf90@amazon.co.uk>
2024-08-08 22:16               ` Elliot Berman
2024-08-09 15:02                 ` Patrick Roy
2024-08-19 10:09   ` Mike Rapoport
2024-08-20 16:56     ` Elliot Berman
2024-08-21 14:26       ` Mike Rapoport
2024-08-05 18:34 ` [PATCH RFC 4/4] mm: guest_memfd: Add ability for mmap'ing pages Elliot Berman
2024-08-06 13:51   ` David Hildenbrand
     [not found]     ` <20240806093625007-0700.eberman@hu-eberman-lv.qualcomm.com>
     [not found]       ` <a7c5bfc0-1648-4ae1-ba08-e706596e014b@redhat.com>
2024-08-08 21:41         ` Elliot Berman
2024-08-08 21:55           ` David Hildenbrand
2024-08-08 22:26             ` Elliot Berman
2024-08-09  7:16               ` David Hildenbrand
2024-08-15  7:24     ` Fuad Tabba
2024-08-16  9:48       ` David Hildenbrand
2024-08-16 11:19         ` Fuad Tabba
2024-08-16 17:45         ` Ackerley Tng
2024-08-16 18:08           ` David Hildenbrand
2024-08-16 21:52             ` Ackerley Tng
2024-08-16 22:03               ` David Hildenbrand [this message]
2024-08-16 23:52                 ` Elliot Berman
2024-08-06 15:48   ` Patrick Roy
2024-08-08 18:51   ` Ackerley Tng
2024-08-08 21:42     ` Elliot Berman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93a010dd-d938-4c49-8643-047c7c1b33b9@redhat.com \
    --to=david@redhat.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox