Re: [EXT] Re: COW in userspace

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Ralf Ramsauer <ralf.ramsauer@oth-regensburg.de>, linux-mm@kvack.org
Cc: Wolfgang Mauerer <wolfgang.mauerer@oth-regensburg.de>,
	Mario Mintel <mario.mintel@st.oth-regensburg.de>
Subject: Re: [EXT] Re: COW in userspace
Date: Mon, 23 Aug 2021 12:33:45 +0200	[thread overview]
Message-ID: <eadd41a9-8953-9f77-6e41-ce2301d4c3a3@redhat.com> (raw)
In-Reply-To: <bde0d9ae-4d29-13cf-8ecd-53f33bde6cc7@oth-regensburg.de>

On 23.08.21 12:16, Ralf Ramsauer wrote:
> 
> 
> On 23/08/2021 10:02, David Hildenbrand wrote:
>> On 20.08.21 15:13, Ralf Ramsauer wrote:
>>> Dear mm folks,
>>>
>>> I have an issue, where it would be great to have a COW-backed virtual
>>> memory area within an userspace process. I know there's the possibility
>>> to have a file-backed MAP_SHARED vma, which is later duplicated with
>>> MAP_PRIVATE, but that's not exactly what I'm looking for.
>>>
>>> Say I have an anonymous page-aligned VMA a, with MAP_PRIVATE and
>>> PROT_RW. Userspace happily writes to/reads from it. At some point in
>>> time, I want to 'snapshot' that single VMA within the context of the
>>> process and without the need to fork(). Say there's something like
>>>
>>>     a = mmap(0, len, PROT_RW, MAP_ANON | MAP_POPULATE, -1, 0);
>>>     [... fill a ...]
>>>
>>>     b = mmdup(a, len, PROT_READ);
>>>
>>> b shall be the new base pointer of a new VMA that is backed by COW
>>> mechanisms. After mmdup, those regular COW mechanisms do the rest: both
>>> VMAs (a and b) will fault on subsequent writes and duplicate the
>>> previously shared physical mapping, pretty much what cow_fault or
>>> shared_fault does.
>>>
>>> Afaict, this, or at least something like this is currently not supported
>>> by the kernel. Is that correct? If so, why? Generally spoken, is it a
>>> bad idea?
>>
>> Not sure if it helps (most probably not), QEMU uses uffd-wp for
>> background snapshots of VM memory. It's different, though, as you'll
>> only have a single mapping and will be catching modifications to your
>> single mapping, such that you can "safe away" relevant snapshot pages
>> before any modifications.
> 
> Thanks for the pointer, David. I'll have a look.
> 
>>
>> You mention "both VMAs (a and b) will fault on subsequent writes", so
>> would you actually be allowing PROT_WRITE access to b ("snapshot")?
>>
> 
> In general, yes, both should be allowed to be PROT_WRITE. So no matter
> "which side" causes the fault, simply both will lead to duplication.
> 
> If it would make things easier, then it would also be absolutely fine to
> have the snapshot PROT_READ, which would suffice my requirements as well.

I recall that Redis has very similar requirements for live snapshotting. 
They used to handle it via fork() just as you described as I was told. I 
don't know if they already switched to uffd-wp, but I would guess they 
already did, because they were another excellent use case for uffd-wp

https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg02955.html

You can handle COW manually in user space that way

1. Creating a second anonymous mapping
2. Registering a UFFD-WP handler on the original mapping
3. WP-protecting the original mapping via UFFD
4. Tracking in a bitmap which pages were already copied

So when you get notified about a WP event, you copy the page manually to 
the second mapping, un-protect the page, and remember in the bitmap that 
the page has been copied.

When reading the snapshot, you have to take a look at the bitmap to 
figure out if you have to read a specific page from the original, or 
from the second mapping. But you won't be able to just read the second 
mapping. (question would be, if that is really required or can be 
worked-around)

-- 
Thanks,

David / dhildenb

next prev parent reply	other threads:[~2021-08-23 10:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20 13:13 Ralf Ramsauer
2021-08-20 23:12 ` Jerome Glisse
2021-08-23  8:02 ` David Hildenbrand
2021-08-23 10:16   ` [EXT] " Ralf Ramsauer
2021-08-23 10:33     ` David Hildenbrand [this message]
2021-08-23 10:49       ` Ralf Ramsauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eadd41a9-8953-9f77-6e41-ce2301d4c3a3@redhat.com \
    --to=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mario.mintel@st.oth-regensburg.de \
    --cc=ralf.ramsauer@oth-regensburg.de \
    --cc=wolfgang.mauerer@oth-regensburg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox