From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: akpm@linux-foundation.org, dan.j.williams@intel.com,
mhocko@suse.com, keith.busch@intel.com,
kirill.shutemov@linux.intel.com,
alexander.h.duyck@linux.intel.com, ira.weiny@intel.com,
andreyknvl@google.com, arunks@codeaurora.org, vbabka@suse.cz,
cl@linux.com, riel@surriel.com, keescook@chromium.org,
hannes@cmpxchg.org, npiggin@gmail.com,
mathieu.desnoyers@efficios.com, shakeelb@google.com, guro@fb.com,
aarcange@redhat.com, hughd@google.com, jglisse@redhat.com,
mgorman@techsingularity.net, daniel.m.jordan@oracle.com,
jannh@google.com, kilobyte@angband.pl, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 0/7] mm: process_vm_mmap() -- syscall for duplication a process mapping
Date: Tue, 28 May 2019 19:15:24 +0300 [thread overview]
Message-ID: <20190528161524.tn5sqzhmhgyuwrmy@box> (raw)
In-Reply-To: <de6e4e89-66ac-da2f-48a6-4d98a728687a@virtuozzo.com>
On Tue, May 28, 2019 at 12:15:16PM +0300, Kirill Tkhai wrote:
> On 28.05.2019 02:30, Kirill A. Shutemov wrote:
> > On Fri, May 24, 2019 at 05:00:32PM +0300, Kirill Tkhai wrote:
> >> On 24.05.2019 14:52, Kirill A. Shutemov wrote:
> >>> On Fri, May 24, 2019 at 01:45:50PM +0300, Kirill Tkhai wrote:
> >>>> On 22.05.2019 18:22, Kirill A. Shutemov wrote:
> >>>>> On Mon, May 20, 2019 at 05:00:01PM +0300, Kirill Tkhai wrote:
> >>>>>> This patchset adds a new syscall, which makes possible
> >>>>>> to clone a VMA from a process to current process.
> >>>>>> The syscall supplements the functionality provided
> >>>>>> by process_vm_writev() and process_vm_readv() syscalls,
> >>>>>> and it may be useful in many situation.
> >>>>>
> >>>>> Kirill, could you explain how the change affects rmap and how it is safe.
> >>>>>
> >>>>> My concern is that the patchset allows to map the same page multiple times
> >>>>> within one process or even map page allocated by child to the parrent.
> >>>>>
> >>>>> It was not allowed before.
> >>>>>
> >>>>> In the best case it makes reasoning about rmap substantially more difficult.
> >>>>>
> >>>>> But I'm worry it will introduce hard-to-debug bugs, like described in
> >>>>> https://lwn.net/Articles/383162/.
> >>>>
> >>>> Andy suggested to unmap PTEs from source page table, and this make the single
> >>>> page never be mapped in the same process twice. This is OK for my use case,
> >>>> and here we will just do a small step "allow to inherit VMA by a child process",
> >>>> which we didn't have before this. If someone still needs to continue the work
> >>>> to allow the same page be mapped twice in a single process in the future, this
> >>>> person will have a supported basis we do in this small step. I believe, someone
> >>>> like debugger may want to have this to make a fast snapshot of a process private
> >>>> memory (when the task is stopped for a small time to get its memory). But for
> >>>> me remapping is enough at the moment.
> >>>>
> >>>> What do you think about this?
> >>>
> >>> I don't think that unmapping alone will do. Consider the following
> >>> scenario:
> >>>
> >>> 1. Task A creates and populates the mapping.
> >>> 2. Task A forks. We have now Task B mapping the same pages, but
> >>> write-protected.
> >>> 3. Task B calls process_vm_mmap() and passes the mapping to the parent.
> >>>
> >>> After this Task A will have the same anon pages mapped twice.
> >>
> >> Ah, sure.
> >>
> >>> One possible way out would be to force CoW on all pages in the mapping,
> >>> before passing the mapping to the new process.
> >>
> >> This will pop all swapped pages up, which is the thing the patchset aims
> >> to prevent.
> >>
> >> Hm, what about allow remapping only VMA, which anon_vma::rb_root contain
> >> only chain and which vma->anon_vma_chain contains single entry? This is
> >> a vma, which were faulted, but its mm never were duplicated (or which
> >> forks already died).
> >
> > The requirement for the VMA to be faulted (have any pages mapped) looks
> > excessive to me, but the general idea may work.
> >
> > One issue I see is that userspace may not have full control to create such
> > VMA. vma_merge() can merge the VMA to the next one without any consent
> > from userspace and you'll get anon_vma inherited from the VMA you've
> > justed merged with.
> >
> > I don't have any valid idea on how to get around this.
>
> Technically it is possible by creating boundary 1-page VMAs with another protection:
> one above and one below the desired region, then map the desired mapping. But this
> is not comfortable.
>
> I don't think it's difficult to find a natural limitation, which prevents mapping
> a single page twice if we want to avoid this at least on start. Another suggestion:
>
> prohibit to map a remote process's VMA only in case of its vm_area_struct::anon_vma::root
> is the same as root of one of local process's VMA.
>
> What about this?
I don't see anything immediately wrong with this, but it's still going to
produce puzzling errors for a user. How would you document such limitation
in the way it makes sense for userspace developer?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2019-05-28 16:15 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-20 14:00 Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 1/7] mm: Add process_vm_mmap() syscall declaration Kirill Tkhai
2019-05-21 0:28 ` Ira Weiny
2019-05-21 8:29 ` Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 2/7] mm: Extend copy_vma() Kirill Tkhai
2019-05-21 8:18 ` Kirill A. Shutemov
2019-05-21 8:48 ` Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 3/7] mm: Extend copy_page_range() Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 4/7] mm: Export round_hint_to_min() Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 5/7] mm: Introduce may_mmap_overlapped_region() helper Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 6/7] mm: Introduce find_vma_filter_flags() helper Kirill Tkhai
2019-05-20 14:00 ` [PATCH v2 7/7] mm: Add process_vm_mmap() Kirill Tkhai
2019-05-21 14:43 ` [PATCH v2 0/7] mm: process_vm_mmap() -- syscall for duplication a process mapping Andy Lutomirski
2019-05-21 15:52 ` Kirill Tkhai
2019-05-21 15:59 ` Kirill Tkhai
2019-05-21 16:20 ` Jann Horn
2019-05-21 17:03 ` Kirill Tkhai
2019-05-21 17:28 ` Jann Horn
2019-05-22 10:03 ` Kirill Tkhai
2019-05-21 16:43 ` Andy Lutomirski
2019-05-21 17:44 ` Kirill Tkhai
2019-05-23 16:19 ` Andy Lutomirski
2019-05-24 10:36 ` Kirill Tkhai
2019-05-22 15:22 ` Kirill A. Shutemov
2019-05-23 16:11 ` Kirill Tkhai
2019-05-24 10:45 ` Kirill Tkhai
2019-05-24 11:52 ` Kirill A. Shutemov
2019-05-24 14:00 ` Kirill Tkhai
2019-05-27 23:30 ` Kirill A. Shutemov
2019-05-28 9:15 ` Kirill Tkhai
2019-05-28 16:15 ` Kirill A. Shutemov [this message]
2019-05-29 14:33 ` Kirill Tkhai
2019-06-03 14:38 ` Kirill Tkhai
2019-06-03 14:56 ` Kirill Tkhai
2019-06-03 17:47 ` Kirill A. Shutemov
2019-06-04 9:32 ` Kirill Tkhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190528161524.tn5sqzhmhgyuwrmy@box \
--to=kirill@shutemov.name \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=andreyknvl@google.com \
--cc=arunks@codeaurora.org \
--cc=cl@linux.com \
--cc=dan.j.williams@intel.com \
--cc=daniel.m.jordan@oracle.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=ira.weiny@intel.com \
--cc=jannh@google.com \
--cc=jglisse@redhat.com \
--cc=keescook@chromium.org \
--cc=keith.busch@intel.com \
--cc=kilobyte@angband.pl \
--cc=kirill.shutemov@linux.intel.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=npiggin@gmail.com \
--cc=riel@surriel.com \
--cc=shakeelb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox