From: Andrea Arcangeli <andrea@qumranet.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>, Robin Holt <holt@sgi.com>,
Avi Kivity <avi@qumranet.com>, Izik Eidus <izike@qumranet.com>,
kvm-devel@lists.sourceforge.net,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
general@lists.openfabrics.org,
Steve Wise <swise@opengridcomputing.com>,
Roland Dreier <rdreier@cisco.com>,
Kanoj Sarcar <kanojsarcar@yahoo.com>,
steiner@sgi.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, daniel.blueman@quadrics.com,
Nick Piggin <npiggin@suse.de>
Subject: Re: EMM: disable other notifiers before register and unregister
Date: Thu, 3 Apr 2008 17:29:36 +0200 [thread overview]
Message-ID: <20080403151908.GB9603@duo.random> (raw)
In-Reply-To: <Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
On Wed, Apr 02, 2008 at 06:24:15PM -0700, Christoph Lameter wrote:
> Ok lets forget about the single theaded thing to solve the registration
> races. As Andrea pointed out this still has ssues with other subscribed
> subsystems (and also try_to_unmap). We could do something like what
> stop_machine_run does: First disable all running subsystems before
> registering a new one.
>
> Maybe this is a possible solution.
It still doesn't solve this kernel crash.
CPU0 CPU1
range_start (mmu notifier chain is empty)
range_start returns
mmu_notifier_register
kvm_emm_stop (how kvm can ever know
the other cpu is in the middle of the critical section?)
kvm page fault (kvm thinks mmu_notifier_register serialized)
zap ptes
free_page mapped by spte/GRU and not pinned -> crash
There's no way the lowlevel can stop mmu_notifier_register and if
mmu_notifier_register returns, then sptes will be instantiated and
it'll corrupt memory the same way.
The seqlock was fine, what is wrong is the assumption that we can let
the lowlevel driver handle a range_end happening without range_begin
before it. The problem is that by design the lowlevel can't handle a
range_end happening without a range_begin before it. This is the core
kernel crashing problem we have (it's a kernel crashing issue only for
drivers that don't pin the pages, so XPMEM wouldn't crash but still it
would leak memory, which is a more graceful failure than random mm
corruption).
The basic trouble is that sometime range_begin/end critical sections
run outside the mmap_sem (see try_to_unmap_cluster in #v10 or even
try_to_unmap_one only in EMM-V2).
My attempt to fix this once and for all is to walk all vmas of the
"mm" inside mmu_notifier_register and take all anon_vma locks and
i_mmap_locks in virtual address order in a row. It's ok to take those
inside the mmap_sem. Supposedly if anybody will ever take a double
lock it'll do in order too. Then I can dump all the other locking and
remove the seqlock, and the driver is guaranteed there will be a
single call of range_begin followed by a single call of range_end the
whole time and no race could ever happen, and there won't be replied
calls of range_begin that would screwup a recursive semaphore
locking. The patch won't be pretty, I guess I'll vmalloc an array of
pointers to locks to reorder them. It doesn't need to be fast. Also
the locks can't go away from under us while we hold the
down_write(mmap_sem) because the vmas can be altered only with
down_write(mmap_sem) (modulo vm_start/vm_end that can be modified with
only down_read(mmap_sem) + page_table_lock like in growsdown page
faults). So it should be ok to take all those locks inside the
mmap_sem and implement a lock_vm(mm) unlock_vm(mm). I'll think more
about this hammer approach while I try to implement it...
next prev parent reply other threads:[~2008-04-03 15:29 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-01 20:55 [ofa-general] [patch 0/9] [RFC] EMM Notifier V2 Christoph Lameter
2008-04-01 20:55 ` [patch 1/9] EMM Notifier: The notifier calls Christoph Lameter
2008-04-02 6:49 ` [ofa-general] " Andrea Arcangeli
2008-04-02 10:59 ` Robin Holt
2008-04-02 11:16 ` Andrea Arcangeli
2008-04-02 14:26 ` Robin Holt
2008-04-02 17:59 ` [ofa-general] " Christoph Lameter
2008-04-02 19:03 ` [ofa-general] EMM: Fixup return value handling of emm_notify() Christoph Lameter
2008-04-02 21:25 ` [ofa-general] " Andrea Arcangeli
2008-04-02 21:33 ` Christoph Lameter
2008-04-03 10:40 ` Peter Zijlstra
2008-04-03 15:00 ` Andrea Arcangeli
2008-04-03 19:14 ` Christoph Lameter
2008-04-02 21:05 ` [ofa-general] EMM: Require single threadedness for registration Christoph Lameter
2008-04-02 22:01 ` [ofa-general] " Andrea Arcangeli
2008-04-02 22:06 ` Christoph Lameter
2008-04-02 22:17 ` Andrea Arcangeli
2008-04-02 22:41 ` Christoph Lameter
2008-04-03 1:24 ` EMM: disable other notifiers before register and unregister Christoph Lameter
2008-04-03 10:40 ` [ofa-general] " Peter Zijlstra
2008-04-03 15:29 ` Andrea Arcangeli [this message]
2008-04-03 19:20 ` Christoph Lameter
2008-04-03 20:23 ` Christoph Lameter
2008-04-04 12:30 ` Andrea Arcangeli
2008-04-04 20:20 ` [ofa-general] [PATCH] mmu notifier #v11 Andrea Arcangeli
2008-04-04 22:06 ` Christoph Lameter
2008-04-05 0:23 ` [ofa-general] " Andrea Arcangeli
2008-04-07 5:45 ` Christoph Lameter
2008-04-07 6:02 ` [ofa-general] " Andrea Arcangeli
2008-04-02 21:53 ` [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls Andrea Arcangeli
2008-04-02 21:54 ` Christoph Lameter
2008-04-02 22:09 ` Andrea Arcangeli
2008-04-02 23:04 ` Christoph Lameter
2008-04-01 20:55 ` [ofa-general] [patch 2/9] Move tlb flushing into free_pgtables Christoph Lameter
2008-04-01 20:55 ` [ofa-general] [patch 3/9] Convert i_mmap_lock to i_mmap_sem Christoph Lameter
2008-04-01 20:55 ` [patch 4/9] Remove tlb pointer from the parameters of unmap vmas Christoph Lameter
2008-04-01 20:55 ` [patch 5/9] Convert anon_vma lock to rw_sem and refcount Christoph Lameter
2008-04-02 17:50 ` Andrea Arcangeli
2008-04-02 18:15 ` Christoph Lameter
2008-04-02 21:56 ` [ofa-general] " Andrea Arcangeli
2008-04-02 21:56 ` Christoph Lameter
2008-04-02 22:12 ` Andrea Arcangeli
2008-04-01 20:55 ` [patch 6/9] This patch exports zap_page_range as it is needed by XPMEM Christoph Lameter
2008-04-01 20:55 ` [patch 7/9] Locking rules for taking multiple mmap_sem locks Christoph Lameter
2008-04-01 20:55 ` [patch 8/9] XPMEM: The device driver Christoph Lameter
2008-04-01 20:55 ` [patch 9/9] XPMEM: Simple example Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080403151908.GB9603@duo.random \
--to=andrea@qumranet.com \
--cc=a.p.zijlstra@chello.nl \
--cc=avi@qumranet.com \
--cc=clameter@sgi.com \
--cc=daniel.blueman@quadrics.com \
--cc=general@lists.openfabrics.org \
--cc=holt@sgi.com \
--cc=hugh@veritas.com \
--cc=izike@qumranet.com \
--cc=kanojsarcar@yahoo.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=rdreier@cisco.com \
--cc=steiner@sgi.com \
--cc=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox