From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sun, 27 Apr 2008 02:20:19 +0200 From: Andrea Arcangeli Subject: Re: mmu notifier #v14 Message-ID: <20080427002019.GL9514@duo.random> References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48137B8B.7010202@us.ibm.com> Sender: owner-linux-mm@kvack.org Return-Path: To: Anthony Liguori Cc: Rusty Russell , Robin Holt , Jack Steiner , Christoph Lameter , akpm@linux-foundation.org, Nick Piggin , Steve Wise , Peter Zijlstra , linux-mm@kvack.org, Kanoj Sarcar , Roland Dreier , linux-kernel@vger.kernel.org, Avi Kivity , kvm-devel@lists.sourceforge.net, general@lists.openfabrics.org, Hugh Dickins , Chris Wright , Marcelo Tosatti List-ID: On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote: >> +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) >> +{ >> + struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> >> PAGE_SHIFT); >> + get_page(page); >> > > You should not assume a struct page exists for any given spte. Instead, use > kvm_get_pfn() and kvm_release_pfn_clean(). Last email from muli@ibm in my inbox argues it's useless to build rmap on mmio regions, so the above is more efficient so put_page runs directly on the page without going back and forth between spte -> pfn -> page -> pfn -> page in a single function. Certainly if we start building rmap on mmio regions we'll have to change that. > Perhaps I just have a weak stomach but I am uneasy having a function that > takes a lock on exit. I walked through the logic and it doesn't appear to > be wrong but it also is pretty clear that you could defer the acquisition > of the lock to the caller (in this case, kvm_mmu_pte_write) by moving the > update_pte assignment into kvm_mmu_pte_write. I agree out_lock is an uncommon exit path, the problem is that the code was buggy, and I tried to fix it with the smallest possible change and that resulting in an out_lock. That section likely need a refactoring, all those update_pte fields should be at least returned by the function guess_.... but I tried to reduce the changes to make the issue more readable, I didn't want to rewrite certain functions just to take a spinlock a few instructions ahead. > Worst case, you pass 4 more pointer arguments here and, take the spin lock, > and then depending on the result of mmu_guess_page_from_pte_write, update > vcpu->arch.update_pte. Yes that was my same idea, but that's left for a later patch. Fixing this bug mixed with the mmu notifier patch was perhaps excessive already ;). > Why move the destruction of the vm to the MMU notifier unregister hook? > Does anything else ever call mmu_notifier_unregister that would implicitly > destroy the VM? mmu notifier ->release can run at anytime before the filehandle is closed. ->release has to zap all sptes and freeze the mmu (hence all vcpus) to prevent any further page fault. After ->release returns all pages are freed (we'll never relay on the page pin to avoid the rmap_remove put_page to be a relevant unpin event). So the idea is that I wanted to maintain the same ordering of the current code in the vm destroy event, I didn't want to leave a partially shutdown VM on the vmlist. If the ordering is entirely irrelevant and the kvm_arch_destroy_vm can run well before kvm_destroy_vm is called, then I can avoid changes to kvm_main.c but I doubt. I've done it in a way that archs not needing mmu notifiers like s390 can simply add the kvm_destroy_common_vm at the top of their kvm_arch_destroy_vm. All others using mmu_notifiers have to invoke kvm_destroy_common_vm in the ->release of the mmu notifiers. This will ensure that everything will be ok regardless if exit_mmap is called before/after exit_files, and it won't make a whole lot of difference anymore, if the driver fd is pinned through vmas->vm_file released in exit_mmap or through the task filedescriptors relased in exit_files etc... Infact this allows to call mmu_notifier_unregister at anytime later after the task has already been killed, without any trouble (like if the mmu notifier owner isn't registering in current->mm but some other tasks mm). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org