From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id D73CF6B004F for ; Tue, 4 Aug 2009 14:58:18 -0400 (EDT) Message-ID: <4A788CBE.7080100@redhat.com> Date: Tue, 04 Aug 2009 22:32:14 +0300 From: Izik Eidus MIME-Version: 1.0 Subject: Re: [PATCH 9/12] ksm: fix oom deadlock References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Hugh Dickins Cc: Andrea Arcangeli , Rik van Riel , Chris Wright , Nick Piggin , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org List-ID: Hugh Dickins wrote: > There's a now-obvious deadlock in KSM's out-of-memory handling: > imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex, > trying to allocate a page to break KSM in an mm which becomes the > OOM victim (quite likely in the unmerge case): it's killed and goes > to exit, and hangs there waiting to acquire ksm_thread_mutex. > > Clearly we must not require ksm_thread_mutex in __ksm_exit, simple > though that made everything else: perhaps use mmap_sem somehow? > And part of the answer lies in the comments on unmerge_ksm_pages: > __ksm_exit should also leave all the rmap_item removal to ksmd. > > But there's a fundamental problem, that KSM relies upon mmap_sem to > guarantee the consistency of the mm it's dealing with, yet exit_mmap > tears down an mm without taking mmap_sem. And bumping mm_users won't > help at all, that just ensures that the pages the OOM killer assumes > are on their way to being freed will not be freed. > > The best answer seems to be, to move the ksm_exit callout from just > before exit_mmap, to the middle of exit_mmap: after the mm's pages > have been freed (if the mmu_gather is flushed), but before its page > tables and vma structures have been freed; and down_write,up_write > mmap_sem there to serialize with KSM's own reliance on mmap_sem. > > But KSM then needs to be careful, whenever it downs mmap_sem, to > check that the mm is not already exiting: there's a danger of using > find_vma on a layout that's being torn apart, or writing into page > tables which have been freed for reuse; and even do_anonymous_page > and __do_fault need to check they're not being called by break_ksm > to reinstate a pte after zap_pte_range has zapped that page table. > > Though it might be clearer to add an exiting flag, set while holding > mmap_sem in __ksm_exit, that wouldn't cover the issue of reinstating > a zapped pte. All we need is to check whether mm_users is 0 - but > must remember that ksmd may detect that before __ksm_exit is reached. > So, ksm_test_exit(mm) added to comment such checks on mm->mm_users. > > __ksm_exit now has to leave clearing up the rmap_items to ksmd, > that needs ksm_thread_mutex; but shift the exiting mm just after the > ksm_scan cursor so that it will soon be dealt with. __ksm_enter raise > mm_count to hold the mm_struct, ksmd's exit processing (exactly like > its processing when it finds all VM_MERGEABLEs unmapped) mmdrop it, > similar procedure for KSM_RUN_UNMERGE (which has stopped ksmd). > > But also give __ksm_exit a fast path: when there's no complication > (no rmap_items attached to mm and it's not at the ksm_scan cursor), > it can safely do all the exiting work itself. This is not just an > optimization: when ksmd is not running, the raised mm_count would > otherwise leak mm_structs. > > Signed-off-by: Hugh Dickins > --- > Acked-by: Izik Eidus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org