From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f170.google.com (mail-we0-f170.google.com [74.125.82.170]) by kanga.kvack.org (Postfix) with ESMTP id 3E2A86B0035 for ; Mon, 28 Apr 2014 05:25:51 -0400 (EDT) Received: by mail-we0-f170.google.com with SMTP id w61so6143324wes.29 for ; Mon, 28 Apr 2014 02:25:50 -0700 (PDT) Received: from casper.infradead.org (casper.infradead.org. [2001:770:15f::2]) by mx.google.com with ESMTPS id xw5si2679559wjc.12.2014.04.28.02.25.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Apr 2014 02:25:49 -0700 (PDT) Date: Mon, 28 Apr 2014 11:25:40 +0200 From: Peter Zijlstra Subject: Re: Dirty/Access bits vs. page content Message-ID: <20140428092540.GO11096@twins.programming.kicks-ass.net> References: <5359CD7C.5020604@zytor.com> <20140425135101.GE11096@twins.programming.kicks-ass.net> <20140426180711.GM26782@laptop.programming.kicks-ass.net> <20140427072034.GC1429@laptop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , "H. Peter Anvin" , Benjamin Herrenschmidt , Jan Kara , Dave Hansen , "linux-arch@vger.kernel.org" , linux-mm , Russell King - ARM Linux , Tony Luck On Sun, Apr 27, 2014 at 01:09:54PM -0700, Hugh Dickins wrote: > On Sun, 27 Apr 2014, Hugh Dickins wrote: > > > > But woke with a panic attack that we have overlooked the question > > of how page reclaim's page_mapped() checks are serialized. > > Perhaps this concern will evaporate with the morning dew, > > perhaps it will not... > > It was a real concern, but we happen to be rescued by the innocuous- > looking is_page_cache_freeable() check at the beginning of pageout(): > which will deserve its own comment, but that can follow later. > > My concern was with page reclaim's shrink_page_list() racing against > munmap's or exit's (or madvise's) zap_pte_range() unmapping the page. > > Once zap_pte_range() has cleared the pte from a vma, neither > try_to_unmap() nor page_mkclean() will see that vma as containing > the page, so neither will do its own flush TLB of the cpus involved, > before proceeding to writepage. > > Linus's patch (serialializing with ptlock) or my patch (serializing > with i_mmap_mutex) both almost fix that, but it seemed not entirely: > because try_to_unmap() is only called when page_mapped(), and > page_mkclean() quits early without taking locks when !page_mapped(). Argh!! very good spotting that. > So in the interval when zap_pte_range() has brought page_mapcount() > down to 0, but not yet flushed TLB on all mapping cpus, it looked as > if we still had a problem - neither try_to_unmap() nor page_mkclean() > would take the lock either of us rely upon for serialization. > > But pageout()'s preliminary is_page_cache_freeable() check makes > it safe in the end: although page_mapcount() has gone down to 0, > page_count() remains raised until the free_pages_and_swap_cache() > after the TLB flush. > > So I now believe we're safe after all with either patch, and happy > for Linus to go ahead with his. OK, so I'm just not seeing that atm. Will have another peek later, hopefully when more fully awake. > Peter, returning at last to your question of whether we could exempt > shmem from the added overhead of either patch. Until just now I > thought not, because of the possibility that the shmem_writepage() > could occur while one of the mm's cpus remote from zap_pte_range() > cpu was still modifying the page. But now that I see the role > played by is_page_cache_freeable(), and of course the zapping end > has never dropped its reference on the page before the TLB flush, > however late that occurred, hmmm, maybe yes, shmem can be exempted. > > But I'd prefer to dwell on that a bit longer: we can add that as > an optimization later if it holds up to scrutiny. For sure.. No need to rush that. And if a (performance) regression shows up in the meantime, we immediately have a good test case too :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org