From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-f181.google.com (mail-ve0-f181.google.com [209.85.128.181]) by kanga.kvack.org (Postfix) with ESMTP id 38FD76B0035 for ; Thu, 24 Apr 2014 15:45:38 -0400 (EDT) Received: by mail-ve0-f181.google.com with SMTP id oy12so3569864veb.12 for ; Thu, 24 Apr 2014 12:45:37 -0700 (PDT) Received: from mail-vc0-x22c.google.com (mail-vc0-x22c.google.com [2607:f8b0:400c:c03::22c]) by mx.google.com with ESMTPS id iy9si1186006vec.33.2014.04.24.12.45.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 24 Apr 2014 12:45:37 -0700 (PDT) Received: by mail-vc0-f172.google.com with SMTP id la4so3585520vcb.17 for ; Thu, 24 Apr 2014 12:45:37 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <53558507.9050703@zytor.com> <53559F48.8040808@intel.com> <20140422075459.GD11182@twins.programming.kicks-ass.net> <20140423184145.GH17824@quack.suse.cz> <20140424065133.GX26782@laptop.programming.kicks-ass.net> Date: Thu, 24 Apr 2014 12:45:36 -0700 Message-ID: Subject: Re: Dirty/Access bits vs. page content From: Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Peter Zijlstra , Jan Kara , Dave Hansen , "H. Peter Anvin" , Benjamin Herrenschmidt , "linux-arch@vger.kernel.org" , linux-mm , Russell King - ARM Linux , Tony Luck On Thu, Apr 24, 2014 at 11:40 AM, Hugh Dickins wrote: > safely with page_mkclean(), as it stands at present anyway. > > I think that (in the exceptional case when a shared file pte_dirty has > been encountered, and this mm is active on other cpus) zap_pte_range() > needs to flush TLB on other cpus of this mm, just before its > pte_unmap_unlock(): then it respects the usual page_mkclean() protocol. > > Or has that already been rejected earlier in the thread, > as too costly for some common case? Hmm. The problem is that right now we actually try very hard to batch as much as possible in order to avoid extra TLB flushes (we limit it to around 10k pages per batch, but that's still a *lot* of pages). The TLB flush IPI calls are noticeable under some loads. And it's certainly much too much to free 10k pages under a spinlock. The latencies would be horrendous. We could add some special logic that only triggers for the dirty pages case, but it would still have to handle the case of "we batched up 9000 clean pages, and then we hit a dirty page", so it would get rather costly quickly. Or we could have a separate array for dirty pages, and limit those to a much smaller number, and do just the dirty pages under the lock, and then the rest after releasing the lock. Again, a fair amount of new complexity. I would almost prefer to have some special (per-mapping?) lock or something, and make page_mkclean() be serialize with the unmapping case. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org