From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f169.google.com (mail-pf0-f169.google.com [209.85.192.169]) by kanga.kvack.org (Postfix) with ESMTP id 498616B0254 for ; Thu, 10 Mar 2016 11:34:48 -0500 (EST) Received: by mail-pf0-f169.google.com with SMTP id u190so43236969pfb.3 for ; Thu, 10 Mar 2016 08:34:48 -0800 (PST) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id ut6si6948452pac.241.2016.03.10.08.34.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Mar 2016 08:34:47 -0800 (PST) Date: Thu, 10 Mar 2016 17:34:39 +0100 From: Peter Zijlstra Subject: Re: [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast Message-ID: <20160310163439.GS6356@twins.programming.kicks-ass.net> References: <1456329561-4319-1-git-send-email-kirill.shutemov@linux.intel.com> <20160224185025.65711ed6@thinkpad> <20160225150744.GA19707@node.shutemov.name> <20160310161035.GD30716@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160310161035.GD30716@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Hugh Dickins , "Kirill A. Shutemov" , Gerald Schaefer , Steve Capper , Dann Frazier , Catalin Marinas , "Kirill A. Shutemov" , Andrew Morton , linux-mm@kvack.org, Martin Schwidefsky On Thu, Mar 10, 2016 at 05:10:35PM +0100, Andrea Arcangeli wrote: > On Thu, Feb 25, 2016 at 10:50:14PM -0800, Hugh Dickins wrote: > > It's a useful suggestion from Gerald, and your THP rework may have > > brought us closer to being able to rely on RCU locking rather than > > IRQ disablement there; but you were right just to delete the comment, > > there are other reasons why fast GUP still depends on IRQs disabled. > > > > For example, see the fallback tlb_remove_table_one() in mm/memory.c: > > that one uses smp_call_function() sending IPI to all CPUs concerned, > > without waiting an RCU grace period at all. > > I full agree, the refcounting change just drops the THP splitting from > the equation, but everything else remains. It's not like x86 is using > RCU for gup_fast when CONFIG_TRANSPARENT_HUGEPAGE=n. > > The main issue Peter also pointed out is how it can be faster to wait > a RCU grace period than sending an IPI to only the CPU that have an > active_mm matching the one the page belongs to Typically RCU (sched) grace periods take a relative 'forever' compared to sending IPIs. That is, synchronize_sched() is typically slower. But, on the upside, not sending IPIs will not perturb those other CPUs, which is something HPC/RT people like. > and I'm not exactly > sure the cost of disabling irqs in gup_fast is going to pay off. Entirely depends on the workload of course, but you can do a lot of gup_fast compared to munmap()s. So making gup_fast, faster, seems like a potential win. Also, is anybody really interested in munmap() performance? > It's > not just swap, large munmap should be able to free up pagetables or > pagetables would get a footprint out of proportion with the Rss of the > process, and in turn it'll have to either block synchronously for long > before returning to userland, or return to userland when the pagetable > memory is still not free, and userland may mmap again and munmap again > in a loop and being legit doing so too, with unclear side effects with > regard to false positive OOM. I'm not seeing that, the only point where this matters at all, is if the batch alloc fails, otherwise the RCU_TABLE_FREE stuff uses call_rcu_sched() and what you write above is true already. Now, RCU already has an oom_notifier to push work harder if we approach that. > Then there's another issue with synchronize_sched(), > __get_user_pages_fast has to safe to run from irq (note the > local_irq_save instead of local_irq_disable) and KVM leverages it. This is unchanged. synchronize_sched() serialized against anything that disables preemption, having IRQs disabled is very much included in that. So there should be no problem running this from IRQ context. > KVM > just requires it to be atomic so it can run from inside a preempt > disabled section (i.e. inside a spinlock), I'm fairly certain the > irq-safe guarantee could be dropped without pain and > rcu_read_lock_sched() would be enough, but the documentation of the > IRQ-safe guarantees provided by __get_user_pages_fast should be also > altered if we were to use synchronize_sched() and that's a symbol > exported to GPL modules too. No changes needed. > Overall my main concern in switching x86 to RCU gup-fast is the > performance of synchronize_sched in large munmap pagetable teardown. Normally, as already established by Martin, you should not actually ever encounter the sync_sched() call. Only under severe memory pressure, when the batch alloc in tlb_remove_table() fails is this ever an issue. And at the point where such allocations fail, performance typically isn't a concern anymore. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org