From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id 6B0D66B002D for ; Mon, 14 Nov 2011 08:57:15 -0500 (EST) Received: by ywp17 with SMTP id 17so3551998ywp.14 for ; Mon, 14 Nov 2011 05:57:13 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1321179449-6675-1-git-send-email-gilad@benyossef.com> <1321179449-6675-5-git-send-email-gilad@benyossef.com> Date: Mon, 14 Nov 2011 15:57:13 +0200 Message-ID: Subject: Re: [PATCH v3 4/5] slub: Only IPI CPUs that have per cpu obj to flush From: Gilad Ben-Yossef Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Hillf Danton Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Frederic Weisbecker , Russell King , linux-mm@kvack.org, Christoph Lameter , Pekka Enberg On Mon, Nov 14, 2011 at 3:19 PM, Hillf Danton wrote: > On Sun, Nov 13, 2011 at 10:57 PM, Gilad Ben-Yossef = wrote: ... >>> Perhaps, the technique of local_cpu_mask defined in kernel/sched_rt.c >>> could be used to replace the above atomic allocation. >>> >> >> Thank you for taking the time to review my patch :-) >> >> That is indeed the direction I went with inthe previous iteration of >> this patch, with the small change that because of observing that the >> allocation will only actually occurs for CPUMASK_OFFSTACK=3Dy which by >> definition are systems with lots and lots of CPUs and, it is actually >> better to allocate the cpumask per kmem_cache rather then per CPU, >> since on system where it matters we are bound to have more CPUs (e.g. >> 4096) then kmem_caches (~160). See >> https://lkml.org/lkml/2011/10/23/151. >> >> I then went a head and further=A0optimized=A0the code to only=A0incur=A0= the >> memory overhead of allocating those cpumasks for CPUMASK_OFFSTACK=3Dy >> systems. See https://lkml.org/lkml/2011/10/23/152. >> >> As you can see from the discussion that=A0evolved, there seems to be an >> agreement that the code complexity overhead involved is simply not >> worth it for what is, unlike sched_rt, a rather esoteric case and one >> where allocation failure is easily dealt with. >> > Even with the introduced overhead of allocation, IPIs could not go down > as much as we wish, right? > My apologies, but I don't think I follow you through - If processor A needs processor B to do something, an IPI is the right thing to do. Let's call them useful IPIs. What I am trying to tackle is the places where processor B doesn't really have anything to do and processor A is simply blindly sending IPIs to the whole system. I call them useless IPIs. I don't see a reason why *useless* IPIs can go to zero, or very close to that. Useful IPIs are fine :-) Thanks, Gilad --=20 Gilad Ben-Yossef Chief Coffee Drinker gilad@benyossef.com Israel Cell: +972-52-8260388 US Cell: +1-973-8260388 http://benyossef.com "Unfortunately, cache misses are an equal opportunity pain provider." -- Mike Galbraith, LKML -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org