From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 19 Nov 2004 22:23:41 -0800 From: William Lee Irwin III Subject: Re: page fault scalability patch V11 [0/7]: overview Message-ID: <20041120062341.GM2714@holomorphy.com> References: <20041120020306.GA2714@holomorphy.com> <419EBBE0.4010303@yahoo.com.au> <20041120035510.GH2714@holomorphy.com> <419EC205.5030604@yahoo.com.au> <20041120042340.GJ2714@holomorphy.com> <419EC829.4040704@yahoo.com.au> <20041120053802.GL2714@holomorphy.com> <419EDB21.3070707@yahoo.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <419EDB21.3070707@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Linus Torvalds , Christoph Lameter , akpm@osdl.org, Benjamin Herrenschmidt , Hugh Dickins , linux-mm@kvack.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org List-ID: William Lee Irwin III wrote: >> There isn't anything left to explain. So if there's a question, be >> specific about it. On Sat, Nov 20, 2004 at 04:50:25PM +1100, Nick Piggin wrote: > Why am I very very wrong? Why won't touch_nmi_watchdog work from > the read loop? > And let's just be nice and try not to jump at the chance to point > out when people are very very wrong, and keep count of the times > they have been very very wrong. I'm trying to be constructive. touch_nmi_watchdog() is only "protection" against local interrupt disablement triggering the NMI oopser because alert_counter[] increments are not atomic. Yet even supposing they were made so, the net effect of "covering up" this gross deficiency is making the user-observable problems it causes undiagnosable, as noted before. William Lee Irwin III wrote: >> This entire line of argument is bogus. A preexisting bug of a similar >> nature is not grounds for deliberately introducing any bug. On Sat, Nov 20, 2004 at 04:50:25PM +1100, Nick Piggin wrote: > Sure, if that is a bug and someone is just about to fix it then > yes you're right, we shouldn't introduce this. I didn't realise > it was a bug. Sounds like it would be causing you lots of problems > though - have you looked at how to fix it? Kevin Marin was the first to report this issue to lkml. I had seen instances of it in internal corporate bugreports and it was one of the motivators for the work I did on pidhashing (one of the causes of the timeouts was worst cases in pid allocation). Manfred Spraul and myself wrote patches attempting to reduce read-side hold time in /proc/ algorithms, Ingo Molnar wrote patches to hierarchically subdivide the /proc/ iterations, and Dipankar Sarma and Maneesh Soni wrote patches to carry out the long iterations in /proc/ locklessly. The last several of these affecting /proc/ have not gained acceptance, though the work has not been halted in any sense, as this problem recurs quite regularly. A considerable amount of sustained effort has gone toward mitigating and resolving rwlock starvation. Aggravating the rwlock starvation destabilizes, not pessimizes, and performance is secondary to stability. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org