From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: deadlock with latest xfs Date: Tue, 28 Oct 2008 19:56:35 +1100 References: <4900412A.2050802@sgi.com> <200810281702.17135.nickpiggin@yahoo.com.au> <20081028062524.GQ4985@disturbed> In-Reply-To: <20081028062524.GQ4985@disturbed> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200810281956.36199.nickpiggin@yahoo.com.au> Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Chinner Cc: Lachlan McIlroy , Christoph Hellwig , xfs-oss , linux-mm@kvack.org List-ID: On Tuesday 28 October 2008 17:25, Dave Chinner wrote: > On Tue, Oct 28, 2008 at 05:02:16PM +1100, Nick Piggin wrote: > > On Sunday 26 October 2008 13:50, Dave Chinner wrote: > > > [1] I don't see how any of the XFS changes we made make this easier to > > > hit. What I suspect is a VM regression w.r.t. memory reclaim because > > > this is the second problem since 2.6.26 that appears to be a result of > > > memory allocation failures in places that we've never, ever seen > > > failures before. > > > > > > The other new failure is this one: > > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=11805 > > > > > > which is an alloc_pages(GFP_KERNEL) failure.... > > > > > > mm-folk - care to weight in? > > > > order-0 alloc page GFP_KERNEL can fail sometimes. If it is called > > from reclaim or PF_MEMALLOC thread; if it is OOM-killed; fault > > injection. > > > > This is even the case for __GFP_NOFAIL allocations (which basically > > are buggy anyway). > > > > Not sure why it might have started happening, but I didn't see > > exactly which alloc_pages you are talking about? If it is via slab, > > then maybe some parameters have changed (eg. in SLUB) which is > > using higher order allocations. > > In fs/xfs/linux-2.6/xfs_buf.c::xfs_buf_get_noaddr(). It's doing a > single page allocation at a time. > > It may be that this failure is caused by an increase base memory > consumption of the kernel as this failure was reported in an lguest > and reproduced with a simple 'modprobe xfs ; mount /dev/xxx > /mnt/xfs' command. Maybe the lguest had very little memory available > to begin with and trying to allocate 2MB of pages for 8x256k log > buffers may have been too much for it... I suppose it could have been getting oom-killed, then failing the alloc, then oopsing on the way out, yes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org