From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa0-f77.google.com (mail-oa0-f77.google.com [209.85.219.77]) by kanga.kvack.org (Postfix) with ESMTP id B3B856B0031 for ; Thu, 21 Nov 2013 13:23:44 -0500 (EST) Received: by mail-oa0-f77.google.com with SMTP id o6so2056oag.0 for ; Thu, 21 Nov 2013 10:23:44 -0800 (PST) Received: from psmtp.com ([74.125.245.105]) by mx.google.com with SMTP id hk1si14566694pbb.221.2013.11.20.08.07.20 for ; Wed, 20 Nov 2013 08:07:21 -0800 (PST) Date: Wed, 20 Nov 2013 11:07:12 -0500 From: Johannes Weiner Subject: Re: [patch] mm, vmscan: abort futile reclaim if we've been oom killed Message-ID: <20131120160712.GF3556@cmpxchg.org> References: <20131113152412.GH707@cmpxchg.org> <20131114000043.GK707@cmpxchg.org> <20131118164107.GC3556@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Andrew Morton , Mel Gorman , Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Mon, Nov 18, 2013 at 05:17:31PM -0800, David Rientjes wrote: > On Mon, 18 Nov 2013, Johannes Weiner wrote: > > > > Um, no, those processes are going through a repeated loop of direct > > > reclaim, calling the oom killer, iterating the tasklist, finding an > > > existing oom killed process that has yet to exit, and looping. They > > > wouldn't loop for too long if we can reduce the amount of time that it > > > takes for that oom killed process to exit. > > > > I'm not talking about the big loop in the page allocator. The victim > > is going through the same loop. This patch is about the victim being > > in a pointless direct reclaim cycle when it could be exiting, all I'm > > saying is that the other tasks doing direct reclaim at that moment > > should also be quitting and retrying the allocation. > > > > "All other tasks" would be defined as though sharing the same mempolicy > context as the oom kill victim or the same set of cpuset mems, I'm not > sure what type of method for determining reclaim eligiblity you're > proposing to avoid pointlessly spinning without making progress. Until an > alternative exists, my patch avoids the needless spinning and expedites > the exit, so I'll ask that it be merged. I laid this out in the second half of my email, which you apparently did not read: "If we have multi-second stalls in direct reclaim then it should be fixed for all direct reclaimers. The problem is not only OOM kill victims getting stuck, it's every direct reclaimer being stuck trying to do way too much work before retrying the allocation. Kswapd checks the system state after every priority cycle. Direct reclaim should probably do the same and retry the allocation after every priority cycle or every X pages scanned, where X is something reasonable and not "up to every LRU page in the system"." NAK to this incomplete drive-by fix. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org