From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 85AC5681021 for ; Fri, 17 Feb 2017 06:11:28 -0500 (EST) Received: by mail-pg0-f69.google.com with SMTP id z67so58057111pgb.0 for ; Fri, 17 Feb 2017 03:11:28 -0800 (PST) Received: from www262.sakura.ne.jp (www262.sakura.ne.jp. [2001:e42:101:1:202:181:97:72]) by mx.google.com with ESMTPS id o5si9944512pgk.410.2017.02.17.03.11.27 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 17 Feb 2017 03:11:27 -0800 (PST) Subject: Re: [Bug 192981] New: page allocation stalls References: <20170123135111.13ac3e47110de10a4bd503ef@linux-foundation.org> <8f450abd-4e05-92d3-2533-72b05fea2012@beget.ru> <20170215160538.GA62565@bfoster.bfoster> <20170215180859.GB62565@bfoster.bfoster> <07ee50bc-8220-dda8-07f9-369758603df9@beget.ru> <20170216172034.GC11750@bfoster.bfoster> <20170216222129.GB15349@dastard> From: Tetsuo Handa Message-ID: <077aa22b-7d84-c1cc-3ae6-1d67f762d291@I-love.SAKURA.ne.jp> Date: Fri, 17 Feb 2017 20:11:09 +0900 MIME-Version: 1.0 In-Reply-To: <20170216222129.GB15349@dastard> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner , Brian Foster Cc: Alexander Polakov , linux-mm@kvack.org, linux-xfs@vger.kernel.org, bugzilla-daemon@bugzilla.kernel.org On 2017/02/17 7:21, Dave Chinner wrote: > FWIW, the major problem with removing the blocking in inode reclaim > is the ease with which you can then trigger the OOM killer from > userspace. The high level memory reclaim algorithms break down when > there are hundreds of direct reclaim processes hammering on reclaim > and reclaim stops making progress because it's skipping dirty > objects. Direct reclaim ends up insufficiently throttled, so rather > than blocking it winds up reclaim priority and then declares OOM > because reclaim runs out of retries before sufficient memory has > been freed. > > That, right now, looks to be an unsolvable problem without a major > rework of direct reclaim. I've pretty much given up on ever getting > the unbound direct reclaim concurrency problem that is causing us > these problems fixed, so we are left to handle it in the subsystem > shrinkers as best we can. That leaves us with an unfortunate choice: > > a) throttle excessive concurrency in the shrinker to prevent > IO breakdown, thereby causing reclaim latency bubbles > under load but having a stable, reliable system; or > b) optimise for minimal reclaim latency and risk userspace > memory demand triggering the OOM killer whenever there > are lots of dirty inodes in the system. > > Quite frankly, there's only one choice we can make in this > situation: reliability is always more important than performance. Is it possible to get rid of direct reclaim and let allocating thread wait on queue? I wished such change in context of __GFP_KILLABLE at http://lkml.kernel.org/r/201702012049.BAG95379.VJFFOHMStLQFOO@I-love.SAKURA.ne.jp . -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org