From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id 04A516B0032 for ; Tue, 17 Feb 2015 11:33:20 -0500 (EST) Received: by mail-wi0-f172.google.com with SMTP id l15so34835505wiw.5 for ; Tue, 17 Feb 2015 08:33:18 -0800 (PST) Received: from mail-wi0-x234.google.com (mail-wi0-x234.google.com. [2a00:1450:400c:c05::234]) by mx.google.com with ESMTPS id na14si26568545wic.105.2015.02.17.08.33.17 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Feb 2015 08:33:17 -0800 (PST) Received: by mail-wi0-f180.google.com with SMTP id h11so35192135wiw.1 for ; Tue, 17 Feb 2015 08:33:16 -0800 (PST) Date: Tue, 17 Feb 2015 17:33:14 +0100 From: Michal Hocko Subject: Re: How to handle TIF_MEMDIE stalls? Message-ID: <20150217163314.GH32017@dhcp22.suse.cz> References: <20141220223504.GI15665@dastard> <201412211745.ECD69212.LQOFHtFOJMSOFV@I-love.SAKURA.ne.jp> <20141229181937.GE32618@dhcp22.suse.cz> <201412301542.JEC35987.FFJFOOQtHLSMVO@I-love.SAKURA.ne.jp> <20141230112158.GA15546@dhcp22.suse.cz> <201502162023.GGE26089.tJOOFQMFFHLOVS@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201502162023.GGE26089.tJOOFQMFFHLOVS@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: david@fromorbit.com, dchinner@redhat.com, linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com, akpm@linux-foundation.org, mgorman@suse.de, hannes@cmpxchg.org, torvalds@linux-foundation.org On Mon 16-02-15 20:23:16, Tetsuo Handa wrote: [...] > (1) Make several locks killable. > > On Linux 3.19, running below command line as an unprivileged user > on a system with 4 CPUs / 2GB RAM / no swap can make the system unusable. > > $ for i in `seq 1 100`; do dd if=/dev/zero of=/tmp/file bs=104857600 count=100 & done > [...] > This is because the OOM killer happily tries to kill a process which is > blocked at unkillable mutex_lock(). If locks shown above were killable, > we can reduce the possibility of getting stuck. > > I didn't check whether it has livelocked or not. But too slow to wait is > not acceptable. Well, you are beating your machine to death so you can hardly get any time guarantee. It would be nice to have a better feedback mechanism to know when to back off and fail the allocation attempt which might be blocking OOM victim to pass away. This is extremely tricky because we shouldn't be too eager to fail just because of a sudden memory pressure. > Oh, why every thread trying to allocate memory has to repeat > the loop that might defer somebody who can make progress if CPU time was > given? I guess you are talking about direct reclaim and the whole priority loop? Well, this is what I was talking above. Sometimes we really have to go down to low priorities and basically scan the world in order to find something reclaimable. If we bail out too early we might see pre mature allocation failures and which could lead to reduced QoS. > I wish only somebody like kswapd repeats the loop on behalf of all > threads waiting at memory allocation slowpath... This is the case when the kswapd is _able_ to cope with the memory pressure. [...] > (3) Replace kmalloc() with kmalloc_nofail() and kmalloc_noretry(). > > Currently small allocations are implicitly treated like __GFP_NOFAIL > unless TIF_MEMDIE is set. But silently changing small allocations like > __GFP_NORETRY will cause obscure bugs. If TIF_MEMDIE timeout is implemented, > we will no longer worry about unkillable tasks which is retrying forever at > memory allocation; instead we kill more OOM victims and satisfy the request. I think this is a bad approach. GFP_KERNEL != __GFP_NORETRY and we should treat it like that. Killing more victims is a bad solution because it doesn't guarantee any progress (just look at your example of hundreds processes with large RSS hammering the same file - you would have to kill all of them at once). Besides that any timeout solution is prone to unexpected delays due to reasons which are not related to the allocation latency. > Therefore, we could introduce kmalloc_nofail(size, gfp) which does > kmalloc(size, gfp | __GFP_NOFAIL) (i.e. invoke the OOM killer) and > kmalloc_noretry(size, gfp) which does kmalloc(size, gfp | __GFP_NORETRY) > (i.e. do not invoke the OOM killer), and switch from kmalloc() to either > kmalloc_noretry() or kmalloc_nofail(). This sounds like a major overkill. We already have gfp flags for that. What would this buy us? > Those who are doing smaller than > PAGE_SIZE bytes allocations would wish to switch from kmalloc() to > kmalloc_nofail() and eliminate untested memory allocation failure paths. nofail allocations should be discouraged and used only if any other measure would fail. > Those who are well prepared for memory allocation failures would wish to > switch from kmalloc() to kmalloc_noretry(). Eventually, kmalloc() which is > implicitly treating small allocations like __GFP_NOFAIL and invoking the > OOM killer will be abolished. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org