From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id 570566B0038 for ; Sun, 1 Mar 2015 15:44:23 -0500 (EST) Received: by widem10 with SMTP id em10so10102070wid.0 for ; Sun, 01 Mar 2015 12:44:22 -0800 (PST) Received: from gum.cmpxchg.org (gum.cmpxchg.org. [85.214.110.215]) by mx.google.com with ESMTPS id bq15si15331542wib.47.2015.03.01.12.44.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 01 Mar 2015 12:44:21 -0800 (PST) Date: Sun, 1 Mar 2015 15:44:12 -0500 From: Johannes Weiner Subject: Re: How to handle TIF_MEMDIE stalls? Message-ID: <20150301204412.GA8497@phnom.home.cmpxchg.org> References: <20150219102431.GA15569@phnom.home.cmpxchg.org> <20150219225217.GY12722@dastard> <20150221235227.GA25079@phnom.home.cmpxchg.org> <20150223004521.GK12722@dastard> <20150228162943.GA17989@phnom.home.cmpxchg.org> <20150228164158.GE5404@thunk.org> <20150228221558.GA23028@phnom.home.cmpxchg.org> <20150301134322.GA3287@thunk.org> <20150301161506.GA1854@phnom.home.cmpxchg.org> <20150301193635.GB3287@thunk.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150301193635.GB3287@thunk.org> Sender: owner-linux-mm@kvack.org List-ID: To: Theodore Ts'o Cc: Dave Chinner , Tetsuo Handa , mhocko@suse.cz, dchinner@redhat.com, linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com, akpm@linux-foundation.org, mgorman@suse.de, torvalds@linux-foundation.org, xfs@oss.sgi.com On Sun, Mar 01, 2015 at 02:36:35PM -0500, Theodore Ts'o wrote: > On Sun, Mar 01, 2015 at 11:15:06AM -0500, Johannes Weiner wrote: > > > > We had these lockups in cgroups with just a handful of threads, which > > all got stuck in the allocator and there was nobody left to volunteer > > unreclaimable memory. When this was being addressed, we knew that the > > same can theoretically happen on the system-level but weren't aware of > > any reports. Well now, here we are. > > I think the "few threads in a small" cgroup problem is a little > difference, because in those cases very often the global system has > enough memory, and there is always the possibility that we might relax > the memory cgroup guarantees a little in order to allow forward > progress. That's exactly how we fixed it. __GFP_NOFAIL are allowed to simply bypass the cgroup memory limits when reclaim within the group fails to make room for the allocation. I'm just mentioning that because the global case doesn't have the same out, but is susceptible to the same deadlock situation when there are no other threads volunteering pages. If your machines are loaded with hundreds or thousands of threads, the chances that a thread stuck in the allocator will be bailed out by the other threads in the system is likely (or that you run into CPU limits first), but if you have only a handful of memory-intensive tasks, this might not be the case. The cgroup problem was closer to that second scenario, where few threads split all available memory between them. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org