From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f72.google.com (mail-lf0-f72.google.com [209.85.215.72]) by kanga.kvack.org (Postfix) with ESMTP id 570506B0069 for ; Fri, 9 Sep 2016 10:00:24 -0400 (EDT) Received: by mail-lf0-f72.google.com with SMTP id n4so37232605lfb.3 for ; Fri, 09 Sep 2016 07:00:24 -0700 (PDT) Received: from mail-wm0-f67.google.com (mail-wm0-f67.google.com. [74.125.82.67]) by mx.google.com with ESMTPS id 8si3090229wmo.24.2016.09.09.07.00.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Sep 2016 07:00:22 -0700 (PDT) Received: by mail-wm0-f67.google.com with SMTP id b187so2908425wme.0 for ; Fri, 09 Sep 2016 07:00:22 -0700 (PDT) Date: Fri, 9 Sep 2016 16:00:21 +0200 From: Michal Hocko Subject: Re: [RFC 1/4] mm, oom: do not rely on TIF_MEMDIE for memory reserves access Message-ID: <20160909140020.GN4844@dhcp22.suse.cz> References: <1472723464-22866-1-git-send-email-mhocko@kernel.org> <1472723464-22866-2-git-send-email-mhocko@kernel.org> <201609041049.GIF51522.FOHLOJVSFOFMtQ@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201609041049.GIF51522.FOHLOJVSFOFMtQ@I-love.SAKURA.ne.jp> Sender: owner-linux-mm@kvack.org List-ID: To: Tetsuo Handa Cc: linux-mm@kvack.org, rientjes@google.com, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org On Sun 04-09-16 10:49:42, Tetsuo Handa wrote: > Michal Hocko wrote: [...] > > @@ -3309,6 +3318,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask) > > return alloc_flags; > > } > > > > +static bool oom_reserves_allowed(struct task_struct *tsk) > > +{ > > + if (!tsk_is_oom_victim(tsk)) > > + return false; > > + > > + /* > > + * !MMU doesn't have oom reaper so we shouldn't risk the memory reserves > > + * depletion and shouldn't give access to memory reserves passed the > > + * exit_mm > > + */ > > + if (!IS_ENABLED(CONFIG_MMU) && !tsk->mm) > > + return false; > > + > > + return true; > > +} > > + > > Are you aware that you are trying to make !MMU kernel's allocations not only > after returning exit_mm() but also from __mmput() from mmput() from exit_mm() > fail without allowing access to memory reserves? Do we allocate from that path in !mmu and would that be more broken than with the current code which clears TIF_MEMDIE after mmput even when __mmput is not called (aka somebody is holding a reference to mm - e.g. a proc file)? > The comment says only after returning exit_mm(), but this change is > not. I can see that the comment is not ideal. Any suggestion how to make it better? > > @@ -3558,8 +3593,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > goto nopage; > > } > > > > - /* Avoid allocations with no watermarks from looping endlessly */ > > - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL)) > > + /* Avoid allocations for oom victims from looping endlessly */ > > + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL)) > > goto nopage; > > This change increases possibility of giving up without trying ALLOC_OOM > (more allocation failure messages), for currently only one thread which > remotely got TIF_MEMDIE when it was between gfp_to_alloc_flags() and > test_thread_flag(TIF_MEMDIE) will give up without trying ALLOC_NO_WATERMARKS > while all threads which remotely got current->signal->oom_mm when they were > between gfp_to_alloc_flags() and test_thread_flag(TIF_MEMDIE) will give up > without trying ALLOC_OOM. I think we should make sure that ALLOC_OOM is > tried (by using a variable which remembers whether > get_page_from_freelist(ALLOC_OOM) was tried). Technically speaking you are right but I am not really sure that this matters all that much. This code as always been racy. If we ever consider the race harmfull we can reorganize the allo slow path in a way to guarantee at least one allocation attempt with ALLOC_OOM I am just not sure it is necessary right now. If this ever shows up as a problem we would see a flood of allocation failures followed by the OOM report so it would be quite easy to notice. > We are currently allowing TIF_MEMDIE threads try ALLOC_NO_WATERMARKS for > once and give up without invoking the OOM killer. This change makes > current->signal->oom_mm threads try ALLOC_OOM for once and give up without > invoking the OOM killer. This means that allocations for cleanly cleaning > up by oom victims might fail prematurely, but we don't want to scatter > around __GFP_NOFAIL. Since there are reasonable chances of the parallel > memory freeing, we don't need to give up without invoking the OOM killer > again. I think that > > - /* Avoid allocations with no watermarks from looping endlessly */ > - if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL)) > +#ifndef CONFIG_MMU > + /* Avoid allocations for oom victims from looping endlessly */ > + if (tsk_is_oom_victim(current) && !(gfp_mask & __GFP_NOFAIL)) > + goto nopage; > +#endif > > is possible. I would prefer to not spread out MMU ifdefs all over the place. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org