From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f70.google.com (mail-pl0-f70.google.com [209.85.160.70]) by kanga.kvack.org (Postfix) with ESMTP id 3274E6B0003 for ; Fri, 25 May 2018 03:26:42 -0400 (EDT) Received: by mail-pl0-f70.google.com with SMTP id d4-v6so2534607plr.17 for ; Fri, 25 May 2018 00:26:42 -0700 (PDT) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id p33-v6si22941226pld.318.2018.05.25.00.26.38 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 25 May 2018 00:26:39 -0700 (PDT) Date: Fri, 25 May 2018 09:26:36 +0200 From: Michal Hocko Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes Message-ID: <20180525072636.GE11881@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Tetsuo Handa , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Thu 24-05-18 14:22:53, David Rientjes wrote: > The oom reaper ensures forward progress by setting MMF_OOM_SKIP itself if > it cannot reap an mm. This can happen for a variety of reasons, > including: > > - the inability to grab mm->mmap_sem in a sufficient amount of time, > > - when the mm has blockable mmu notifiers that could cause the oom reaper > to stall indefinitely, > > but we can also add a third when the oom reaper can "reap" an mm but doing > so is unlikely to free any amount of memory: > > - when the mm's memory is fully mlocked. > > When all memory is mlocked, the oom reaper will not be able to free any > substantial amount of memory. It sets MMF_OOM_SKIP before the victim can > unmap and free its memory in exit_mmap() and subsequent oom victims are > chosen unnecessarily. This is trivial to reproduce if all eligible > processes on the system have mlocked their memory: the oom killer calls > panic() even though forward progress can be made. > > This is the same issue where the exit path sets MMF_OOM_SKIP before > unmapping memory and additional processes can be chosen unnecessarily > because the oom killer is racing with exit_mmap(). > > We can't simply defer setting MMF_OOM_SKIP, however, because if there is > a true oom livelock in progress, it never gets set and no additional > killing is possible. > > To fix this, this patch introduces a per-mm reaping timeout, initially set > at 10s. It requires that the oom reaper's list becomes a properly linked > list so that other mm's may be reaped while waiting for an mm's timeout to > expire. No timeouts please! The proper way to handle this problem is to simply teach the oom reaper to handle mlocked areas. -- Michal Hocko SUSE Labs