linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrea Arcangeli <andrea@novell.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Nick Piggin <piggin@cyberone.com.au>,
	Rik van Riel <riel@redhat.com>,
	Martin MOKREJ? <mmokrejs@ribosome.natur.cuni.cz>,
	tglx@linutronix.de
Subject: Re: [PATCH] fix spurious OOM kills
Date: Thu, 11 Nov 2004 11:56:14 -0200	[thread overview]
Message-ID: <20041111135614.GB16349@logos.cnet> (raw)
In-Reply-To: <20041111165050.GA5822@x30.random>

On Thu, Nov 11, 2004 at 05:50:51PM +0100, Andrea Arcangeli wrote:
> On Thu, Nov 11, 2004 at 10:38:50AM -0200, Marcelo Tosatti wrote:
> > 
> > Hi!
> > 
> > On Thu, Nov 11, 2004 at 04:42:38PM +0100, Andrea Arcangeli wrote:
> > > On Thu, Nov 11, 2004 at 09:29:22AM -0200, Marcelo Tosatti wrote:
> > > > Hi,
> > > > 
> > > > This is an improved version of OOM-kill-from-kswapd patch.
> > > > 
> > > > I believe triggering the OOM killer from task reclaim context 
> > > > is broken because the chances that it happens increases as the amount
> > > > of tasks inside reclaim increases - and that approach ignores efforts 
> > > > being done by kswapd, who is the main entity responsible for
> > > > freeing pages.
> > > > 
> > > > There have been a few problems pointed out by others (Andrea, Nick) on the 
> > > > last patch - this one solves them.
> > > 
> > > I disagree about the design of killing anything from kswapd. kswapd is
> > > an async helper like pdflush and it has no knowledge on the caller (it
> > > cannot know if the caller is ok with the memory currently available in
> > > the freelists, before triggering the oom). 
> > 
> > If zone_dma / zone_normal are below pages_min no caller is "OK with
> > memory currently available" except GFP_ATOMIC/realtime callers.
> 
> If the GFP_DMA zone is filled, and nobody allocates with GFP_DMA,
> nothing should be killed and everything should run fine, how can you
> get this right from kswapd? 

It does get it right. It only triggers OOM killer if _both_ 
GFP_DMA / GFP_KERNEL are full _and_ there is a task failing 
to allocate/free memory.

I think you missed the "task_looping_oom" variable in the patch, which is
set as soon as a task is unable to allocate/free memory. This variable
is set where the code used to call the OOM killer.

> > > I'm just about to move the
> > > oom killing away from vmscan.c to page_alloc.c which is basically the
> > > opposite of moving the oom invocation from the task context to kswapd.
> > > page_alloc.c in the task context is the only one who can know if
> > > something has to be killed, vmscan.c cannot know. vmscan.c can only know
> > > if something is still freeable, but if something isn't freeable it
> > > doesn't mean that we've to kill anything 
> > 
> > Well Andrea, its not about "if something isnt freeable", its about
> > "the VM is unable to make progress reclaiming pages". 
> 
> "VM is unable to reclaim pages" == "nothing is freeable"

OK, correct, silly me. I noted the gaffe after sending the email. 

But still, the main idea is valid here.

I'll say this again just in case: If ZONE_DMA and ZONE_NORMAL reclaiming 
efforts are in vain, and there is task which is looping on try_to_free_pages() 
unable to succeed, _and_ both DMA/normal are below pages_min, the OOM 
killer will be triggered.

(should be pages_min + higher protection).

> > > (for example if a task exited
> > > or some dma or normal-zone or highmem memory was released by another
> > > task while we were paging waiting for I/O). 
> > 
> > My last patch checks for pages_min before OOM killing, have you read it?
> 
> checking pages_min isn't correct anyways, the lowmem_reserve must taken
> into account or you may not kill tasks when you should really kill
> tasks.

Indeed - this can be improved. 

> Plus you're checking for all zones, but kswapd cannot know that it
> doesn't matter if the zone dma is under pages_min, as far as there's no
> GFP_DMA.

You mean boxes with no DMA zone?

If the normal zone is below pages_min+protection, then GFP_KERNEL allocations 
will fallback and eat from DMA zone.

I dont get you? 

> > > Every allocation is different and page_alloc.c is the only one who 
> > > knows what has to be done for every single allocation.
> > 
> > OK, what do you propose? Its the third time I ask you this and got no 
> > concrete answer yet. 
> 
> I want to move it to page_alloc.c (and up to the caller) and not in
> kswapd, I mention this a few times.
> 
> > Sure, allocators should receive -ENOMEM whenever possible, but this 
> > is not the issue here.
> 
> it is the issue, because only the context of the task can choose if to
> return -ENOMEM or to invoke the oom killer and try again. 

If the task chooses to return -ENOMEM it wont set "task_looping_oom" flag. 

OK - you are right to say that "only the context of the task can choose
to return -ENOMEM or invoke the oom killer". 

This allocator-context-only information is passed to kswapd via
 "task_looping_oom".

> > Triggering OOM killer on __alloc_pages() failure ? 
> 
> yes, ideally I'd put the oom killer _outside_ alloc_pages, but just
> moving it into alloc_pages should make things better than they are right
> now in vmscan.c.
> 
> > Show us the code, please :) 
> 
> I'm supposedly listening to a meeting right now, then I've a bad kernel
> crash to debug with random mem corruption that I just managed to
> reproduce deterministcally inside uml by emulating numa inside uml and
> I'll be busy until next week at the very least. So I doubt I'll be able
> to write any oom-related code until next week, sorry.

Good luck!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-11-11 13:56 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-11 11:29 Marcelo Tosatti
2004-11-11 15:42 ` Andrea Arcangeli
2004-11-11 12:38   ` Marcelo Tosatti
2004-11-11 16:50     ` Andrea Arcangeli
2004-11-11 13:56       ` Marcelo Tosatti [this message]
2004-11-11 21:45         ` Andrea Arcangeli
2004-11-11 19:19           ` Marcelo Tosatti
2004-11-11 17:42       ` Martin J. Bligh
2004-11-11 21:50         ` Andrea Arcangeli
2004-11-12 11:13       ` fix for mpol mm corruption on tmpfs Andrea Arcangeli
2004-11-11 21:57 ` [PATCH] fix spurious OOM kills Chris Ross
2004-11-12 16:52   ` Chris Ross
2004-11-12 23:56     ` Nick Piggin
2004-11-13 23:37     ` Andrea Arcangeli
2004-11-14  9:44       ` Marcelo Tosatti
2004-11-14 10:02         ` Marcelo Tosatti
2004-11-14 17:11           ` Andrea Arcangeli
2004-11-14 17:03         ` Andrea Arcangeli
2004-11-14 18:16           ` Martin J. Bligh
2004-11-14 18:27             ` Andrea Arcangeli
2004-11-14 20:21           ` Marcelo Tosatti
2004-11-16 16:30             ` Chris Ross
2004-11-17  9:08               ` Chris Ross
2004-11-17  9:23                 ` Andrew Morton
2004-11-17  6:06                   ` Marcelo Tosatti
2004-11-17  6:08                     ` Marcelo Tosatti
2004-11-17  6:38                       ` Marcelo Tosatti
2004-11-17 11:04                         ` Chris Ross
2004-11-17 10:26                       ` Andrew Morton
2004-11-17 10:50                       ` Chris Ross
2004-11-17  7:09                         ` Marcelo Tosatti
2004-11-17 11:49                           ` Chris Ross
2004-11-17 12:09                           ` Rik van Riel
2004-11-17 13:12                   ` Chris Ross
     [not found]                   ` <419CD8C1.4030506@ribosome.natur.cuni.cz>
2004-11-18 21:16                     ` Andrew Morton
     [not found]                       ` <419D25B5.1060504@ribosome.natur.cuni.cz>
     [not found]                         ` <419D2987.8010305@cyberone.com.au>
2004-11-19  0:03                           ` Martin MOKREJŠ
2004-11-19  0:08                             ` Andrew Morton
2004-11-19  8:09                               ` Marcelo Tosatti
2004-11-19 16:17                                 ` Thomas Gleixner
     [not found]                               ` <419E821F.7010601@ribosome.natur.cuni.cz>
2004-11-20 10:23                                 ` Thomas Gleixner
2004-11-20 10:45                                   ` Martin MOKREJŠ
2004-11-20 11:29                                   ` Martin MOKREJŠ
2004-11-20 13:29                                     ` Thomas Gleixner
2004-11-20 21:19                                       ` Martin MOKREJŠ
2004-11-21 11:53                                         ` Thomas Gleixner
2004-11-21 12:17                                           ` Martin MOKREJŠ
2004-11-21 13:57                                             ` Thomas Gleixner
2004-11-22 10:55                                               ` Thomas Gleixner
2004-11-23  7:41                                                 ` Martin MOKREJŠ
2004-11-23 10:27                                                   ` Thomas Gleixner
2004-11-24 15:52                                                     ` Martin MOKREJŠ
2004-11-24 16:36                                                       ` Thomas Gleixner
2004-12-14 16:04                                                     ` Martin MOKREJŠ
2004-12-14 17:38                                                       ` Andrea Arcangeli
2004-12-14 23:30                                                         ` Nick Piggin
2004-12-14 23:55                                                           ` Andrea Arcangeli
2004-12-15  0:16                                                             ` Thomas Gleixner
2004-12-15  0:37                                                               ` Andrea Arcangeli
2004-12-15  0:48                                                                 ` Thomas Gleixner
2004-11-21 19:01                   ` Chris Ross
2004-11-22 12:15                     ` Chris Ross
2004-11-22  8:35                       ` Marcelo Tosatti
2004-11-16  8:37           ` Chris Ross
2004-11-17  3:45   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041111135614.GB16349@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=andrea@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mmokrejs@ribosome.natur.cuni.cz \
    --cc=piggin@cyberone.com.au \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox