From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrea Arcangeli <andrea@novell.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Nick Piggin <piggin@cyberone.com.au>,
Rik van Riel <riel@redhat.com>,
Martin MOKREJ? <mmokrejs@ribosome.natur.cuni.cz>,
tglx@linutronix.de
Subject: Re: [PATCH] fix spurious OOM kills
Date: Thu, 11 Nov 2004 11:56:14 -0200 [thread overview]
Message-ID: <20041111135614.GB16349@logos.cnet> (raw)
In-Reply-To: <20041111165050.GA5822@x30.random>
On Thu, Nov 11, 2004 at 05:50:51PM +0100, Andrea Arcangeli wrote:
> On Thu, Nov 11, 2004 at 10:38:50AM -0200, Marcelo Tosatti wrote:
> >
> > Hi!
> >
> > On Thu, Nov 11, 2004 at 04:42:38PM +0100, Andrea Arcangeli wrote:
> > > On Thu, Nov 11, 2004 at 09:29:22AM -0200, Marcelo Tosatti wrote:
> > > > Hi,
> > > >
> > > > This is an improved version of OOM-kill-from-kswapd patch.
> > > >
> > > > I believe triggering the OOM killer from task reclaim context
> > > > is broken because the chances that it happens increases as the amount
> > > > of tasks inside reclaim increases - and that approach ignores efforts
> > > > being done by kswapd, who is the main entity responsible for
> > > > freeing pages.
> > > >
> > > > There have been a few problems pointed out by others (Andrea, Nick) on the
> > > > last patch - this one solves them.
> > >
> > > I disagree about the design of killing anything from kswapd. kswapd is
> > > an async helper like pdflush and it has no knowledge on the caller (it
> > > cannot know if the caller is ok with the memory currently available in
> > > the freelists, before triggering the oom).
> >
> > If zone_dma / zone_normal are below pages_min no caller is "OK with
> > memory currently available" except GFP_ATOMIC/realtime callers.
>
> If the GFP_DMA zone is filled, and nobody allocates with GFP_DMA,
> nothing should be killed and everything should run fine, how can you
> get this right from kswapd?
It does get it right. It only triggers OOM killer if _both_
GFP_DMA / GFP_KERNEL are full _and_ there is a task failing
to allocate/free memory.
I think you missed the "task_looping_oom" variable in the patch, which is
set as soon as a task is unable to allocate/free memory. This variable
is set where the code used to call the OOM killer.
> > > I'm just about to move the
> > > oom killing away from vmscan.c to page_alloc.c which is basically the
> > > opposite of moving the oom invocation from the task context to kswapd.
> > > page_alloc.c in the task context is the only one who can know if
> > > something has to be killed, vmscan.c cannot know. vmscan.c can only know
> > > if something is still freeable, but if something isn't freeable it
> > > doesn't mean that we've to kill anything
> >
> > Well Andrea, its not about "if something isnt freeable", its about
> > "the VM is unable to make progress reclaiming pages".
>
> "VM is unable to reclaim pages" == "nothing is freeable"
OK, correct, silly me. I noted the gaffe after sending the email.
But still, the main idea is valid here.
I'll say this again just in case: If ZONE_DMA and ZONE_NORMAL reclaiming
efforts are in vain, and there is task which is looping on try_to_free_pages()
unable to succeed, _and_ both DMA/normal are below pages_min, the OOM
killer will be triggered.
(should be pages_min + higher protection).
> > > (for example if a task exited
> > > or some dma or normal-zone or highmem memory was released by another
> > > task while we were paging waiting for I/O).
> >
> > My last patch checks for pages_min before OOM killing, have you read it?
>
> checking pages_min isn't correct anyways, the lowmem_reserve must taken
> into account or you may not kill tasks when you should really kill
> tasks.
Indeed - this can be improved.
> Plus you're checking for all zones, but kswapd cannot know that it
> doesn't matter if the zone dma is under pages_min, as far as there's no
> GFP_DMA.
You mean boxes with no DMA zone?
If the normal zone is below pages_min+protection, then GFP_KERNEL allocations
will fallback and eat from DMA zone.
I dont get you?
> > > Every allocation is different and page_alloc.c is the only one who
> > > knows what has to be done for every single allocation.
> >
> > OK, what do you propose? Its the third time I ask you this and got no
> > concrete answer yet.
>
> I want to move it to page_alloc.c (and up to the caller) and not in
> kswapd, I mention this a few times.
>
> > Sure, allocators should receive -ENOMEM whenever possible, but this
> > is not the issue here.
>
> it is the issue, because only the context of the task can choose if to
> return -ENOMEM or to invoke the oom killer and try again.
If the task chooses to return -ENOMEM it wont set "task_looping_oom" flag.
OK - you are right to say that "only the context of the task can choose
to return -ENOMEM or invoke the oom killer".
This allocator-context-only information is passed to kswapd via
"task_looping_oom".
> > Triggering OOM killer on __alloc_pages() failure ?
>
> yes, ideally I'd put the oom killer _outside_ alloc_pages, but just
> moving it into alloc_pages should make things better than they are right
> now in vmscan.c.
>
> > Show us the code, please :)
>
> I'm supposedly listening to a meeting right now, then I've a bad kernel
> crash to debug with random mem corruption that I just managed to
> reproduce deterministcally inside uml by emulating numa inside uml and
> I'll be busy until next week at the very least. So I doubt I'll be able
> to write any oom-related code until next week, sorry.
Good luck!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-11-11 13:56 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-11-11 11:29 Marcelo Tosatti
2004-11-11 15:42 ` Andrea Arcangeli
2004-11-11 12:38 ` Marcelo Tosatti
2004-11-11 16:50 ` Andrea Arcangeli
2004-11-11 13:56 ` Marcelo Tosatti [this message]
2004-11-11 21:45 ` Andrea Arcangeli
2004-11-11 19:19 ` Marcelo Tosatti
2004-11-11 17:42 ` Martin J. Bligh
2004-11-11 21:50 ` Andrea Arcangeli
2004-11-12 11:13 ` fix for mpol mm corruption on tmpfs Andrea Arcangeli
2004-11-11 21:57 ` [PATCH] fix spurious OOM kills Chris Ross
2004-11-12 16:52 ` Chris Ross
2004-11-12 23:56 ` Nick Piggin
2004-11-13 23:37 ` Andrea Arcangeli
2004-11-14 9:44 ` Marcelo Tosatti
2004-11-14 10:02 ` Marcelo Tosatti
2004-11-14 17:11 ` Andrea Arcangeli
2004-11-14 17:03 ` Andrea Arcangeli
2004-11-14 18:16 ` Martin J. Bligh
2004-11-14 18:27 ` Andrea Arcangeli
2004-11-14 20:21 ` Marcelo Tosatti
2004-11-16 16:30 ` Chris Ross
2004-11-17 9:08 ` Chris Ross
2004-11-17 9:23 ` Andrew Morton
2004-11-17 6:06 ` Marcelo Tosatti
2004-11-17 6:08 ` Marcelo Tosatti
2004-11-17 6:38 ` Marcelo Tosatti
2004-11-17 11:04 ` Chris Ross
2004-11-17 10:26 ` Andrew Morton
2004-11-17 10:50 ` Chris Ross
2004-11-17 7:09 ` Marcelo Tosatti
2004-11-17 11:49 ` Chris Ross
2004-11-17 12:09 ` Rik van Riel
2004-11-17 13:12 ` Chris Ross
[not found] ` <419CD8C1.4030506@ribosome.natur.cuni.cz>
2004-11-18 21:16 ` Andrew Morton
[not found] ` <419D25B5.1060504@ribosome.natur.cuni.cz>
[not found] ` <419D2987.8010305@cyberone.com.au>
2004-11-19 0:03 ` Martin MOKREJŠ
2004-11-19 0:08 ` Andrew Morton
2004-11-19 8:09 ` Marcelo Tosatti
2004-11-19 16:17 ` Thomas Gleixner
[not found] ` <419E821F.7010601@ribosome.natur.cuni.cz>
2004-11-20 10:23 ` Thomas Gleixner
2004-11-20 10:45 ` Martin MOKREJŠ
2004-11-20 11:29 ` Martin MOKREJŠ
2004-11-20 13:29 ` Thomas Gleixner
2004-11-20 21:19 ` Martin MOKREJŠ
2004-11-21 11:53 ` Thomas Gleixner
2004-11-21 12:17 ` Martin MOKREJŠ
2004-11-21 13:57 ` Thomas Gleixner
2004-11-22 10:55 ` Thomas Gleixner
2004-11-23 7:41 ` Martin MOKREJŠ
2004-11-23 10:27 ` Thomas Gleixner
2004-11-24 15:52 ` Martin MOKREJŠ
2004-11-24 16:36 ` Thomas Gleixner
2004-12-14 16:04 ` Martin MOKREJŠ
2004-12-14 17:38 ` Andrea Arcangeli
2004-12-14 23:30 ` Nick Piggin
2004-12-14 23:55 ` Andrea Arcangeli
2004-12-15 0:16 ` Thomas Gleixner
2004-12-15 0:37 ` Andrea Arcangeli
2004-12-15 0:48 ` Thomas Gleixner
2004-11-21 19:01 ` Chris Ross
2004-11-22 12:15 ` Chris Ross
2004-11-22 8:35 ` Marcelo Tosatti
2004-11-16 8:37 ` Chris Ross
2004-11-17 3:45 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041111135614.GB16349@logos.cnet \
--to=marcelo.tosatti@cyclades.com \
--cc=andrea@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mmokrejs@ribosome.natur.cuni.cz \
--cc=piggin@cyberone.com.au \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox