Re: [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andrea Arcangeli <andrea@novell.com>
To: Nikita Danilov <nikita@clusterfs.com>
Cc: Nick Piggin <piggin@cyberone.com.au>,
	Jesse Barnes <jbarnes@sgi.com>,
	Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
	Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage
Date: Sun, 7 Nov 2004 02:16:38 +0100	[thread overview]
Message-ID: <20041107011638.GJ3851@dualathlon.random> (raw)
In-Reply-To: <16781.9482.821680.375843@gargle.gargle.HOWL>

On Sat, Nov 06, 2004 at 10:24:58PM +0300, Nikita Danilov wrote:
> This means breaking all layering and passing mempool pointer all the way
> down to the lowest layer allocators (like bio and drivers). The only

bio and drivers already have their own mempools.  the blkdev layer is
guaranteed to succeed and it can only try GFP_NOIO allocations (if those
fails it'll fallback in the reserved mempool).

> practical way to do this, is to put mempool pointer into current
> task_struct. At which point it's no different from having per-thread
> list of pages that __alloc_pages() looks into before falling back to
> per-cpu page-sets and buddy. _Except_ in the latter case, reservation is
> handled transparently in __alloc_pages() and code shouldn't be adjusted
> to check for mempool in zillion of places.

that's sure reasonable to avoid changing lots of code.

> I think you are confusing "file system" and "ext2". I definitely know
> from experience that with some file system types, system can be oommed
> without any significant user-level allocation activity. Now, one can say
> that either such file-systems are broken, or Linux MM lacks support for
> features (like reservation) they need.

the latter is true, I agree.

>  > that's the PF_MEMALLOC path. A reservation already exists, or it would
>  > never work since 2.2. PF_MEMALLOC and the min/2 watermark are meant to
>  > allow writepage to allocate ram. however the amount reserved is limited,
> 
> low-mem watermark is mostly useless in the face of direct reclaim, when
> unbounded number of threads enter try_to_free_pages() and call
> ->writepage() simultaneously.

agreed.

>  > so it's not perfect. The only way to make it perfect I believe is to
>  > reserve the stuff inside the fs with mempools as described above.
> 
> I don't see what advantages mempools have over page reservation handled
> directly by page allocator, like in
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9-rc4/2.6.9-rc4-mm1/broken-out/reiser4-perthread-pages.patch

guess what, that patch is running in my kernel right now. However I
believe this approch is very wasteful. I agree it'll work right, but
you're wasting loads of ram and you're as well less efficient.

The efficient fix for your problem, is to have a global pool, protected
by a global semaphore (definitely not per-thread), so that when you hit
oom (and when you hit true oom the last thing you can care about is
paralleism or the scalability on such a global semaphore), the VM will
trasparently take the semaphore and start using the pool. This will
still require you to mark the start and end of your critical section
like this:

reiserf4_writepage()
{
	enable_reserved_pages_pool();

	find_or_create_page()
	journal something
	getblk
	biowhatever

	disable_reserved_pages_pool();
}

disable_reserved_pages_pool has to check a per-thread flag that the VM
will set if it has used the reserved pool and taken the semaphore, but
by that time the I/O can be guaranteed to complete and the memory will
be guaranteed to be unlocked eventually when the bio I/O completes. So
you can freely alloc_pages to refill the pool inside
disable_reserved_pages_pool and then drop the semaphore.
enable_reserved_pages_pool is only needed to set a per-thread flag to
tell the VM it's allowed to fallback in the global pool by blocking in
the global semaphore if the box is oom (instead of returning NULL).

in disable_reserved_pages_pool you'll also have to clear the pre-thread
flag before calling alloc_pages again to avoid deadlock on the semaphore
if another oom condition happens of course.

then you need an create_reserved_pages_pool(nr_pages) while you mount the
fs, and destroy_unreserve_pages_pool(nr_pages) when you unmont it. where
many different users (i.e. different fs) will be allowed to reserve a
different size for the global pool. They all will share the same pool,
you've only need to track each user nr_pages to know which is the max
reservation you need.

That's still enterely transparent, it'll work in the thread context
thanks to the global semaphore, but it'll avoid the waste of ram where
every different task has to pin the ram into the task before starting
the writepage I/O.

I mean, I understand the only point of the perthread-pages patch is
deadlock avoidance during OOM. So you definitely don't need a per-thread
reservation, the global pool methods I described above should be more
than enough and they'll save ram and make your system faster as well.

I agree PF_MEMALLOC has nothing to do with this.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2004-11-07  1:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-05 20:01 Marcelo Tosatti
2004-11-05 23:32 ` Jesse Barnes
2004-11-05 23:47   ` Thomas Gleixner
2004-11-06  1:20   ` Andrea Arcangeli
2004-11-06  1:26     ` Nick Piggin
2004-11-06  1:36       ` Jesse Barnes
2004-11-06  1:50       ` Andrea Arcangeli
2004-11-06  9:47         ` Hugh Dickins
2004-11-06 10:53           ` Nick Piggin
2004-11-06 15:29             ` Andrea Arcangeli
2004-11-06 15:29           ` Andrea Arcangeli
2004-11-06 16:21             ` Hugh Dickins
2004-12-10  6:02               ` William Lee Irwin III
2004-11-06 11:37         ` Nikita Danilov
2004-11-06 15:32           ` Andrea Arcangeli
2004-11-06 16:54             ` Nikita Danilov
2004-11-06 17:44               ` Andrea Arcangeli
2004-11-06 19:24                 ` Nikita Danilov
2004-11-07  1:16                   ` Andrea Arcangeli [this message]
2004-11-06 10:11       ` Marcelo Tosatti
2004-11-06  1:55     ` Thomas Gleixner
2004-11-06 10:28       ` Marcelo Tosatti
2004-11-17 22:54       ` Werner Almesberger
2004-11-17 23:27         ` Chris Ross
2004-11-18  0:04           ` Werner Almesberger
2004-11-18  0:28             ` Chris Ross
2004-11-18  1:14               ` Werner Almesberger
2004-11-18  8:20                 ` Chris Ross
2004-11-18 10:01                   ` Werner Almesberger
2004-11-18 14:44                     ` Thomas Gleixner
2004-11-18 15:10                       ` Chris Friesen
2004-11-06 10:05     ` Marcelo Tosatti
2004-11-06 15:44       ` Andrea Arcangeli
2004-11-06 15:52         ` Arjan van de Ven
2004-11-06 17:09         ` Marcelo Tosatti
2004-11-07  0:48           ` Andrea Arcangeli
2004-11-07 11:21             ` Marcelo Tosatti
2004-11-06 12:53 ` [PATCH] Remove OOM killer Andries Brouwer
2004-11-06 10:41   ` Marcelo Tosatti
2004-11-07  9:26   ` Marko Macek
2004-11-08 16:27 ` [PATCH] Remove OOM killer from try_to_free_pages / all_unreclaimable braindamage Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041107011638.GJ3851@dualathlon.random \
    --to=andrea@novell.com \
    --cc=akpm@osdl.org \
    --cc=jbarnes@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=nikita@clusterfs.com \
    --cc=piggin@cyberone.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox