Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <clameter@sgi.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, dkegel@google.com,
	David Miller <davem@davemloft.net>,
	Daniel Phillips <phillips@google.com>
Subject: Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
Date: Thu, 23 Aug 2007 05:02:38 +0200	[thread overview]
Message-ID: <20070823030238.GD18788@wotan.suse.de> (raw)
In-Reply-To: <1187710167.6114.258.camel@twins>

On Tue, Aug 21, 2007 at 05:29:27PM +0200, Peter Zijlstra wrote:
> [ now with CCs ]
> 
> On Tue, 2007-08-21 at 02:28 +0200, Nick Piggin wrote:
> 
> > I do of course. There is one thing to have a real lock deadlock
> > in some core path, and another to have this memory deadlock in a
> > known-to-be-dodgy configuration (Linus said last year that he didn't
> > want to go out of our way to support this, right?)... But if you can
> > solve it without impacting fastpaths etc. then I don't see any
> > objection to it.
> 
> That has been my intention, getting the problem solved without touching
> fast paths and with minimal changes to how things are currently done.
> 
> > I don't mean for correctness, but for throughput. If you're doing a
> > lot of network operations right near the memory limit, then it could
> > be possible that these deadlock paths get triggered relatively often.
> > With Christoph's patches, I think it would tend to be less.
> 
> Christoph's patches all rely on file backed memory being predominant.
> [ and to a certain degree fully ignore anonymous memory loads :-( ]

Yes. 


> Whereas quite a few realistic loads strive to minimise these - I'll
> again fall back to my MPI cluster example, they would want to use so
> much anonymous memory to preform their calculations that everything
> except the hot paths of code are present in memory. In these scenarios 1
> MB of text would already be a lot.

OK, I don't know exactly about MPI workloads. But I mean a few basic
things like the C and MPI libraries could already be quite big before
you even consider the application text (OK it won't be all paged in).

Maybe it won't be enough, but I think some form of recurive reclaim
will be better than our current scheme. Even assuming your patches are
in the kernel, don't you think it is a good idea to _not_ have potentially
complex writeout from reclaim just default to using up memory reserves?

 
> > > > How are your deadlock patches going anyway? AFAIK they are mostly a network
> > > > issue and I haven't been keeping up with them for a while. 
> > > 
> > > They really do rely on some VM interaction too, network does not have
> > > enough information to break out of the deadlock on its own.
> > 
> > The thing I don't much like about your patches is the addition of more
> > of these global reserve type things in the allocators. They kind of
> > suck (not your code, just the concept of them in general -- ie. including
> > the PF_MEMALLOC reserve). I'd like to eventually reach a model where
> > reclaimable memory from a given subsystem is always backed by enough
> > resources to be able to reclaim it. What stopped you from going that
> > route with the network subsystem? (too much churn, or something
> > fundamental?)
> 
> I'm wanting to keep the patches as non-intrusive as possible, exactly
> because some people consider this a fringe functionality. Doing as you
> say does sound like a noble goal, but would require massive overhauls.

But the code would end up better, wouldn't it? And it could be done
incrementally?

 
> Also, I'm not quite sure how this would apply to networking. It
> generally doesn't have much reclaimable memory sitting around, and it
> heavily relies on kmalloc so an alloc/free cycle accounting system would
> quickly involve a lot of the things I'm already doing.

It wouldn't use reclaimable memory as such, but would have some small
amounts of reserve memory for allocating all those things required to
get a response from critical sockets. NBD for example would also then
be sure to reserve enough memory to at least clean one page etc. That's
the way the block layer has gone, which seems to be pretty good and I
think much better than having it in the buddy allocator.

> (also one advantage of keeping it all in the buddy allocator is that it
> can more easily form larger order pages)

I don't know if that is a really good advantage. The amount of memory
involved should just be pretty small. I mean it is an advantage, but
there are other disadvantages (imagine the mess if other subsystems used
their own global reserves in the allocator rather than mempools etc). I
don't see why networking is fundamentally more deserving of its own pools
in the allocator than anybody else.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-08-23  3:02 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-14 14:21 Christoph Lameter
2007-08-14 14:21 ` [RFC 1/3] Allow reclaim via __GFP_NOMEMALLOC reclaim Christoph Lameter
2007-08-14 14:21 ` [RFC 2/3] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set Christoph Lameter
2007-08-14 14:21 ` [RFC 3/3] Test code for PF_MEMALLOC reclaim Christoph Lameter
2007-08-14 14:36 ` [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Peter Zijlstra
2007-08-14 15:29   ` Christoph Lameter
2007-08-14 19:32     ` Peter Zijlstra
2007-08-14 19:41       ` Christoph Lameter
2007-08-15 12:22 ` Nick Piggin
2007-08-15 13:12   ` Peter Zijlstra
2007-08-15 14:15     ` Andi Kleen
2007-08-15 13:55       ` Peter Zijlstra
2007-08-15 14:34         ` Andi Kleen
2007-08-15 20:32         ` Christoph Lameter
2007-08-15 20:29     ` Christoph Lameter
2007-08-16  3:29     ` Nick Piggin
2007-08-16 20:27       ` Christoph Lameter
2007-08-20  3:51       ` Peter Zijlstra
2007-08-20 19:15         ` Christoph Lameter
2007-08-21  0:32           ` Nick Piggin
2007-08-21  0:28         ` Nick Piggin
2007-08-21 15:29           ` Peter Zijlstra
2007-08-23  3:02             ` Nick Piggin [this message]
2007-09-12 22:39           ` Christoph Lameter
2007-09-05  9:20 ` Daniel Phillips
2007-09-05 10:42   ` Christoph Lameter
2007-09-05 11:42     ` Nick Piggin
2007-09-05 12:14       ` Christoph Lameter
2007-09-05 12:19         ` Nick Piggin
2007-09-10 19:29           ` Christoph Lameter
2007-09-10 19:37             ` Peter Zijlstra
2007-09-10 19:41               ` Christoph Lameter
2007-09-10 19:55                 ` Peter Zijlstra
2007-09-10 20:17                   ` Christoph Lameter
2007-09-10 20:48                     ` Peter Zijlstra
2007-09-11  7:41             ` Nick Piggin
2007-09-12 10:52         ` Peter Zijlstra
2007-09-12 22:47           ` Christoph Lameter
2007-09-13  8:19             ` Peter Zijlstra
2007-09-13 18:32               ` Christoph Lameter
2007-09-13 19:24                 ` Peter Zijlstra
2007-09-05 16:16     ` Daniel Phillips
2007-09-08  5:12       ` Mike Snitzer
2007-09-18  0:28         ` Daniel Phillips
2007-09-18  3:27           ` Mike Snitzer
     [not found]             ` <200709172211.26493.phillips@phunq.net>
2007-09-18  8:11               ` Wouter Verhelst
2007-09-18  9:58               ` Peter Zijlstra
2007-09-18 16:56                 ` Daniel Phillips
2007-09-18 19:16                   ` Peter Zijlstra
2007-09-18  9:30             ` Peter Zijlstra
2007-09-18 18:40             ` Daniel Phillips
2007-09-18 20:13               ` Mike Snitzer
2007-09-10 19:25       ` Christoph Lameter
2007-09-10 19:55         ` Peter Zijlstra
2007-09-10 20:22           ` Christoph Lameter
2007-09-10 20:48             ` Peter Zijlstra
2007-10-26 17:44               ` Pavel Machek
2007-10-26 17:55                 ` Christoph Lameter
2007-10-27 22:58                   ` Daniel Phillips
2007-10-27 23:08                 ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070823030238.GD18788@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=davem@davemloft.net \
    --cc=dkegel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=phillips@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox