linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Lameter <clameter@sgi.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Linus Torvalds <torvalds@osdl.org>,
	Aucoin <aucoin@houston.rr.com>,
	'Nick Piggin' <nickpiggin@yahoo.com.au>,
	'Tim Schmielau' <tim@physik3.uni-rostock.de>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: la la la la ... swappiness
Date: Tue, 5 Dec 2006 15:20:45 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0612051507000.20570@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20061205133954.92082982.akpm@osdl.org>

On Tue, 5 Dec 2006, Andrew Morton wrote:

> > However, since we do not recognize 
> > that we are in a dirty overload situation we may not do synchrononous 
> > writes but return without having reclaimed any memory
> 
> Return from what?  try_to_free_pages() or balance_dirty_pages()?

If we do not reach the dirty_ratio then we will not block but simply 
trigger writeouts.

try_to_free_pages() will trigger pdflush and we may wait 1/10th of a 
second in congestaion_wait() and in throttle_vm_writeout() (well not 
really since we check global limits) but we will not block. I think what 
happens is that try_to_free_pages() (given sufficient slowless of the 
writeout) at some point will start to return 0 and thus 
we OOM.

> The behaviour of page reclaim is independent of the level of dirty memory
> and of the dirty-memory thresholds, as far as I recall...

You cannot easily free a dirty page. We can only trigger writeout.

> > Could we get to the inode from the reclaim path and just start writing out 
> > all dirty pages of the indoe?
> 
> Yeah, maybe.  But of course the pages on the inode can be from any zone at
> all so the problem is that in some scenarios, we could write out tremendous
> numbers of pages from zones which don't need that writeout.

But we know that at least one page was in the correct zone. Writeout will 
be much faster if we can write a seris of block in sequence via the inode.

> > Its continual on the nodes of the cpuset. Reclaim is constantly running 
> > and becomes very inefficient.
> 
> I think what you're saying is that we're not throttling in
> balance_dirty_pages().  So a large write() which is performed by a process
> inside your one-tenth-of-memory cpuset will just go and dirty all of the
> pages in that cpuset's nodes and things get all gummed up.

Correct.
 
> That can certainly happen, and I suppose we can make changes to
> balance_dirty_pages() to fix it (although it will have the
> we-wrote-lots-of-pages-we-didnt-need-to failure mode).

Right. In addition to checking the limits of the nodes in the current 
cpuset (requires looping over all nodes and adding up the counters we 
need) I made some modification to pass a set of nodes in the 
writeback_control structure. We can then check if there are sufficient 
pages of the inode within the nodes of the cpuset. But I am a bit 
concerned about performance.

> But right now in 2.6.19 the machine should _not_ declare oom in this
> situation.  If it does, then we should fix that.  If it's only happening
> with NFS then yeah, OK, mumble, NFS still needs work.

We OOM only in some rare cases. Mostly it seems that the
machines just becomes extremely slow and the LRU locks become hot.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-12-05 23:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200612050641.kB56f7wY018196@ms-smtp-06.texas.rr.com>
2006-12-05 16:17 ` Linus Torvalds
2006-12-05 16:59   ` Andrew Morton
2006-12-05 17:41     ` aucoin, Andrew Morton
2006-12-05 18:31       ` Christoph Lameter
2006-12-05 18:44         ` Linus Torvalds
2006-12-05 19:32           ` Christoph Lameter
2006-12-05 20:02             ` Andrew Morton
2006-12-05 20:15               ` Christoph Lameter
2006-12-05 20:48                 ` Andrew Morton
2006-12-05 20:59                   ` Christoph Lameter
2006-12-05 21:39                     ` Andrew Morton
2006-12-05 23:20                       ` Christoph Lameter [this message]
2006-12-12 15:12                         ` Aucoin
2006-12-05 20:52               ` Andrew Morton
2006-12-05 20:39           ` aucoin, Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0612051507000.20570@schroedinger.engr.sgi.com \
    --to=clameter@sgi.com \
    --cc=akpm@osdl.org \
    --cc=aucoin@houston.rr.com \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=tim@physik3.uni-rostock.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox