Re: [RFC] split zonelist and use nodemask for page allocation [1/4]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Paul Jackson <pj@sgi.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-mm@kvack.org, clameter@sgi.com
Subject: Re: [RFC] split zonelist and use nodemask for page allocation [1/4]
Date: Thu, 20 Apr 2006 23:17:51 -0700	[thread overview]
Message-ID: <20060420231751.f1068112.pj@sgi.com> (raw)
In-Reply-To: <20060421131147.81477c93.kamezawa.hiroyu@jp.fujitsu.com>

Interesting ... maybe ?

Doesn't this change the semantics of the kernel page allocator?

If I read correctly:

    The existing code scans the entire systems zonelists multiple times.
    First, it looks on all nodes in the system for easy memory, and if that
    fails, tries again, looking for less easy (lower threshold) memory.

    Your code takes one node at a time, in the alloc_pages_nodemask() loop,
    and calls __alloc_pages() for that node, which will exhaust that node
    before giving up.

In particular, the low memory failure cases, such as when the system
starts to swap on a node, or a task is forced to sleep waiting for memory,
or the out-of-memory killer called, would seem to be quite different with
your patch.  This could cause some serious problems, I suspect.

Some of your other advantages from this change look nice, but I suspect
it would take a radical rewrite of __alloc_pages(), moving the multiple
scans at increasingly aggressive free memory settings up into your
__alloc_pages_nodemask() routine, and  moving the cpuset_zone_allowed()
check from get_page_from_freelist() up as well.

This would be a major rewrite of mm/page_alloc.c, perhaps a very
interesting one, but I don't think it would be an easy one.

Or, just perhaps, the above change in semantics is a -good- one.  I'll
wager that my colleague Christoph will consider it such (I see he has
already heartily endorsed your patch.)  Essentially your patch would
seem to increase the locality of allocations -- beating one node to
death before considering the next.  Sometimes this will be a good
improvement.

And sometimes not.  In my ideal world, there would be a per-cpuset
option, perhaps just a boolean, choosing between the two choices of:
  1) look on all allowed nodes for easy memory, before reconsidering
       each allowed node for the one of the last free pages, or
  2) beat all zones on one node hard, before going off-node.

I believe that the existing code does (1), and your patch does (2).

In any event, the layering of yet another control loop on top of the
nested conditional fallback loops of loops we have now is a concern.
It is getting harder and harder for mere mortals to understand this.

Perhaps there are opportunities here for much more cleanup, though
that would not be easy.

My apologies for wasting your time if I misread this.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-04-21  6:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-21  4:11 KAMEZAWA Hiroyuki
2006-04-21  4:41 ` Christoph Lameter
2006-04-21  6:17 ` Paul Jackson [this message]
2006-04-21  6:49   ` KAMEZAWA Hiroyuki
2006-04-21  6:56     ` Paul Jackson
2006-04-21  8:05       ` KAMEZAWA Hiroyuki
2006-04-21 15:06       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060420231751.f1068112.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox