Re: [RFC] split zonelist and use nodemask for page allocation [1/4]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Paul Jackson <pj@sgi.com>
Cc: linux-mm@kvack.org, clameter@sgi.com
Subject: Re: [RFC] split zonelist and use nodemask for page allocation [1/4]
Date: Fri, 21 Apr 2006 15:49:16 +0900	[thread overview]
Message-ID: <20060421154916.f1c436d3.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20060420231751.f1068112.pj@sgi.com>

On Thu, 20 Apr 2006 23:17:51 -0700
Paul Jackson <pj@sgi.com> wrote:

> Interesting ... maybe ?
> 
> Doesn't this change the semantics of the kernel page allocator?
> 
> If I read correctly:
> 
>     The existing code scans the entire systems zonelists multiple times.
>     First, it looks on all nodes in the system for easy memory, and if that
>     fails, tries again, looking for less easy (lower threshold) memory.
> 
>     Your code takes one node at a time, in the alloc_pages_nodemask() loop,
>     and calls __alloc_pages() for that node, which will exhaust that node
>     before giving up.
> 

Ah....okay, get_page_from_freelist()  scans several times in alloc_pages()....
I should consider again and rewrite the whole patch.
Thank you for pointing it out.
Maybe what I should do is not to add a function which encapsulate alloc_pages()
but to modify get_pge_from_freelist() to take nodemask.


> In particular, the low memory failure cases, such as when the system
> starts to swap on a node, or a task is forced to sleep waiting for memory,
> or the out-of-memory killer called, would seem to be quite different with
> your patch.  This could cause some serious problems, I suspect.
> 
Yes, serious.

> Some of your other advantages from this change look nice, but I suspect
> it would take a radical rewrite of __alloc_pages(), moving the multiple
> scans at increasingly aggressive free memory settings up into your
> __alloc_pages_nodemask() routine, and  moving the cpuset_zone_allowed()
> check from get_page_from_freelist() up as well.
> 
Yes, I think so too.

> This would be a major rewrite of mm/page_alloc.c, perhaps a very
> interesting one, but I don't think it would be an easy one.
> 

> Or, just perhaps, the above change in semantics is a -good- one.  I'll
> wager that my colleague Christoph will consider it such (I see he has
> already heartily endorsed your patch.)  Essentially your patch would
> seem to increase the locality of allocations -- beating one node to
> death before considering the next.  Sometimes this will be a good
> improvement.
> 
> And sometimes not.  In my ideal world, there would be a per-cpuset
> option, perhaps just a boolean, choosing between the two choices of:
>   1) look on all allowed nodes for easy memory, before reconsidering
>        each allowed node for the one of the last free pages, or
>   2) beat all zones on one node hard, before going off-node.
> 
> I believe that the existing code does (1), and your patch does (2).
> 
> In any event, the layering of yet another control loop on top of the
> nested conditional fallback loops of loops we have now is a concern.
> It is getting harder and harder for mere mortals to understand this.
> 
> Perhaps there are opportunities here for much more cleanup, though
> that would not be easy.
> 
yes, not easy.

> My apologies for wasting your time if I misread this.
> 
I think you are right.
Thank you. 

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-04-21  6:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-21  4:11 KAMEZAWA Hiroyuki
2006-04-21  4:41 ` Christoph Lameter
2006-04-21  6:17 ` Paul Jackson
2006-04-21  6:49   ` KAMEZAWA Hiroyuki [this message]
2006-04-21  6:56     ` Paul Jackson
2006-04-21  8:05       ` KAMEZAWA Hiroyuki
2006-04-21 15:06       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060421154916.f1c436d3.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox