From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Paul Jackson <pj@sgi.com>
Cc: linux-mm@kvack.org, clameter@sgi.com
Subject: Re: [RFC] split zonelist and use nodemask for page allocation [1/4]
Date: Fri, 21 Apr 2006 15:49:16 +0900 [thread overview]
Message-ID: <20060421154916.f1c436d3.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20060420231751.f1068112.pj@sgi.com>
On Thu, 20 Apr 2006 23:17:51 -0700
Paul Jackson <pj@sgi.com> wrote:
> Interesting ... maybe ?
>
> Doesn't this change the semantics of the kernel page allocator?
>
> If I read correctly:
>
> The existing code scans the entire systems zonelists multiple times.
> First, it looks on all nodes in the system for easy memory, and if that
> fails, tries again, looking for less easy (lower threshold) memory.
>
> Your code takes one node at a time, in the alloc_pages_nodemask() loop,
> and calls __alloc_pages() for that node, which will exhaust that node
> before giving up.
>
Ah....okay, get_page_from_freelist() scans several times in alloc_pages()....
I should consider again and rewrite the whole patch.
Thank you for pointing it out.
Maybe what I should do is not to add a function which encapsulate alloc_pages()
but to modify get_pge_from_freelist() to take nodemask.
> In particular, the low memory failure cases, such as when the system
> starts to swap on a node, or a task is forced to sleep waiting for memory,
> or the out-of-memory killer called, would seem to be quite different with
> your patch. This could cause some serious problems, I suspect.
>
Yes, serious.
> Some of your other advantages from this change look nice, but I suspect
> it would take a radical rewrite of __alloc_pages(), moving the multiple
> scans at increasingly aggressive free memory settings up into your
> __alloc_pages_nodemask() routine, and moving the cpuset_zone_allowed()
> check from get_page_from_freelist() up as well.
>
Yes, I think so too.
> This would be a major rewrite of mm/page_alloc.c, perhaps a very
> interesting one, but I don't think it would be an easy one.
>
> Or, just perhaps, the above change in semantics is a -good- one. I'll
> wager that my colleague Christoph will consider it such (I see he has
> already heartily endorsed your patch.) Essentially your patch would
> seem to increase the locality of allocations -- beating one node to
> death before considering the next. Sometimes this will be a good
> improvement.
>
> And sometimes not. In my ideal world, there would be a per-cpuset
> option, perhaps just a boolean, choosing between the two choices of:
> 1) look on all allowed nodes for easy memory, before reconsidering
> each allowed node for the one of the last free pages, or
> 2) beat all zones on one node hard, before going off-node.
>
> I believe that the existing code does (1), and your patch does (2).
>
> In any event, the layering of yet another control loop on top of the
> nested conditional fallback loops of loops we have now is a concern.
> It is getting harder and harder for mere mortals to understand this.
>
> Perhaps there are opportunities here for much more cleanup, though
> that would not be easy.
>
yes, not easy.
> My apologies for wasting your time if I misread this.
>
I think you are right.
Thank you.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-04-21 6:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-21 4:11 KAMEZAWA Hiroyuki
2006-04-21 4:41 ` Christoph Lameter
2006-04-21 6:17 ` Paul Jackson
2006-04-21 6:49 ` KAMEZAWA Hiroyuki [this message]
2006-04-21 6:56 ` Paul Jackson
2006-04-21 8:05 ` KAMEZAWA Hiroyuki
2006-04-21 15:06 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060421154916.f1c436d3.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox