linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] change zonelist order v5 [0/3]
@ 2007-05-08 11:14 KAMEZAWA Hiroyuki
  2007-05-08 11:16 ` [PATCH] change zonelist order v5 [1/3] implements zonelist order selection KAMEZAWA Hiroyuki
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-08 11:14 UTC (permalink / raw)
  To: LKML
  Cc: Linux-MM, Lee.Schermerhorn, Christoph Lameter, AKPM, Andi Kleen,
	jbarnes, kamezawa.hiroyu

Hi, this is zonelist-order-fix patch version 5.
against 2.6.21-mm1. works well in my ia64/NUMA environment.


ChangeLog V4 -> V5
- separated 'doc' patch and rewrote it.
- more clean ups.
- sysctl/boot option params are simplified.

ChangeLog V2 -> V4
- automatic configuration is added.
- automatic configuration is now default.
- relaxed_zone_order is renamed to be numa_zonelist_order
  you can specify value "default" , "zone" , "numa"
- clean-up from Lee Schermerhorn
- patch is speareted to "base" and "autoconfiguration algorithm"

Changelog from V1 -> V2
- sysctl name is changed to be relaxed_zone_order
- NORMAL->NORMAL->....->DMA->DMA->DMA order (new ordering) is now default.
  NORMAL->DMA->NORMAL->DMA order (old ordering) is optional.
- addes boot opttion to set relaxed_zone_order. ia64 is supported now.
- Added documentation

Thanks to Lee Schermerhon for his great help. please ack or
give your sign-off if O.K.

[patch set]
[1/3] ---- add zonelist selection logic.
[2/3] ---- add automatic configration of zonelist order
[3/3] ---- add documentaion.

Any comments are welcome.

[Description]
This patch modifies zonelist order in NUMA. This patch offers two zonelist
order.
(TypeA) zone is ordered by node locality, then zone type
(TypeB) zone is ordered by zone type, then node locality

(TypeA) is called as "Node Order", (TypeB) is called as "Zone Order"
Default zonelist order is determined by the kernel automatically.


Assume 2 Node NUMA, Node(0) has ZONE_DMA/ZONE_NORMAL and Node(1) has ZONE_NORMAL.
In this case, zonelist for GFP_KERNEL in Node(0) will be

In "Node Order",  Node(0)NORMAL -> Node(0)DMA -> Node(1)NORMAL
In "Zone Order",  Node(0)NORMAL -> Node(1)NORMAL -> Node(0) DMA

"Node Order" will guarantee "better locality" but  "Zone Order" places
ZONE_DMA at the tail of zonelist. This will offer robust zonelist agatist OOM on ZONE_DMA, which is tend to be small.

"Which is better ?" 
It depends on a system's environment and memory usage, I think.

[Case Study]
On my (and other) ia64 NUMA box, only Node(0) has 2Gbytes of ZONE_DMA.
Assume a machine with following configuration.

Node 0:   12GB of memory   10GB NORMAL 2GB DMA
Node 1:   12GB of memory   12GB NORMAL
Node 2:   12GB of memory   12GB NORMAL

Start a process which uses 12GB of memory on Node(0), then memory usage
will be
Node 0:   0/12 GB of memory is available, NORMAL: empty DMA: empty
Node 1:  12/12 GB of memory is available. NORMAL: 12G
Node 2:  12/12 GB of memory is available. NORMAL: 12G

An interesting matter is "ZONE_DMA is exhausted before ZONE_NORMAL".
This is current kernel's behavior. This can cause OOM very easily if the
system has a device which uses GFP_DMA. 

This patch fixes this kind of situation as following. (by using "Zone Order")
Node 0:   2/12 GB of memory is available, NORMAL: empty DMA: 2G
Node 1:  10/12 GB of memory is available. NORMAL: 10G
Node 2:  12/12 GB of memory is available. NORMAL  12G

A user can say "Good bye OOM-Killer" but 2GB of memory is allocated from
off-node memory. it's trade-off.

-Kame







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-05-09 13:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-08 11:14 [PATCH] change zonelist order v5 [0/3] KAMEZAWA Hiroyuki
2007-05-08 11:16 ` [PATCH] change zonelist order v5 [1/3] implements zonelist order selection KAMEZAWA Hiroyuki
2007-05-08 17:06   ` Lee Schermerhorn
2007-05-08 17:22     ` Christoph Lameter
2007-05-08 17:33       ` Lee Schermerhorn
2007-05-08 18:05         ` Christoph Lameter
2007-05-08 20:37           ` Lee Schermerhorn
2007-05-09  0:29             ` KAMEZAWA Hiroyuki
2007-05-09  0:58               ` Andrew Morton
2007-05-09  1:07                 ` Christoph Lameter
2007-05-09  1:20                 ` KAMEZAWA Hiroyuki
2007-05-09 13:55                   ` Lee Schermerhorn
2007-05-09  4:12                 ` KAMEZAWA Hiroyuki
2007-05-09  8:53                   ` Andy Whitcroft
2007-05-09  9:04                     ` KAMEZAWA Hiroyuki
2007-05-08 11:18 ` [PATCH] change zonelist order v5 [2/3] automatic configuration KAMEZAWA Hiroyuki
2007-05-08 17:07   ` Lee Schermerhorn
2007-05-08 11:19 ` [PATCH] change zonelist order v5 [3/3] documentation KAMEZAWA Hiroyuki
2007-05-08 17:08   ` Lee Schermerhorn
2007-05-09  0:23     ` KAMEZAWA Hiroyuki
2007-05-08 12:04 ` [PATCH] change zonelist order v5 [4/3] compile fix KAMEZAWA Hiroyuki
2007-05-08 16:14 ` [PATCH] change zonelist order v5 [0/3] Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox