Merged Zone / Node in order to do containers etc easily?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Lameter <clameter@engr.sgi.com>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Subject: Merged Zone / Node in order to do containers etc easily?
Date: Mon, 5 Mar 2007 10:54:43 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0703051046510.6913@schroedinger.engr.sgi.com> (raw)

We have talked a bit in the last days about eventually getting rid of 
either nodes or zones.

If one would merge the nodes and the zones struct this would be possible. 
Actually the current kernel supports something like it if the 
following config options are not set

CONFIG_ZONE_DMA
CONFIG_ZONE_DMA32
CONFIG_HIGHMEM

In that case we only have a single zone per node but no support anymore 
for DMA zones or highmem. We save the bits in the page->flags that are 
usually used to identify the zone. For simplicities sake lets just call 
these node / zone entities "zone".

Let say we have also CONFIG_NUMA set. Then

A. We could add more "zones" via node hotplug.
B. We can identify the zones via a node number from user space and direct 
   allocations to a specfic "zone".
C. We can migrate memory between "zones"
D. We have an indication how favorably these "zones" are to be used given
   their SLIT distance.

Lets call these "zones" that were generated during bootup "base zones".

Now we need some additional functionality. In particular we want to be 
able to put some memory dynamically into containers and we need to find a 
replacement for the DMA zones.

Lets create a new type of zones called "derived zones". These are based on 
base zone. An arbitrary number of MAX_ORDER blocks can be moved to these 
and then they function like a regular "zone". They can be dynamically 
created and deleted via the node hotplug interfaces.

So if we create a new container then we create a new zone and extract a 
number of MAX_ORDER blocks from a base zone. The zone functions like a 
base zone for the time that it exists and thus we have all the usual 
accounting for the zone and do not need to add them separately. Reclaim 
will work as for base zones etc etc. (this only works if we have MAX_ORDER 
blocks available, thus we would need Mel's defrag patches). Applications 
can be restricted to a container or containers by the cpuset 
functionality. The build in process migration in cpusets can move 
applications. Processes can be manually moved through page migration.

If we need some DMA zones for a particular device then we can also create 
a new zone and extract pages in a certain range from the base zone. This 
could occur dynamically (but early during boot so that the low end pages 
in a zone have not been used yet) if we discover that devices exist that 
need restricted memory pools. Moreover these zones could be custom sized 
for the devices that are challenged in a particular way. For example we 
could dynamically create a pool for a 2GB pool for the strange SCSI device 
that can only reliable do DMA using a 31 bit address.

That leaves the HIGHMEM out cold so far but HIGHMEM is not needed on 64 
bit platforms as far as I can tell. Maybe HIGHMEM could also be some sort 
of derived zone with memory taken from the base zone used as the memmap 
and as bounce buffers etc?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

                 reply	other threads:[~2007-03-05 18:54 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0703051046510.6913@schroedinger.engr.sgi.com \
    --to=clameter@engr.sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox