From: frankeh@us.ibm.com
To: Kanoj Sarcar <kanoj@google.engr.sgi.com>
Cc: Andrea Arcangeli <andrea@suse.de>,
Mark_H_Johnson.RTS@raytheon.com, linux-mm@kvack.org,
riel@nl.linux.org, Linus Torvalds <torvalds@transmeta.com>,
pratap@us.ibm.com
Subject: Re: 2.3.x mem balancing
Date: Wed, 26 Apr 2000 15:06:09 -0400 [thread overview]
Message-ID: <852568CD.00695B9D.00@D51MTA07.pok.ibm.com> (raw)
Kanoj, this is the issue I raised earlier on the board, but didn't get a
reply...
Yes, one NUMA machine here at IBM research consists of a 4-node cluster of
4-way xeon boxes.
When NUMA-d together, each memory controller simply relocates its own node
memory to
a designated 1-GB range and forwards other requests to the appropriate
nodes while maintaining cache coherence.
This ofcourse leads to the situation, that only the first node will have
DMA memory, given the 1GB kernel limitation.
I used to have a software solution to this namely by rewritting the __pa
and __va macros to do some remapping
which would allow each node to provide some kernel virtual DMA memory.
Now how do you believe the architectures (particular x86 based NUMA
systems) will evolve ?
As with respect to some other messages regarding the zones.
With respect to NUMA allocation, I still like to see happening what was
pointed out for the IRIX and which is for instance
also available on NUMAQ/Dynix as well. Namely resource classes.
A resource class to be a set of basic resources such as (CPUs and memory,
i.e nodes) on which to restrict execution and allocation for user processes
(a) we have a full CPU affinity patch, driven by a system call interface
that restricts execution to a set of specified CPUs .. any takers ...
(b) kanoj and I made a first attempt (~2.3.48 timeframe) to restrict
allocation to certain nodes, but the swapping behavior never properly
worked and with
the constant changes under 2.3.99-preX, I put this on ice until the vm
becomes somewhat more stable.
Again, I want to specify a set of nodes from where to allocate memory .
Given a node set specification, I would like to treat the zones of the
same class on all those specified nodes (e.g. ZONE_HIGH) as a single target
class. Only if it can not allocate within that combined class on the
specified set of nodes, should the allocator decent into the next lower
class.
Open ofcourse in this spec is what will be effected by the memory
specification ??? only user pages, or pages that go to memory mapped files
as well?
kanoj@google.engr.sgi.com (Kanoj Sarcar) on 04/26/2000 01:36:48 PM
To: andrea@suse.de (Andrea Arcangeli)
cc: Mark_H_Johnson.RTS@raytheon.com, linux-mm@kvack.org,
riel@nl.linux.org, torvalds@transmeta.com (Linus Torvalds)
Subject: Re: 2.3.x mem balancing
>
> On NUMA hardware you have only one zone per node since nobody uses
ISA-DMA
> on such machines and you have PCI64 or you can use the PCI-DMA sg for
> PCI32. So on NUMA hardware you are going to have only one zone per node
> (at least this was the setup of the NUMA machine I was playing with). So
> you don't mind at all about classzone/zone. Classzone and zone are the
> same thing in such a setup, they both are the plain ZONE_DMA zone_t.
> Finished. Said that you don't care anymore about the changes of how the
> overlapped zones are handled since you don't have overlapped zones in
> first place.
Andrea, are you talking about the SGI Origin platform, or are you
using some other NUMA platform? In any case, the SGI platform in fact
does not support ISA-DMA, but unfortunately, I don't think just because
it has PCI mapping registers, you can assume that all memory is DMAable.
For us to be able to consider all memory as dmaable, before each dma
operation starts, we need to have a pci-dma type hook to program the
mapping registers. As far as I know, such a hook is not used on all
drivers (in 2.4 timeframe), so very unfortunately, I think we need
to keep the option open about each node having more than just ZONE_DMA.
Finally, I am not sure how things will work, we are still busy trying
to get the Origin/Linux port going.
FWIW, I think the IBM/Sequent NUMA machines in fact have nodes that
have only nondmaable memory.
>
> If you move the NUMA balancing and node selection into the higher layer
> as I was proposing, instead you can do clever things.
>
For an example and a (old) patch for this, look at
http://oss.sgi.com/projects/numa/download/numa.gen.42b
http://oss.sgi.com/projects/numa/download/numa.plat.42b
Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
next reply other threads:[~2000-04-26 19:14 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-04-26 19:06 frankeh [this message]
-- strict thread matches above, loose matches on Subject: below --
2000-04-26 16:03 Mark_H_Johnson.RTS
2000-04-26 17:06 ` Andrea Arcangeli
2000-04-26 17:36 ` Kanoj Sarcar
2000-04-26 21:58 ` Andrea Arcangeli
2000-04-26 17:43 ` Kanoj Sarcar
[not found] <Pine.LNX.4.21.0004250401520.4898-100000@alpha.random>
2000-04-25 16:57 ` Linus Torvalds
2000-04-25 17:50 ` Rik van Riel
2000-04-25 18:11 ` Jeff Garzik
2000-04-25 18:33 ` Rik van Riel
2000-04-25 18:53 ` Linus Torvalds
2000-04-25 19:27 ` Rik van Riel
2000-04-26 0:26 ` Linus Torvalds
2000-04-26 1:19 ` Rik van Riel
2000-04-26 1:07 ` Andrea Arcangeli
2000-04-26 2:10 ` Rik van Riel
2000-04-26 11:24 ` Stephen C. Tweedie
2000-04-26 16:44 ` Linus Torvalds
2000-04-26 17:13 ` Rik van Riel
2000-04-26 17:24 ` Linus Torvalds
2000-04-27 13:22 ` Stephen C. Tweedie
2000-04-26 14:19 ` Andrea Arcangeli
2000-04-26 16:52 ` Linus Torvalds
2000-04-26 17:49 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=852568CD.00695B9D.00@D51MTA07.pok.ibm.com \
--to=frankeh@us.ibm.com \
--cc=Mark_H_Johnson.RTS@raytheon.com \
--cc=andrea@suse.de \
--cc=kanoj@google.engr.sgi.com \
--cc=linux-mm@kvack.org \
--cc=pratap@us.ibm.com \
--cc=riel@nl.linux.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox