From mboxrd@z Thu Jan 1 00:00:00 1970 From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <200004261743.KAA16088@google.engr.sgi.com> Subject: Re: 2.3.x mem balancing Date: Wed, 26 Apr 2000 10:43:30 -0700 (PDT) In-Reply-To: <852568CD.0057D4FC.00@raylex-gh01.eo.ray.com> from "Mark_H_Johnson.RTS@raytheon.com" at Apr 26, 2000 11:03:58 AM MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Mark_H_Johnson.RTS@raytheon.com Cc: Andrea Arcangeli , linux-mm@kvack.org, riel@nl.linux.org, torvalds@transmeta.com List-ID: > In the context of "memory balancing" - all processors and all memory is NOT > equal in a NUMA system. To get the best performance from the hardware, you > prefer to put "all" of the memory for each process into a single memory unit - > then run that process from a processor "near" that memory unit. This seemingly > simple principle has a lot of problems behind it. What about... > - shared read only memory (e.g., libraries) [to clone or not?] > - shared read/write memory [how to schedule work to be done when load >> "local > capacity"] > - when memory is low, which pages should I remove? > - when I start a new job, even when there is lots of free memory, where should > I load the job? The problem is, every app has different requirements, and performs best under different resource (cpu/memory) scheduling policies. IRIX provides a tool called "dplace", that will allow performance experts specify which threads of a program should be run on cpus on which node, and how different sections of the address space should have their pages allocated (that is, on which nodes; possible policies: firsttouch, ie, allocate the page on the node which has the processor that first accesses that page, roundrobin, ie, round robin the allocations across all nodes, etc etc). Linux is a little away from providing such flexible options, specially since it is not even possible to pin a process to a cpu or node yet. The page allocation strategies are of course much more work to implement. For global issues like "when memory is low, which pages should I remove" the problem is a little more complex. Having a kswapd per node is an option, although I think it is too early to decide that. I am hoping we can get a multinode system up soon, and investigate these issues. Kanoj -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/