From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.17.195.12]) by e34.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j04Hex9g082350 for ; Tue, 4 Jan 2005 12:40:59 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay03.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j04Hex7j367998 for ; Tue, 4 Jan 2005 10:40:59 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j04HexLw018490 for ; Tue, 4 Jan 2005 10:40:59 -0700 Subject: Re: process page migration From: Dave Hansen In-Reply-To: <41DAD2AF.80604@sgi.com> References: <41D99743.5000601@sgi.com> <1104781061.25994.19.camel@localhost> <41D9A7DB.2020306@sgi.com> <20050104.234207.74734492.taka@valinux.co.jp> <41DAD2AF.80604@sgi.com> Content-Type: text/plain Date: Tue, 04 Jan 2005 09:40:56 -0800 Message-Id: <1104860456.7581.21.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Ray Bryant Cc: Hirokazu Takahashi , Marcelo Tosatti , linux-mm , Rick Lindsley , "Matthew C. Dobson [imap]" List-ID: On Tue, 2005-01-04 at 11:30 -0600, Ray Bryant wrote: > My thinking on this was to update the mempolicy after page migration. > This works for my purposes since my plan is to > > (1) suspend the process via SIGSTOP > (2) update the mempolicy > (3) migrate the process's pages > (4) migrate the process to the new cpu via set_schedaffinity() > (5) resume the process via SIGCONT > > These steps are to be performed via a user space program that implements > the actual migration function; the (2)-(4) are just the system calls that > implement this. This keeps some of the function (i. e. which processes to > migrate) out of the kernel and allows the user some flexibility in what > order operations are performed as well as other functions that may go > along with this migration request. (The actual function we are trying > to implement is to support >>job<< migration from one set of NUMA nodes to > another, and a job may consist of several processes.) We already have scheduler code which has some knowledge of when a process is dragged from one node to another. Combined with the per-node RSS, could we make a decision about when a process needs to have migration performed on its pages on a more automatic basis, without the syscalls? We could have a tunable for how aggressive this mechanism is, so that the process wouldn't start running again on the more strict SGI machines until a very large number of the pages are pulled over. However, on machines where process latency is more of an issue, the tunable could be set to a much less aggressive value. This would give normal, somewhat less exotic, NUMA machines the benefits of page migration without the need for the process owner to do anything manually to them, while also making sure that we keep the number of interfaces to the migration code to a relative minimum. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org