From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from taynzmail03.nz-tay.cpqcorp.net (tandem.com [16.47.4.103]) by atlrel8.hp.com (Postfix) with ESMTP id 37DD2346DC for ; Sun, 19 Mar 2006 18:25:52 -0500 (EST) Received: from anw.zk3.dec.com (alpha.zk3.dec.com [16.140.128.4]) by taynzmail03.nz-tay.cpqcorp.net (Postfix) with ESMTP id 0772E630F for ; Tue, 14 Mar 2006 17:28:07 -0500 (EST) Subject: Re: [PATCH/RFC] AutoPage Migration - V0.1 - 0/8 Overview From: Lee Schermerhorn Reply-To: lee.schermerhorn@hp.com In-Reply-To: <1142019195.5204.12.camel@localhost.localdomain> References: <1142019195.5204.12.camel@localhost.localdomain> Content-Type: text/plain Date: Tue, 14 Mar 2006 17:27:46 -0500 Message-Id: <1142375266.5235.105.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm List-ID: Some results for auto migration + migrate on fault using the McAlpin Stream benchmark. Kernel: 2.6.16-rc6-git1 with the migrate-on-fault and automigration patches applied, in that order. Platform: HP rx8620 16cpu/4node/32BG numa platform with 1.6GHz ia64 cpus. I built the stream benchmark to run for 1000 loops [NTIMES defined as 1000], and at the end of each loop I printed the run# and the times for the copy, scale, add and triad loops. I compiled with icc [Intel compiler] 9.0 using OpenMP extensions to parallelize the loops. I ran the benchmark twice: once with auto migration disabled [/sys/kernel/migration/sched_migrate_memory = 0] and once with it enabled [including migrate on fault--a.k.a lazy migration]. After I started each run, I did a kernel build on the platform using: make mrproper; make oldconfig [using my regular .config]; make -j32 all. I have posted graphs of the results and the raw data itself at: http://free.linux.hp.com/~lts/Patches/PageMigration/ The nomig+kbuild files [automigration disabled] show the benchmark starting with more or less "best case" times. By best case, I mean, that if you run the benchmark over and over [with, say, NTIMES = 10], you'll see a number of different results, depending on how the OMP threads happen to fall relative to the data. I have made no effort to do any initial placement. The copy and scale times are ~0.052 seconds; the add and triad numbers are ~0.078 seconds. Then the kernel build disturbs the run for a while, during which time the scheduler migrates the stream/OMP threads to least busy groups/queues. When the kernel build is done, the threads happen to be a way suboptimal configuration. This is just a roll of the dice. They could have ended up in better shape. The automig+kbuild files show the same secnario. However, the threads happened to start at a suboptimal configuration, similar to the end state of the nomig case. Again, I made no effort to place the threads at the start. As soon as the kernel build starts, migration kicks in. I believe that the first spike is from the "make mrproper". This disturbance is sufficient to cause the stream/OMP threads to migrate, at which time autopage migration causes the pages to be migrated to the node with the threads. Then, the "make -j32 all" really messes up the benchmark, but when build finishes, the job runs in the more or less best case configuration. Some things to note: the peaks for the automigration case are higher [almost double] that of the no migration case because the page migrations get included in the benchmark times, which is just using gettimeofday(). Also, because I'm using lazy migration [sched_migrate_lazy enabled], when a task migrates to a new node, I only unmap the pages. Then, they migrate or not, as threads touch them. If the page is already on the node where the faulting thread is running, the fault just reinstates the pte for the page. If not, it will migrate the page to the thread's node and install that pte. Finally, as noted above, the first disturbance in the automig case caused some migrations that resulted in a better thread/memory configuration that what the program started with. To give an application control over this, it might be useful to provide an MPOL_NOOP policy to be used along with the MPOL_MF_MOVE flag, and the MPOL_MF_LAZY flag that I implemented in the last of the migrate-on-fault patches. The 'NOOP would retain the existing policy for the specified range, but the 'MOVE+'LAZY would unmap the ptes of the range, pushing anon pages to the swap cache, if necessary. This would allow the threads of an application to pull the pages local to their node on next touch. I will test this theory... Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org