From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa0-f51.google.com (mail-oa0-f51.google.com [209.85.219.51]) by kanga.kvack.org (Postfix) with ESMTP id 177226B0031 for ; Mon, 27 Jan 2014 19:35:34 -0500 (EST) Received: by mail-oa0-f51.google.com with SMTP id h16so7692057oag.10 for ; Mon, 27 Jan 2014 16:35:33 -0800 (PST) Received: from g6t0187.atlanta.hp.com (g6t0187.atlanta.hp.com. [15.193.32.64]) by mx.google.com with ESMTPS id rk9si5433872obb.51.2014.01.27.16.35.32 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 27 Jan 2014 16:35:32 -0800 (PST) Message-ID: <52E6FB52.3070001@hp.com> Date: Mon, 27 Jan 2014 16:35:30 -0800 From: Chegu Vinod MIME-Version: 1.0 Subject: Re: [PATCH v5 0/9] numa,sched,mm: pseudo-interleaving for automatic NUMA balancing References: <1390860228-21539-1-git-send-email-riel@redhat.com> In-Reply-To: <1390860228-21539-1-git-send-email-riel@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: riel@redhat.com, linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, peterz@infradead.org, mgorman@suse.de, mingo@redhat.com On 1/27/2014 2:03 PM, riel@redhat.com wrote: > The current automatic NUMA balancing code base has issues with > workloads that do not fit on one NUMA load. Page migration is > slowed down, but memory distribution between the nodes where > the workload runs is essentially random, often resulting in a > suboptimal amount of memory bandwidth being available to the > workload. > > In order to maximize performance of workloads that do not fit in one NUMA > node, we want to satisfy the following criteria: > 1) keep private memory local to each thread > 2) avoid excessive NUMA migration of pages > 3) distribute shared memory across the active nodes, to > maximize memory bandwidth available to the workload > > This patch series identifies the NUMA nodes on which the workload > is actively running, and balances (somewhat lazily) the memory > between those nodes, satisfying the criteria above. > > As usual, the series has had some performance testing, but it > could always benefit from more testing, on other systems. > > Changes since v4: > - remove some code that did not help performance > - implement all the cleanups suggested by Mel Gorman > - lots more testing, by Chegu Vinod and myself > - rebase against -tip instead of -next, to make merging easier Acked-by: Chegu Vinod --- The following 1, 2, 4 & 8 socket-wide results on an 8-socket box are an average of 4 runs. I) Eight 1-socket wide instances (10 warehouse threads/instance) a) numactl pinning results throughput = 350720 bops throughput = 355250 bops throughput = 350338 bops throughput = 345963 bops throughput = 344723 bops throughput = 347838 bops throughput = 347623 bops throughput = 347963 bops b) Automatic NUMA balancing results (Avg# page migrations : 10317611) throughput = 319037 bops throughput = 319612 bops throughput = 314089 bops throughput = 317499 bops throughput = 320516 bops throughput = 314905 bops throughput = 315821 bops throughput = 320575 bops c) No Automatic NUMA balancing and NO-pinning results throughput = 175433 bops throughput = 179470 bops throughput = 176262 bops throughput = 162551 bops throughput = 167874 bops throughput = 173196 bops throughput = 172001 bops throughput = 174332 bops ------- II) Four 2-socket wide instances (20 warehouse threads/instance) a) numactl pinning results throughput = 611391 bops throughput = 618464 bops throughput = 612350 bops throughput = 616826 bops b) Automatic NUMA balancing results (Avg# page migrations : 8643581) throughput = 523053 bops throughput = 519375 bops throughput = 502800 bops throughput = 528880 bops c) No Automatic NUMA balancing and NO-pinning results throughput = 334807 bops throughput = 330348 bops throughput = 306250 bops throughput = 309624 bops ------- III) Two 4-socket wide instances (40 warehouse threads/instance) a) numactl pinning results throughput = 946760 bops throughput = 949712 bops b) Automatic NUMA balancing results (Avg# page migrations : 5710932) throughput = 861105 bops throughput = 879878 bops c) No Automatic NUMA balancing and NO-pinning results throughput = 500527 bops throughput = 450884 bops ------- IV) One 8-socket wide instance (80 warehouse threads/instance) a) numactl pinning results throughput = 1199211 bops b) Automatic NUMA balancing results (Avg# page migrations : 3426618) throughput = 1119524 bops c) No Automatic NUMA balancing and NO-pinning results throughput = 789243 bops Thanks Vinod > Changes since v3: > - various code cleanups suggested by Mel Gorman (some in their own patches) > - after some testing, switch back to the NUMA specific CPU use stats, > since that results in a 1% performance increase for two 8-warehouse > specjbb instances on a 4-node system, and reduced page migration across > the board > Changes since v2: > - dropped tracepoint (for now?) > - implement obvious improvements suggested by Peter > - use the scheduler maintained CPU use statistics, drop > the NUMA specific ones for now. We can add those later > if they turn out to be beneficial > Changes since v1: > - fix divide by zero found by Chegu Vinod > - improve comment, as suggested by Peter Zijlstra > - do stats calculations in task_numa_placement in local variables > > > Some performance numbers, with two 40-warehouse specjbb instances > on an 8 node system with 10 CPU cores per node, using a pre-cleanup > version of these patches, courtesy of Chegu Vinod: > > numactl manual pinning > spec1.txt: throughput = 755900.20 SPECjbb2005 bops > spec2.txt: throughput = 754914.40 SPECjbb2005 bops > > NO-pinning results (Automatic NUMA balancing, with patches) > spec1.txt: throughput = 706439.84 SPECjbb2005 bops > spec2.txt: throughput = 729347.75 SPECjbb2005 bops > > NO-pinning results (Automatic NUMA balancing, without patches) > spec1.txt: throughput = 667988.47 SPECjbb2005 bops > spec2.txt: throughput = 638220.45 SPECjbb2005 bops > > No Automatic NUMA and NO-pinning results > spec1.txt: throughput = 544120.97 SPECjbb2005 bops > spec2.txt: throughput = 453553.41 SPECjbb2005 bops > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org