From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f197.google.com (mail-wj0-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id B1EBB6B0069 for ; Mon, 28 Nov 2016 06:45:47 -0500 (EST) Received: by mail-wj0-f197.google.com with SMTP id j10so20086878wjb.3 for ; Mon, 28 Nov 2016 03:45:47 -0800 (PST) Received: from outbound-smtp03.blacknight.com (outbound-smtp03.blacknight.com. [81.17.249.16]) by mx.google.com with ESMTPS id w133si25442364wmf.59.2016.11.28.03.45.46 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 28 Nov 2016 03:45:46 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp03.blacknight.com (Postfix) with ESMTPS id 6BCA598B6A for ; Mon, 28 Nov 2016 11:45:44 +0000 (UTC) Date: Mon, 28 Nov 2016 11:45:43 +0000 From: Mel Gorman Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3 Message-ID: <20161128114543.h5e3e7tbmq56eis6@techsingularity.net> References: <20161127131954.10026-1-mgorman@techsingularity.net> <5621b386-ee65-0fa5-e217-334924412c7f@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5621b386-ee65-0fa5-e217-334924412c7f@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Vlastimil Babka Cc: Andrew Morton , Christoph Lameter , Michal Hocko , Johannes Weiner , Linux-MM , Linux-Kernel On Mon, Nov 28, 2016 at 12:00:41PM +0100, Vlastimil Babka wrote: > On 11/27/2016 02:19 PM, Mel Gorman wrote: > > > > 2-socket modern machine > > 4.9.0-rc5 4.9.0-rc5 > > vanilla hopcpu-v3 > > Hmean send-64 178.38 ( 0.00%) 256.74 ( 43.93%) > > Hmean send-128 351.49 ( 0.00%) 507.52 ( 44.39%) > > Hmean send-256 671.23 ( 0.00%) 1004.19 ( 49.60%) > > Hmean send-1024 2663.60 ( 0.00%) 3910.42 ( 46.81%) > > Hmean send-2048 5126.53 ( 0.00%) 7562.13 ( 47.51%) > > Hmean send-3312 7949.99 ( 0.00%) 11565.98 ( 45.48%) > > Hmean send-4096 9433.56 ( 0.00%) 12929.67 ( 37.06%) > > Hmean send-8192 15940.64 ( 0.00%) 21587.63 ( 35.43%) > > Hmean send-16384 26699.54 ( 0.00%) 32013.79 ( 19.90%) > > Hmean recv-64 178.38 ( 0.00%) 256.72 ( 43.92%) > > Hmean recv-128 351.49 ( 0.00%) 507.47 ( 44.38%) > > Hmean recv-256 671.20 ( 0.00%) 1003.95 ( 49.57%) > > Hmean recv-1024 2663.45 ( 0.00%) 3909.70 ( 46.79%) > > Hmean recv-2048 5126.26 ( 0.00%) 7560.67 ( 47.49%) > > Hmean recv-3312 7949.50 ( 0.00%) 11564.63 ( 45.48%) > > Hmean recv-4096 9433.04 ( 0.00%) 12927.48 ( 37.04%) > > Hmean recv-8192 15939.64 ( 0.00%) 21584.59 ( 35.41%) > > Hmean recv-16384 26698.44 ( 0.00%) 32009.77 ( 19.89%) > > > > 1-socket 6 year old machine > > 4.9.0-rc5 4.9.0-rc5 > > vanilla hopcpu-v3 > > Hmean send-64 87.47 ( 0.00%) 127.14 ( 45.36%) > > Hmean send-128 174.36 ( 0.00%) 256.42 ( 47.06%) > > Hmean send-256 347.52 ( 0.00%) 509.41 ( 46.59%) > > Hmean send-1024 1363.03 ( 0.00%) 1991.54 ( 46.11%) > > Hmean send-2048 2632.68 ( 0.00%) 3759.51 ( 42.80%) > > Hmean send-3312 4123.19 ( 0.00%) 5873.28 ( 42.45%) > > Hmean send-4096 5056.48 ( 0.00%) 7072.81 ( 39.88%) > > Hmean send-8192 8784.22 ( 0.00%) 12143.92 ( 38.25%) > > Hmean send-16384 15081.60 ( 0.00%) 19812.71 ( 31.37%) > > Hmean recv-64 86.19 ( 0.00%) 126.59 ( 46.87%) > > Hmean recv-128 173.93 ( 0.00%) 255.21 ( 46.73%) > > Hmean recv-256 346.19 ( 0.00%) 506.72 ( 46.37%) > > Hmean recv-1024 1358.28 ( 0.00%) 1980.03 ( 45.77%) > > Hmean recv-2048 2623.45 ( 0.00%) 3729.35 ( 42.15%) > > Hmean recv-3312 4108.63 ( 0.00%) 5831.47 ( 41.93%) > > Hmean recv-4096 5037.25 ( 0.00%) 7021.59 ( 39.39%) > > Hmean recv-8192 8762.32 ( 0.00%) 12072.44 ( 37.78%) > > Hmean recv-16384 15042.36 ( 0.00%) 19690.14 ( 30.90%) > > That looks way much better than the "v1" RFC posting. Was it just because > you stopped doing the "at first iteration, use migratetype as index", and > initializing pindex UINT_MAX hits so much quicker, or was there something > more subtle that I missed? There was no changelog between "v1" and "v2". > The array is sized correctly which avoids one useless check. The order-0 lists are always drained first so in some rare cases, only the fast paths are used. There was a subtle correction in detecting when all of one list should be drained. In combination, it happened to boost performance a lot on the two machines I reported on. While 6 other machines were tested, not all of them saw such a dramatic boost and if these machines are rebooted and retested every time, the high performance is not always consistent, it all depends on how often the fast paths are used. > > Signed-off-by: Mel Gorman > > Acked-by: Vlastimil Babka > Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org