From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFE95CDB482 for ; Thu, 12 Oct 2023 12:17:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E0078D0124; Thu, 12 Oct 2023 08:17:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 390268D0002; Thu, 12 Oct 2023 08:17:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 257E48D0124; Thu, 12 Oct 2023 08:17:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 181AC8D0002 for ; Thu, 12 Oct 2023 08:17:53 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D72EC120568 for ; Thu, 12 Oct 2023 12:17:52 +0000 (UTC) X-FDA: 81336710784.06.02C759E Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by imf02.hostedemail.com (Postfix) with ESMTP id A669980012 for ; Thu, 12 Oct 2023 12:17:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RCyR7wtJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697113070; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X6PZfvjxIZNgAXdaW5DHgUipcr6MUWNYV1OVEaCMr98=; b=W691WpG7XdpK+b8X9zLO5+p9z63HFbRLuABgfdEEp5Yvy+YwYxK+c6YAaTnIfHwg7yQUVt uqFTSFqPO6+p5gJql7ZsVQGkIwIC2AcTKmBxWyPYL9koNxN5SG2C1sMCJAJLsMrAKP3jAw toFzXLQZAgqESqP8QaO2odtQm83t914= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RCyR7wtJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697113070; a=rsa-sha256; cv=none; b=oL7cMEt5JxJ1jbkEwJ71PRcCTdqglOo5FvMXI2iFQA1uJ64eyeGqRPUfXaMWO8hhisYXi3 bvEh1m9VsrK5SjrE7K02HHvg0mLcOvLp7t+5U8g1IBwrcyYZLOPnd5MdW3j8f2iqlbytIf 1IE4abq0X0qXXV0Un+gMBk0I0rN3iOo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697113070; x=1728649070; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=TsWwp6y4lxASPuSdg0tEU4cS1WQ994qMPTC91I8n58M=; b=RCyR7wtJh0NLMHlzlkSBvxlMfShRn/M7+6LfWefFqj6gVDTPdkRf0HxL G2MH/pNJMfxiicfRXwBeGLL/vs6CYe/UJTx6Et3vt709bbE+CfaKF7lxK SahgwaMqbPpuz/CuZRmiTT0Jh8qq3ZFvkeiG753kr31Fzrpk/8OqLsyy9 GYvZNgUEWFuUaIDgA2SITNl37+jgNP6lDzKmGiIfsRntixazphbaNg7cX s2s4ShWNAFgQEq9g0LJeZaF9H0XVg4wS72tfJrpCFqGFDO0tnp2kt34h2 DNDVRNeaNeNMIRHN7AizrhBw2zsmf5lBMowuoUFRVKi++ZBUpFILi3ARD w==; X-IronPort-AV: E=McAfee;i="6600,9927,10861"; a="382139890" X-IronPort-AV: E=Sophos;i="6.03,218,1694761200"; d="scan'208";a="382139890" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2023 05:17:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10861"; a="844977593" X-IronPort-AV: E=Sophos;i="6.03,218,1694761200"; d="scan'208";a="844977593" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2023 05:17:45 -0700 From: "Huang, Ying" To: Mel Gorman Cc: , , Arjan Van De Ven , Andrew Morton , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , "Christoph Lameter" Subject: Re: [PATCH 04/10] mm: restrict the pcp batch scale factor to avoid too long latency References: <20230920061856.257597-1-ying.huang@intel.com> <20230920061856.257597-5-ying.huang@intel.com> <20231011125219.kuoluyuwxzva5q5w@techsingularity.net> Date: Thu, 12 Oct 2023 20:15:42 +0800 In-Reply-To: <20231011125219.kuoluyuwxzva5q5w@techsingularity.net> (Mel Gorman's message of "Wed, 11 Oct 2023 13:52:19 +0100") Message-ID: <878r88f34h.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: A669980012 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 7559coddiokj9dg3kt5ubzxb7ft4jwcb X-HE-Tag: 1697113070-83967 X-HE-Meta: U2FsdGVkX1+3o4kPkn3PWwfBS04UznCF/PWYakzCRxqouAuitIUZV63iNEmwccveJbws5R3ybMwraNWD0IVKTRCO4pmBJhx/tdNbP1CBxjtnLUq+8LkiygFyW0JdUpIIhBiWtG2v65oi+k+I/OB8bTN4mX2rQN8blE44/UnuBZWBsj9tQpVbpIp/iNovwyRfJTR8/6IQC6s79c3P+AsAWFlYHBJliBFvyBflOV36zZul7v7htLdtuUe/S0g1ehJgFMu3Tvc1B9HxiPJjxOtSzm2ebq3vyQZ1BG0iZr6jqB2Jn3iGT42Jm2RYaALdiansQ6kYdjbjG4XXLcv2f5cqbjxajnVZ8GBmVE0kk0LowP+xx8wwDaftwQfDQAVWZ6W8QVg3EDBwcgmqGTVdrFuJoy3x8yRhRuk9xuHUn2oBzdG1PysZJ5aQMTD0QgANTQI9SaRmM29yISb2LfksN+wgqjq4fipn0BdAj+GX0pc16R2uTjwAw/IZ2CAQGIRPHcpHr3ABI08xmjpFeeyYFJoupi52n+erw7Lo2qCga4y2wX8heig4XsJt5mRtcyoqLnwvuFIrVoKxe30DCVzwPc5D41Q5ZVdaijrKoRnQKRxMfVM5kyu4srupfOflRj9ZfiVkV31C3mEw1f0lXJtoWrDw0O2diQjU/AEIcixhh79Q3lpU4jMo1Ik4k98PwStobuZxW87eQWPnVWfQ5nBF/SlT3NBG3o7fJD9tDlr4WSK+mxNvHDkhnScVzv6I5M7GtvPYmUSBV+k/j0TjB8hOpYdHNsagP48/zsvlzxCq/y3GeUWqbj8LpYXHPNrIxD+Q8zqS24+nam2dnYKkOxlXpUesXzOz+7nxA1OfQzgyrhDNEw7uMKTiqWB8W5N6hIQRahWqwlo7hRj3LLZuwIsUbjW1Gqni2oyGp9gL7HL7lj+M8XBfeaf8OClSsm5QDvCEDeUT42FdrInprR7qz+NMEO6 Kw42i727 YwQcAz4hCsyzaiAOAcriaAJYWbw3A9u/mv5EKaPdLC2F1VTo+/tf6FbYxQYPaJdq609Blk1SAoaRsl890XtQYL5jf15d5cSffWi6z/EBAAYYFGhruGBzRbsLq7SVogaTZxuFer5O9XtbfmE+q06zAllzY5vyHmhor4JCQG5bdD02gL0QtWFBbSDSvYl2ddr//oyXz4R54bfpuf9tfH5kAE3YRCA9kmBwCzQJK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mel Gorman writes: > On Wed, Sep 20, 2023 at 02:18:50PM +0800, Huang Ying wrote: >> In page allocator, PCP (Per-CPU Pageset) is refilled and drained in >> batches to increase page allocation throughput, reduce page >> allocation/freeing latency per page, and reduce zone lock contention. >> But too large batch size will cause too long maximal >> allocation/freeing latency, which may punish arbitrary users. So the >> default batch size is chosen carefully (in zone_batchsize(), the value >> is 63 for zone > 1GB) to avoid that. >> >> In commit 3b12e7e97938 ("mm/page_alloc: scale the number of pages that >> are batch freed"), the batch size will be scaled for large number of >> page freeing to improve page freeing performance and reduce zone lock >> contention. Similar optimization can be used for large number of >> pages allocation too. >> >> To find out a suitable max batch scale factor (that is, max effective >> batch size), some tests and measurement on some machines were done as >> follows. >> >> A set of debug patches are implemented as follows, >> >> - Set PCP high to be 2 * batch to reduce the effect of PCP high >> >> - Disable free batch size scaling to get the raw performance. >> >> - The code with zone lock held is extracted from rmqueue_bulk() and >> free_pcppages_bulk() to 2 separate functions to make it easy to >> measure the function run time with ftrace function_graph tracer. >> >> - The batch size is hard coded to be 63 (default), 127, 255, 511, >> 1023, 2047, 4095. >> >> Then will-it-scale/page_fault1 is used to generate the page >> allocation/freeing workload. The page allocation/freeing throughput >> (page/s) is measured via will-it-scale. The page allocation/freeing >> average latency (alloc/free latency avg, in us) and allocation/freeing >> latency at 99 percentile (alloc/free latency 99%, in us) are measured >> with ftrace function_graph tracer. >> >> The test results are as follows, >> >> Sapphire Rapids Server >> ====================== >> Batch throughput free latency free latency alloc latency alloc latency >> page/s avg / us 99% / us avg / us 99% / us >> ----- ---------- ------------ ------------ ------------- ------------- >> 63 513633.4 2.33 3.57 2.67 6.83 >> 127 517616.7 4.35 6.65 4.22 13.03 >> 255 520822.8 8.29 13.32 7.52 25.24 >> 511 524122.0 15.79 23.42 14.02 49.35 >> 1023 525980.5 30.25 44.19 25.36 94.88 >> 2047 526793.6 59.39 84.50 45.22 140.81 >> >> Ice Lake Server >> =============== >> Batch throughput free latency free latency alloc latency alloc latency >> page/s avg / us 99% / us avg / us 99% / us >> ----- ---------- ------------ ------------ ------------- ------------- >> 63 620210.3 2.21 3.68 2.02 4.35 >> 127 627003.0 4.09 6.86 3.51 8.28 >> 255 630777.5 7.70 13.50 6.17 15.97 >> 511 633651.5 14.85 22.62 11.66 31.08 >> 1023 637071.1 28.55 42.02 20.81 54.36 >> 2047 638089.7 56.54 84.06 39.28 91.68 >> >> Cascade Lake Server >> =================== >> Batch throughput free latency free latency alloc latency alloc latency >> page/s avg / us 99% / us avg / us 99% / us >> ----- ---------- ------------ ------------ ------------- ------------- >> 63 404706.7 3.29 5.03 3.53 4.75 >> 127 422475.2 6.12 9.09 6.36 8.76 >> 255 411522.2 11.68 16.97 10.90 16.39 >> 511 428124.1 22.54 31.28 19.86 32.25 >> 1023 414718.4 43.39 62.52 40.00 66.33 >> 2047 429848.7 86.64 120.34 71.14 106.08 >> >> Commet Lake Desktop >> =================== >> Batch throughput free latency free latency alloc latency alloc latency >> page/s avg / us 99% / us avg / us 99% / us >> ----- ---------- ------------ ------------ ------------- ------------- >> >> 63 795183.13 2.18 3.55 2.03 3.05 >> 127 803067.85 3.91 6.56 3.85 5.52 >> 255 812771.10 7.35 10.80 7.14 10.20 >> 511 817723.48 14.17 27.54 13.43 30.31 >> 1023 818870.19 27.72 40.10 27.89 46.28 >> >> Coffee Lake Desktop >> =================== >> Batch throughput free latency free latency alloc latency alloc latency >> page/s avg / us 99% / us avg / us 99% / us >> ----- ---------- ------------ ------------ ------------- ------------- >> 63 510542.8 3.13 4.40 2.48 3.43 >> 127 514288.6 5.97 7.89 4.65 6.04 >> 255 516889.7 11.86 15.58 8.96 12.55 >> 511 519802.4 23.10 28.81 16.95 26.19 >> 1023 520802.7 45.30 52.51 33.19 45.95 >> 2047 519997.1 90.63 104.00 65.26 81.74 >> >> From the above data, to restrict the allocation/freeing latency to be >> less than 100 us in most times, the max batch scale factor needs to be >> less than or equal to 5. >> >> So, in this patch, the batch scale factor is restricted to be less >> than or equal to 5. >> >> Signed-off-by: "Huang, Ying" > > Acked-by: Mel Gorman > > However, it's worth noting that the time to free depends on the CPU and > while the CPUs you tested are reasonable, there are also slower CPUs out > there and I've at least one account that the time is excessive. While > this patch is fine, there may be a patch on top that makes this runtime > configurable, a Kconfig default or both. Sure. Will add a Kconfig option first in a follow-on patch. -- Best Regards, Huang, Ying