From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FAE8C04AAF for ; Wed, 20 Sep 2023 16:41:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF7886B00CE; Wed, 20 Sep 2023 12:41:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA7ED6B00D2; Wed, 20 Sep 2023 12:41:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D21DA6B00D6; Wed, 20 Sep 2023 12:41:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BE33A6B00CE for ; Wed, 20 Sep 2023 12:41:24 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8AC3B1CA8F7 for ; Wed, 20 Sep 2023 16:41:24 +0000 (UTC) X-FDA: 81257541288.12.8019DEB Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf11.hostedemail.com (Postfix) with ESMTP id 93C624000E for ; Wed, 20 Sep 2023 16:41:22 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=2C1qKJNp; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695228082; a=rsa-sha256; cv=none; b=IClcYUj1brXSXiPElqfcxBgYe5Jq4xviVGW0SBXGa7YEj193KPTYGe6sReVQjIYa9oCqKj wyZeEnO/cW1xgDaVkfxUlGugN4LXyAo7YrUn3kleIksW6acUTgp6TSgOTiMKyYvbVuEQTN GwpuWgKh0MpHJU9CH+phc2yEeDOt1l8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=2C1qKJNp; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695228082; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6aeHRfEhdq8QMSJ4pXwn2360L52FQVCRrJ5zwq3BruA=; b=bDwy3/cQg4/FOmg4XqEt/E1/BMW5567aCGzt2VPo5ZROA2NLVvJKuo5UzfIsY1YVP60+3E 69rjQmcDfGKwAmLLhnCqfyUeDhPxMVdUGOMJOxIF3yIDYIN4RZ1dGm+n5VS9Pj7VR27L7z veO5AtCABAFUvMGqG78p/36lUP+P4gw= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D4882B81DD2; Wed, 20 Sep 2023 16:41:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1AF03C433C7; Wed, 20 Sep 2023 16:41:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1695228079; bh=VX2oHOVCDzGT6jqqR+drIuZN91AqPApS73azW2k4k0I=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=2C1qKJNp1nBdIrGsK/QkVP7rD/18cjSBvs9sXLfbaZtMJLBRTvRllW3mEj1OZfDoA 6YVvm5XRPz57CI5PkU/vcpB2W8OXkykA4r4OGGoC9kmpNCfLnlXIxXC/hF/OSn+K7a mB3zsHxu0+UCMQUa7v4NX4swx1Ls553GaRSDk2Nk= Date: Wed, 20 Sep 2023 09:41:18 -0700 From: Andrew Morton To: Huang Ying Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: Re: [PATCH 00/10] mm: PCP high auto-tuning Message-Id: <20230920094118.8b8f739125c6aede17c627e0@linux-foundation.org> In-Reply-To: <20230920061856.257597-1-ying.huang@intel.com> References: <20230920061856.257597-1-ying.huang@intel.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 93C624000E X-Stat-Signature: zx6xz44eid1d9kgpdeft4eu8hk5re6du X-Rspam-User: X-HE-Tag: 1695228082-340322 X-HE-Meta: U2FsdGVkX18IN6xJXr3I/uvysharSK2mF4XYJ/8ZxyFCdwlGxOTeZscDRzNACA1v+DBfN+8WCXHsYim/iANxAtZ78xPVD8261RC7Kmy1e+Ka4tXC1uvKg+aEt7pixvaGpHPs0EbvROX82sflYegzU+KFBYp3IUmkzYNuJQKKT419BS7DIO4Bdaxsmnn11tJ8JulnmU1xAQkJUfrrKTzQryqP9vsOZ729CDgP0P2szay41JB+Simvf6U9YelaaFiG0NzeskhgsqhozboRuAkaYPBH0lKNKQGgBZSDasvM64e/1Y6sQvDr6v4HOXwSGytXJ8oaBa372hfxy3sSjuJcpN6b6OLWcRW8gy/W0eZg9nZVyGcEJG8yKFfCzJ3Z6o6D1q+dTJ1OAZn8C3QsTGybEz/212NTQv881a4i5Gv+aImf7kwYXhu8um3iPUXvb66j5q5zJp+8FVu3k3cFM/n9f2hxypBWIjX9fNco3mEllib0WxeM1YQRZBtW8iWQpAapxwbGv2xgGTjghgttFyTXiGvKS+dVwnHXFgmUGkIPuuWTev6l85tmyi4GX0D0dmZw2Qedb0xuEwS4Cfh1JzLxLuhWmNtnvAeTrGvFUyOJ6E4fOxO5edkYBDr0Ep+BlHVWubDyAM6zI0SHcV/vGrXjU+sIYanOxyR+m3uQyEqHBYdt/rlYqCMgV2JY8S/cL4+Zbwh1znC0gR/xfx0Sz3tmMpUokC5f08iMQI77pwHFa6WTfH01L1tQpYX/kOZf8s8jVgCWh27E8OK6Esn/AlHyJ7F4QUoh5kGbRlwRxB+H1KiTQwEO7NY3qxBxPrrSxxAafN/4Z880cxjTevM2vHpJCvJKyVsneqNzayCJ0ZuZ32fjvemdfsCzb0fZcoukyWU3KD4tcQMeWm2oDl0NXoh2+vnqeovVO9w9LH/v/E+xrXWszLP9WriAOciS5SXKGb92ANa1oCxKUjo4/cEqZIe PLLz+OiI vSVGDpaLB9wILqjwObc8hSq68zIn1QnS7QSGrh+2fZFuCQz3hq2uQ+OHnxdalPZaG6TGYoh2SIp1+7B6/f5xeHCPcnkkgqazPwS6RM6yQ3xOMJ6SFwJ09i2fqQeWbBDWavp7ru4e/k+taXTqCKHBO7X54HPtvJ2HtNA867UORA7oWgmdc4CvDrOdiiiJ3BjTD9AsaOCNOSes+wEVcqP/hziCRmljdy8LF8KCtY5X/sOd8vtk4oOOE/CezKcurL0kHm6I4Q5BWbq9EgGlLLeZWcR7Z/Mn1ROZXIYw3BhbwEF3nV6rVJUkjuAqW3ZxuZIwW+vVYfjjKYF9zREF9kaZeiWC0Hf0LmwZUM8yA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 20 Sep 2023 14:18:46 +0800 Huang Ying wrote: > The page allocation performance requirements of different workloads > are often different. So, we need to tune the PCP (Per-CPU Pageset) > high on each CPU automatically to optimize the page allocation > performance. Some of the performance changes here are downright scary. I've never been very sure that percpu pages was very beneficial (and hey, I invented the thing back in the Mesozoic era). But these numbers make me think it's very important and we should have been paying more attention. > The list of patches in series is as follows, > > 1 mm, pcp: avoid to drain PCP when process exit > 2 cacheinfo: calculate per-CPU data cache size > 3 mm, pcp: reduce lock contention for draining high-order pages > 4 mm: restrict the pcp batch scale factor to avoid too long latency > 5 mm, page_alloc: scale the number of pages that are batch allocated > 6 mm: add framework for PCP high auto-tuning > 7 mm: tune PCP high automatically > 8 mm, pcp: decrease PCP high if free pages < high watermark > 9 mm, pcp: avoid to reduce PCP high unnecessarily > 10 mm, pcp: reduce detecting time of consecutive high order page freeing > > Patch 1/2/3 optimize the PCP draining for consecutive high-order pages > freeing. > > Patch 4/5 optimize batch freeing and allocating. > > Patch 6/7/8/9 implement and optimize a PCP high auto-tuning method. > > Patch 10 optimize the PCP draining for consecutive high order page > freeing based on PCP high auto-tuning. > > The test results for patches with performance impact are as follows, > > kbuild > ====== > > On a 2-socket Intel server with 224 logical CPU, we tested kbuild on > one socket with `make -j 112`. > > build time zone lock% free_high alloc_zone > ---------- ---------- --------- ---------- > base 100.0 43.6 100.0 100.0 > patch1 96.6 40.3 49.2 95.2 > patch3 96.4 40.5 11.3 95.1 > patch5 96.1 37.9 13.3 96.8 > patch7 86.4 9.8 6.2 22.0 > patch9 85.9 9.4 4.8 16.3 > patch10 87.7 12.6 29.0 32.3 You're seriously saying that kbuild got 12% faster? I see that [07/10] (autotuning) alone sped up kbuild by 10%? Other thoughts: - What if any facilities are provided to permit users/developers to monitor the operation of the autotuning algorithm? - I'm not seeing any Documentation/ updates. Surely there are things we can tell users? - This: : It's possible that PCP high auto-tuning doesn't work well for some : workloads. So, when PCP high is tuned by hand via the sysctl knob, : the auto-tuning will be disabled. The PCP high set by hand will be : used instead. Is it a bit hacky to disable autotuning when the user alters pcp-high? Would it be cleaner to have a separate on/off knob for autotuning? And how is the user to determine that "PCP high auto-tuning doesn't work well" for their workload?