From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBD57EB64D9 for ; Wed, 12 Jul 2023 07:47:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 529986B0072; Wed, 12 Jul 2023 03:47:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D9BB6B0075; Wed, 12 Jul 2023 03:47:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C8B06B0078; Wed, 12 Jul 2023 03:47:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2D7806B0072 for ; Wed, 12 Jul 2023 03:47:54 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E68BCC0180 for ; Wed, 12 Jul 2023 07:47:53 +0000 (UTC) X-FDA: 81002180826.06.1068E35 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf24.hostedemail.com (Postfix) with ESMTP id 35313180018 for ; Wed, 12 Jul 2023 07:47:50 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=P56y7YF2; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689148072; a=rsa-sha256; cv=none; b=dff5+YJnKwacgLJnNejthjsCdHy1NlqPSvyCn5Dekev8pQwQD9MSh6hk6DtOGAgSP6tdDE qmJ3vZyhCQYXHWm33szur+WVtmEvOgXJQPFElBKur5kA6D2X4nY6PA2CFbTbuOOolYONBi hJ3h+3qptQO3EjJfqAf+6RrFjVmY55c= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=P56y7YF2; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689148072; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UsHzUzwb9ykHuEDle9hbMIArhMe6jzs2Xjf/yfGXM30=; b=kLLb6U2UPYpiI8aAtwzT0yS9WIN/MVLtyCaUr6UhH4j5TisH7QHv93XVWDtg6wS98G5uxg VOI3ZK2hO6akd/0BcYwHXCO05mkakfulhku+/cPRiL//KFskK/vbcrMq+fa6YRLUSHbFEN gU1AgOrQ4G+ddeXBIiJfWjtxGl+ci8I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689148071; x=1720684071; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=cOEw6AgPn6Q4ypR95DKCI0xD1KSp63jur3bY44nW8GM=; b=P56y7YF2ebvv471zCCCRihq/kCKXnhbI2aF+mWH9SANKApShEy24PYZD nqV9opHxw/Pdt1iJecBqJIo83yVt7FGexqnnicmTghkl+tdZIzI3JljUZ F6iO5l0gWPa3osvCkOwZvAbzCHqIDPR2+CHMm0fOgsTtuH1ILUzBGxz3v SLarZ7D4mVyX5j1yGU3wPB7pVopqYOCoQTt9Ew9q1heagWLdFe7wXOAeq anSobVphb25Q1vb41m2vqIA2sVzcR/Kc6RPiNiVZj5jcJJgq/5oU8yAqc dQHmRDsxIbAIfYc4q6Do2+Pd61adI1b+dqDGex3Z1eUh5uUFsd0QNLD7M g==; X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="354734268" X-IronPort-AV: E=Sophos;i="6.01,199,1684825200"; d="scan'208";a="354734268" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2023 00:47:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10768"; a="866023618" X-IronPort-AV: E=Sophos;i="6.01,199,1684825200"; d="scan'208";a="866023618" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jul 2023 00:47:44 -0700 From: "Huang, Ying" To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Pavel Tatashin , Matthew Wilcox Subject: Re: [RFC 1/2] mm: add framework for PCP high auto-tuning References: <20230710065325.290366-1-ying.huang@intel.com> <20230710065325.290366-2-ying.huang@intel.com> Date: Wed, 12 Jul 2023 15:45:58 +0800 In-Reply-To: (Michal Hocko's message of "Tue, 11 Jul 2023 13:07:17 +0200") Message-ID: <87edldefnt.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 35313180018 X-Stat-Signature: jozkodua66iszrfurnnmx6phuiupjcgy X-Rspam-User: X-HE-Tag: 1689148070-785462 X-HE-Meta: U2FsdGVkX18ssKzpUI3H/h69GMh5cgdDA6SqW/6k1A9WJcta/QhMj/nQ41SPt+z8WURdpsJFTDOCZerMyvpZLCWR7klf6Ulu+o64AuNp8Q7qf5340h0bPvfB6RK5a5TvwDvah3xs3b/iVafjT1peAl08x2XrkxPIE/uqlF1I/qzcq6DJw7WHNNycadA/qf93dZskYgWCMhcKJaJuouaRSwqTbp6w6z9pmrdNsqnpaCrHf7dsxZAFE++u+tVVo5Vu2Yt2YpewSrgZNQMmFX2QBnu4jGWX3s5LkNoKNzLoEamUjjrbRFzJfOf78KWDC4TOuLpcvhNwpk75J09JVDgBoWLRY6GS0M0nw8FlZLmCbxEZAbaDv6xhUC9B1jOY+KXC6kRIvapJ69sEQOokZ6Ul4+yS+7+XlL15ivKnKJ3DDZtMvmvPLkHN8Z0QxkOTE8FmlFQtJilarCjUy3lL7kexb7yG4DzDujmeWG7ySKqooBUPv6mfvE69qUqVXYzNF+A/b/v/GWP5yvDlCNyUbuInXsy8H8qccsd3+jsfmS0uY/4GKMKc7+Fl9KgPhP2N1uuUJHpY1HJYJZFzVX64MlDgZjYjr2y/BN+4mCtNWfAs8lIo0AnMHgjz5x3vjGvn43UsmnOwVJB5JHTQDs/kc0ssNNHKVl0cJbPjjDhGI77ePHVa7c5nyPSHkMv9/7i4s/VYmF3qa+GDQTVw6Ubwswe//cbGuu0Qyo9uPMuRn76nwmEDSxbZlrSzlnqzuuOGWVgFV0oIcED2tWqCBd7IDmgCSIDAq29o39xNlO3WJKfzikSf/MAEaog+MXwSwC1l/gVv//dVXAgG7g7eYiX597bEj2gNkPP7rY7JOg3dH/xuCu2S8CAj3NjvlZ0kiy2wFxBor2E8OWrEnKu7tRAIVoItAmH/meVFDxRLUweMdE7Z47BiixIgTOIgbqV4PqnKuKp94zwon1OyQG3igQbJISc WOLYFxg0 jhr4quh6877vwMOioCrg8DOOsZIYW9VHJVHzvtLXqWsQreb/xqG+NpGEwnfClpIh8IEvSjxhQn0DO9qOAt96XxHKonsJhYYZ/tsZ9usHXv67RcToI9c4wa6Y3CNvuN7lWUaBZPJPyhaYT6I5M4SEj9/SAXwnUF2cVHr+uMT9yxm3bBQPHD5TktDADA18n3m6qXtUGjh5iiSLuHsLAPnCMfTwTyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Michal Hocko writes: > On Mon 10-07-23 14:53:24, Huang Ying wrote: >> The page allocation performance requirements of different workloads >> are usually different. So, we often need to tune PCP (per-CPU >> pageset) high to optimize the workload page allocation performance. >> Now, we have a system wide sysctl knob (percpu_pagelist_high_fraction) >> to tune PCP high by hand. But, it's hard to find out the best value >> by hand. And one global configuration may not work best for the >> different workloads that run on the same system. One solution to >> these issues is to tune PCP high of each CPU automatically. >> >> This patch adds the framework for PCP high auto-tuning. With it, >> pcp->high will be changed automatically by tuning algorithm at >> runtime. Its default value (pcp->high_def) is the original PCP high >> value calculated based on low watermark pages or >> percpu_pagelist_high_fraction sysctl knob. To avoid putting too many >> pages in PCP, the original limit of percpu_pagelist_high_fraction >> sysctl knob, MIN_PERCPU_PAGELIST_HIGH_FRACTION, is used to calculate >> the max PCP high value (pcp->high_max). > > It would have been very helpful to describe the basic entry points to > the auto-tuning. AFAICS the central place of the tuning is tune_pcp_high > which is called from the freeing path. Why? Is this really a good place > considering this is a hot path? What about the allocation path? Isn't > that a good spot to watch for the allocation demand? Yes. The main entry point to the auto-tuning is tune_pcp_high(). Which is called from the freeing path because pcp->high is only used by page freeing. It's possible to call it in allocation path instead. The drawback is that the pcp->high may be updated a little later in some situations. For example, if there are many page freeing but no page allocation for quite long time. But I don't think this is a serious problem. > Also this framework seems to be enabled by default. Is this really > desirable? What about workloads tuning the pcp batch size manually? > Shouldn't they override any auto-tuning? In the current implementation, the pcp->high will be tuned between original pcp high (default or tuned manually) and the max pcp high (via MIN_PERCPU_PAGELIST_HIGH_FRACTION). So the high value tuned manually is respected at some degree. So you think that it's better to disable auto-tuning if PCP high is tuned manually? Best Regards, Huang, Ying