From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70F90C77B7C
	for <linux-mm@archiver.kernel.org>; Fri, 12 May 2023 02:56:42 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CCE186B0071; Thu, 11 May 2023 22:56:41 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id C7EB36B0074; Thu, 11 May 2023 22:56:41 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B94316B0075; Thu, 11 May 2023 22:56:41 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id AAC1F6B0071
	for <linux-mm@kvack.org>; Thu, 11 May 2023 22:56:41 -0400 (EDT)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 6EDCFA0DC0
	for <linux-mm@kvack.org>; Fri, 12 May 2023 02:56:41 +0000 (UTC)
X-FDA: 80780090202.04.C143FF1
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
	by imf04.hostedemail.com (Postfix) with ESMTP id 2CEF440006
	for <linux-mm@kvack.org>; Fri, 12 May 2023 02:56:37 +0000 (UTC)
Authentication-Results: imf04.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=AxtuGq8g;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1683860199;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=EuYZYko4XhY0WrBaSapxzH4IQtJP77yEEd41d9bS65U=;
	b=UqFF6gFPq1Q/AcwuiRRFkslXPuQLPvnqGMNefV9phU/+kzJ2oGD8HXg82r1FDfYbwICH7P
	L52rMPKZFJ+Aa6u/i0grvwiYur/SbN7MJ7thmbEf9jy6Eddo3+UdRw6hrsSd9278MpUUTg
	LgX+OYsOnsq7NLDX6llESyrIbV4nBGI=
ARC-Authentication-Results: i=1;
	imf04.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=AxtuGq8g;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf04.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683860199; a=rsa-sha256;
	cv=none;
	b=04zlhQl6rsuEu8hj4mOCedT7D/b7m7tS7RlND6Ybdj5rVHtzrXqPp1qRMHO29PeNuwUfmw
	zH3xn8Kq/e9K2KZREHkhn55BHEt+e8ahkK7ieDtQNm0oCarepxnMmw3oKSJXHCpG392XF9
	zkYXfJrvR6rj6s13k71Dm4vwsvoplsg=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1683860198; x=1715396198;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=JQwFcMEcOc46Dt/Yt7xAWILwrd7bAWYMMUprblli39I=;
  b=AxtuGq8gaNzvRzTpJ3ooDyRZcR4DMUBpJ4oihjm5G8dOwgbjA5arOfX6
   Sdtma/fQTg91XURoY4RbdvDGoOh2N/96U1b2sfToNzVAYfGExO2myuR/3
   /YlqkQhB6lmv8KQrPI6UKMXH+3DoWj6997eX461fFH8ldSG5Zl6kRn8so
   HmXxLIdn3P4xTmTw8aWSEqOS6JmODFbXxBmv2PFzmZHD42VNeiyNx4SEu
   Cfd0ZgSTw3gwZhkZJD+ZLfsahRYaj4v4sl5R8pFnpD8u/saZci4aepaLZ
   JOylb3tz2AaK0REVkLEAcQ/DxHxcQPbE5zNDEokUSTCoqytQ1K61j5/s3
   w==;
X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="339996971"
X-IronPort-AV: E=Sophos;i="5.99,269,1677571200"; 
   d="scan'208";a="339996971"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
  by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 19:56:34 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="811888990"
X-IronPort-AV: E=Sophos;i="5.99,269,1677571200"; 
   d="scan'208";a="811888990"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 19:56:31 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org,  linux-kernel@vger.kernel.org,  Arjan Van De Ven
 <arjan@linux.intel.com>,  Andrew Morton <akpm@linux-foundation.org>,  Mel
 Gorman <mgorman@techsingularity.net>,  Vlastimil Babka <vbabka@suse.cz>,
  David Hildenbrand <david@redhat.com>,  Johannes Weiner
 <jweiner@redhat.com>,  Dave Hansen <dave.hansen@linux.intel.com>,  Pavel
 Tatashin <pasha.tatashin@soleen.com>,  Matthew Wilcox
 <willy@infradead.org>
Subject: Re: [RFC 0/6] mm: improve page allocator scalability via splitting
 zones
References: <20230511065607.37407-1-ying.huang@intel.com>
	<ZF0ET82ajDbFrIw/@dhcp22.suse.cz>
Date: Fri, 12 May 2023 10:55:21 +0800
In-Reply-To: <ZF0ET82ajDbFrIw/@dhcp22.suse.cz> (Michal Hocko's message of
	"Thu, 11 May 2023 17:05:51 +0200")
Message-ID: <87r0rm8die.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Stat-Signature: i74d36yrdyg5a1q48kaeoyua81y43n3b
X-Rspam-User: 
X-Rspamd-Queue-Id: 2CEF440006
X-Rspamd-Server: rspam07
X-HE-Tag: 1683860197-739388
X-HE-Meta: U2FsdGVkX18+MQNOY6tZXRwWv+0yl/Ct6BDHxmMhLjTCGzuRlAoBLez2oOMlb+LhkxJ/Kkl8KmKDn8+JMniItZWb/ehl5umE4qJsLj8Uc9Qtwh0pked+CKyh8eGUT9BsOTUKiMCJuG3Xb6+RrP9qAkUWHVtszn1mz0XC+dvutAW5BWzpW1lOjEGc0lm4kJXpbtYOBgNwcWZTjF32/XxOVJ8p+0eXayhSRFhRT7Im0Jj25yD1groR+PRIFrMbtwUqFgT+ItsSxTPL0Jcu2U7eiukzpe0rQqxuxrgEpsKZyj6z5DsxbpD3JT4KAhhncDgNqxm7EBdHFd1CE/IhNOYkw9109J5EClizpccCT0NVJE6Dg0y2nQ/1WCs8HF3rcTOOr2ttSDP5mCcPdrgrU5TdmLDiXN+7FaF4G+ZOT6MN+fN4+NwXL0Eo5wdCp+lx5mPVbB1Zbs71un28G2oRsemyE0zwKdI3Ix3F4JRtyyDy54Klt+WFB5m2rVtsKK+BZmSKKrQDPLq8k3xlBVGwhhcV+z8pbouR0rI9KiE/VPtFRxqOuIy0L5qRsFiDcUmt+ytKWSVccpEtw5SiIOnHTx0oiI1pro+Bv6HE5hkdvivI7qQxYK7WMw10wZL2e5zEYdD0cfVg46sAJl4d/1yLukig9vyXkUfpRgiV7WlQlzakNUSEmc56CQN5yS4BpGNu7nMoRlHQbuBZApI5TWMYoema5quMv7QUkPz6sZGfCQljKmV6/qf9UNxcj3nTyeCP6EenKfbz2t8mcr2TmjlCBj9xRi6iA02tx8qHV9pMbQQjy5mrWIkIYCYG2wmYU4cNPweij58btcMMa0GjIP+3oigKZ77lR+TgQ9pfL1pzj+qGN2YX4hFJ0gunr+q6//lgDP48BZZ3wFilNMVISRYSUtWVyykHHHJY1LedM/qYfhbS/e7iFok2dRNQgLlx/gLiJjT7flg1YKks1DNzPkyzZqN
 yHo1WLuD
 t1LjzakvWgnB2TajJzXwuYJyMBhzCrKa16FK/D+XG2GR4IFezT4N5nSfPISnb30fVsaMNQIDyubJsrOfye1XLXzoVeU4txrTsk7xJvAUoeGxdm/Wn+VT4D31jrz2oaMDnWmHl0SrmvQ74eceKJhmcdVqr3XxHkiQ3/vmQJrSklEiQzV4fzwWhJYY4nUOjGgmKJV78oSsJ2K39rjWhTtXSxZ5Diw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hi, Michal,

Thanks for comments!

Michal Hocko <mhocko@suse.com> writes:

> On Thu 11-05-23 14:56:01, Huang Ying wrote:
>> The patchset is based on upstream v6.3.
>> 
>> More and more cores are put in one physical CPU (usually one NUMA node
>> too).  In 2023, one high-end server CPU has 56, 64, or more cores.
>> Even more cores per physical CPU are planned for future CPUs.  While
>> all cores in one physical CPU will contend for the page allocation on
>> one zone in most cases.  This causes heavy zone lock contention in
>> some workloads.  And the situation will become worse and worse in the
>> future.
>> 
>> For example, on an 2-socket Intel server machine with 224 logical
>> CPUs, if the kernel is built with `make -j224`, the zone lock
>> contention cycles% can reach up to about 12.7%.
>> 
>> To improve the scalability of the page allocation, in this series, we
>> will create one zone instance for each about 256 GB memory of a zone
>> type generally.  That is, one large zone type will be split into
>> multiple zone instances.  Then, different logical CPUs will prefer
>> different zone instances based on the logical CPU No.  So the total
>> number of logical CPUs contend on one zone will be reduced.  Thus the
>> scalability is improved.
>
> It is not really clear to me why you need a new zone for all this rather
> than partition free lists internally within the zone? Essentially to
> increase the current two level system to 3: per cpu caches, per cpu
> arenas and global fallback.

Sorry, I didn't get your idea here.  What is per cpu arenas?  What's the
difference between it and per cpu caches (PCP)?

> I am also missing some information why pcp caches tunning is not
> sufficient.

PCP does improve the page allocation scalability greatly!  But it
doesn't help much for workloads that allocating pages on one CPU and
free them in different CPUs.  PCP tuning can improve the page allocation
scalability for a workload greatly.  But it's not trivial to find the
best tuning parameters for various workloads and workload run time
statuses (workloads may have different loads and memory requirements at
different time).  And we may run different workloads on different
logical CPUs of the system.  This also makes it hard to find the best
PCP tuning globally.  It would be better to find a solution to improve
the page allocation scalability out of box or automatically.  Do you
agree?

Best Regards,
Huang, Ying