From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36706C7EE22 for ; Thu, 11 May 2023 10:30:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16F976B0072; Thu, 11 May 2023 06:30:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11FAF6B0074; Thu, 11 May 2023 06:30:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 035036B0075; Thu, 11 May 2023 06:30:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E59226B0072 for ; Thu, 11 May 2023 06:30:18 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8FAE9160BB1 for ; Thu, 11 May 2023 10:30:17 +0000 (UTC) X-FDA: 80777604474.13.D1F69E4 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf13.hostedemail.com (Postfix) with ESMTP id 2440420015 for ; Thu, 11 May 2023 10:30:14 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683801015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=102UW7dbybdfq1nK+Nizpu6mohUhLtNtuzywJEC+YUk=; b=xGdpyg9spJwsxwkkeqCaoofc3WoZOVPibS0czOVBv1XuUw02KfFJbUSGG1XgEdfU/7UJXo lnw5YwviGKAZFoDwKWB5JuT0KpORfmO12mQg6KHteQJPB8qz5pesPQMnW7APVZKUrNHkqh cClf+1wavTe/5HRbt3TA9/Jib+Q4fpk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683801015; a=rsa-sha256; cv=none; b=Kq28UlXYdW/FKCmBci3r+OYxT+SexKbno51+w8Bnch7pKaCtzRL+4xfS3Di0RbioQp/5OI CAt0j/0BVQgYiHuh9HDTjc01jqJkZMf3pRkYMTvMgFKsCQ9+PNK5qM6NCN10lJoWXyH2Pn DmAZCFSw6iBNAjcSWg5eF0/jz0slKrw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4QH7Rn152Vz67dY8; Thu, 11 May 2023 18:29:25 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 11 May 2023 11:30:10 +0100 Date: Thu, 11 May 2023 11:30:09 +0100 From: Jonathan Cameron To: Huang Ying CC: , , Arjan Van De Ven , Andrew Morton , "Mel Gorman" , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox Subject: Re: [RFC 0/6] mm: improve page allocator scalability via splitting zones Message-ID: <20230511113009.00004821@Huawei.com> In-Reply-To: <20230511065607.37407-1-ying.huang@intel.com> References: <20230511065607.37407-1-ying.huang@intel.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.227.76] X-ClientProxiedBy: lhrpeml500002.china.huawei.com (7.191.160.78) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 2440420015 X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: 9zrarji9e7z1tprbgj1rrsjrss8pjc5c X-HE-Tag: 1683801014-423029 X-HE-Meta: U2FsdGVkX18l/h4ZRQzMaqEJGA/v8fzfrg1zhl6zFY8NFzXli3Ym0gfYlnOm2YAawP5wtbsy3gRzASKTm1M1b1hOJG8Hw91Ta3VtLEAANsY12NDpWrDH5ZLBBQlFwNFS347q5K3zZANtSlafbU+kDARrNC5+ea4pKshksoeR7LyYN6NSGbBtzHYMZleIHpIVA2Cl1tWBWuMhYT21p09PtTBnxEwHS0ARExqzYGEr8nInGSiYrXwRjaXHjpXC2wrWPhVFRzvFRwIwq6pQVbs0nfcduPuSjKOqcDqZdQX1FO8o7ikY7sRtIEuDn4cF+OoHwZqaz+LZtrFJJk+LMA2YIxs6UHWRPubeqz037Td19Ihkd4GOsUUazAdGavKHzTMK1SI8WxuO+57ws3xCUBxITKPicChH8kk3Uzo+T48TzXNoD7IFNdmq9Ctymmh+1W8xltmDOiKmRBSmutTQN3JpYzvjtRB9+QvWaCPbyvP1azm3G4G+LNiT0DkT920SF0pHd09OO4vTn9WGjLPhQrZYWuEqI25SEvg6USTMWcTMYOgfk4Mo5/23RoZWZueztKwtu9p50u/Uubnniq7zMbpN7qAjn3+CfBNVmTBcDiOmP0VK6wtvLnnvZh6wn+vSpTZJsLY0GQMDYlklG2IWI2j+cJnrFdya8EFUCquUcAW95O3SE6RYl+n1Szv81sPOWU1BVYOvNF1RDidrRvKiHHHoxmp57LwNGkfMLfHaS+qwP9UTudKo4CKfEs4IxbB+ih5mgcxm7yqu1JkZrsTlt0dWzExrZvJDji2L+rm3wG9LgQfsPC9locCQdXVtL/4YAO7RU+ZRc8R52RLy3+irIb/VSc8KSsUOizQSjAnJ/mmf4kchOIttF8vUgfM5nK8BZi3idHoFi4gTtWC8SfxY2DzbGP1h8ItgJ60B/FM6J6goiaM+0u1eZfTJZ7lFXprg3p68NMZh/mXSE2XhNLZu7LG i65J3djT 9ykGrmO73lP04/kuPfYEzDgVIyVRzXcoyzfr6LUoxryzkRo3v/qIU/0Kj4IEVg948Pqt2mTD7oBJmnaWGBKCksKgzb/PGPKvoKvRJF6+WKdWE+tQGWKY6UNytVlbZPUahAExvUu82GTZLX7dCxB4yB64aApBX0p6352eJitXmZRbqjQXUlk6oLhdE7nn8qdqg6bNCT+Py/RT+xzkVdUXVxiJv/ssgRZC/XUQk089CYT1aJnw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 11 May 2023 14:56:01 +0800 Huang Ying wrote: > The patchset is based on upstream v6.3. > > More and more cores are put in one physical CPU (usually one NUMA node > too). In 2023, one high-end server CPU has 56, 64, or more cores. > Even more cores per physical CPU are planned for future CPUs. While > all cores in one physical CPU will contend for the page allocation on > one zone in most cases. This causes heavy zone lock contention in > some workloads. And the situation will become worse and worse in the > future. > > For example, on an 2-socket Intel server machine with 224 logical > CPUs, if the kernel is built with `make -j224`, the zone lock > contention cycles% can reach up to about 12.7%. > > To improve the scalability of the page allocation, in this series, we > will create one zone instance for each about 256 GB memory of a zone > type generally. That is, one large zone type will be split into > multiple zone instances. Then, different logical CPUs will prefer > different zone instances based on the logical CPU No. So the total > number of logical CPUs contend on one zone will be reduced. Thus the > scalability is improved. > > With the series, the zone lock contention cycles% reduces to less than > 1.6% in the above kbuild test case when 4 zone instances are created > for ZONE_NORMAL. > > Also tested the series with the will-it-scale/page_fault1 with 16 > processes. With the optimization, the benchmark score increases up to > 18.2% and the zone lock contention reduces from 13.01% to 0.56%. > > To create multiple zone instances for a zone type, another choice is > to create zone instances based on the total number of logical CPUs. > We choose to use memory size because it is easier to be implemented. > In most cases, the more the cores, the larger the memory size is. > And, on system with larger memory size, the performance requirement of > the page allocator is usually higher. > > Best Regards, > Huang, Ying > Hi, Interesting idea. I'm curious though on whether this can suffer from imbalance problems where due to uneven allocations from particular CPUs you can end up with all page faults happening in one zone and the original contention problem coming back? Or am I missing some process that will result in that imbalance being corrected? Jonathan