From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE611C7EE22 for ; Thu, 11 May 2023 14:26:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A3746B0072; Thu, 11 May 2023 10:26:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02D3B6B0074; Thu, 11 May 2023 10:26:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0F3F6B0075; Thu, 11 May 2023 10:26:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CF2846B0072 for ; Thu, 11 May 2023 10:26:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D6FFBA0955 for ; Thu, 11 May 2023 14:26:30 +0000 (UTC) X-FDA: 80778199740.19.24FDD47 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf09.hostedemail.com (Postfix) with ESMTP id 5CC2214025F for ; Thu, 11 May 2023 14:23:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="MVdO3/2T"; spf=pass (imf09.hostedemail.com: domain of dave.hansen@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683815034; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+XlnKlB0x9Rnb9o8KcwVCDnlrBU4hIuzG8sbsUaPiBA=; b=aeFCKCGj6aWE85glSYUac4jDp/2ElIliEvtDi6iGsnUIQyQa3YBZg+F3qWVLUHu8B0UNpR U4ebRg6j8GXpRzFLNTC8dsnNpnfFmoY3+gSAl/HlLtT57FER7KLIfm7QA85ZGIL/Xr1y/R ftHHsjfSRz02KhKNf3+dvLnTj6IOJQA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683815034; a=rsa-sha256; cv=none; b=A+9+eS/qggbwouLP5wprCsLFog+7Bsv/03A3qfcC3j9ojUhcTaQM64BHOCQ9uxaaa9f2mS OIpA/2yJ8Miiz/V4cKSFlfEGzeSPJarhmWGtTn/cCf9tPE23uCvoF+r829HhnBQXB6/DR+ SJdE7ryEjwbySd/YaMATDNGCvYVb5zM= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="MVdO3/2T"; spf=pass (imf09.hostedemail.com: domain of dave.hansen@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683815034; x=1715351034; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=be2GtGpChjQC18AL8KYB2LRliaSADqMrGVcS9L1D0U0=; b=MVdO3/2Tf/DIongZR0Y5F31sqFwdHcJbJnb7G2YeBmreu0pJEH9UCTwH PI9pQ7t6y7J1eNUEo/Vl/9LdW68jViqo/E+Zb+GQzfO9YxX2jxfmC+RW6 68cg2htEzLcEUXk4MSgy2C7enOdGRtAWghPQhBy1iseRlQRL+/GfvMqrx VVcDZ/5v3eXnblLV39im0Y6uH47bLAElH+A+YSv7d0Z9oZw+XgIVMDYyE lma/BZDGNB+ZhfosY2FPd61Qm3pzjA+GcfzYzvqolVF5suJukScOwk7bn iwR8jZqDcKp9zxZdrGDJ8F1jKYQx5P4oE26g07OoQH/w8PPk6RN1Ah4XL w==; X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="349352188" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="349352188" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 07:23:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="769349575" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="769349575" Received: from ambujamp-mobl1.amr.corp.intel.com (HELO [10.212.238.119]) ([10.212.238.119]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 07:23:51 -0700 Message-ID: Date: Thu, 11 May 2023 07:23:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [RFC 0/6] mm: improve page allocator scalability via splitting zones Content-Language: en-US To: Huang Ying , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox References: <20230511065607.37407-1-ying.huang@intel.com> From: Dave Hansen In-Reply-To: <20230511065607.37407-1-ying.huang@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: s75pmdfckmii4ktg94zjajjbr8gxq3gb X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 5CC2214025F X-HE-Tag: 1683815033-187825 X-HE-Meta: U2FsdGVkX1/ryd150c6JiJ3/JbeTPzHgratsEr9dT3LXezIZL2j7NhOvK9WKs9Cvv0hezoeSOQEejuy22Z88U1rPam1I7GRdT1FSb8rgAD4/ncHKczasX4+BPqhkljVV64SfR999lGaTKHh6ieAao6qo7PhZ/7OuvkPd41lV0vyF0YjdjmDjyo6OkU/cmURFLBHABLnQ6fV+Bjwe91YlkgBW2/krg90NHMCDaLuvBp7s30tgZiyEHYjgLhrSAHmxQGXka8ppcFukKUzN2i9RV3vqr6uwjaNM+d2GuC00nioJSkVSVfKsfKAH4UtgID0h+2cWWqbtf4bNL/Rhjdb8pwFUgnT+cSWSr7zLQzTd1+LeeckVJpttVpECIuRJyE9ZMhSGuNjth6RVfaoIfNG6oGRyqIjD/o6dLdseXO0Y3NoyWXF2DTvbOESnfPrhTL9AHkXHo/boEJULj8sE3xouiG2x+0i3goB2C3mFSI8TygnXgq/OLuwCcrf3McQzvpoMt0pLJeLphZ7bwxqNEEtU2ZALXPNy1TnR35tpB0e8UvhimpC8ub1gFglHFFtR7KXNBHKbD00U5Qe9Lzgc6Okgvs4B1TSHvF7wJYpBSJDlui2NIKBZB0LUeslju/ftSzVqeWhmakg1RXjaC1phq0SHuB/qSR0aGpjMuyoiDGbvqgRQ+oiNSVtMhEqAm8Q2Nto3vjMEu+q17aDoG2pmE1tVYPj3n0eWxzBbat/ElJhOqmKG4amQA93NK2HVvZm/419EsSaHVapWh8Dn9AmKrzeEFbSubS8AQ+t+dcfg/evgOknnNZ/uEdj3vIebLx2ABVUWwKqWaj5SXctIU40SQyi0fO+IiPMLfHuXyUpwh5/h90V+pVdbA+0vZ+oU3dDTTquXsxFuDlwETefCNmwHfO4Hoot5Fm7TS708ZwcoDxqulU16/yASq7+OJMKEE7DuQM1JDs3IX/Y41QSq5KPtpXH WFk9CMar YUqpxqT+YuRBq0+LxggkF6grA0R8iCPj97xlpJINIeva/+tDQBzNkdVejL52JR40ncVpCvRmCuMefR4zLenROuqfixr87E1X+5PMmlm8QwDsFbgOdSDNiNBFln7Yot+5sVSweuR3e1HGIk3YW/VUp7zjoUA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/10/23 23:56, Huang Ying wrote: > To improve the scalability of the page allocation, in this series, we > will create one zone instance for each about 256 GB memory of a zone > type generally. That is, one large zone type will be split into > multiple zone instances. A few anecdotes for why I think _some_ people will like this: Some Intel hardware has a "RAM" caching mechanism. It either caches DRAM in High-Bandwidth Memory or Persistent Memory in DRAM. This cache is direct-mapped and can have lots of collisions. One way to prevent collisions is to chop up the physical memory into cache-sized zones and let users choose to allocate from one zone. That fixes the conflicts. Some other Intel hardware a ways to chop a NUMA node representing a single socket into slices. Usually one slice gets a memory controller and its closest cores. Intel calls these approaches Cluster on Die or Sub-NUMA Clustering and users can select it from the BIOS. In both of these cases, users have reported scalability improvements. We've gone as far as to suggest the socket-splitting options to folks today who are hitting zone scalability issues on that hardware. That said, those _same_ users sometimes come back and say something along the lines of: "So... we've got this app that allocates a big hunk of memory. It's going slower than before." They're filling up one of the chopped-up zones, hitting _some_ kind of undesirable reclaim behavior and they want their humpty-dumpty zones put back together again ... without hurting scalability. Some people will never be happy. :) Anyway, _if_ you do this, you might also consider being able to dynamically adjust a CPU's zonelists somehow. That would relieve pressure on one zone for those uneven allocations. That wasn't an option in the two cases above because users had ulterior motives for sticking inside a single zone. But, in your case, the zones really do have equivalent performance.