From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D47B7C4332F for ; Wed, 1 Nov 2023 02:36:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 707248D0031; Tue, 31 Oct 2023 22:36:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B7E38D0001; Tue, 31 Oct 2023 22:36:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5801A8D0031; Tue, 31 Oct 2023 22:36:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 463958D0001 for ; Tue, 31 Oct 2023 22:36:22 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1E2F1120990 for ; Wed, 1 Nov 2023 02:36:22 +0000 (UTC) X-FDA: 81407821404.22.DB81C26 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by imf23.hostedemail.com (Postfix) with ESMTP id BA7D714000F for ; Wed, 1 Nov 2023 02:36:19 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="gQc/s7GK"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698806180; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2C+KT8dCgSuT/otao5fAZTtBBjaavTdhmI2Co5WqJIU=; b=LfuxYRjt2yRaii8Ia2tZRc7jHIIUlNsMr+6ww+eeCB7PsErt2/+EdS7GR0KVsAKN3bHEIT JGlRcXkae0hst7ogHJw1OKapcchby0Z9k7rzDuMMBRoSwgtcelMvdDXmWSCitJL/HFNN01 v8aoi1tDWfF701tt2dHJLusn6Svl4HM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="gQc/s7GK"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698806180; a=rsa-sha256; cv=none; b=E1HD+CcAsFu53JXGC+5iDdQcypDwiiVCjy8pSutl7YE9wO3m74NNYJYPPFw+40qTjjl/ht BBtC+Gt26zLSFxTK+JHXFbO8yoXtmlPJysVuJ/XODF+8aoysb8SxIaeuSiIvl4oYGPd209 hL491v3XEkNOD7i2/0P9AkDNq10DEhY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698806179; x=1730342179; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=CN4PDYT+KcfignHaXej0DYuB1qouUibKHlCbKBw8oZ4=; b=gQc/s7GKMisukvnLGgKJsbIqLUbiLhfBvIhbqK6PCNErbLExr4Kx9Dz3 DKBjnpMOXpP0g4LfXXSrMSLUrSjuj0lZBo66DNYYAFt2/00+rXV4jE8Ad 4PMkLcaPSYWNeCe0CEQ1iAdkl1CQNiK4Na9B/cOJRtT/FXIj1zX2YToXM rREPSvzQKeKMKz2ZeHKptfgnCsqp7orpNgAjsWHVmSfxzxm2ei0/oBdwv LwU46mePbmMNgqyR499pP9x2dgkpZx6JHZlFdO0WGdC96Qwflmh3YfQCE X0FuhlIjW0i15t2J6lFgUOtt29aBOMidx7ZsLk9mgpJZaFDDjKk7p8OMv g==; X-IronPort-AV: E=McAfee;i="6600,9927,10880"; a="385599660" X-IronPort-AV: E=Sophos;i="6.03,266,1694761200"; d="scan'208";a="385599660" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2023 19:36:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,266,1694761200"; d="scan'208";a="1954297" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2023 19:36:15 -0700 From: "Huang, Ying" To: Johannes Weiner Cc: Michal Hocko , Gregory Price , linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, weixugc@google.com, apopple@nvidia.com, tim.c.chen@intel.com, dave.hansen@intel.com, shy828301@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, Gregory Price Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave In-Reply-To: <20231031162216.GB3029315@cmpxchg.org> (Johannes Weiner's message of "Tue, 31 Oct 2023 12:22:16 -0400") References: <20231031003810.4532-1-gregory.price@memverge.com> <20231031152142.GA3029315@cmpxchg.org> <20231031162216.GB3029315@cmpxchg.org> Date: Wed, 01 Nov 2023 10:34:12 +0800 Message-ID: <87il6m6w2j.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: BA7D714000F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: z5p3wnb5uwu48cmbdhxriqe39beto4fo X-HE-Tag: 1698806179-444500 X-HE-Meta: U2FsdGVkX1/MyiCg0iXy1vsxmJZmGIl1WrFQXaoOqok8hpl3yZF9aNued+6kDjoIYLW3rFzOJHaUAVYJZBBjE73SoI6i+Cz7huvxX6FhvjC/2X5BLdj0h6IHXrKZKhPRJongSQEeuQzcOnW/4cWNX0+NhW18iKJVBX3hJ19tQ71DRF5WRmyStoT1RvR8XxmHwWGyDBTjnrRd8AhkmY0tGvTSve9L7+pQnHKxScIsIIQ3wZfLdvHX3UTXnSHtnDeq5ApWO+gJwqyQomi4JLjKWjJ0S3s0Rmz3nat+bCkWLFYizgBbjwF0rnsx6cMkqHoKnlMmh5jTCR07yum0poANxi4UVYHWTQKzpqqHJVQFQi182Kfg5oyqL5Shbl0VR6tgbaC0zzEklnlPArVgA/ShJKBuH7POBnLxyMrGKik3kHoTlY/vM5wg1A7PdcWUdqcysaSNIpB1PsQPceam6N08tlMbL2Pq+Ubl20H/hxt3TFdUe41S4ao5aREPJUL1qZ//iR1Ri4O920iVnviWMsCra17hek/DvH4Zr9PigW+gzrcc9r5mcmzgbwDUEi/htFyQg83HppW10YtwjvwPiSkiaqLm6PqNz/mMqafFTFz7vFSP+f4rsA9h9xp9owBU11VGw9Qq15oKhDkVgaCu2BRUq0rd6xzOllI2tfxulunESh8it3s/SmUEDySNOmDt86PF3Kkp/yqN/6kRFTAB6RKrQ4P7cWjnsKSDI1agTODFdU3qP0mOT4vszyqu6Qd+degQPCFTGzVqz9vyPs8B2o6F1uBMMN9FP+9uNCotmWO1c5OizTRT3Ic8davoTvqJLp7gLt+ZuBQFXJriZnoebCOSt1EecvWo7l+GycTIE8kXNN0XZ3AIPgsezkSPpk/t8b+aJRpO9hfnKRAuXf/wUL5hsxtjsoF41peMorRS5jNXRXkAuqV4vCLNBN0EjaMrek5MqJL0/JfsUvlgKJPYHFF 0x451+Vi FoFKR/LNWTcGYpkABgcNiG/NAoAeLxxE4i9jcfzu6oZkwEae+agzWmWEYHyIClOmp/uL1KX6It+FRoKzSt+NVhtXFUDuqxSLVUaBng3ojfINEGIAp0wSSAm+e9/zhxOflJ6Ty8iAxQHf1mRZrq1zCRhIaUR04yP2yk2AdnndTAUyx2+AAjF4clFZHJZU3iLNWYFMw1yT5i8OHrI5m9PgD3OKn8SLGNDlgA+PhAuUjO5y8Xym2dOhWl+R8Q1tOQ52z5J+VqvLBehYZeaMbNE7qkO0kLELKrdG6GjGY67Onf4ilmKj2r0tOtg2H6GsApoFgFubw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Johannes Weiner writes: > On Tue, Oct 31, 2023 at 04:56:27PM +0100, Michal Hocko wrote: >> On Tue 31-10-23 11:21:42, Johannes Weiner wrote: >> > On Tue, Oct 31, 2023 at 10:53:41AM +0100, Michal Hocko wrote: >> > > On Mon 30-10-23 20:38:06, Gregory Price wrote: [snip] >> >> > This hopefully also explains why it's a global setting. The usecase is >> > different from conventional NUMA interleaving, which is used as a >> > locality measure: spread shared data evenly between compute >> > nodes. This one isn't about locality - the CXL tier doesn't have local >> > compute. Instead, the optimal spread is based on hardware parameters, >> > which is a global property rather than a per-workload one. >> >> Well, I am not convinced about that TBH. Sure it is probably a good fit >> for this specific CXL usecase but it just doesn't fit into many others I >> can think of - e.g. proportional use of those tiers based on the >> workload - you get what you pay for. >> >> Is there any specific reason for not having a new interleave interface >> which defines weights for the nodemask? Is this because the policy >> itself is very dynamic or is this more driven by simplicity of use? > > A downside of *requiring* weights to be paired with the mempolicy is > that it's then the application that would have to figure out the > weights dynamically, instead of having a static host configuration. A > policy of "I want to be spread for optimal bus bandwidth" translates > between different hardware configurations, but optimal weights will > vary depending on the type of machine a job runs on. > > That doesn't mean there couldn't be usecases for having weights as > policy as well in other scenarios, like you allude to above. It's just > so far such usecases haven't really materialized or spelled out > concretely. Maybe we just want both - a global default, and the > ability to override it locally. I think that this is a good idea. The system-wise configuration with reasonable default makes applications life much easier. If more control is needed, some kind of workload specific configuration can be added. And, instead of adding another memory policy, a cgroup-wise configuration may be easier to be used. The per-workload weight may need to be adjusted when we deploying different combination of workloads in the system. Another question is that should the weight be per-memory-tier or per-node? In this patchset, the weight is per-source-target-node combination. That is, the weight becomes a matrix instead of a vector. IIUC, this is used to control cross-socket memory access in addition to per-memory-type memory access. Do you think the added complexity is necessary? > Could you elaborate on the 'get what you pay for' usecase you > mentioned? -- Best Regards, Huang, Ying