From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DB04C47077 for ; Thu, 18 Jan 2024 04:39:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D73346B0075; Wed, 17 Jan 2024 23:39:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D22906B0078; Wed, 17 Jan 2024 23:39:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEA316B007B; Wed, 17 Jan 2024 23:39:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B00B06B0075 for ; Wed, 17 Jan 2024 23:39:43 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 82B851206AC for ; Thu, 18 Jan 2024 04:39:43 +0000 (UTC) X-FDA: 81691178646.19.DBA46D5 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.20]) by imf12.hostedemail.com (Postfix) with ESMTP id 26E7D40018 for ; Thu, 18 Jan 2024 04:39:40 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="QAcS/awK"; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705552781; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YNiFpfdrZ8tU1llXtyGfipVgxFw0iESB7tR1pVL3iDU=; b=m085uVO9fHah0hF/ZUlLLw0+5M6yxlY+CTukInK1gWjHuzMNLhDGs7nTNHn/pGhFjz2sQe +HjU3Y3bSnmV/ZfASUXarBpB4s8gP5r8ZO0xQdbmjBhi6M3b2zhGA0eylDCq4G7nXYeHPN a15VZrlj3lXTV33iW4fFufJZQJcEf1s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705552781; a=rsa-sha256; cv=none; b=MvjFZrkuKwIYdgxWFu87l63aZd17D2KLPu1KTkNG0Zu0vsXV7iBnYw5MpOG2D2UbT1idy+ vRSr0a08KWjG4guCUKg/dxwAuEwYr+qB+m89s9/xs92SHr7OodDanV8uYaCD7AMWEhq9Va ah1sxpqPqBOU738MXeHiFlVMxelwM8k= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="QAcS/awK"; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705552781; x=1737088781; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=WNBz0nAGev6j0DP23tPtu2NqwCR+QLRtNKY/oIRIW4M=; b=QAcS/awKMRHDpOc0R1ci3moQTwAhmx3iRaTj3hJ1V/95OkvsTSYvRFnF 22mUH2byZ2wTWBGoNTH3ISRlP8tI3obCV1AXW3GC0YjlMfeyuAKSXctvU 7VDkxkraenjNa3B3Fa+q6S+IdZCZirLH57zFEI6kR92+8KRK+afcYaSLz HJg5A8jW8KJ09uf7lk4VIBjiY/PRhLBQ5swNA8FNnSTqrSjTG302CMip1 Zq/yx1e1TujZwQPkechlRbbXdf8CzJt/ZUa6HZSDZU7uyqQy73t33zBrZ VQ4u5TfBuXyOcoPNJPOQFVSkjAP/gtWU+ObtSJ5OmJbvesQ5/EE+blydt Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10956"; a="390796587" X-IronPort-AV: E=Sophos;i="6.05,201,1701158400"; d="scan'208";a="390796587" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2024 20:39:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,201,1701158400"; d="scan'208";a="26359903" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2024 20:39:34 -0800 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH 1/3] mm/mempolicy: implement the sysfs-based weighted_interleave interface In-Reply-To: (Gregory Price's message of "Wed, 17 Jan 2024 12:46:03 -0500") References: <20240112210834.8035-1-gregory.price@memverge.com> <20240112210834.8035-2-gregory.price@memverge.com> <87le8r1dzr.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o7dkzbsv.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Thu, 18 Jan 2024 12:37:36 +0800 Message-ID: <87bk9jz27j.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 26E7D40018 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: dmxzfuziqf4anxeonwzmnqqjfkb3py7n X-HE-Tag: 1705552780-900387 X-HE-Meta: U2FsdGVkX1+DE0UoOCBcz2plkpPiS2hF798UafmAtijcrXG13C6jpaVOVluM8SpF5LMjop4wjJnVoApLAFavz3cBOoon6NyA9W8LoxfCsUgZXJxCNXa9+ACxHlRZmFbP/9xhn1zLvaK2Pp+x7UuQ9lEgOJDv1QhY0sZQTMXeMHOMflcpfHl28kCvy2/Dsj5vT+20kr8fyJmj7jQjQ2R1LsEnn5zjBNOXewfj1g1uoJ4YvnKqjBE1cuFcWx1kvAU0GOjChxxxEaR/TctKUD+i7KBcBkVvWo8Bp/XrCrd9CWs7MbcXhkFNxrEC+dIQ3+tH/7s+75usEvmrfEDegAW/qZgiQPHIuculRPcHP1Gu7VNTQDH/cUu6GgP+bu31mD5WClhwx0lDPOymK8Od6almGwFo1Ba5bczwzQoAN3n8D/wRTIEAid+GG2lCiHRyQHKOqKUUDtAqruCjO9F/hWrQ/SiIwfiRKg7Gn0oB/ami9USpKs8rQGm6LEZ0FVIQLSNlO4VEqwEqM95pdByifPjeQCH4YYM5EEyWXBwNrNSrBua0chfDZ3zPiAmeQx2bNQuKBOUtWsRIuRU9WDPFxh3OOqw/7otz0UV8KjFjyI/YyvkzfZheX+dT27VeoWQMTKYC3qS5Yjk47MrmW3MJ32f0AUDGUjxKWOs/o9bw9xZ708NHnGgjsPUrS5m2qJmDX0rBdqsTnfi3R1ND7LFc8BfFBcnxBpe4SMJ53Gbgy7nhEnmPq0BMRrYNbmdubFcAJXeYEL50CBXjTFAPXzNTUCkMG9PcJKzvZRAu7amtYs3gDzxzX4Bmc20TpFY/wrE8rtB4PY6fLHeffmiCnydIUojEUeob4OVq/2sothwqJ+rLf+ITcUnPlDKsqB3oVLjlLcSr92rtwGukBFmlU3CgfWa/siDJ7r90pc1wUFntAPu7vKkLuXW9PUFrMsPS6ju/oFmzMmcB1Wd/siurmpPbrMG z5Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > On Wed, Jan 17, 2024 at 02:58:08PM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > We haven't had the discussion on how/when this should happen yet, >> > though, and there's some research to be done. (i.e. when should DRAM >> > weights be set? should the entire table be reweighted on hotplug? etc) >> >> Before that, I'm OK to remove default_iw_table and use hard coded "1" as >> default weight for now. >> > > Can't quite do that. default_iw_table is a static structure because we > need a reliable default structure not subject to module initialization > failure. Otherwise we can end up in a situation where iw_table is NULL > during some allocation path if the sysfs structure fails to setup fully. As the first simplest implementation, we can avoid default_iw_table[]. Becuse it's constant. > There's no good reason to fail allocations just because sysfs failed to > initialization for some reason. I'll leave default_iw_table with a size > of MAX_NUMNODES for now (nr_node_ids is set up at runtime per your > reference to `setup_nr_node_ids` below, so we can't use it for this). We allocate memory during module initialization all over the places in kernel. I don't think it will cause any issue in practice. Just some additional checking for "default_iw_table == NULL". And, we cannot make it just static, because we need to use RCU to keep it consistent. Otherwise, it may be changed during reading. >> > >> >> u8 __rcu *iw_table; >> >> >> >> Then, we only need to allocate nr_node_ids elements now. >> >> >> > >> > We need nr_possible_nodes to handle hotplug correctly. >> >> nr_node_ids >= num_possible_nodes(). It's larger than any possible node >> ID. >> > > nr_node_ids gets setup at runtime, while the default_iw_table needs > to be a static structure (see above). I can make default_iw_table > MAX_NUMNODES and subsequent allocations of iw_table be nr_node_ids, > but that makes iw_table a different size at any given time. > > This *will* break if "true hotplug" ever shows up and possible_nodes != > MAX_NUMNODES. But I can write it up if it's a sticking point for you. I don't think it is an issue for "true hotplug". Because we can set nr_node_ids = MAX_NUMNODES even if there is something called "true hotplug". > Ultimately we're squabbling over, at most, about ~3kb of memory, just > keep that in mind. (I guess if you spawn 3000 threads and each tries a > concurrent write to sysfs/node1, you'd eat 3MB view briefly, but that > is a truly degenerate case and I can think of more denegerate things). Not just for memory wastage, it's about proper API too. >> >> When "true node hotplug" becomes reality, we can make nr_node_ids == >> MAX_NUMNODES. So, it's safe to use it. Please take a look at >> setup_nr_node_ids(). >> -- Best Regards, Huang, Ying