From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A356EC47422 for ; Wed, 17 Jan 2024 07:00:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D0AF6B00C8; Wed, 17 Jan 2024 02:00:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1803F6B00CD; Wed, 17 Jan 2024 02:00:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06FE56B00CE; Wed, 17 Jan 2024 02:00:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id EA7096B00C8 for ; Wed, 17 Jan 2024 02:00:17 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A9C0080AE4 for ; Wed, 17 Jan 2024 07:00:17 +0000 (UTC) X-FDA: 81687904074.21.3E613F8 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by imf05.hostedemail.com (Postfix) with ESMTP id 22E0A100028 for ; Wed, 17 Jan 2024 07:00:14 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UDROp582; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705474815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fZfVwjqJqMquaAY/TDQoTKzF7uq1RMadziLin2psumc=; b=UqEoOuRS8uvn0Xahm4Xyyrgv1SoZrVTzeX/3/i2zIvN0bmgmtd7XI0yoCeL0Do/DuPEEji GqpneRqMPafEErDTcvxJxVOpOGvWN1PWSh+L2MD5ki/qPDYZ8vj2KzPryJ0eqa6gmkDx9C uVwZ4L3UgbWgFtczhR4+CstrSledaXE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705474815; a=rsa-sha256; cv=none; b=jchuMXz1/aTlHl07t80WwgZZqvjCf9JCNLH3meluIssDwqmD53buxoK91wLERDO31Fiw0K ss9TeXkqp+YMTBG6ww6PxJUhiC6q91K72+8EkDq3AfCUt5YSYnwhsm0YoeHdWVj2Jgvx3M Szr8X96wQzOUFdCmZSfMgev2CEd7nMs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UDROp582; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1705474815; x=1737010815; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=k7OAtGL+2nxEw5zXMvTzAkCupl6TNovUqU9pBszDoZw=; b=UDROp582wpAbxMAg2P5D9Qii7Wmp79L6hHtL7peraryRpZEGxmTAIQGd FJ5aXaU1PkQScIP5MssM0W0wuF/aMBxEz+h59C/jNYXMi86Qdt9H2qXwY 7vxbizwCS9Yuwh5UAc6lckf1beCStIchibVPdIKvIL+UobxwzQOkGS97Q H9ZdzkQDYmffN0Drjkr3I13sAX3LV7uqW/KJDSg2NFKVp9CADVEvU/ZeG 13zGI136S1HpfWGOnpw4W7Yr+HMbJDX2W/D0HPW1aI8qDQ950H5gMGXW7 cNZLOt/5l3HVeCFqY5TeA1D4PAfOYw7sl7XwvREF2mEQYkBbVoIHYGdw7 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10955"; a="6805785" X-IronPort-AV: E=Sophos;i="6.05,200,1701158400"; d="scan'208";a="6805785" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2024 23:00:13 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10955"; a="957456948" X-IronPort-AV: E=Sophos;i="6.05,200,1701158400"; d="scan'208";a="957456948" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2024 23:00:06 -0800 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH 1/3] mm/mempolicy: implement the sysfs-based weighted_interleave interface In-Reply-To: (Gregory Price's message of "Wed, 17 Jan 2024 00:24:41 -0500") References: <20240112210834.8035-1-gregory.price@memverge.com> <20240112210834.8035-2-gregory.price@memverge.com> <87le8r1dzr.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 17 Jan 2024 14:58:08 +0800 Message-ID: <87o7dkzbsv.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 22E0A100028 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 11n4mouzzhm6f9ksowjqamsatgcnmnth X-HE-Tag: 1705474814-84313 X-HE-Meta: U2FsdGVkX188PBHzUiunsR5PRld9+AxQIZFN7zT0CTiIIppwMWCDBhf4psuTcpFIhEOXjkgaNnTDrW9iwWmYZ/64HAVxauqUyV37FlRr1EbK23GYYHkdZ0ypZ+pLhuD88e25TFdPwwVmWdUtPA4rYuI/0cbm4YDC2Pya8drNAjN/TI5LCoMLyCuHRHGy2v7WP6X5fbT3JU5ABIUF9chUiYDBACyAOgI4/eLg13fxVoyn5laBosrxKWuLgTvVd/CnIqa4crERqlnZq5+0/7zV8Wa69Cg2O6f9VObTsfKB+W3/PqQXPEamFddrHXd2von4BZ75nksdYoGoJIzvzxhwOYbtiZmEJb2AFZWZKcC6qrZbZKIRMxeoRmJ0q7gQw2lIwv3KE7q1LmbhC3ExdN7XPd23LzTBYrazFj2GPKoxyY52ppBJOkp+i5JZwaQ+uYTq2AuQ394wUoz6Km6PXJbRrNvQTQ1YZBhHJ3TGgEx9e+/c5QMmVxRYHRODtimmOl/SjY8NWg04tTpmg6Xwvq7lspMEJpEKZxBY5OUmPxTtZxlMdh2GwW/rl/td+cj3l0Hb+H0Xaf0yJKyZ2laSjwymatZRn4XztzQ0KxPOJe1LtomQE6wtFecK71c09b4TlMBnYbWbRPWw/BKZ4OWrdnvO8YcbyE9rDrKj9M2goWADQOwfBlKZM8b9dxCX4n26b0ua/V+ZywUct/dwm05zEkmfihq5WS28tN7EHhIBrdDXv5EcbSyw4ICQWJEJo3vl4FM8d6SwmD9z5tDqSZ/1knM4p6J6uc6lzAVbp71sVLzNLvl0b0L/nv8x4Y9HJ363ZQyokVEOx6UJJSNUvvxuJKbdRyMBHRh6MRVLQkWYXSKEndk0oLpPAmeRTW7UysCN8SFIhtQBXtU/6TN1jQOz1kGiVJKGw1RcHDO1F+J1aA3TWO2B3f4gXnq85TyqKTmTcMdRW3LGr24FD6G2WlyWOo2 3VLZS/yl Ge3K+ejgqypq9nI3bynQ8TBKZShhEc1PUXHRqhTXeNdCzFlAYIxrYvyQmQZGcIRks7xtCGUNW2hkMN9NKZlE69gjvajrMPBWtaCAH7D57Ue2hXOnp2prorZMP5q+h1UgydzRA5ZAzfNPwtoqx9XOX4PRzimJkcn30YELFXnIbHtL88Jw/bBsnTafL6WaeusXHkWIfST1JV0rz3DDP/6bnUJEs1qyzpS0yvAXt4EZj+zhF5tXkT3ZqzyDUGyx0JatfLtL0uI00LMNEMlQuRhak+1EYG31uTxSTMceiLhvhyWkEZCeoAtQIin4Aez7GFS2fbjv9nYixCl/h8fLPBb1buDVkO55BBMUjbqoE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > On Mon, Jan 15, 2024 at 11:18:00AM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > +static struct iw_table default_iw_table; >> > +/* >> > + * iw_table is the sysfs-set interleave weight table, a value of 0 >> > + * denotes that the default_iw_table value should be used. >> > + * >> > + * iw_table is RCU protected >> > + */ >> > +static struct iw_table __rcu *iw_table; >> > +static DEFINE_MUTEX(iw_table_mtx); >> >> I greped "mtx" in kernel/*.c and mm/*.c and found nothing. To following >> the existing coding convention, better to name this as iw_table_mutex or >> iw_table_lock? >> > > ack. > >> And, I think this is used to protect both iw_table and default_iw_table? >> If so, it deserves some comments. >> > > Right now default_iw_table cannot be updated, and so it is neither > protected nor requires protection. > > I planned to add the protection comment in the next patch series, which > would implement the kernel-side interface for updating the default > weights during boot/hotplug. > > We haven't had the discussion on how/when this should happen yet, > though, and there's some research to be done. (i.e. when should DRAM > weights be set? should the entire table be reweighted on hotplug? etc) Before that, I'm OK to remove default_iw_table and use hard coded "1" as default weight for now. >> > +static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr, >> > + const char *buf, size_t count) >> > +{ >> > + struct iw_node_attr *node_attr; >> > + struct iw_table __rcu *new; >> > + struct iw_table __rcu *old; >> > + u8 weight = 0; >> > + >> > + node_attr = container_of(attr, struct iw_node_attr, kobj_attr); >> > + if (count == 0 || sysfs_streq(buf, "")) >> > + weight = 0; >> > + else if (kstrtou8(buf, 0, &weight)) >> > + return -EINVAL; >> > + >> > + new = kmalloc(sizeof(*new), GFP_KERNEL); >> > + if (!new) >> > + return -ENOMEM; >> > + >> > + mutex_lock(&iw_table_mtx); >> > + old = rcu_dereference_protected(iw_table, >> > + lockdep_is_held(&iw_table_mtx)); >> > + /* If value is 0, revert to default weight */ >> > + weight = weight ? weight : default_iw_table.weights[node_attr->nid]; >> >> If we change the default weight in default_iw_table.weights[], how do we >> identify whether the weight has been customized by users via sysfs? So, >> I suggest to use 0 in iw_table for not-customized weight. >> >> And if so, we need to use RCU to access default_iw_table too. >> > > Dumb simplification on my part, I'll walk this back and add the > > if (!weight) weight = default_iw_table[node] > > logic back into the allocator paths accordinly. > >> > + memcpy(&new->weights, &old->weights, sizeof(new->weights)); >> > + new->weights[node_attr->nid] = weight; >> > + rcu_assign_pointer(iw_table, new); >> > + mutex_unlock(&iw_table_mtx); >> > + kfree_rcu(old, rcu); >> >> synchronize_rcu() should be OK here. It's fast enough in this cold >> path. This make it good to define iw_table as >> > I'll take a look. > >> u8 __rcu *iw_table; >> >> Then, we only need to allocate nr_node_ids elements now. >> > > We need nr_possible_nodes to handle hotplug correctly. nr_node_ids >= num_possible_nodes(). It's larger than any possible node ID. > I decided to simplify this down to MAX_NUMNODES *juuuuuust in case* > "true node hotplug" ever becomes a reality. If that happens, then > only allocating space for possible nodes creates a much bigger > headache on hotplug. > > For the sake of that simplification, it seemed better to just eat the > 1KB. If you really want me to do that, I will, but the MAX_NUMNODES > choice was an explicitly defensive choice. When "true node hotplug" becomes reality, we can make nr_node_ids == MAX_NUMNODES. So, it's safe to use it. Please take a look at setup_nr_node_ids(). >> > +static int __init mempolicy_sysfs_init(void) >> > +{ >> > + /* >> > + * if sysfs is not enabled MPOL_WEIGHTED_INTERLEAVE defaults to >> > + * MPOL_INTERLEAVE behavior, but is still defined separately to >> > + * allow task-local weighted interleave and system-defaults to >> > + * operate as intended. >> > + * >> > + * In this scenario iw_table cannot (presently) change, so >> > + * there's no need to set up RCU / cleanup code. >> > + */ >> > + memset(&default_iw_table.weights, 1, sizeof(default_iw_table)); >> >> This depends on sizeof(default_iw_table.weights[0]) == 1, I think it's >> better to use explicit loop here to make the code more robust a little. >> > > oh hm, you're right. rookie mistake on my part. > -- Best Regards, Huang, Ying