From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 228B8C47422 for ; Mon, 29 Jan 2024 08:19:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FC1A6B007B; Mon, 29 Jan 2024 03:19:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9AC8A6B007D; Mon, 29 Jan 2024 03:19:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84D246B007E; Mon, 29 Jan 2024 03:19:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 75CA96B007B for ; Mon, 29 Jan 2024 03:19:54 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1D4A6A077D for ; Mon, 29 Jan 2024 08:19:54 +0000 (UTC) X-FDA: 81731650308.23.A11579C Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by imf10.hostedemail.com (Postfix) with ESMTP id 95A67C0018 for ; Mon, 29 Jan 2024 08:19:51 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JIhpEHYY; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706516392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k5RQAfUeiDflGgpixTGvGXW1odzSlq+Kk+O917hM2gY=; b=anWJU5mp8Tnd7y2jBYhjkBpevvZPGQwfcaDV/pdf2GSCSkZRAeag+SYztfJMpFsbAwmctA cUsjDo6Ksdmp8EegxWy+HqhmcHNDBv8OtJ5gI176OICRCcBVgkdi9ksFTO1FL4O1kB3jh+ VCSgjWDVC8kq2G/RMVzSaIaOAShe9lI= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JIhpEHYY; spf=pass (imf10.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706516392; a=rsa-sha256; cv=none; b=44/t8pt5cyC5V+OBIQ8wnbf7GNfPpZDLlRMVqzLXgCUBs0MClPm6Ghc1WuM36jCf6tFmUA 2VxxUh3skOUvG9VULeRlUhLGFNiGw0euaexsRaIqiUhRCy2iTgl6LH9o/u78OoO1WGYk9H zZqeswAYCDzzDAfHAXfLYLe2DU2+gvc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706516391; x=1738052391; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=q9yLJZ27jvLejI0DqUZQHKwFIPnOnNb3BDVtGNN+2V8=; b=JIhpEHYYMvTxRTFQDg0GX6eNZ8ZrelCTytzl6lu3MLSj8do4/hyeMKdr Pxr61zTtMSIsq6v8Ng5HsbrffwCcCKBWp5xxrRQr4ApC1xndMde+HBIYu UcjWBmn+tDgGOQQNyY1A4JYZe39PIvX7c76rGpS97oGoMjBfyUHyv/Yie puqXR4o8//dVHWt4KTttEn4MVPMcQDpO5WUUOx+CHi4H82k3hw9uXG1hb cXC4HRGKBOrcEv8bpVq//efrh+O6Bjtzk6uWpJNbrOg4xZRtJskdvH2Hn MVsMdx/90lTxckoJwhLIlyUYUAFTGCS7axdUQxvy9e07n8i3KHfJHrD3H w==; X-IronPort-AV: E=McAfee;i="6600,9927,10967"; a="406609299" X-IronPort-AV: E=Sophos;i="6.05,226,1701158400"; d="scan'208";a="406609299" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 00:19:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,226,1701158400"; d="scan'208";a="29701150" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 00:19:43 -0800 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v3 4/4] mm/mempolicy: change cur_il_weight to atomic and carry the node with it In-Reply-To: (Gregory Price's message of "Fri, 26 Jan 2024 11:38:01 -0500") References: <20240125184345.47074-1-gregory.price@memverge.com> <20240125184345.47074-5-gregory.price@memverge.com> <87sf2klez8.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 29 Jan 2024 16:17:46 +0800 Message-ID: <877cjsk0yd.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 95A67C0018 X-Rspam-User: X-Stat-Signature: qtzrq45b38yhs68psfk6qr4tzms6mogd X-Rspamd-Server: rspam01 X-HE-Tag: 1706516391-942832 X-HE-Meta: U2FsdGVkX19KKsNKV63wa8aPf8RvCBThxz6j5A15YX8MQ6pK2in6WmdgpgtdH2hNqhipuF2Ew9XUdTnuQMzz88e3QuYAwn3y3G1cUIc3F/na3wV8ldMiIy5kSXDDw8j0pm2SJkN+JMHqs8QtA1R1HqmM5u3kaZWiBnYwD0JmBME5qtHmkCCgPqmFtVqUbnP8/oD4EEFmRaUCDSbH0QHqbZ4XUE6Ly0SXdDJiTbAYkw7+Cr9YF5uRCY6KWKJxLy6HsCxRk8Iysr1O7XhxrDl57m7x9fNyM1fmNFVDmf+un/T9Nadu3ZmSRp6OC51nszlLGMO66jkzfj68kpu4xqIkn2W1km03VYF5iJykQKLQTkDMqiy6p1yymG9VHqyFzkl+x2P2V5itaypcP3bu1s0XMZzOIvJdIA5kqiuCFOv9sujq443kvzmMBH4jYxQOMfeWns1b6uQsXzfhluGvy26VzU2TWm86N/F/EOP1jIoc+k3dkIwL/hku6d1mtlB/GJv/gW6+zXxAHIRvdgnt3z6nWnGerSBBahxTWIXsch3d5sgCAc6lJxtrVcaCUQ+Ke0nKSOMyro6knTUkl9OHYEoT3RsdLeEDr8iOtHZMyIov4FBmNnwOirH6dyUW5Z1tVon2KYd8nusG96en6PKHXRhPd8fL+uQRjJiAaqcYxzfVPLO1y8DALnf1EpRQVfR4kT/l1yGl0e/KYeFrEVp7BfvYk3SyRg7BQRIFHfpMvvACU0nzMybkloRCWx6c74tDQ2frszlxbVcaaciBm4Tcy7L/tvsQXHZciSL1mBtOGNm+5PrEb5iDrDH4nnonXTSIhiEOGMyarW9l6PLPPXmK5BVSCr/CTUIVd1vGIO7P8hZVCTyetDxgguRvxDO79UBEssfh4qR4Uq8FMZodrsTsaEUIxd4kK/LAbDe84sIzOHO6sKEI3gZOEWLfl6Au9s+4q+FpyO1a6ibdOtIBf0YGRiB CQGXaj0J 1dDjXmFFqSI9k7xrC4BcM9xGGleX6eZzmO2KC9nXCTjeUQYInNuoux6yMHaYPznzd20RRHkjiOvfCL7aoVvfZWBM+xpQt4Yy6sjNFeLwrnnxfe0En9FaFfxttXJYFK+NRHOStUciD0PMQwHFcVFC7MByDgHl/yQAfjTrfxcTEEXbBcigkNTCNPJItau4u+Kug89bLLsWgz0yEA4+Gnw5eayB9Z2/ucexW8+7j6XmySbyVn+vuzO6z59pU7DXUoLw4xv1REYsT/M8b7SLBTg9rPHmDdhsp9+0B3zm5DZc+iWI8PheT+OsCVMokkLrltEA0rrpinGNp5IY1uWDEGUwIPf9KBwg+4FsS8yZW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > On Fri, Jan 26, 2024 at 03:40:27PM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > Two special observations: >> > - if the weight is non-zero, cur_il_weight must *always* have a >> > valid node number, e.g. it cannot be NUMA_NO_NODE (-1). >> >> IIUC, we don't need that, "MAX_NUMNODES-1" is used instead. >> > > Correct, I just thought it pertinent to call this out explicitly since > I'm stealing the top byte, but the node value has traditionally been a > full integer. > > This may be relevant should anyone try to carry, a random node value > into this field. For example, if someone tried to copy policy->home_node > into cur_il_weight for whatever reason. > > It's worth breaking out a function to defend against this - plus to hide > the bit operations directly as you recommend below. > >> > /* Weighted interleave settings */ >> > - u8 cur_il_weight; >> > + atomic_t cur_il_weight; >> >> If we use this field for node and weight, why not change the field name? >> For example, cur_wil_node_weight. >> > > ack. > >> > + if (cweight & 0xFF) >> > + *policy = cweight >> 8; >> >> Please define some helper functions or macros instead of operate on bits >> directly. >> > > ack. > >> > else >> > *policy = next_node_in(current->il_prev, >> > pol->nodes); >> >> If we record current node in pol->cur_il_weight, why do we still need >> curren->il_prev. Can we only use pol->cur_il_weight? And if so, we can >> even make current->il_prev a union. >> > > I just realized that there's a problem here for shared memory policies. > > from weighted_interleave_nodes, I do this: > > cur_weight = atomic_read(&policy->cur_il_weight); > ... > weight--; > ... > atomic_set(&policy->cur_il_weight, cur_weight); > > On a shared memory policy, this is a race condition. > > > I don't think we can combine il_prev and cur_wil_node_weight because > the task policy may be different than the current policy. > > i.e. it's totally valid to do the following: > > 1) set_mempolicy(MPOL_INTERLEAVE) > 2) mbind(..., MPOL_WEIGHTED_INTERLEAVE) > > Using current->il_prev between these two policies, is just plain incorrect, > so I will need to rethink this, and the existing code will need to be > updated such that weighted_interleave does not use current->il_prev. IIUC, weighted_interleave_nodes() is only used for mempolicy of tasks (set_mempolicy()), as in the following code. + *nid = (ilx == NO_INTERLEAVE_INDEX) ? + weighted_interleave_nodes(pol) : + weighted_interleave_nid(pol, ilx); But, in contrast, it's bad to put task-local "current weight" in mempolicy. So, I think that it's better to move cur_il_weight to task_struct. And maybe combine it with current->il_prev. -- Best Regards, Huang, Ying