From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D277FC48291 for ; Mon, 5 Feb 2024 05:50:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C0C96B0071; Mon, 5 Feb 2024 00:50:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0718B6B0072; Mon, 5 Feb 2024 00:50:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7A8B6B0074; Mon, 5 Feb 2024 00:50:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D2A4A6B0071 for ; Mon, 5 Feb 2024 00:50:53 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 52FB2160983 for ; Mon, 5 Feb 2024 05:50:53 +0000 (UTC) X-FDA: 81756676386.18.BB39862 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by imf01.hostedemail.com (Postfix) with ESMTP id 7B9E240009 for ; Mon, 5 Feb 2024 05:50:49 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CRIHsBu6; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.7 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707112251; a=rsa-sha256; cv=none; b=IxgyHIM/owUMHH/7DaYW6VgZBiRNDMTUJkVICxIUznbx0w52ddVWaVyhe/aGFv2opPnaYZ SeCl68ZZi04l098hxqv1CDAYTO4etsWU91VoEYg3kMetAEJR4rvJYSvSuH5D/6exauPuzF Trn6k3jX536+z9SbUPB6+GMfK1sJEYo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CRIHsBu6; spf=pass (imf01.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.7 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707112251; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SE8rW3ckCWKfTFmy2cYmzdBa3t5hT9zY4Ayr5RGjXiM=; b=HpYKVJEcE3R0V514NI99O9/SL8avNrZisInGoxH1s6017mpzxGdD4kfnqURFOL6BJAaTpQ CddJGlbBymyG+jJvuqOh2onkVyLQS03l5wI4C7K4YK2TXAKCadCgKetB8fVpcVeVkvDlPv 4GkhTH3gmV/qPPF5HWNs+uQcChnsctI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1707112250; x=1738648250; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=lSM7C+vUc6UhPUqzZ2Tj67xDB4u+cxvZTB1BWRh9jbE=; b=CRIHsBu6aceJmsyX4WjxmG11pDqXCTT+2ntdCLg4xKepu/fXCPtXBunJ /CVnFVmgm/l5s0yr5z13PnnwJWRZeWHBQNaxWi7pCQy/CHFP9VYtLHquy 75fbEqwiHTENhsuef71Jj5JJuMrB0VQRIHhac9tjd24nR14fJdWGK40zd gFo8+g/Q9iqeJczZ2R2empjjukClVVJcRIohufDV2HSOlWnV/htvCwB3s mGsbqK54HUp8bGZrDposE6CSC1fpQBx1aLNwrAq34ZA3Gt+QW6zZhSa1C 8TuMfw4Y9f0kSzwFLXBmH+uEGd73FKvXb75XF3tw1fOkthKaOMEGl5Hgr A==; X-IronPort-AV: E=McAfee;i="6600,9927,10974"; a="25900768" X-IronPort-AV: E=Sophos;i="6.05,242,1701158400"; d="scan'208";a="25900768" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2024 21:50:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,242,1701158400"; d="scan'208";a="38038147" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Feb 2024 21:50:41 -0800 From: "Huang, Ying" To: Gregory Price Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, corbet@lwn.net, akpm@linux-foundation.org, gregory.price@memverge.com, honggyu.kim@sk.com, rakie.kim@sk.com, hyeongtak.ji@sk.com, mhocko@kernel.org, vtavarespetr@micron.com, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com, hannes@cmpxchg.org, dan.j.williams@intel.com Subject: Re: [PATCH v5 4/4] mm/mempolicy: protect task interleave functions with tsk->mems_allowed_seq In-Reply-To: <20240202170238.90004-5-gregory.price@memverge.com> (Gregory Price's message of "Fri, 2 Feb 2024 12:02:38 -0500") References: <20240202170238.90004-1-gregory.price@memverge.com> <20240202170238.90004-5-gregory.price@memverge.com> Date: Mon, 05 Feb 2024 13:48:44 +0800 Message-ID: <87r0hr31hf.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7B9E240009 X-Stat-Signature: dy8yf3jpirmtsqtdq9c5zw5r9ppeu7sk X-Rspam-User: X-HE-Tag: 1707112249-264237 X-HE-Meta: U2FsdGVkX19jatOYiMKlK1UoMq+4M+rGqIe6TcpoANnGMECS4hbSbKKW2mP4TlZvLIyxxMNS2EyN7ot8kb6gYANt8A9pIAtknkNXAy+v9Xic3q6VTfTSfhEsLzPwiOtAXPzcWZ3bMttU26dDb/yVRKTDN058oMFnKcWTvZIvt7WFO+36gov95RI0Aw1dVpdsHgCLesLGwXHvaQgpuvLFX2/cCebdh/ARXIY6AF1Fnj7AHfniEiEv9L4qEmdWkWJ82Fx7lg/xRONEVeH5/kHmV+lDSxvl9XNLekOgvuc0OYy9q61U+LO/aDWrkxfnpeydmGC3XBiYxM8cGdfaRbT/B6eXxqLnwsr61F3qrHH2E1haqKLl1xGvXxqkXdFfnh7/wEBELLZMxgOS+QzUuiOU49ptIJGBVmtde/Of6drQxDgPTtYJQOeaFqFdsjbLTOJGTQauyrHi0cNbgDEpoF9JsYyaN0hBplnIUb/U4jjJurg8p+96w2GNhDQj1CRIh4FHFZ6R31gYjg58l4LVfN9j0AuekQxpFT49QrgXi4MZV9qmAGRGQbgyLF7g0k4Toc+i/jfatckyTJqWJ0PMpOFubyl7M27U+jAtybz0OlV91ihJrS2XmxNi/CYjYB1gQFvb33gLRB/hT/UAP6psdhzvld6zhAEeNTREbRiqYWiz7qzojfwcYE3HMAsdF6DXgSFXlT0f99GmxfHPnJA5z1t0nSWIuMkqJUKDPFTSXt7lc6I2xqngvmtwJHQ9oPpCphnJl4Mjg4m7yml5CgOXZsXdUPQ3tKIAgfrg2Fu6jgvNu7LaQlPEazdDuCb5KfVTzhpvAN7qVF+EzjbLRb48/Uc7UgKNQcGeLPDrd4hqE39y3RXwRUYUEf0FMFnosRVsIphcc18x0Gn4u1/7Y+X38JYpdN5IuQqeWuIU0WtnqVagnd8cw17bGZP6Dk/5VQj3HJF3HYKz665tVi9QNSlAGG+ 4OG3C6JH z7gUXtFYFyVZpvmKjk8f01/TmxlWrGLMRP97QiT7kmjuQX6pS2SCIsuldkDw2PMrmQ3U8g9tcn1tyH2Ioc98IjLViJpVStdtKeZkYybdWTSuDdoE3OAEGn0x/PjwmvSqBZ8CrJbtDa2878EDceXbZAJSTFD7pBvvfEL4BUC8SrH+GmK7uggiOZ1P/+8QNY0yhZEGOHthkQOtzhqichnAc4dPYpl4SAgVlbKi+WGPD/LNuLKkbx8dcysVK16Nmy5O5ucnZ9qO6O2WY2+t23LEgVxjb1WYK5Bo/dJpxUncxNWT+j7cCNHICK62yf6RZ3dU3LfwgYBc1shxWpBJm/5zlfXXozdkAL+EaqGEM0BTz/69zmQM3jR4Uox7rPw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > In the event of rebind, pol->nodemask can change at the same time as an > allocation occurs. We can detect this with tsk->mems_allowed_seq and > prevent a miscount or an allocation failure from occurring. > > The same thing happens in the allocators to detect failure, but this > can prevent spurious failures in a much smaller critical section. > > Suggested-by: "Huang, Ying" > Signed-off-by: Gregory Price > --- > mm/mempolicy.c | 31 +++++++++++++++++++++++++------ > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index d8cc3a577986..ed0d5d2d456a 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1878,11 +1878,17 @@ bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone) > > static unsigned int weighted_interleave_nodes(struct mempolicy *policy) > { > - unsigned int node = current->il_prev; > - > - if (!current->il_weight || !node_isset(node, policy->nodes)) { > + unsigned int node; > + unsigned int cpuset_mems_cookie; > + > +retry: > + /* to prevent miscount use tsk->mems_allowed_seq to detect rebind */ > + cpuset_mems_cookie = read_mems_allowed_begin(); > + node = current->il_prev; > + if (!node || !node_isset(node, policy->nodes)) { ~~~~~ !current->il_weight ? -- Best Regards, Huang, Ying > node = next_node_in(node, policy->nodes); > - /* can only happen if nodemask is being rebound */ > + if (read_mems_allowed_retry(cpuset_mems_cookie)) > + goto retry; > if (node == MAX_NUMNODES) > return node; > current->il_prev = node; > @@ -1896,8 +1902,14 @@ static unsigned int weighted_interleave_nodes(struct mempolicy *policy) > static unsigned int interleave_nodes(struct mempolicy *policy) > { > unsigned int nid; > + unsigned int cpuset_mems_cookie; > + > + /* to prevent miscount, use tsk->mems_allowed_seq to detect rebind */ > + do { > + cpuset_mems_cookie = read_mems_allowed_begin(); > + nid = next_node_in(current->il_prev, policy->nodes); > + } while (read_mems_allowed_retry(cpuset_mems_cookie)); > > - nid = next_node_in(current->il_prev, policy->nodes); > if (nid < MAX_NUMNODES) > current->il_prev = nid; > return nid; > @@ -2374,6 +2386,7 @@ static unsigned long alloc_pages_bulk_array_weighted_interleave(gfp_t gfp, > struct page **page_array) > { > struct task_struct *me = current; > + unsigned int cpuset_mems_cookie; > unsigned long total_allocated = 0; > unsigned long nr_allocated = 0; > unsigned long rounds; > @@ -2391,7 +2404,13 @@ static unsigned long alloc_pages_bulk_array_weighted_interleave(gfp_t gfp, > if (!nr_pages) > return 0; > > - nnodes = read_once_policy_nodemask(pol, &nodes); > + /* read the nodes onto the stack, retry if done during rebind */ > + do { > + cpuset_mems_cookie = read_mems_allowed_begin(); > + nnodes = read_once_policy_nodemask(pol, &nodes); > + } while (read_mems_allowed_retry(cpuset_mems_cookie)); > + > + /* if the nodemask has become invalid, we cannot do anything */ > if (!nnodes) > return 0;