From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC520C369AB for ; Tue, 22 Apr 2025 00:16:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9D2C6B0005; Mon, 21 Apr 2025 20:16:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4CAD6B0007; Mon, 21 Apr 2025 20:16:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C14226B0008; Mon, 21 Apr 2025 20:16:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A2D2E6B0005 for ; Mon, 21 Apr 2025 20:16:14 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E38265F502 for ; Tue, 22 Apr 2025 00:16:14 +0000 (UTC) X-FDA: 83359762668.17.E1DFDC4 Received: from out-177.mta1.migadu.com (out-177.mta1.migadu.com [95.215.58.177]) by imf06.hostedemail.com (Postfix) with ESMTP id 1FB5618000D for ; Tue, 22 Apr 2025 00:16:12 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Tw9+yQZZ; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745280973; a=rsa-sha256; cv=none; b=oc4xu2CCX+wZL+waMOvE01jtMk92Q8WVIy2FweOVJnPojQAuyjDb10vOTrdwJ21MJCDUK4 pvgX4+CQvj//r2JmwjCJcrj4eicuN26CYw7DLWURDA4ILwT1ycx6w4uPq186SaCgg6HYr3 9RJ3/zgg38fwvR4MPaYrjv9iZBUt1bM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Tw9+yQZZ; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.177 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745280973; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kltmpeeeQXYojmYSTMp3YORACo5B4JEzEwh7/cdoB9c=; b=DrDTj3h1tSStrRZ0LjrzqRQIIg0Tn/M/L19Zw0FYqzdlbAVSqVSfyUMSqGsxeeLeiyC2C4 Q8repO9uacOCo3Rvj3niU8y6Fx3jF6B0jwosWJJr2xpkK3YRxxflUrZ2A7rvqdiBZLq5FN crS4dsIQQz5r/o/qmHqKRvMJusWC3Xw= Date: Mon, 21 Apr 2025 17:16:00 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745280971; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kltmpeeeQXYojmYSTMp3YORACo5B4JEzEwh7/cdoB9c=; b=Tw9+yQZZVktLYL5BVwsAnRNmcYOKnInrbY4CI5zP1kWJRz1smRHRQsnVzxT9wBXOmcVGuZ p5qxbOfY9ZLUnr9SurDwYWYWGmrTJEYzcxM61XmcMIY9s/4e7PsjgLxl/C0yJ3FeEsj1xb hI2eDep1wKW+wIsipcIEvF+TVapALrw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Waiman Long Cc: Gregory Price , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, tj@kernel.org, mkoutny@suse.com, akpm@linux-foundation.org Subject: Re: [PATCH v3 2/2] vmscan,cgroup: apply mems_effective to reclaim Message-ID: References: <20250419053824.1601470-1-gourry@gourry.net> <20250419053824.1601470-3-gourry@gourry.net> <7dtp6v5evpz5sdevwrexhwcdtl5enczssvuepkib2oiaexk3oo@ranij7pskrhe> <3478a69d-b4e9-4561-a09a-d64397ced130@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3478a69d-b4e9-4561-a09a-d64397ced130@redhat.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 1FB5618000D X-Stat-Signature: 39jrmby83qnncfmmak8se83psbxks6ts X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1745280972-513186 X-HE-Meta: U2FsdGVkX18oBuVvsYvqfZIRwernoxdINHSADPM53SFfjfpdHd45W++aays7b3BlvbO31SML8THwTejq0RYiqYR2I6ObzFaKnLRAG+LYS5pWiYn7jXrdY/BgkF53ySFBTmEZ5XWrPRbLxyTVPpEpw/9sqD1m2MYO+XQRYMsvqSgBWyS0iHkZmOs+cCB1fctBokpVc1JfQLu05KNvzEMhmXyXXthhfDzbmenu4o/Q89jCqQfiJLpST5W28jZ9ywBtZgwmWNsy2zALBGRSOKyxqbUFcDFLXc28UZXvIP0VkIdS62SVKHxUZIW20a8y4fGYh2bX644kYr+jaREs8WsFSSPFDGSwvwzNz63g4enZ7C8E1mzb1yuxUut71WGDYtHuzEp4R8i5awXqFCmbgXvaRSDOmApgb0a3HpD7tSu8sxV9LBvBMwbmxa+o8k+bwW/WF45sGDj+q4o/7aLGIcp4k2MfrahUUcSek7BKSYTmFpj1R+sx1E0Lj0MkOnQRD+QfNJAcPXiDqnKemmU8LTDU1Bk2AOMkr3d6jevtcLPigv+WpJhLMR+vHMQoiE+j2wTjLsfAcrjgp4JLXeSbTkYdX6HtzYmPPbfCcTfH6DC9ppdy0inwb8xyT/PVXdM5iFR1FVlBA1M4XXsrwI55ptG5Zrd/jn3L2tFS2HAWlPruALOl6EUY6k7zhdybaHm9htHWawocLVD3bqsWfTQTfk6AgvO+gcu0DKrduhewxmux4/6xQnv8JWMOZK9T1PSPtX78RRhNNPwhxxX+z0eota4g6k2o4fUEWR09jHhzRxIQVjlBcf5PqmWhkrrWXJ4Ict4TCtm3CfGM8FWhxXQq9lnu53lE8PjWyMTkiVqAalNG9BgODwyplDqd8uVHrFFgm25qre8LfgGm7gXMTUfLmzkxd4Y1BppedJIBMvobXB/cZvnFk0yTg5MiI++KLZM6p+nh9NhB/YcSQ8QOO+om06+ VAi1Wufq UiL9Hpm/hLoNBF/hZWtDzY4RFYSpHvW8hPMVp2Nq/YTkVFCAu1PafH4j0G0OclTns79q+Fx96srg2W/oOlr2craKNF8RIQM3kJczzcAGdQb+phAp2/tZbxiqoWZjVUx3OFS7O0046tHUoaurhgxc7r8Z0q6R+u1H6m4SOkX8zQIpnwR80vKuvWEWpsxFVt3rtb9tSzgkRq88QiGJ0769aKHybwEO5K8KT8Ps4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Apr 21, 2025 at 08:10:41PM -0400, Waiman Long wrote: > On 4/21/25 7:15 PM, Shakeel Butt wrote: > > On Mon, Apr 21, 2025 at 06:59:20PM -0400, Gregory Price wrote: > > > On Mon, Apr 21, 2025 at 10:39:58AM -0700, Shakeel Butt wrote: > > > > On Sat, Apr 19, 2025 at 08:14:29PM -0400, Waiman Long wrote: > > > > > On 4/19/25 2:48 PM, Shakeel Butt wrote: > > > > > > On Sat, Apr 19, 2025 at 01:38:24AM -0400, Gregory Price wrote: > > > > > > > +bool cpuset_node_allowed(struct cgroup *cgroup, int nid) > > > > > > > +{ > > > > > > > + struct cgroup_subsys_state *css; > > > > > > > + unsigned long flags; > > > > > > > + struct cpuset *cs; > > > > > > > + bool allowed; > > > > > > > + > > > > > > > + css = cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys); > > > > > > > + if (!css) > > > > > > > + return true; > > > > > > > + > > > > > > > + cs = container_of(css, struct cpuset, css); > > > > > > > + spin_lock_irqsave(&callback_lock, flags); > > > > > > Do we really need callback_lock here? We are not modifying and I am > > > > > > wondering if simple rcu read lock is enough here (similar to > > > > > > update_nodemasks_hier() where parent's effective_mems is accessed within > > > > > > rcu read lock). > > > > > The callback_lock is required to ensure the stability of the effective_mems > > > > > which may be in the process of being changed if not taken. > > > > Stability in what sense? effective_mems will not get freed under us > > > > here or is there a chance for corrupted read here? node_isset() and > > > > nodes_empty() seems atomic. What's the worst that can happen without > > > > callback_lock? > > > Fairly sure nodes_empty is not atomic, it's a bitmap search. > > For bitmaps smaller than 64 bits, it seems atomic and MAX_NUMNODES seems > > smaller than 64 in all the archs. > > RHEL sets MAX_NUMNODES to 1024 for x86_64. So it is not really atomic for > some distros. In reality, it is rare to have a system with more than 64 > nodes (nr_node_ids <= 64). So it can be considered atomic in most cases. Thanks for the explanation. > > > > > > Anyways I am hoping that we can avoid taking a global lock in reclaim > > path which will become a source of contention for memory pressure > > situations. > > It is a valid conern. I will not oppose to checking effective_mems without > taking the callback_lock, but we will have to take rcu_read_lock to make > sure that the cpuset structure won't go away and clearly document that this > is an exceptional case as it is against our usual rule and the check may be > incorrect in some rare cases. Oh this function is doing cgroup_get_e_css(cgroup, &cpuset_cgrp_subsys) i.e. taking a reference on cpuset, so with rcu_read_lock, we can avoid that as well.