From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A3E2C4332F for ; Fri, 10 Nov 2023 21:24:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDAB480024; Fri, 10 Nov 2023 16:24:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D8AA88D0005; Fri, 10 Nov 2023 16:24:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C041880024; Fri, 10 Nov 2023 16:24:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AC2A28D0005 for ; Fri, 10 Nov 2023 16:24:50 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8533F160F75 for ; Fri, 10 Nov 2023 21:24:50 +0000 (UTC) X-FDA: 81443324340.03.A666BE5 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2056.outbound.protection.outlook.com [40.107.244.56]) by imf27.hostedemail.com (Postfix) with ESMTP id A685640017 for ; Fri, 10 Nov 2023 21:24:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=lugXL4St; dmarc=pass (policy=none) header.from=memverge.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf27.hostedemail.com: domain of gregory.price@memverge.com designates 40.107.244.56 as permitted sender) smtp.mailfrom=gregory.price@memverge.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699651487; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bf3wfIZu9HrWNhlf17i64kr9XqF+UKOg061jJ9kuGYw=; b=DBpipMVKrdNGh73A5EXh1CyY98X42jaTs1qcreCgM16h94ygvWkXPR0n07MwvkRj5rUMfH deAZZdkBnAGnVxoJYxrmRzZN/CwAlzQE67mHcR3rdd6fGeTrosK2CaAr4w6xZL8Z/volGa aYx4QQmM80s7IycItl/yVxkktBAbAV0= ARC-Authentication-Results: i=2; imf27.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=lugXL4St; dmarc=pass (policy=none) header.from=memverge.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf27.hostedemail.com: domain of gregory.price@memverge.com designates 40.107.244.56 as permitted sender) smtp.mailfrom=gregory.price@memverge.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1699651487; a=rsa-sha256; cv=pass; b=m8S8VPPKA8VsRW4sNETiyMy2n9j0ZFPYlZ7AlhnaCyps5rLV002koXtRE0mUU3Hd9EFD0u YPxM81Y2YhuBDyuX/pRqFYz6CnQuPE8zkfm8193XTqYFhHhAQrrUpoqQcnCH50EYWogJjY i3audEoytEEpP8vQy5ppEsbnin6MsOc= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=X4MGbz+yWBn3wpR5BQqewG7sZGvIFjZNL8a9ScRyrxYcoH1RdqRmhRo8OfrIC8/qAynRFt0dfizuNGANyGwLBq7zQ627gFUJPbS+oWzT4XTqDY+ddMif4gq4R1ha4Vp64fGo1myHVyw9jDauPRJ00y8lq+ZRHxHixn1yDNg/S42rueHNrZ7poRoYqIxNAysVEj3YKdyhhfkhelCrM/eNwQd4Kt9pOz921Ngl8TSirQbL8VgvF1Eu+tcL4pWWPZ3+uLmc+6cn31rAR9F+IVUfTOCq8Sk8hKWO7K23idgVaCXGxXeTjAaq6j4QacGA+SXCaHAmARarXgCdcQecUrunWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Bf3wfIZu9HrWNhlf17i64kr9XqF+UKOg061jJ9kuGYw=; b=YBFE+Ly0337230A8uWiH6qq98ERdtEwvpLkQNSlMLgs0uwLlTRQ6oPeHnPDynyxjag3tHWD3xwjaajyO+gONGO+bwCEfOohgOowebCMYunJHJYcF8xxhai/aJMBHCMOV53g/xDqbW9wil8s8UiXOzRBJrSALbmcBzZPTsIAgRmPAQbSVz+ZDlkEJpZtdFVgNsGXqkkTGfmoXn3FefeHlTVhbljDS6nNZTHeZkPWFai8X5MrgHdNN7E4Es2RQUdBZsj4x6dkLv54/qpT3+MmM6B9W9BhOP1DWnvhuY9V+Yg5HO+a5zJWHtT2cWwZj5Z2SWWtaZSeVMs6Iro9qrVMOxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=memverge.com; dmarc=pass action=none header.from=memverge.com; dkim=pass header.d=memverge.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=memverge.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Bf3wfIZu9HrWNhlf17i64kr9XqF+UKOg061jJ9kuGYw=; b=lugXL4St6cALlvoPZ2Cz3L0h2cKTNA1kLWc+7uapp3bj+IQ43vZF/tSc4M0CC46xepmioanvLQhIjQ+wOCqkJ19VOfchbOpuFgbUYNFLW/FVdWx85w/CNIBT00YFrKxNvyhwi/+ZVPYEzbOOLbYAyhz4QxSs5cGTaHuMI/4Vdc4= Received: from SJ0PR17MB5512.namprd17.prod.outlook.com (2603:10b6:a03:394::19) by SA0PR17MB4490.namprd17.prod.outlook.com (2603:10b6:806:e1::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.17; Fri, 10 Nov 2023 21:24:43 +0000 Received: from SJ0PR17MB5512.namprd17.prod.outlook.com ([fe80::381c:7f11:1028:15f4]) by SJ0PR17MB5512.namprd17.prod.outlook.com ([fe80::381c:7f11:1028:15f4%5]) with mapi id 15.20.7002.010; Fri, 10 Nov 2023 21:24:43 +0000 Date: Fri, 10 Nov 2023 16:24:34 -0500 From: Gregory Price To: Michal Hocko Cc: Gregory Price , linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, ying.huang@intel.com, akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Message-ID: References: <20231109002517.106829-1-gregory.price@memverge.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SJ0PR05CA0035.namprd05.prod.outlook.com (2603:10b6:a03:33f::10) To SJ0PR17MB5512.namprd17.prod.outlook.com (2603:10b6:a03:394::19) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR17MB5512:EE_|SA0PR17MB4490:EE_ X-MS-Office365-Filtering-Correlation-Id: f478c0c5-56b1-4986-d61e-08dbe233742e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: CUaaa/2maLfBUVylino0wa5WzV6XQGtmyqZFBtXJcZixsxS/rD2Ae2vWIHuqjX0g8UUoBYUQUBV3j+nPWj8vY2ok7SeVBiwZler8lzC6A9awaHazkJsI/89JHUqn5neWun/yKPi03LzPyItwnoYkN98vl1jrN5+0Yvu537LyK691KM5Xvqd9oGscZuLA4aghJcjl5Ot3V1DKYl542+cIQfnmXkzXyWh1s96hLO3X3My75IBL02jzplVMk1jMA1V1QFlgY2o5uhI+5ktzkceR7M59pJc5mXy4Zp4AxPurG7TpCBrU30WHQmNWq3floVRLxICOG8ZjOdMthTaVgNC0XuebhXX3KS7igLQmE5GEebKoq6e2ggSWtqRmMeQ4XLdKZAmn6HhBJaaEQ2Magc7MBxuBzCBdJmViQOb+3ufyt+NfDTb25gzN/mILBLUOYY1lG/EgQartYE42vQaWFOPG6W0l3wxAvdC1DWDkJ1m62jUP9s8E3MFOSv2MmScDoimG/HszvhKjG8trvAOVIVRKoJQ9CjKDXJhL75TxdrxB/HRz0ihODplY/8DcTzJGJTZd742GKOtboVJLUXJ/EEiXqj535e3oCcYmEhCEafHFFhM= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR17MB5512.namprd17.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376002)(136003)(346002)(396003)(366004)(39840400004)(230922051799003)(64100799003)(1800799009)(186009)(451199024)(86362001)(6506007)(478600001)(6486002)(6666004)(316002)(966005)(41300700001)(2616005)(66476007)(66946007)(6512007)(8676002)(8936002)(66556008)(4326008)(7416002)(44832011)(5660300002)(6916009)(36756003)(83380400001)(38100700002)(26005)(2906002)(16393002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?JG1aN09NJ40JvEu4xLH2JXcA7RBN62mQy6Xc8jvj+pEdxt/p6//LEk9MmZ79?= =?us-ascii?Q?DPmOZ/0WlZ76ay/hh7UCO6T1XNM4o5rjCHjQBg3O63I59WJyFx7IUFRTb2fn?= =?us-ascii?Q?TEhv+ft+EQgX0P7Xb4hyl65unW9P6etn1E86m+Lt9iZbgSOtoA2juK5Qr0Jo?= =?us-ascii?Q?gr8iFPXR9RIuLUwKZPYfFUpFfP1Lo8o39sGx71nz9yMKlmRAtVbi/MRefm0I?= =?us-ascii?Q?EAIIiMNQ1PPjG5K+5bPYfILwSjZjNg/n9hsi88hL9VHPJ25PsE+rHgQEH9hv?= =?us-ascii?Q?ZVFQyiNOah6RoBZ/S5FGeWk7sdGvsOzTMCv2AqOYlqHQIZl1nHhEq7tJjPri?= =?us-ascii?Q?p3w3CeGwVxKR9A1QxUTEU8C362WQoK/LkXFiPG8UzFaf20LtFzqkmmOEEmKX?= =?us-ascii?Q?Nz5BRddztKZSaGCwDKO1FubymAEv6AgqZCx7xY4G7Lc9YlwT3g0EbVj/wN42?= =?us-ascii?Q?nVx7qK0HvfcGI+hmm9BKC9dBfubjTYtIOGjf+4yrX7P+rCR5PxfK6W85Kq6N?= =?us-ascii?Q?plEo218lsoNkwkjz476w7dXpNpGTGJh6LYXujVJEtc+Y4eR5R/xIAkVjeuF4?= =?us-ascii?Q?DOfaWLTZ+6+0NB1WKvi/2zXwRC5e0p9WVhTGuZdofx9+XgiAy/FQ6DgzpLG/?= =?us-ascii?Q?ysVFAz5HY++eAXX6bIl9woIPlxxIuI2Z1/HI2lUF+tZXEqHa8IaJzyjVId0d?= =?us-ascii?Q?SGXol2PtTvvwceTUNtMFm56XaRzv9nrOcYLRUIgc0VlMvaMm2ST9u/U/wXni?= =?us-ascii?Q?Jwik44plcZfltWpZGZ8BBmvkzXiX516tC9k6AjBVZCy+DNYZ5G5FTDXR5S46?= =?us-ascii?Q?XD2wkQFKjkShu0swdadV59wcjFujnAK9m9JdjrSBRSgeshXIJcjRxlr7HiG2?= =?us-ascii?Q?i0IozYB3Bt4brkgI1n0db4AyW977y8JovJ6RWz1eIMRyYrdCgf2JyDi2CbzM?= =?us-ascii?Q?JoFHtPoArgMC4mXVMtSmk7XASxhQHfQbm7LIRvyY5GQAk+9uCmEbnyj2Vo+/?= =?us-ascii?Q?Env1Oaro1VosPz7bNMGJOdJiD1zg5Z8kCM9v6mhRwR5EVWMPBoWCjHtLC7K3?= =?us-ascii?Q?GxrC8wgJ8QHJAiPiEQdM9S/MOrKQPUKMa38AgB94mHB8Euqag9FY0uLB8x2g?= =?us-ascii?Q?cV0JVaTu5ZlvcPav3jFuWll4gbdB3aH795WRSenv0J2gkanqKJIDDY1D1hIg?= =?us-ascii?Q?4TeleYun4eRb799DeFlzLTo7h4ky4P9eig06F4xyHa+lhVy1tZV3hZyJRY/x?= =?us-ascii?Q?x7l6+2pzcFG/fZB3Ces7G8RYKSWMvpuAknOon7BDjIs5YSSJhDvzKBJl3RRz?= =?us-ascii?Q?fE49ypmt/siwwy4WC8xGSli4UHhnjaZDoFgzJk+KHZVqh7yWAOhRP+J5zZAr?= =?us-ascii?Q?+y1QPvuevMQquPsvpRtAI54Qu1d02JnVYqOIVaOIZaeH/t7t1HOEU2hwQE/d?= =?us-ascii?Q?jujeyaZxdyuPEKFIPfZdM6I8DnTt7esYIf4M0tc+uUTtToB2jXevFC9Bh7A4?= =?us-ascii?Q?HSr5icPS4Ti10PfzG0ZhVrldeHgObDOriNk2ZNs4PkV+ArjHi7V0gFvbTXGK?= =?us-ascii?Q?FNF87veTuunm1m91L+A0wFD2Kq5/4OS3YENoVmi9+EeX+wxzD1LG4OmEETjv?= =?us-ascii?Q?Dg=3D=3D?= X-OriginatorOrg: memverge.com X-MS-Exchange-CrossTenant-Network-Message-Id: f478c0c5-56b1-4986-d61e-08dbe233742e X-MS-Exchange-CrossTenant-AuthSource: SJ0PR17MB5512.namprd17.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2023 21:24:42.5579 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5c90cb59-37e7-4c81-9c07-00473d5fb682 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: tSoWXlwmPo4of1RoR3G4VSwGAwuqvqwKInqxPdB8nmWfoQlw292zGjV9NnGT8fs/gkZo0npPMSSI9LMC8JmPJPIhcEH86h+ntPpZRMqMR1U= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA0PR17MB4490 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A685640017 X-Stat-Signature: qpz3jnrqobdo4xydk8mad7z7d34f7fx8 X-HE-Tag: 1699651487-659314 X-HE-Meta: U2FsdGVkX1/WMh4Ki6OcZFS1qmafLl/43QUa4ahOoqKWePvxueRxXS5C5Ec0avdGLh92p7kStiUqZ0X4VQ4ylz0D8cIyJVwXZ6g/geWLLOVdw5LR4n+2QnavgfaG0lW54YwUhcehlagM7vQ/14ULtZFIbvORXTWSO/PPdEHgykakE4IH5CzHYa4q8EKMJxXpBQC7REu4Ii7RasE/LTI/B8+T0NwevtrIly7B0/eaihQEv6PhORrOoqrOO5QcUGMSLjDTgsStZJ+IXUdc4BGcQBbYR3eN7cJykLfrvi3gTiHyZjm4Bp9LQS+6RU3NKXon1ktQLlyLIzrsNP3gv8xFwUKQDSFlQL07WwPYge7Lfgbcmli9OLXhIJqCh4vSic/6NRZ5ktrhq3q/gYtGV2vKRsYYiwlP5npXrdx9tm5anRloDY1prfhTzT7A+AF6cKEsR9dyPo8jgzst32tSiIvlGs8wlwtKqYCp3Aq7kEqzILzHK4GNxuJmzv5JyVLOLQNEeX6DzwPOBk29PAFg57yGbl5Am+9h+0HaA+y1+3rUqpQLLdb1LskJ276VQAWGhsis5saHjCjlcHbTzSCOxle2ZVTP0EjbE1/hnot/GZvIUovW/SNyZ+piFw0WyhBJubFxsW48R+GrDYqqZL7Bh08pa/v7bDi5nENoM60PQc0KGf5fICGFgZktzePUiBDP49JBK4wA3k1rml/iXy4WB6iQ6ycqKDYSs+z+DR7T0a9Z8UZNsrWsCsim9hA7urAt/5262Jc9Kmh8u8hiA52S5Eub32+QWRQwYvfVmYJhrxll4d5ne+xlMrqnj/CL7555R/fB9tzAwY4SzWcb39Ezsaevg8XuIkS/P9roo5u9Kdb0N/DAdWgY6b0ndAS+Du4T7XtcwTWRWEQK2yW6pIynyHK1K/5YfdzaV/wcaAidO5t2SrfuDjp6jrss4QIeZ5N3LWTCivlf+cSxlNtX1FtZ+d2 eNS3zf7c UDKjjrYD0zp20emFSLvjj351DGSMfp1T5L6Q1+bp4ntK4PyD8V2FsWlHvtStc92hSkSfpPEmObOUDa9etuqRJPSV1np30VFYR2LRmTMNGW6e/Nbp7wbi2pUnAU3QgX1hw740CHim4IniXSGxdhE/n0NJwvzEmTGX3oFlMAKYTzrKeqvRgSHy+SPG0JIoqjpLr8x13uG5HzbkYvWEsca5bSlc4brbCTiM6ljTrlvtwmRQS9gtQks//ucuBwjix6qzFkm0xFa+JqfmklQDqF+WJhkq7HWiFQV7Lybq5LI6Pd/qqmFZO7Up6f8XmMTgIu0tkch5FYMgqWw7jsCxvlDXiuS1e6Xh8PtVcM1JdW2gCTUTlucU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Nov 10, 2023 at 10:05:57AM +0100, Michal Hocko wrote: > On Thu 09-11-23 11:34:01, Gregory Price wrote: > [...] > > Anyway, summarizing: After a bit of reading, this does seem to map > > better to the "accounting consumption" subsystem than the "constrain" > > subsystem. However, if you think it's better suited for cpuset, I'm > > happy to push in that direction. > > Maybe others see it differently but I stick with my previous position. > Memcg is not a great fit for reasons already mentioned - most notably > that the controller doesn't control the allocation but accounting what > has been already allocated. Cpusets on the other hand constrains the > allocations and that is exactly what you want to achieve. > -- > Michal Hocko > SUSE Labs Digging in a bit, placing it in cpusets has locking requirements that concerns me. Maybe I'm being a bit over-cautious, so if none of this matters, then I'll go ahead and swap the code over to cpusets. Otherwise, just more food for thought in cpusets vs memcg. In cpusets.c it states when acquiring read-only access, we have to acquire the (global) callback lock: https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L391 * There are two global locks guarding cpuset structures - cpuset_mutex and * callback_lock. We also require taking task_lock() when dereferencing a * task's cpuset pointer. See "The task_lock() exception", at the end of this * comment. Examples: cpuset_node_allowed: https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L4780 spin_lock_irqsave(&callback_lock, flags); rcu_read_lock(); cs = nearest_hardwall_ancestor(task_cs(current)); <-- walks parents allowed = node_isset(node, cs->mems_allowed); rcu_read_unlock(); spin_unlock_irqrestore(&callback_lock, flags); cpuset_mems_allowed: https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L4679 spin_lock_irqsave(&callback_lock, flags); rcu_read_lock(); guarantee_online_mems(task_cs(tsk), &mask); <-- walks parents rcu_read_unlock(); spin_unlock_irqrestore(&callback_lock, flags); Seems apparent that any form of parent walk in cpusets will require the acquisition of &callback_lock. This does not appear true of memcg. Implementing a similar inheritance structure as described in this patch set would therefore cause the acquisition of the callback lock during node selection. So if we want this in cpuset, we're going to eat that lock acquisition, despite not really needing it. I'm was not intending to do checks against cpusets.mems_allowed when acquiring weights, as this is already enforced between cpusets and mempolicy on hotplug and mask changes, as well as in the allocators via read_mems_allowed_begin/retry.. This is why I said this was *not* a constraining feature. Additionally, if the node selected by mpol is exhausted, the allocator will simply acquire memory from another (allowed) node, disregarding the weights entirely (which is the correct / expected behavior). Another example of "this is more of a suggestion" rather than a constraint. So I'm contending here that putting it in cpusets is overkill. But if it likewise doesn't fit in memcg, is it insane to suggest that maybe we should consider adding cgroup.mpol, and maybe consider migrating features from mempolicy.c into cgroups (while keeping mpol the way it is). ~Gregory