From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 152F0C4332F for ; Sat, 11 Nov 2023 03:05:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 50E958D0005; Fri, 10 Nov 2023 22:05:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4BDC78D0003; Fri, 10 Nov 2023 22:05:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35E4C8D0005; Fri, 10 Nov 2023 22:05:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1F6158D0003 for ; Fri, 10 Nov 2023 22:05:56 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DDA2680B9D for ; Sat, 11 Nov 2023 03:05:55 +0000 (UTC) X-FDA: 81444183870.13.85EBDB4 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf28.hostedemail.com (Postfix) with ESMTP id 0C00BC0019 for ; Sat, 11 Nov 2023 03:05:53 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WDiDaiBT; spf=pass (imf28.hostedemail.com: domain of htejun@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699671954; a=rsa-sha256; cv=none; b=Wg7ZFmJvoBgoPGIDbOLlMnCQ0YFOcPeDX7Ts4A40qGSl7zzykxqKsE8M0b1XODH/MY1pot ZAajz1iqZPFQnbj7kCeMS24g3hiUMgXqTWK6xFLKYcHCFrq20RTan38UWlDKsVCZPe+UF8 vRQpDZM3YKbNraRO1vZwxDgoCq5/Fbc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WDiDaiBT; spf=pass (imf28.hostedemail.com: domain of htejun@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699671954; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VfzDqMYQmBmjq9uvQPvzGmUJyL/1vfZVHtCKJYCOiuY=; b=dUBQbwYwgJ86/m/RideVDy6q0Ki9u8PLR4KZq1DdE1miaSBTtof9s3N7a3jfGNkvTi3k6d KsqVmmyQLj1EyEUKjxvQy8XaiaKHjW0r/YBEo5CQyn67xiy1JUoN73vqow9Q94jAyXYDQV 7+hVBaoHT0goBf+1UZPC7YCV9nAcW14= Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6c320a821c4so2454054b3a.2 for ; Fri, 10 Nov 2023 19:05:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699671953; x=1700276753; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=VfzDqMYQmBmjq9uvQPvzGmUJyL/1vfZVHtCKJYCOiuY=; b=WDiDaiBThmqpi8MBKHRwoy+4oH4gLuMoXf9ybzb0PhntR+YmBgMwnSUCndxGp+us27 PQGorTG86Ple09pN2evpuYGKxsLDgFjizQXPtVfDn/ld/ZySQpJhnUL83buvjmydfRP1 M7pKi6XNWIYU0q39a7MpeiZx/Gg/pbZ6J5lMR0vAIXTF9FkWan2wzYMXC/Bi/R8tlgtY +iGwZn184e3vINtl8xaJbb/H4NjnJXIiD/PgQF+eN/Yhk7qqK5adj0GdIe/B7rEozJTu u4LwstAVlLUVj87GVeqS3KpETZNSMNi6IjEYguYUBbogxmD4Y0Giz8Bs6YRNv9KCUUHE r7tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699671953; x=1700276753; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VfzDqMYQmBmjq9uvQPvzGmUJyL/1vfZVHtCKJYCOiuY=; b=qrZU9qoJ+YSw3gNnImFaHksdcaV3PNeHarMbfts6rWMhQ9DyzIsEL5KLqzp3oYb2TZ skiVe9dpf5usXJB/PsB0eW1kgCnQ1IbBVmO/GsZgpQh09F5pqIBOoNQHbO/0QZ0J4lRl MawhXGZ5Pg49vbHRbo01SmfLcTMiQvrHGquECmL+vmkzAWc9oyNYC6xLjg9WhO03VB7/ TSBfXMKx2/v3Z8zG8LivkPk+OP+VV6qDq8HQ/qpz6TDfA68q2CO3VjU6uOq2498vtjzS 7PYCVjFhtYUo7nuR5MyCMtH2NKZXqshe0UbkNE0tPpxSOdhshJkJnMNCf5nS1t1yWUY6 Tn1Q== X-Gm-Message-State: AOJu0YzypFy3ljjlMBuQ6TSmHqU/jeiEyufrf6zZLGUXeFPTFQIUxIlG pJGT6hRJF9WZDbs7J5SEBtk= X-Google-Smtp-Source: AGHT+IHUPJgHMXEV/EyUFO/yC9XqxCh/1abXVk25+sGMhpTAXc5HF4ljKpw3s0KUaXLtUJycmLQLcw== X-Received: by 2002:a05:6a00:98c:b0:6c0:3e5e:1d46 with SMTP id u12-20020a056a00098c00b006c03e5e1d46mr939728pfg.21.1699671952669; Fri, 10 Nov 2023 19:05:52 -0800 (PST) Received: from localhost ([2620:10d:c090:400::4:7384]) by smtp.gmail.com with ESMTPSA id ei23-20020a056a0080d700b006b5922221f4sm422487pfb.8.2023.11.10.19.05.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 19:05:52 -0800 (PST) Date: Fri, 10 Nov 2023 17:05:50 -1000 From: "tj@kernel.org" To: Gregory Price Cc: John Groves , Gregory Price , "linux-kernel@vger.kernel.org" , "linux-cxl@vger.kernel.org" , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , "linux-doc@vger.kernel.org" , "ying.huang@intel.com" , "akpm@linux-foundation.org" , "mhocko@kernel.org" , "lizefan.x@bytedance.com" , "hannes@cmpxchg.org" , "corbet@lwn.net" , "roman.gushchin@linux.dev" , "shakeelb@google.com" , "muchun.song@linux.dev" , "jgroves@micron.com" Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control Message-ID: References: <20231109002517.106829-1-gregory.price@memverge.com> <0100018bb64636ef-9daaf0c0-813c-4209-94e4-96ba6854f554-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0C00BC0019 X-Stat-Signature: rqymeoog69ii7y6utrci9hnqnz9686b7 X-Rspam-User: X-HE-Tag: 1699671953-561213 X-HE-Meta: U2FsdGVkX1+kv9FF1rjnzpsxfksyDLDHcxhFZqBeREEkF+5edx0aFlDm1ZhPhc+rdtlaWEOZIFgYiV29fnlhXumvD7/3X9aBfuxTgq2Y5zRs9NCVWT/xmC9IQKCnOqB919z5/JPAhsYpwxChGaVnlOUPVw2sC/3Ey0Z4MqgiFl2QgVzpA6Vg9hgg1S6D33J2V0iwrQkghCYX0D0YhjJfh7VfV3j/NsRQebCTucBMrnKqL+n5Qw295sLXoJZPzOvV5qcHK8v3PXQkYNrHfNrr+rF9rPdbvwEygb8wex7v7VbgWfB3DszZZVuppAitT/wSUYRZedzp8BOLS0CVaTJQKJJNyeRXPMQRTdmLgKHsiFrQmsX8Bz2hU4Hzv6jQw7G/EPDiZ9LC6e7tdnD3FZ2HsvBIKTUEqj5u7gwvoLOD47IDm1/9eNUft77jan7okwaSl7x+NrSPSP/u/CUW9ESt84IX84rzlBb9VmxboiT/4SP342LBCiJmCf38WyFNXb46vbPMkwQVjZyQ8ZiYkflocqMSDUUPmGEZTC+RX6RBbt0IpEkKEev5d8IEbOBsFs+5+kvHI9+h9sZVKCjmaPkcssck4tXU1CbXqtt6Ufmd4bthTtdB0n8zOHhx9T6xtUgVJwQXglnvwbF7DaBCuqq4cLLfn12LBvHW26qBGI2DOAeOW/rKsuJ28RoVKgolfPzpxJXNych5NllA5zjBPriP0I6mEc1+vUaqygfJMVNGQ+B5JdaS2UXDnXFMcV48aqehtJ5ZJYOU0UgY5y++yPd69aksi25z//UL5VYbYWZm9/JUd9ta3WR/ZnGUUB9voCOmUZBLgf19xww/ZQMiiClio/H0/RWxMfwj1duBNv5Bq5gNUSnnVD3zkysxJtWX3iOIuYcedTWQAxAPPyeSJuxtmePci9m78Z+Yg5HaFa6Hm5fiRIOp2NegWQg0zSJtbB517BMhqED6i8+1oF3xF7X G3thXpG/ dkqrOkHzBZl0goKs/WRF+0IRJ7VoUBGd4EgMFp+8BEYj9QKOMH2EFpuEX0dYm/8uo4ZX7aCQvxE40H8XrtyHU9SYDumT50b0VL+G+SiDBGZEYmmGFxxKCaPfztyuynmivqcix6WOTvg4/mvc8dpYjQOjcAE1D8ADRj5D0P/M1TrhYdzuicP9155YEbm40JBYkSbVS0PlOiuiGdTKjt11pSTUBBaz5hM7nL4l1joh4YFR/jQDUh4Adh2kr9OMQujYaEHUrC4XYIp1c09+/7vYBGW2almW9T8+IbC38uy8TQzF3ril/zarVdO7yMqceng4p8Ypy6QDO21z0H5+dgY9PUmauxC7+NiIKjxVfvMAVrIQMZkz1R6gw1aD9wjijHOyuYLeRDtO+Dq+5s4fHgD7oiIWwa7aIe7At0HpcB0lPFLP3V0FZBbSWbwKXq0YRDUoZz+MBkfM3gKTFWn/wHdMwx8e3MV7u7SYJ32RvVcg8CHVaD2w4qTVZDvsaMaotOH7q7zUKVNH22YTn0GtRRY+MjRqt+YtPyGp7QQoLUXqHg/xNjQhr1pgXxYoC/YxwQ72WuyCJKPMviOJgllDyhcEbwjF4xrpYr40xCcmA/TD0Qsw1Tt5Fyl6gXlRSryJn/OYXobakQpUOpgL8nyaRli57a6rzNxfcrQ1WNOob83SYLhTwAcAxSu50stGDqieWrB/oKP0R2D/zXofr9RPMk18Haq78cQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello, Gregory. On Fri, Nov 10, 2023 at 05:29:25PM -0500, Gregory Price wrote: > I did originally implement it this way, but note that it will either > require some creative extension of set_mempolicy or even set_mempolicy2 > as proposed here: > > https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@memverge.com/ > > One of the problems to consider is task migration. If a task is > migrated from one socket to another, for example by being moved to a new > cgroup with a different cpuset - the weights might be completely nonsensical > for the new allowed topology. > > Unfortunately mpol has no way of being changed from outside the task > itself once it's applied, other than changing its nodemasks via cpusets. Maybe it's time to add one? > So one concrete use case: kubernetes might like change cpusets or move > tasks from one cgroup to another, or a vm might be migrated from one set > of nodes to enother (technically not mutually exclusive here). Some > memory policy settings (like weights) may no longer apply when this > happens, so it would be preferable to have a way to change them. Neither covers all use cases. As you noted in your mempolicy message, if the application wants finer grained control, cgroup interface isn't great. In general, any changes which are dynamically initiated by the application itself isn't a great fit for cgroup. I'm generally pretty awry of adding non-resource group configuration interface especially when they don't have counter part in the regular per-process/thread API for a few reasons: 1. The reason why people try to add those through cgroup somtimes is because it seems easier to add those new features through cgroup, which may be true to some degree, but shortcuts often aren't very conducive to long term maintainability. 2. As noted above, just having cgroup often excludes a signficant portion of use cases. Not all systems enable cgroups and programatic accesses from target processes / threads are coarse-grained and can be really awakward. 3. Cgroup can be convenient when group config change is necessary. However, we really don't want to keep adding kernel interface just for changing configs for a group of threads. For config changes which aren't high frequency, userspace iterating the member processes and applying the changes if possible is usually good enough which usually involves looping until no new process is found. If the looping is problematic, cgroup freezer can be used to atomically stop all member threads to provide atomicity too. Thanks. -- tejun