From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 701BEC369AB for ; Mon, 21 Apr 2025 21:34:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B81CF6B0005; Mon, 21 Apr 2025 17:34:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2FC66B0007; Mon, 21 Apr 2025 17:34:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F8576B0008; Mon, 21 Apr 2025 17:34:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 80E806B0005 for ; Mon, 21 Apr 2025 17:34:54 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2376C80FFB for ; Mon, 21 Apr 2025 21:34:55 +0000 (UTC) X-FDA: 83359356150.15.6DF5D31 Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) by imf12.hostedemail.com (Postfix) with ESMTP id 32E1740002 for ; Mon, 21 Apr 2025 21:34:52 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=H1M8vOPg; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf12.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745271293; a=rsa-sha256; cv=none; b=zFYpCRDSP1iZMT4DMjw9PUU/93hM3w+GIF9VMt3TMzfJQ1dibPwy9aeU3YaPSVw6558mhT HCi4T2XGignbNtUbaFZp3iPrcn1hqr4WxoIU2LIhLTjBwONjyrhDbcyrFeMf1lOhsmEHeS 99tu/I4HBRWKeQ5jTO0JeQK7YLDPJ4I= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=H1M8vOPg; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf12.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745271293; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H/P6KO5tmhrmW7ovw7XWmmzoB4BERa9BgtVy7OgEIno=; b=Y7XgeEsRKi9FdOdXVwxr0kRpNTHHL7/GBXOoohglHEvS+Udwxbcp3PmRYWEtNuqljwb46F 37KrKFzPbBEyFFYsIzdJM4LGsZE1p7z/wm+7ITFQb78fa1YNd/EdTgxOoM9rp15dVAYrzR +QRAlUybyhY9zUla7lMRKHPBOCIMUqQ= Date: Mon, 21 Apr 2025 21:34:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745271290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H/P6KO5tmhrmW7ovw7XWmmzoB4BERa9BgtVy7OgEIno=; b=H1M8vOPg30LswLGaE8LCWjWxA2FMnO6Ya3I3SjGeZyEdGb5wh71XL+MpMLezULbnwcojD+ MrJ4HfPEtbJ73NLjwmyCdhPdyORWn6TYDM2v0lghbYCBO/fUbbtEXH8RiBMhguZ14PJ0M3 GjUQp+jm7jol1Y0FgvGCzIkxogYd3Uw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Muchun Song , Yosry Ahmed , Tejun Heo , Michal =?iso-8859-1?Q?Koutn=FD?= , Greg Thelen , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: Re: [PATCH v2] memcg: introduce non-blocking limit setting option Message-ID: References: <20250419183545.1982187-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250419183545.1982187-1-shakeel.butt@linux.dev> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 32E1740002 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: zaqgy585j8nc1mckqsjeq83acgqjezoi X-HE-Tag: 1745271292-706865 X-HE-Meta: U2FsdGVkX19skkELsvyBzhisW6AkjPJicfE5JwffEiVYsVBnc/LkozUIZSq2FFWp6+ocl9fFt7JZvb/81a4Fy9O/pUCRnIXFqhs1NAxrbdso5QSTLijZ1AXMmLPVm/0fgUob2EkhJJGoVSFfGxDydoC0CtX2FiZBz+OsVf6dAxDvixClZQ0gXxcOZcbUc6w1QE5yB/PVj/wQqurObHO5xJ8nseJ0jrI1zB3ZHxB3yunNYN4qxD2wA3bulJ3ZMoxuOfp3W3r99U6uY2ENEEzLYWp7Fhfg07KDXKiT58sltXV7Lgt0rSscqEEdehs6h6bcZdHfbIo3S/UPUMiVR50gXaHiB1JrTSVmEQTj9XDM5FamZt6YI++CDsNkP2dSJ/FQLgMGJM58eq9oMkCtuAMNtcYvzHQRSjeW9zrBG6tIk7hE3gypSkU0D4tavVOl5JyWUX2zFovS55UeKIZKVGR3/ODJs6bGNIswez4CQfLqMpaFOB3I9T/6LL68pMj9h2+n75XTeZJkRzb7lew/IY6Sfc6GaQs3os/6Rp3UpxgaQiVZzjnK/v+2xhPBErWpsRftDSfEkpetTkffm712TGKL94mtUFQaJjme91+sVYbGTey0CFMAMPGCoj053tKM11wmiiGjlEIILJIR8fZzkteIC95iDQO5Li7VhYNlMZqdHvvr3Gj85Qk4mrbp94bR/2NvwdpsBP5tVJGbbFtphrxytGTd21kynpzV+7LmaN6mA88Rnk2uUvK0fRyxGL8OUzHq7kWY5hIZwSg/pLpw+IwStdwI6WA9/tkXi8DaBNBj8XiX2dZMGdvdmWFhKCHEqq6PeX9o80wWsiOTCnyJti462Ev14uGSd6zHYEG1fRdFQ8l62GGh/FKjpbi69S7trYOL9sU43qVa0mJOp9TwvUpMD8SP2pnS2z8gh6NxGH/yS+cKzqZKiwz4v+3H0hErIk5s0alt8FvCg7lfxN6YNS9 LqVkZrNC 5XyB5MEITJpTQ91/gEDSIXyieqCqw++bOZAs6xBbMr6i+LU+MWsu/9u9mbX5D+pGEZ2CeA9VkA35+RCsgqDB34n1t+EnI/DRUyLGpJMoG1phhXHfpWSOznPOjIKaz3vj27x4QX2TxYGLwScEywlL1PHvn2mkrC1eHpixXtzBeK97a0CLZRBhpp9WB3VR0P6A9/KUyLlLXmDx1QZjb/WQ7b6Pvdg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Apr 19, 2025 at 11:35:45AM -0700, Shakeel Butt wrote: > Setting the max and high limits can trigger synchronous reclaim and/or > oom-kill if the usage is higher than the given limit. This behavior is > fine for newly created cgroups but it can cause issues for the node > controller while setting limits for existing cgroups. > > In our production multi-tenant and overcommitted environment, we are > seeing priority inversion when the node controller dynamically adjusts > the limits of running jobs of different priorities. Based on the system > situation, the node controller may reduce the limits of lower priority > jobs and increase the limits of higher priority jobs. However we are > seeing node controller getting stuck for long period of time while > reclaiming from lower priority jobs while setting their limits and also > spends a lot of its own CPU. > > One of the workaround we are trying is to fork a new process which sets > the limit of the lower priority job along with setting an alarm to get > itself killed if it get stuck in the reclaim for lower priority job. > However we are finding it very unreliable and costly. Either we need a > good enough time buffer for the alarm to be delivered after setting > limit and potentialy spend a lot of CPU in the reclaim or be unreliable > in setting the limit for much shorter but cheaper (less reclaim) alarms. > > Let's introduce new limit setting option which does not trigger > reclaim and/or oom-kill and let the processes in the target cgroup to > trigger reclaim and/or throttling and/or oom-kill in their next charge > request. This will make the node controller on multi-tenant > overcommitted environment much more reliable. > > Signed-off-by: Shakeel Butt > --- > Changes since v1: > - Instead of new interfaces use O_NONBLOCK flag (Greg, Roman & Tejun) > > Documentation/admin-guide/cgroup-v2.rst | 14 ++++++++++++++ > mm/memcontrol.c | 10 ++++++++-- > 2 files changed, 22 insertions(+), 2 deletions(-) Acked-by: Roman Gushchin Re stable backports: can you, please, share some details about the problem users are facing? Which kernel are they using? Thanks!