From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1693C0218A for ; Thu, 30 Jan 2025 17:08:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 34D3E2800B5; Thu, 30 Jan 2025 12:08:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D60C2800A5; Thu, 30 Jan 2025 12:08:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 128F62800B5; Thu, 30 Jan 2025 12:08:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E4E0D2800A5 for ; Thu, 30 Jan 2025 12:08:06 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 240F942EB3 for ; Thu, 30 Jan 2025 17:07:44 +0000 (UTC) X-FDA: 83064750048.01.77E4D8E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 5D721C0012 for ; Thu, 30 Jan 2025 17:07:41 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=E8IW0Y9g; spf=pass (imf22.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738256861; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n0kKJ1G6ZjljaFWbHC2zuPfe2Sm4wdicJSq0zfASmEg=; b=nPLP4rwrnJJTpqjpQ6+Jwx+5Fq/HXq6ov0D4JzpGNbv3OHYEpHW+YbpIhFDDfEKvN0qXKv Yet/cLtwPDKTIF68W59X/DI0UcqtPGM3YRRfFg3wb6Yo2OtWZvO+VZaKVWlWngwls3zDyi ArceXZ9PT0/S9JRSZ2DUelWYjjpAVDQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=E8IW0Y9g; spf=pass (imf22.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738256861; a=rsa-sha256; cv=none; b=4DVgSY7QHmyZW8kA0dHINnmUMEi7nzfHQEQo8y4kbiETOgZFNYc7E43fCVEtQQ/5qEaOjC hQwXNnWTHJ+DPMzfQ98QX3wnW55JCWIOTgPHEG0AXOP1dZKNttbC1o2vaIn+Jp/4DoaLFb yeRRSXrVHWzg6VFP5ZDrY4fy7B6C93Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738256860; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n0kKJ1G6ZjljaFWbHC2zuPfe2Sm4wdicJSq0zfASmEg=; b=E8IW0Y9ge5tkppek8PaLCNMJzRTWYo3GX3jJeN6Mqg3bInJTVUzdjx7SltSV/8B0agOXtc g8vG6ViK4RcytFCwf4CIhXoGyncTwEDcgdo7Vwqbg1/QHL+etCdDQDUb9rwzLzMdqIMLLh NOSmBkW997rTFRqKGLbCH6i3OzXSjTo= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-630-WrSHW2OMP6CZ1iHX2JVLMQ-1; Thu, 30 Jan 2025 12:07:39 -0500 X-MC-Unique: WrSHW2OMP6CZ1iHX2JVLMQ-1 X-Mimecast-MFC-AGG-ID: WrSHW2OMP6CZ1iHX2JVLMQ Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6e2378169a4so22814036d6.2 for ; Thu, 30 Jan 2025 09:07:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738256854; x=1738861654; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n0kKJ1G6ZjljaFWbHC2zuPfe2Sm4wdicJSq0zfASmEg=; b=wXF9lLGLGgbaKNpknhSFZiGlktpDOrWxu1EpwiOLg1WhdejD0BhErDDpuy4KwYc1uY UR0nmJvzKnFSMl0s+XvX+udM8Z/qxyS5yHRmi1Ni6Pafd4xCwWM2O1DfaKrQ5tbg8Dtr TSTqzrvK4oCpDciSkdq9N6Dt2ByZvGq5h8gtY/uff2Kia1Hxy4io6PVjl2Xy7q/2EQ1M j1v1XTvMeaM5J/ATkVkH4ngjuZnWOO8B/OM6jhBWCuoi9mVdIxH0R7FFlxCPh/AkfxnZ ZRrLf/kwm+M4z+lB0qZbh34x5A2lOw+GututBi/Dd5qVbWk3Ymvz8FeHrsUmKuNxf0nO T9Ow== X-Forwarded-Encrypted: i=1; AJvYcCWCpcBlveMH+FaVFe0euKrekLctpo+HZDFszlXEecSa4TosfOwaD3Gja4ey3FNYy+yL+sVDuiUX5A==@kvack.org X-Gm-Message-State: AOJu0Yy9pQKvOHbMC2ZFvuMn0Vr5yjam8654aD2AiIa9PkUsYuzmOQ1r I9cIvm9zITWuVJOp3INIl5msb7CRYgZR3Nxjeb18uVdZ6AX8qGU8abD2WZJhKPhhNH+VRTk/14g BbJ2ZE3dDGNwhg5+dPmqLIXax4daCg2BnqYl+CYLRp7MOnNuD X-Gm-Gg: ASbGncuD53WXcEkIIADzjbbPo9sE3F0/ZUKjD90nsazQT0Iq+dQgC/y/TSvsALLtzNx FIo21TkJ1SgAM3FRNEe3nAzQMN+x5+fAgLubCOAKo/24jDsnE/6HI+7muoFhDqRuD/VsYLEHjjw lgcSZdsyn2AYxKuG60WIk+doET3E3YSJgB9QYAbx8/PW2LIX3i+uPPb5X4aqIjW8yZKujX2wt0m weslO2HD5c25uAPKpZvh7drfcLDcioh5c7BWy5/7mKAweLQ+5TLeNEmaj/s9/xYFuWsd/Gz86et Y/qyEUK8KKngxVwkn5plJNZyTUBD+1pgolJqMdmctNxvL+/rb5g= X-Received: by 2002:a05:622a:199c:b0:467:5f17:94d with SMTP id d75a77b69052e-46fd0c03bfcmr133200131cf.52.1738256853796; Thu, 30 Jan 2025 09:07:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IGZoq1XD68jE07xNm/4TeJ099xQ48miPov6sXEUGROQlV7OEIV2eqoSMcfINVON3Hybe9Q/fw== X-Received: by 2002:a05:622a:199c:b0:467:5f17:94d with SMTP id d75a77b69052e-46fd0c03bfcmr133199731cf.52.1738256853411; Thu, 30 Jan 2025 09:07:33 -0800 (PST) Received: from ?IPV6:2601:408:c101:1d00:6621:a07c:fed4:cbba? ([2601:408:c101:1d00:6621:a07c:fed4:cbba]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-46fdf0e2d7asm8648331cf.34.2025.01.30.09.07.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Jan 2025 09:07:32 -0800 (PST) From: Waiman Long X-Google-Original-From: Waiman Long Message-ID: Date: Thu, 30 Jan 2025 12:07:31 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle To: Johannes Weiner , Waiman Long Cc: Yosry Ahmed , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Jonathan Corbet , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, Peter Hunt References: <20250129191204.368199-1-longman@redhat.com> <366fd30f-033d-48d6-92b4-ac67c44d0d9b@redhat.com> <20250130163904.GB1283@cmpxchg.org> In-Reply-To: <20250130163904.GB1283@cmpxchg.org> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 8yZ17TAwmWNDHQr4rEqt2r_jDL1lISN_ljR_S22eFFM_1738256859 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 5D721C0012 X-Stat-Signature: c3mk7umwa8bzfsn334wy7cpzzkmwjfrb X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738256861-584310 X-HE-Meta: U2FsdGVkX1/eDQ0RWd1xad6oEQXeLJJVU44RgKXqDFiXLbljnugsmNKpzt5GJLNeVr5eFyzxDCnoDBecmP/34FwvneDNObquNLFmF2EmkglN15PMAtM3ZD9UVmQ+1UomjPYQAFkRKDDF/ahJsZ1Dq8Sf3nk3HVI43SLsLzlPmfeL9B27IIFDB/k7m6PrNTOqgVgh+3iBiUfcAO6MPR+Ov8GD9TB1EUNuwsT0lzE8ivcEIF7N5Fx5glyRxZXo60RjRcH0Ilk264qABljIJKZ5C123O8ZSz3UYPc6OGXjyVoxFMAqeoqqJ7d04YpjheNnEQaBdhuPihtC9xxkD9XdaCkOVoDmcqPj60oQ5ofsstXBDfk1LwbDVS9cLnkEXTHPlmSExRQTm1JU61vfbY4SDlJ1A2k3dovxL9pPuCiHdGJ/+09tNIZs+a/bBH0t9HcAof9+Nib52tmYB1HKpj5egVyGglpG+dmF9F+Rq8kc6fInkaVTDhpYV+JW15xI0wHA9gRRGlU6gQuEmwml5VMZ0ljm4m694B2YhCBc3DtSpBYW31b5th3YU02eNrhCAs7u0NFxQMsHCPX4tBbKDF+R1SMbic5ddRoy6/YNZTp7w0vuu5FP4MkCpyoF2Xu6enTNVH6zDu3J4RC6SHVUvaFzzAE/5sbMdywOIYF+Go1UBqzVXGVgwYKsPpLgDdbA5ripm0LP6BU09LH1T7000vBw6FCpJoZt3sW0bhC3Qrb4egpSlUa1GCRuVwkAT9EqMmj6FLpRL75pF92inOhQkAmlJbIHDqG9bd53WHm4gbMiTdcFEPAR12O3cCb/ISg/bDBmXVm0Ymvi76cb/DIdyVuqhX1hDoMa0q0vavoXpaW9Krf5ApFYbpc+C2JJgks2418bniWiOzTKHosHlTijK7tBE3TQ3UvytTee03lnnOO4ZRvi9Qs/nu8HltGIMRskKZCAOhORvDPfARnF8UYsF2sV Bm5X/Rez BsCAFyfeYKf82XMLhB0qRJguQcmxcM6d2wRYOisFhGR8Ma9XdnNBx1K9HnJU8+YbKj3fzR4K+ns/QdXfu4/ii9rEc1puiW1+aWexlIwlBkwUJYtd2ATGg6N2ZyYOkDzoUjQzjZRlvxlDrzV02UQMIqVbpUODBjty93sjX3OAyBE/9hMbiVFkiuqwXRFbHYltqPPw3e2IlUR+WGbokPoe8nH3eBAaFasYVx/JaCXLUwb0qPiGGtPcDWh/cYXT48GeMnCbALat3wHSxHL/XL9M0MvoSC0TkCjKxkWSAIN+QOn/xJSxJbTTAt9ssnGH6spYKlsGdYHVEjH0Z5+H7EK0knI2cvU7NU5PtH15yq7J2hS4xcplqn5p0KD0RVNCS+0OUxIJTF1ThexDSfezD7Ir2EnQqGTKSadI7yErLYk90W8stZKpf+TcyMG8+AA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/30/25 11:39 AM, Johannes Weiner wrote: > On Thu, Jan 30, 2025 at 09:52:29AM -0500, Waiman Long wrote: >> On 1/29/25 3:10 PM, Yosry Ahmed wrote: >>> On Wed, Jan 29, 2025 at 02:12:04PM -0500, Waiman Long wrote: >>>> Since commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing >>>> reclaim over memory.high"), the amount of allocator throttling had >>>> increased substantially. As a result, it could be difficult for a >>>> misbehaving application that consumes increasing amount of memory from >>>> being OOM-killed if memory.high is set. Instead, the application may >>>> just be crawling along holding close to the allowed memory.high memory >>>> for the current memory cgroup for a very long time especially those >>>> that do a lot of memcg charging and uncharging operations. >>>> >>>> This behavior makes the upstream Kubernetes community hesitate to >>>> use memory.high. Instead, they use only memory.max for memory control >>>> similar to what is being done for cgroup v1 [1]. >>>> >>>> To allow better control of the amount of throttling and hence the >>>> speed that a misbehving task can be OOM killed, a new single-value >>>> memory.high.throttle control file is now added. The allowable range >>>> is 0-32. By default, it has a value of 0 which means maximum throttling >>>> like before. Any non-zero positive value represents the corresponding >>>> power of 2 reduction of throttling and makes OOM kills easier to happen. >>>> >>>> System administrators can now use this parameter to determine how easy >>>> they want OOM kills to happen for applications that tend to consume >>>> a lot of memory without the need to run a special userspace memory >>>> management tool to monitor memory consumption when memory.high is set. >>>> >>>> Below are the test results of a simple program showing how different >>>> values of memory.high.throttle can affect its run time (in secs) until >>>> it gets OOM killed. This test program allocates pages from kernel >>>> continuously. There are some run-to-run variations and the results >>>> are just one possible set of samples. >>>> >>>> # systemd-run -p MemoryHigh=10M -p MemoryMax=20M -p MemorySwapMax=10M \ >>>> --wait -t timeout 300 /tmp/mmap-oom >>>> >>>> memory.high.throttle service runtime >>>> -------------------- --------------- >>>> 0 120.521 >>>> 1 103.376 >>>> 2 85.881 >>>> 3 69.698 >>>> 4 42.668 >>>> 5 45.782 >>>> 6 22.179 >>>> 7 9.909 >>>> 8 5.347 >>>> 9 3.100 >>>> 10 1.757 >>>> 11 1.084 >>>> 12 0.919 >>>> 13 0.650 >>>> 14 0.650 >>>> 15 0.655 >>>> >>>> [1] https://docs.google.com/document/d/1mY0MTT34P-Eyv5G1t_Pqs4OWyIH-cg9caRKWmqYlSbI/edit?tab=t.0 >>>> >>>> Signed-off-by: Waiman Long >>>> --- >>>> Documentation/admin-guide/cgroup-v2.rst | 16 ++++++++-- >>>> include/linux/memcontrol.h | 2 ++ >>>> mm/memcontrol.c | 41 +++++++++++++++++++++++++ >>>> 3 files changed, 57 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst >>>> index cb1b4e759b7e..df9410ad8b3b 100644 >>>> --- a/Documentation/admin-guide/cgroup-v2.rst >>>> +++ b/Documentation/admin-guide/cgroup-v2.rst >>>> @@ -1291,8 +1291,20 @@ PAGE_SIZE multiple when read back. >>>> Going over the high limit never invokes the OOM killer and >>>> under extreme conditions the limit may be breached. The high >>>> limit should be used in scenarios where an external process >>>> - monitors the limited cgroup to alleviate heavy reclaim >>>> - pressure. >>>> + monitors the limited cgroup to alleviate heavy reclaim pressure >>>> + unless a high enough value is set in "memory.high.throttle". >>>> + >>>> + memory.high.throttle >>>> + A read-write single value file which exists on non-root >>>> + cgroups. The default is 0. >>>> + >>>> + Memory usage throttle control. This value controls the amount >>>> + of throttling that will be applied when memory consumption >>>> + exceeds the "memory.high" limit. The larger the value is, >>>> + the smaller the amount of throttling will be and the easier an >>>> + offending application may get OOM killed. >>> memory.high is supposed to never invoke the OOM killer (see above). It's >>> unclear to me if you are referring to OOM kills from the kernel or >>> userspace in the commit message. If the latter, I think it shouldn't be >>> in kernel docs. >> I am sorry for not being clear. What I meant is that if an application >> is consuming more memory than what can be recovered by memory reclaim, >> it will reach memory.max faster, if set, and get OOM killed. Will >> clarify that in the next version. > You're not really supposed to use max and high in conjunction. One is > for kernel OOM killing, the other for userspace OOM killing. That's tho > what the documentation that you edited is trying to explain. > > What's the usecase you have in mind? That is new to me that high and max are not supposed to be used together. One problem with v1 is that by the time the limit is reached and memory reclaim is not able to recover enough memory in time, the task will be OOM killed. I always thought that by setting high to a bit below max, say 90%, early memory reclaim will reduce the chance of OOM kills. There are certainly others that think like that. So the use case here is to reduce the chance of OOM kills without letting really mishaving tasks from holding up useful memory for too long. Cheers, Longman