From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22CD2C0218A for ; Thu, 30 Jan 2025 17:41:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADBCE2802AD; Thu, 30 Jan 2025 12:41:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A8BBE2802AA; Thu, 30 Jan 2025 12:41:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92C5A2802AD; Thu, 30 Jan 2025 12:41:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7349F2802AA for ; Thu, 30 Jan 2025 12:41:55 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1E2A8C0350 for ; Thu, 30 Jan 2025 17:41:45 +0000 (UTC) X-FDA: 83064835770.01.3FEFBBC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 9215D4001A for ; Thu, 30 Jan 2025 17:41:42 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dc8w3AFt; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738258902; a=rsa-sha256; cv=none; b=AHPbyKTPwH+VF3aPGX5751ZLmTwqeQ151PKYohx8GHM4V6WQpdVAgFWL7pRwTJ4MC0YPsl 8BmTw0m7C0+TkfrpkqoIS0JUGu92Qqgacln/WsQFrJGFyFeSEP/Z/ZYXup2Ngh45ctk+Tn Dl7glZ+SaocUtbEs/JARtXP7/VUv6qE= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dc8w3AFt; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of llong@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=llong@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738258902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a+Tm4MMuMgMNaWCuqAB5supWyn/zNxaUSBKarskhg04=; b=GNexz72cpxmsgyVPikKxaJ9N41zHAueHfjuaBU/fN9xQWsM4SYlr+lO+smw/h1wLln7hrX ytx23VOIbAgFD9rZV9zDJRKlKw5BBlb8kEPP5cQSfIl2232oohJF5EraYSibTd7GdMNFef /wpyn0PUDeBJVkKX+EaAPRrA3zA7HAk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738258901; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a+Tm4MMuMgMNaWCuqAB5supWyn/zNxaUSBKarskhg04=; b=dc8w3AFt81TzTMR7puoyqf3i7newJeO8Kp70+WRT6tKXgBQ9OBWFck2Z2WO8S+bba3HzeO HHlAzeoPnsLDNNLUB6YI0ND0X3GBgkgJXMCoY5PzUpC/+aJzCNRdSk4LmrU0AS3kBa4yGc qknW/2t8KcEtqa33sVTiNQ4AsHmKpLk= Received: from mail-oa1-f72.google.com (mail-oa1-f72.google.com [209.85.160.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-540-6b19TiN_OpCnR7U-bXEWMg-1; Thu, 30 Jan 2025 12:41:40 -0500 X-MC-Unique: 6b19TiN_OpCnR7U-bXEWMg-1 X-Mimecast-MFC-AGG-ID: 6b19TiN_OpCnR7U-bXEWMg Received: by mail-oa1-f72.google.com with SMTP id 586e51a60fabf-2acd587d640so779266fac.0 for ; Thu, 30 Jan 2025 09:41:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738258895; x=1738863695; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=a+Tm4MMuMgMNaWCuqAB5supWyn/zNxaUSBKarskhg04=; b=cN6RxyhSulyBXX8uIgQDPMaOZ2Ym+DIj47J+8teKzZp8J1Fl9DJptnum4qkybBDc3l 4kgt4RxX8nw9ljffVwrA5erYyi7mP99q+XPDDr+Pvpa9DYB/7vwVivB4/wvC86xluIKy R0eHqbCUTowdVtWPw8mk4wLHlZBCPD1kb1jtTwD8XLIdcOBRjbXRhwk7rL4U6BMhKIyf loRJTFsSENicfJ25hoNDsKXy/sntsXRLpN/gTrnb9Enq3oKF0Zt8Sfej8OLLwN25LHea CibvVxnyxPTlXEwYUg6uTmxQAYBdIgckeEYEe8KZOQCMU7QiYJlMWRJ22Ll3Jpl7WMVD zcSw== X-Forwarded-Encrypted: i=1; AJvYcCUTNhUKv+HAQNFYCCRfRHOQjQoiC7WJ/OFYgNX4HbS1GnFg0PNQW7MAYjzB7LYYg/XplTW5QvNc/g==@kvack.org X-Gm-Message-State: AOJu0YwZmSCvuN9xigAzrKvxEn/KYqtGSBUNubQ7pQ5Ekw0WmP32+8yT oZHx0KvobUWPEIlkcFNZ3eTs2Qb2UCJ1YhXSJPOw8WiuQhddW2UBKhg2cJBUXfsdKpk96TwWfl/ qZIRGGfcO4No0Nv+Mhvc1bmajgmt50ItcTSaWOvE1TqLrHYNeuVQH6GCKxj0= X-Gm-Gg: ASbGncuimtWV6Hr2+YAovGfvM4PT6yQL6wvzUSuj4RONFsTxpBoDRXlTGLO8bnSQk78 Py0Yf+uIrgb8l+QKzx5YGtO7+1LSuOtvSWWfsb3w2cU6Hs5R8o1GAjh0KxdG2pYX9Pb0c05gAyt CzqtStTwjM+25vHhAvP2V5A/nKIUO9oN5BU5ZcypdV46PE4V29LJYZ4T64MIFDnKRDDZ4rg9gQK BXWLJLRbx2cfrIuaipOh9ie5eNwVtn9UAf6NdWbEX5Tb8iI6bLuGRPwVGxczsCqIjJIZfb2uaw9 soivqOnaIOtqMp3MPNQQMoPlWILrPg3FHWWSEmxkfAbOYPy76uw= X-Received: by 2002:a05:6870:56a4:b0:29e:671b:6019 with SMTP id 586e51a60fabf-2b32f284055mr5475595fac.31.1738258894763; Thu, 30 Jan 2025 09:41:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IHGKmbRw3DSmtj5+GCkf1IDjlUIoa6aTSZ6+uzH9hTvj6m3TrzQxzX+gu+fF8o1zU4D3OZgNQ== X-Received: by 2002:ac8:5f88:0:b0:467:4f0a:1b5d with SMTP id d75a77b69052e-46fd0b68880mr124572611cf.42.1738258881566; Thu, 30 Jan 2025 09:41:21 -0800 (PST) Received: from ?IPV6:2601:408:c101:1d00:6621:a07c:fed4:cbba? ([2601:408:c101:1d00:6621:a07c:fed4:cbba]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-46fdf0c62d4sm8874501cf.17.2025.01.30.09.41.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Jan 2025 09:41:20 -0800 (PST) From: Waiman Long X-Google-Original-From: Waiman Long Message-ID: <8785134d-3012-42c1-a67c-b64862d89fc5@redhat.com> Date: Thu, 30 Jan 2025 12:41:19 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH] mm, memcg: introduce memory.high.throttle To: Shakeel Butt , Waiman Long Cc: Roman Gushchin , Michal Hocko , Tejun Heo , Johannes Weiner , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Jonathan Corbet , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, Peter Hunt References: <20250129191204.368199-1-longman@redhat.com> <211b394b-3b9a-4872-8c07-b185386487d3@redhat.com> In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: DjITEpJimuJYH_LjN__8baUzH_cairH2JdAwvNUgg_g_1738258900 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9215D4001A X-Stat-Signature: qe4qw7a9gh6o9i4t3ypncu9tw8hydofm X-HE-Tag: 1738258902-945889 X-HE-Meta: U2FsdGVkX1/7/Ll9HVIoCGGhaIjohmYnmT2ayla6i2z9ohKaHFJ77gNRuBlaW5qyVUG5/RAkkq1IOqwYB2U02G394wOSq0Hy9YgKIUSaGBMKGegYIX6/1Kd2Mfbbvp4CN+Guzwdwd2UaAzT02CSsjxlTXEznSnhqpP0sLC/t2Wo/p8MCPm30NdEzYoLb8Najs4BUaFSmLz0JvYn67bZ3JZ9FOfc+bsuoM1f4Gh2ptSpqTJB3YbYE8kU36AVFy4JZNWs1W/U2lhVIWvHULflc5VQtKUiRNH5XojaUD8XnARDpWZJvlH0auU3f9DBvmM+FfvRgTufn3fJzRQtvvBJnPro+dJpfNihJdJ6I1EcFKYB3hdLFtva9e657rMZcAvT2gmoz+ikxyRDlOozF5RIPKBukE03vOk+FBBJmtGdDlJNl6TiFLdx053SgqtL/ONwN5VQZM7h1Ko+MNRa4aTsJrnhFWdv066Uwo+5iS22RUpUtLLPc/qocQJmWBBJjdLHv+x3XghiDsOftR05aIvrdqOuGMiEio+Avbplpa/1Ruf7CRyCdp79KL7aar88lzIS9FdoGFYxenKvyne60LPvfs8EyTIQ8xnC97GgdIhvzyuEzUimbeuQdDz5gYb3Jo1j+s8Eshxs/FHObrBSQLubXsBxTKCr95/OZgrPOvkmqeUfvdFLwWu0jwHCPjmRrgoy0j9dn/AkphxlNqxnQsAMKF9QXXrhB3QnygXgttlwzzIdM2IabkbKoAUni1J+X0MggJo3P2w7HuIfVYzgvxxjAlifbj+pmqsq0QKNIlI1FNwHWf474NInvFKAKJUQmKRHkve72EOrxA+7n7C5TFXUAuLMXMAvu/BULu2NBa9sAYVktLdhyaO0cUsLB4YqNAY+BDtP+iXZ+xOkbEDg3vd57KlU6Xl2k7Cg1NCSPS2w2DMTVIBJgeFKTBGEEfhLARZzHx7PYnbUVJKvedKBK8fr Lumypcws mVyIlcLm64pNHEfh5RhwTOqSYIv4ax+eff6zQNPhUElWsGcOQE4FUyWsKocktbaB6JxfFFIAuYP505CIULt6FxMf1VNpxmBeEm9RBtPrvlCliu0YPygX9IAy9f/CU98HOrfnJYAnn3bt1bpqXowiepviKNOj4zqksxE44U3H5BsD+0abscqWwSMO6aoKN47ztO0uccrgEvjSBHa4xTWB/LgudqkS5y7+KeZJrENN2VnwYncKDR5DwTi/aOtYu9sgCSS6/RrlktTi45dWKSTDlMZpIM3LUagOLiq8eqftlKt2qoK0lIF6MkatyvBDJMyzR0gFaGgCZo9dTAtryTxiOh0V8B1ke+Ew4WJnNHTb2ZnyuCkoFJchi0soSRVrtBqy6kRV7A2kxiKb05C6X4ub80PGbhb+ixNIWw40mws5YdLgJtHs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/30/25 12:32 PM, Shakeel Butt wrote: > On Thu, Jan 30, 2025 at 12:19:38PM -0500, Waiman Long wrote: >> On 1/30/25 12:05 PM, Roman Gushchin wrote: >>> On Thu, Jan 30, 2025 at 10:05:34AM -0500, Waiman Long wrote: >>>> On 1/30/25 3:15 AM, Michal Hocko wrote: >>>>> On Wed 29-01-25 14:12:04, Waiman Long wrote: >>>>>> Since commit 0e4b01df8659 ("mm, memcg: throttle allocators when failing >>>>>> reclaim over memory.high"), the amount of allocator throttling had >>>>>> increased substantially. As a result, it could be difficult for a >>>>>> misbehaving application that consumes increasing amount of memory from >>>>>> being OOM-killed if memory.high is set. Instead, the application may >>>>>> just be crawling along holding close to the allowed memory.high memory >>>>>> for the current memory cgroup for a very long time especially those >>>>>> that do a lot of memcg charging and uncharging operations. >>>>>> >>>>>> This behavior makes the upstream Kubernetes community hesitate to >>>>>> use memory.high. Instead, they use only memory.max for memory control >>>>>> similar to what is being done for cgroup v1 [1]. >>>>> Why is this a problem for them? >>>> My understanding is that a mishaving container will hold up memory.high >>>> amount of memory for a long time instead of getting OOM killed sooner and be >>>> more productively used elsewhere. >>>>>> To allow better control of the amount of throttling and hence the >>>>>> speed that a misbehving task can be OOM killed, a new single-value >>>>>> memory.high.throttle control file is now added. The allowable range >>>>>> is 0-32. By default, it has a value of 0 which means maximum throttling >>>>>> like before. Any non-zero positive value represents the corresponding >>>>>> power of 2 reduction of throttling and makes OOM kills easier to happen. >>>>> I do not like the interface to be honest. It exposes an implementation >>>>> detail and casts it into a user API. If we ever need to change the way >>>>> how the throttling is implemented this will stand in the way because >>>>> there will be applications depending on a behavior they were carefuly >>>>> tuned to. >>>>> >>>>> It is also not entirely sure how is this supposed to be used in >>>>> practice? How do people what kind of value they should use? >>>> Yes, I agree that a user may need to run some trial runs to find a proper >>>> value. Perhaps a simpler binary interface of "off" and "on" may be easier to >>>> understand and use. >>>>>> System administrators can now use this parameter to determine how easy >>>>>> they want OOM kills to happen for applications that tend to consume >>>>>> a lot of memory without the need to run a special userspace memory >>>>>> management tool to monitor memory consumption when memory.high is set. >>>>> Why cannot they achieve the same with the existing events/metrics we >>>>> already do provide? Most notably PSI which is properly accounted when >>>>> a task is throttled due to memory.high throttling. >>>> That will require the use of a userspace management agent that looks for >>>> these stalling conditions and make the kill, if necessary. There are >>>> certainly users out there that want to get some benefit of using memory.high >>>> like early memory reclaim without the trouble of handling these kind of >>>> stalling conditions. >>> So you basically want to force the workload into some sort of a proactive >>> reclaim but without an artificial slow down? > I wouldn't call it a proactive reclaim as reclaim will happen > synchronously in allocating thread. > >>> It makes some sense to me, but >>> 1) Idk if it deserves a new API, because it can be relatively easy implemented >>> in userspace by a daemon which monitors cgroups usage and reclaims the memory >>> if necessarily. No kernel changes are needed. >>> 2) If new API is introduced, I think it's better to introduce a new limit, >>> e.g. memory.target, keeping memory.high semantics intact. >> Yes, you are right about that. Introducing a new "memory.target" without >> disturbing the existing "memory.high" semantics will work for me too. >> > So, what happens if reclaim can not reduce usage below memory.target? > Infinite reclaim cycles or just give up? Just give up in this case. It is used mainly to reduce the chance of reaching max and cause OOM kill. Cheers, Longman