From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0832EC369B1 for ; Wed, 16 Apr 2025 08:29:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6EF56B01FA; Wed, 16 Apr 2025 04:29:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF5C36B01FB; Wed, 16 Apr 2025 04:29:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9716D280001; Wed, 16 Apr 2025 04:29:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 771896B01FA for ; Wed, 16 Apr 2025 04:29:26 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C16FEC0335 for ; Wed, 16 Apr 2025 08:29:27 +0000 (UTC) X-FDA: 83339232774.04.A3F7DCB Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf23.hostedemail.com (Postfix) with ESMTP id E6A68140004 for ; Wed, 16 Apr 2025 08:29:25 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=foBHMYoU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of jingxiangzeng.cas@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=jingxiangzeng.cas@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744792166; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ovSlUrKCeKW3pPZbEQAJqu/9QO1u+A25+RyFvDwdr+Y=; b=PtKAnYwHUvkoHx12Tegaj5KxBR/laCXK1LPp60zEXz8XUBEv7ymwTS4nXrrDh0wgwIy5F7 4zUKYwLVrMnCmdJH8i44UDX59OZzORYIftv0fUppVB8HIB2yK36KAIRdX26j5JfSdarL7o v7cYwMmBveX0ZFgfFL6ZqUqtYon62H0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744792166; a=rsa-sha256; cv=none; b=LjjuQGufzldxpXP3NmMIYMUQTvDvq+Nf/UAhQA1MHBJ3/fE7YgyHiZrYspiOtL9nAEUy4t qC6EqQH6vxssPc0IoV2YCUOxF5nmnCT9ulajrXIIZNrFTvQU0YcIdSJBWsSu000aomQsmI 1YpnkDG/AfWbRY0eCs2fqBzmGYCqZps= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=foBHMYoU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of jingxiangzeng.cas@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=jingxiangzeng.cas@gmail.com Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-22c33ac23edso4582875ad.0 for ; Wed, 16 Apr 2025 01:29:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1744792165; x=1745396965; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ovSlUrKCeKW3pPZbEQAJqu/9QO1u+A25+RyFvDwdr+Y=; b=foBHMYoUsSy5Fzb4GwiyL+2nA/hX3V7Ux3qZz5F8d/6HB9jWjeuvynxSa9t/x5T0Ut 6hisPYwxyYfnoX2RnIESZhCYlb1gCdpvLKtiObx0+WV/5n6RbXgwFBa2w3BsDyGi2FWr NPWbrwIMnouWfM7qYeGIoMK3nM0WTqujxT7d1PlfgMXdRLbsnR4Y1DgcD3GRBK/KmIgA O9vvA/2yuukCV8N1MXHEzIhKKpg6Mtiq2I3wnYYrkksKLTSYby4sPtd9PWD0E2uM/A+u 7NZxngwb80oGW89OZtUx1PGWKbgwIHkw5MLOkU44NyFolSB/fYll6if8KMF4/NyTQXoM Bj1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744792165; x=1745396965; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ovSlUrKCeKW3pPZbEQAJqu/9QO1u+A25+RyFvDwdr+Y=; b=JsQwffdAm20gvAm4zreaQMiq6HhjLLPvoDxYJ66DzoZPkey1adW8Hg+pBsiDOf3k4C IdTdZHXGvW1icRC1ZYHrkUedSDmfLWdVktUyZgtdidC0DQs5+D0h7z2dDJMWm50feSlf FTL8rE0EDo56UW0y2UxRoQ4LHxiUGZzI0MAqJZ9IzshlFE4ZHwLfXcq1NAptjU1t2RkQ /0kkj9xNNHPKEe2PvGcz04S9qrQhIzAIDfKFoNbfMed5UiefYqHuG0T0KebLwkDfBTCE MCLft0E03VP810Cj9JQrHkzi3T9uMLZ3ONVnDzjqxRtRm62mW0hziOXASe6QayV2HpCv ZAog== X-Forwarded-Encrypted: i=1; AJvYcCWMN+Zd+gLC6vkRpEqT7aW8GK1LZawbZF5p6ZKxojyQiizICnncu+M7+L5XYAbnXAz1lNw0mfjKHw==@kvack.org X-Gm-Message-State: AOJu0Yz/Ke5kOWuhBHSWR/aq6ivm+pPp+BvzNwjEBu6hlzEGIBiW399X DTu/DCBD1VYyVaMtauAmL5w1rZwA2bq2UeuAD0GAI5UvmBRYEPZDQOW3prXyorv5MWLqZiNBUHb GYWWOgIPtWyiH2uGTTH4fm/ZB/k8= X-Gm-Gg: ASbGncs8AfKreW8mBHSFgw5nP5lcZcLkKJPsm4Wi5NIy+3rdFGA1p2KKvq8mBecKLfd aa3HcCUmV6XmC+POTHg1588U7zNB0s+Ojp+9TndswTo6Xgjs60eQw6LNCrkDm7+etoCpx3863xJ +XMVeQ/VlEo9c6NzE1AdVcDKdHzU2rHffx X-Google-Smtp-Source: AGHT+IFPJsWFdxHLRn0A3K3T0Ttkaq59tPju15WeDboMAunp4Fr63JgpoiKfRarAkC7MF0/sucAkaLiHSi1wesRrcUY= X-Received: by 2002:a17:90a:d88d:b0:2f7:e201:a8cc with SMTP id 98e67ed59e1d1-30863f2f576mr1493568a91.18.1744792164501; Wed, 16 Apr 2025 01:29:24 -0700 (PDT) MIME-Version: 1.0 References: <20250319064148.774406-1-jingxiangzeng.cas@gmail.com> <20250319064148.774406-3-jingxiangzeng.cas@gmail.com> <7ia4tt7ovekj.fsf@castle.c.googlers.com> <20250320142846.GG1876369@cmpxchg.org> <4lygax4lgpkkmtmpxif6psl7broial2h74lel37faelc3dlsx3@s56hfvqiazgc> In-Reply-To: From: jingxiang zeng Date: Wed, 16 Apr 2025 16:29:13 +0800 X-Gm-Features: ATxdqUHMcJS3GpWxSBSZsTcmQzVqWzPzHcmjHLMzsnvNr2EqwFUfW8_JXgsUyHE Message-ID: Subject: Re: [External] Re: [RFC 2/5] memcontrol: add boot option to enable memsw account on dfl To: =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: Zhongkun He , Shakeel Butt , Johannes Weiner , Roman Gushchin , Jingxiang Zeng , akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, mhocko@kernel.org, muchun.song@linux.dev, kasong@tencent.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E6A68140004 X-Rspam-User: X-Stat-Signature: 1j1w94wpdpub3kiiep4rmnzx7bhxwfpq X-HE-Tag: 1744792165-687193 X-HE-Meta: U2FsdGVkX1/EG8+cOnwLgRVjxN8r3UtPCOTKKA41tx/Yd2imdipV5EGPu2lRmPT6tU41Cv6RlIRbe2PwjHbQw3m2VOZp+XcoDNfDnPrrxo+9M3n+CFeybktetucbZngnhAsLHDEvCWseCmIsthhMV4GtMsANp3b0+ryYueon+Q0MAHBSrFfBkZKoxuCKfC8+rAklRyzUWgNvbOTl13iTjpUSsCcVRVLLcx7wuEUS3lzmI6VnmJQhqv/i416BKWn1jbaFwyK1hLOkMM3fn841zdVIKEw5QQulPMl58AGHaZ2NJDYo3Az7qTLzEHHYYROCANzR7ayzrHfVQwhHY2r40P61i6Q/qFvuaDOskm+is/7pIWLoTT2eMg+94KovWza5OSo1FLjt5WqlJ7hFEBvgKvgPa65UruC16XZvNWZa0FVxzcOJ4d+A/nY2kVsVOi4vtpYBcsxy8moZY31ZidOKDwTbFp4pp4SOwVQJAqAp0NrjNJStJgEONVQLeKv8/bIClsUNyPdZrRnLl+lWIlH+ROYLUEmj/nbMg4TwZNO/WTeHSZozX1ZNFcetsKEkaWg7oYDAmte93uxq5OHGJ8utT+ONMk4e1qHXRJ2lMi8F19XcjyuWIpoCqsQBBQ4HOEUWteyL+jOalxRz3ng3If/mFVLoVY33LcIgmRFeWdYOINts6kjfk75b+oC03MN+rZzkjWwC+baAHeFRYdA6pJ6khWEBdTXf+kNjHS+oit9W/C8QEVxhERZIO8tcSSl0kYG24o8FW6Pf5di52GLe2h2DNenMrWUncXkZVZ3nxaKI+nAbtuWnyf1E1WfOS8qLStIgP0j4C/M/qHkt6asyu5NdS68BNZ26lFIRuhub2En7KD7eX5mzxQ254eP1ShQw1Bi88cUe7ByxK3NnlymIA3QaJG3Gsj1tSWawpMirYbkFpw6JGMGT4sMAcAXNhDbyU6mTUvxG8WifQui5LqPfieV faF7Hp1r BDF1qCBaTxFsVbB3bQga0EDEyRGO0sMWrOxxjFzpGG/5oemlixwJzWToLGbS1bMcntugRWIllJ4tWy96EegHHumEoLggBB193R5Hr1XyvRgFg4ovtVD2AYfMsHMFHB4T0K/TIle/mjnQG28arGms4jkrMEqNq8Z9L2zGCVhI52wcHlS5IvNzHjLP7E5DLkTwUR0doGGv9VWr6DM3hNEfc81spRnBLruZOTyETc9a0xlG1pvagwXj0OAOtV/NsYKInTiJAPZWzCtDVzUBBq3s+U9x+QVxGICho7NgSWUMHwmjtPBD3Rf6cHN2oaAgoWQBdFWx97FTEwDbzjOpCPwdeTATeHxWXYVarq+UCHZJPLOvu+v2B0vg3CWFShujdVXX+NQRTBJx1XeIJqM7gcW+C+VqeuCocwKi9HPzlmLvjyul54XcMbabuGjHm8pSc5yJcCZNqdf3R9nDpS/Kbd7qXd0N3FA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 12 Apr 2025 at 00:57, Michal Koutn=C3=BD wrote: > > On Thu, Apr 03, 2025 at 05:16:45PM +0800, jingxiang zeng wrote: > > > We encountered an issue, which is also a real use case. With memory o= ffloading, > > > we can move some cold pages to swap. Suppose an application=E2=80=99s= peak memory > > > usage at certain times is 10GB, while at other times, it exists in a > > > combination of > > > memory and swap. If we set limits on memory or swap separately, it wo= uld lack > > > flexibility=E2=80=94sometimes it needs 1GB memory + 9GB swap, sometim= es 5GB > > > memory + 5GB swap, or even 10GB memory + 0GB swap. Therefore, we stro= ngly > > > hope to use the mem+swap charging method in cgroupv2 > > App's peak need determines memory.max=3D10G. > The apparent flexibility is dependency on how much competitors the app > has. It can run 5GB memory + 5GB swap with some competition or 1GB > memory + 9 GB with different competition (more memory demanding). > If you want to prevent faulty app to eating up all of swap for itself > (like it's possible with memsw), you may define some memory.swap.max. > (There's no unique correspondence between this and original memsw value > since the cost of mem<->swap is variable.) > > > > Yes, in the container scenario, if swap is enabled on the server and > > the customer's container requires 10GB of memory, we only need to set > > memory.memsw.limit_in_bytes=3D10GB, and the kernel can automatically > > swap out part of the business container's memory to swap according to > > the server's memory pressure, and it can be fully guaranteed that the > > customer's container will not use more memory because swap is enabled > > on the server. > > This made me consider various causes of the pressure: > > - global pressure > - it doesn't change memcg's total consuption (memsw.usage=3Dconst) > - memsw limit does nothing > - self-memcg pressure > - new allocations against own limit and memsw.usage hits memsw.limit > - memsw.limit prevents new allocations that would extend swap > - achievable with memory.swap.max=3D0 > - ancestral pressure > - when sibling needs to allocate but limit is on ancestor > - similar to global pressure (memsw.usage=3Dconst), self memsw.limit > does nothing > > - or there is no outer pressure but you want to prevent new allocations > when something has been swapped out already > - swapped out amount is a debt > - memsw.limit behavior is suboptimal until the debt needs to be > repaid > - repay is when someone else needs the swap space > > The above is a free flow of thoughts but I'd condense such conversions: > - memory.max :=3D memory.memsw.limit_in_bytes > - memory.swap.max :=3D anything between 0 and memory.memsw.limit_in_bytes > > Did I fail to capture some mode where memsw limits were superior? > Hi, Michal In fact, the memsw counter is mainly effective in proactive memory offload scenarios. For example, the current container memory usage is as follows: memory.limit_in_bytes =3D 10GB memory.usage_in_bytes =3D 9GB Theoretically, through the memory.reclaim proactive reclaim interface, the memory usage of [0GB, 9GB] can be reclaimed to the swap, so: memory.limit_in_bytes =3D 10GB memory.usage_in_bytes =3D 9GB - [0GB, 9GB] In the case of proactive memory offload, the amount of memory that can be reclaimed is determined by the container's PSI and other indicators. It is difficult to set an accurate memory.swap.max value. memory.swap.current =3D [0GB, 9GB] memory.swap.max =3D ? The memory space saved by swapping out to swap can continue to load the operation of system components or more workloads. memory.limit_in_bytes =3D 10GB memory.usage_in_bytes =3D 9GB - [0GB, 9GB] memory.swap.current =3D [0GB, 9GB] The memory usage of memory.usage_in_bytes is reduced due to proactive offload to swap, which will cause additional problems, such as: 1. There may be some memory leaks or abnormal imported network traffic in the container, which may cause OOM to fail to trigger or be triggered la= te; 2. In the oversold scenario, if the container's memory requirement is 10GB, the container's memory+swap should only use 10GB. In the above scenario, the memsw counter is very useful: memory.limit_in_bytes =3D 10GB memory.usage_in_bytes =3D 9GB - [0GB, 9GB] memory.memsw.limit_in_bytes =3D 10GB memory.memsw.usage_in_bytes =3D 9GB Above, thanks. > Thanks, > Michal