From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3EBAC3DA4A for ; Thu, 1 Aug 2024 11:40:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EC276B007B; Thu, 1 Aug 2024 07:40:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 275506B0082; Thu, 1 Aug 2024 07:40:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EEC66B0083; Thu, 1 Aug 2024 07:40:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E2EA56B007B for ; Thu, 1 Aug 2024 07:40:53 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id F359F160BB6 for ; Thu, 1 Aug 2024 11:40:52 +0000 (UTC) X-FDA: 82403484744.01.995F15C Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf10.hostedemail.com (Postfix) with ESMTP id 19F88C0002 for ; Thu, 1 Aug 2024 11:40:49 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=A3OSug8b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722512408; a=rsa-sha256; cv=none; b=eFy54qkRRLqTOFpq3uxA2lyN/iZNskU1nBohCnARp09d3jzout0EO2DVUjmthea4Q/CICM vTo02rQkst0WShYfqY9mSSiYCmgSvjPymNy7NgZusKxHSua9DJ+CIJk9EjX9+PG4Cc9X46 EWW6uAQHwDelfD3qAT9EnhenQBZftWs= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=A3OSug8b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722512408; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k4h0C3m16RHauZJhat03rjfnW666/JF+l3iulFI7afA=; b=q1iR3HaiNndUgBAa3c+u+yHhKejbmXrl2jJqbZlMgMHTcwlrBOQIplCsV99WlDLrSwyJnv RjRPWOwu2CdYgI5WEQnLJGy4c+SSI8ai6JmINTAbwg9Q0BCbHLNi7blAusiejdxvfnlZ/P ZRubaDhS/QAYtnGsfIENbHLW3OyiO5Q= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-a7d638a1f27so354452366b.2 for ; Thu, 01 Aug 2024 04:40:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722512448; x=1723117248; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=k4h0C3m16RHauZJhat03rjfnW666/JF+l3iulFI7afA=; b=A3OSug8b9jwEedwy+I48t+cP6TXmIc6MZJK7cI4AmaWqYeghfUJsJqmqNJVgsMPa/U GZBajwkABREcLBcoq4mFJLx0xmqIaahyzqRM5whQ2ifrGHeD0MLVVCRYNg3Xt38Ok5sO ZG9fTG1X/5+mclOfgoGRPVtXT9tkNDmY3UfZbJdwIKqFvQgXVAxriOU3L2cvzHoxaLfC kQXO77G6YwbgX8ig7pi4wWoFsqq8ZuTMZzStt83W9YZV8yUZ62rwqDAQkljv/MdTqzBn 4K9BaffhEltfIzCcGv2+0FQ+8ClOOrHyXTCEEgdWQEeN6sSKnApEo4VBADBB891L9AjI lv9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722512448; x=1723117248; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k4h0C3m16RHauZJhat03rjfnW666/JF+l3iulFI7afA=; b=XjXO5A0diEl7M2EFh8Am3lFk/sTahEB0OpLfGqMoDZ61BT8bG6PBPC/MJo6bAyUdkY tIVRn2VAwZeCmRZlpQrg0p60GHzOT7aA2YkGl2NLMpAPZ3hY5RjhUtIBfxIFs0luyowH b95ck/G2jir5KhIPykpWpiO6AjIBDcZSOwbtz9Ucjq6y469QDYbvPCsEkfKQzuTjRa3g xBjbwG3q2b4aOpNUVDT3q79uhnB/FsJZsCqI0lM/78puhfTXjDefhYDHa6eQzsZQinED luUrVCRXzJ8AiAgKEtYIKCfbxyq1IXn2PUefEFcWFx24GgUUwhpaVqLcIwrg1tisz5/P ZJFQ== X-Forwarded-Encrypted: i=1; AJvYcCUZi94GMn/HohlDHQUW2FAcpEuUI6eGOfX57Orn0X7HyhZORCMwnfOX+PSMbR0sZPwlUJkRIBxtkQC97PE+0v5n5Qo= X-Gm-Message-State: AOJu0YxJ/RC12KcyYzrYC4HYe5Ry1GNc1MzM/OPft+gSZheW5naBl3j9 qIllRDNi/isf8UgJGvFdTvBxt8hzP5DYd4Fb4c6+QGKm2pq5iffTtUqpO0sz3yYos/ylLc12M1h cz3dp2u3cCiyaxzwqW1G+/608GLs= X-Google-Smtp-Source: AGHT+IGvq30QUdJ2Ae3lQEO36wJYwLrAji75UF3q3P+jQkl895vx2DLPN9vA1jIrFJbik/B0ppK5WjmLrJ6gbLUaeTw= X-Received: by 2002:a05:6402:14d0:b0:5a1:5692:b5f9 with SMTP id 4fb4d7f45d1cf-5b700f8a256mr1773662a12.38.1722512447847; Thu, 01 Aug 2024 04:40:47 -0700 (PDT) MIME-Version: 1.0 References: <20240801045430.48694-1-ioworker0@gmail.com> <2527d5a4-de1f-4c93-b7ee-fdd6fbe2a6f0@kernel.org> In-Reply-To: <2527d5a4-de1f-4c93-b7ee-fdd6fbe2a6f0@kernel.org> From: Lance Yang Date: Thu, 1 Aug 2024 19:40:10 +0800 Message-ID: Subject: Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error To: "Vlastimil Babka (SUSE)" Cc: akpm@linux-foundation.org, 21cnbao@gmail.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com, ziy@nvidia.com, libang.li@antgroup.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Cgroups Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 19F88C0002 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 7sqxqur67c674pj9sm6cfyu3a6qi6pkz X-HE-Tag: 1722512449-137502 X-HE-Meta: U2FsdGVkX19fREZZ4t0mYWjiy+08dCi52WSvn726AfMjIdIBQuAqLJwRQdFo2M1LEYd1/PzFTalDBSMpvl9G1I3JX7nUQ9U6kloElzblQ6sl5iECcQhD5tRDsvTHwOTHqdW2P5Hg0ZttDV6PbyWi77fCs2k4gO/j0MqdDgjFVN9MpIfiBytEVUGqpQAEGzxpj8K8WhVEqx1V30y7t3EEKco1oTTjl/1A6C+ZzcOGS3LTH7IT/9IxkWO5vMhmZBLT/iCfcZ4y3i8lKZtCz0bB2YS7pUafSWZ2FvMKl/VGtOHQBPUSllFC7qXE+FRXpAVxBQSysHZ8rd1mgRM261Zt0fAFjC9S38hMIii/Z7tPgI28QNh+oWDDuJODXDZJWPn8rA1Sk8xriBLFM24f3P9GOtNYpkz4f6CwqW3DT2t+N07SlNAnOXXp5gBuHZIHsIzh35z7j3PF6ykIfxKJsrlJM7zmr3n05RCOpYNKI+K6v0fG5KtTb8Ahi8UYkSuez6fJVuaA3KgnikkverQAZyy0Ri56GKF/tASRIPbpav4b7e9tf9cJvmhUm65NF8jklqqGVGybV51wi7JjAEEH1kr+mXd3uYITp9S51cZDZCD1/AbN/42E8Ztg4/MsY5u9sgmBq1VUz4WHBlN34LcmTJABFotKJkcYhpbH7PRD69Uk/YZOTw96grtcUrc4lF5Qqebnp1dC6A3GwyxhS1HejqsPXFzRgorNInT3TMwIw4ftAfxU7v+hrUW++Rlso3lBcEDx6gYm6jCIehIC4tJ9LnaJ3l5hQPj804uLyKzCeWq7E09jYXOfbh6sRKi8TXhxlgIIvJs/MTI0jIzV0TKStP5YptL5Lo9KRWuUg9SK/3LSuZsOkQdXY6i/aFgxAlN8koP5QWOnqMQgOAW78SkVKx+OyFKxgK/zdhoBp8PAkYIrXOzwcbXHDrrVacyCvMKOymEu8NczPIv7kX/GNUkCKtB TwDDvWlR Ls7riCrIqVMcP872jUKku9NT5MyEjc0u0wnQESPoUJBqg40sS9ZMugh27Gt4sbKgqu1gnM9fFRcRemQKoBV17KQ0dy2qKpZcEayL6TV1qdOJtb7n5QlraRJhFw9NnRLcCyvCor1QM1D0Yv4HTwpCXQjhtMwTS2Z4exRBs8+j68xJfAv7ErPSpmwbTgfy9fHhfmFd61lvUVEfQgvhji/Hzu2W10s5dEGQmNSPu+sB6Fa3+SBm4gxJmA09UMS7BpOrPqcWing0PJy3vddC6IPRmQskfysxhqoxBYgG+dOvf1W8jDzbADsghZWWaRYSeWuhbwdiWU7tY8SUFOtrMZRylYo5DUsi6qzlDrGqpFcldeJ9B9GkZA2ax5KQj4fuudbBLY8r+aV8CbyfEVbNkvwBT2LYZTgwBjO7vAIqgPCCCrFhYOr3m0k2z+TsJK+JSBNK/XrCU X-Bogosity: Ham, tests=bogofilter, spamicity=0.077879, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Vlastimil, Thanks a lot for paying attention! On Thu, Aug 1, 2024 at 6:35=E2=80=AFPM Vlastimil Babka (SUSE) wrote: > > On 8/1/24 06:54, Lance Yang wrote: > > Hi all, > > > > It's possible to encounter an OOM error if both parent and child cgroup= s are > > configured such that memory.min and memory.max are set to the same valu= es, as > > is practice in Kubernetes. > > Is it a practice in Kubernetes since forever or a recent one? Did it work > differently before? The memory.min is only applied when the Kubernetes memory QoS feature gate is enabled, which is disabled by default. > > > Hmm... I'm not sure that whether this behavior is a bug or an expected = aspect of > > the kernel design. > > Hmm I'm not a memcg expert, so I cc'd some. > > > To reproduce the bug, we can follow these command-based steps: > > > > 1. Check Kernel Version and OS release: > > > > ``` > > $ uname -r > > 6.10.0-rc5+ > > Were older kernels behaving the same? I tested another machine and it behaved the same way. # uname -r 5.14.0-427.24.1.el9_4.x86_64 # cat /etc/os-release NAME=3D"Rocky Linux" VERSION=3D"9.4 (Blue Onyx)" ... > > Anyway memory.min documentations says "Hard memory protection. If the mem= ory > usage of a cgroup is within its effective min boundary, the cgroup=E2=80= =99s memory > won=E2=80=99t be reclaimed under any conditions. If there is no unprotect= ed > reclaimable memory available, OOM killer is invoked." > > So to my non-expert opinion this behavior seems valid. if you set min to = the > same value as max and then reach the max, you effectively don't allow any > reclaim, so the memcg OOM kill is the only option AFAICS? I completely agree that this behavior seems valid ;) However, if the child cgroup doesn't exist and we add a process to the 'tes= t' cgroup, then attempt to create a large file(2GB) using dd, we won't encount= er an OOM error; everything works as expected. Hmm... I'm a bit confused about that. Thanks, Lance > > > $ cat /etc/os-release > > PRETTY_NAME=3D"Ubuntu 24.04 LTS" > > NAME=3D"Ubuntu" > > VERSION_ID=3D"24.04" > > VERSION=3D"24.04 LTS (Noble Numbat)" > > VERSION_CODENAME=3Dnoble > > ID=3Dubuntu > > ID_LIKE=3Ddebian > > HOME_URL=3D"" > > SUPPORT_URL=3D"" > > BUG_REPORT_URL=3D"" > > PRIVACY_POLICY_URL=3D"" > > UBUNTU_CODENAME=3Dnoble > > LOGO=3Dubuntu-logo > > > > ``` > > > > 2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set = memory settings: > > > > ``` > > $ cd /sys/fs/cgroup/ > > $ stat -fc %T /sys/fs/cgroup > > cgroup2fs > > $ mkdir test > > $ echo "+memory" > cgroup.subtree_control > > $ mkdir test/test-child > > $ echo 1073741824 > memory.max > > $ echo 1073741824 > memory.min > > $ cat memory.max > > 1073741824 > > $ cat memory.min > > 1073741824 > > $ cat memory.low > > 0 > > $ cat memory.high > > max > > ``` > > > > 3. Set up and check memory settings in the child cgroup: > > > > ``` > > $ cd test-child > > $ echo 1073741824 > memory.max > > $ echo 1073741824 > memory.min > > $ cat memory.max > > 1073741824 > > $ cat memory.min > > 1073741824 > > $ cat memory.low > > 0 > > $ cat memory.high > > max > > ``` > > > > 4. Add process to the child cgroup and verify: > > > > ``` > > $ echo $$ > cgroup.procs > > $ cat cgroup.procs > > 1131 > > 1320 > > $ ps -ef|grep 1131 > > root 1131 1014 0 10:45 pts/0 00:00:00 -bash > > root 1321 1131 99 11:06 pts/0 00:00:00 ps -ef > > root 1322 1131 0 11:06 pts/0 00:00:00 grep --color=3D= auto 1131 > > ``` > > > > 5. Attempt to create a large file using dd and observe the process bein= g killed: > > > > ``` > > $ dd if=3D/dev/zero of=3D/tmp/2gbfile bs=3D10M count=3D200 > > Killed > > ``` > > > > 6. Check kernel messages related to the OOM event: > > > > ``` > > $ dmesg > > ... > > [ 1341.112388] oom-kill:constraint=3DCONSTRAINT_MEMCG,nodemask=3D(n= ull),cpuset=3D/,mems_allowed=3D0,oom_memcg=3D/test,task_memcg=3D/test/test-= child,task=3Ddd,pid=3D1324,uid=3D0 > > [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd= ) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0= pgtables:76kB oom_score_adj:0 > > ``` > > > > 7. Reduce the `memory.min` setting in the child cgroup and attempt the = same large file creation, and then this issue is resolved. > > > > ``` > > # echo 107374182 > memory.min > > # dd if=3D/dev/zero of=3D/tmp/2gbfile bs=3D10M count=3D200 > > 200+0 records in > > 200+0 records out > > 2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s > > ``` > > > > Thanks, > > Lance > > >