From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C97B7CA0EE6 for ; Fri, 30 Aug 2024 07:14:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 212AF6B00C7; Fri, 30 Aug 2024 03:14:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CD606B00CB; Fri, 30 Aug 2024 03:14:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03B7A6B00CC; Fri, 30 Aug 2024 03:14:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D53456B00C7 for ; Fri, 30 Aug 2024 03:14:31 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5BB0D41555 for ; Fri, 30 Aug 2024 07:14:29 +0000 (UTC) X-FDA: 82508048658.19.42734C1 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf16.hostedemail.com (Postfix) with ESMTP id 7423218001E for ; Fri, 30 Aug 2024 07:14:27 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rsxotY0x; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725002046; a=rsa-sha256; cv=none; b=f4ARnB7B+b7TGQYm8Di2671CM8AxaiqeSajO3OhtqAFSdqgtiN7jlS4CWP8IEAJmJOeMd7 I0XZVz1XoRQR4l0ZKvkcxgq9cT99Ogo39hjaA/zCbsEWLR/MsWfNFR0UM7raKQbN7n0nTF TQjFRx0YbdjlHdNSjTQz7xxv47Fi3qU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rsxotY0x; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725002046; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lE27HOqSAeqxKsX++00AjgC4a3EofYaayL4XrigSFRU=; b=TOtDP0SJG17tclhS1KmlooDlsRhg2rA2xcPv1mpsoAr3saNzudYHsYfhQXXTTu9gAXieN4 nJDCVZsbRqOzajdB3WAjnuaeeK26ZLU6bnih496mX8rYg+QQivYH4ug4+PTjwwApELV3ke T1vB4S86HEJNF8E4Af3YOCVYxDnWilI= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a868b739cd9so186466166b.2 for ; Fri, 30 Aug 2024 00:14:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1725002066; x=1725606866; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lE27HOqSAeqxKsX++00AjgC4a3EofYaayL4XrigSFRU=; b=rsxotY0xSAUWu9q3Y15TwkUDtKVxLJCrarT9HNJdJ5ojJXrjb/FcR5NMUGeEcpZOlg RzcOJOQ96d20nNWSWO2ivKatIdbVQPg8xer5h6DvVGPxTQhSOI1jZWn8Y6e1WzY80as2 RDoxM58AVaqf0nwDkuRULXxAIg50ojTcy1NVWNBDZ+cklpVZrsTpzKOUizaoATByeTfr zOKMvzhPXDqx/2TtOO5fVuwaOsrF4HbSFyF28JUTppgtixmvng94RJ7ScJFHWzOTghQz nhRCMyQR29YcnkZ2KCaiyooA1BOBWl34MYZgcma+OJYGkrKbJ71ntQ3sj9HAdjqN8sIb 5ieg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725002066; x=1725606866; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lE27HOqSAeqxKsX++00AjgC4a3EofYaayL4XrigSFRU=; b=HbCeydfidl7/Wp4OrvBFZRN3CTWmtuAyRxS9sRfaQxN3sQFyjskSrRkYbyZCkdPHBv Oj75sGR4VME5EUWxG0YeO0tYAJjshM/AZncipgl8TExHzf160602leGCddWNHoLVxnj8 6m5atm0gvPyZ6L0KAHFyseq5FPoX6vdj9HiNjUmm2Ep7j6cMpged1qnVP+ZfcCXRadwO sH7vK5XvY7jLX6lYPNcSl9QLFP3F91owtdr7dNy9OqOkw+uPw/gekoCRna3+vwP/CjM/ pyais9gjO4j29qhmP4HZ1UGnAplfdh9Sfv655G7Rud5dDZcGEueamhQnVM7lrCG/bW6r J9Kw== X-Forwarded-Encrypted: i=1; AJvYcCVcNAxjtKb80vJZqFIPSvCmAiZoq/xg1iZr2BYYEHPYZM9pTiiSVGnhVWYzwuDByI8lnYVjkc0VaA==@kvack.org X-Gm-Message-State: AOJu0Yzsop2Kx6N0pboFgIVGhGylifFl04cbch0TwA0n2gnQLKgxQlAS 0PChZb4islld+hQMAVVRiD1JzG42t+dHJo87GdBCw1BZgXMAFq1r1NAyO4oXAtZnX8iU3xcC8PU 5n/zJFFVL376RgYFnOJvx/KrjCdRlo2vP3aX0 X-Google-Smtp-Source: AGHT+IGaNudbImkNVyuwQt85PNUGnSiMWiDMzm96cL/fqsqHH5F2T9yhuDLnIBWgtyxOnOEtymQoZVMG7K6eJP8r3Tw= X-Received: by 2002:a17:907:3f2a:b0:a86:9c71:ec93 with SMTP id a640c23a62f3a-a897f84d69emr419046066b.24.1725002065122; Fri, 30 Aug 2024 00:14:25 -0700 (PDT) MIME-Version: 1.0 References: <20240830112239957689310@cmss.chinamobile.com> In-Reply-To: <20240830112239957689310@cmss.chinamobile.com> From: Yosry Ahmed Date: Fri, 30 Aug 2024 00:13:47 -0700 Message-ID: Subject: Re: The percpu memory used by memcg cannot be cleared To: liujing Cc: akpm , linux-mm , linux-kernel , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 7423218001E X-Rspamd-Server: rspam01 X-Stat-Signature: 57tfnnue1sazgq1j5mkencnsgmbjz8gy X-HE-Tag: 1725002067-258234 X-HE-Meta: U2FsdGVkX18qulXZ8QgDbJanTV58KNTIhtZAImrLHPWQ47gs8ylFK+cRh51BEPwahv+ZjblFluy7QN779sdr6w/SI7d804SrmmpxyzIo77jBbOtmxM2OSBXP4L2fCdKs2p7S8pFx/wQI8qC+Q74KdLqE2EWHOUYPIH7v+QZ3q+I6NF8vv13LjdQTXThtIVfAKkLmu7Mrb4Ze7uUjXjgcxDG32OUpfEDIB4qI0ENMwq/68py4kTFgT3/MWRmsWknAUJQ75K9V0ZVCsnZvxeGENy7Syg8gYIEKF1DHhb7nZlGaaY/zyT2zPChgbA610SwegZ8HsgK8JW+6wSO4Ob8HB60wjDkDGHRzj09IrUDw2g3EayiQHX3KNhggurFKBnHYBOxJ+FP3+oREoXnohUvr0hTSHfbAH/DDlKbBOahi3v4U1qDn2lXGKeHKIitbaZoF/27FuSMpGxkDFpri58TXNHAdMPLDRFGidlAWl5coCBfxedjts1RSetbNQdQL3uANcmvyozUq4Oqol2YewftsOBCX/K4RnYpztkUF+6tgC6IV54Pos/eSjj/4dWBpVi2Jo0d/m3CFmgeUdfEkdaGIK52i4TNOFxND6jfKGoJpik7QYRdHJMHapY/PUgSVusRKhwPqkaO6VYPk4gfi8M+sNJm+ZCaaoRimSFAmCzKvZxMrTEI5l5gjMZSPEL6tF1TyRVM6a//ed4xhgAJAZ9wvwjU06IfnqK+TECDViHyIhS32LJ+dUEop/xsvIUZxOJUeoieJfiBVh0hVwx1gfL/280qjWsox5o2bGljD8iayE/BdlOUgnQQWmQLX2Lb10kkBX49QHltGmjmOtwvNXVJYTZ095UaLdR7d6cobPyYCG4npo10r/gG5/PcWyzfvuYIRK4Katm9D0qYctH9QLBGQBJsU1Kl0An+CzLzAfASa0ZUkf/4Vbf4jd7KwqcYzpR9v6eIl23ikISMXOqF1Rnz NGmT/4XC 0WREEg/7ixg8PBDnIL0y3cjgYgvLzqc8PUjPG2Rnd6Z2+IbS8IRseuiZjvuVmY/Y8uy4qVZJkk7cXVyhnJ6ln1xnnpsSiiaaQWe/wBVp1a/Mn+i69gjfZgYiZXvq1ODZTJ4iP7L0suK0R/q5pHzIzFUF53ipJ9n57IOxBpE5kb2yq7fnE3puFXVlpEACde6Yp6wyuKDrb2HCEJX9hcmr6IaFzhnU62+y1+w7GVtcDmEyr+wD3Hb69izJZ5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000945, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 29, 2024 at 8:22=E2=80=AFPM liujing wrote: > > hello=EF=BC=8Clinux boss > > I found a problem in the process of using linux memcg=EF=BC=8CWhe= n I turned swap off, the memcg memory I created with the following script c= ould not be deleted with echo 0 > memory.force_empty, as explained below=E3= =80=82 (Adding memcg maintainers in case they are interested) It's not a problem, it's the way the linux kernel currently behaves in terms of handling deleted memcgs that are still referenced in the kernel (i.e. offline/dying/zombie memcgs). > > -------------------------------------------------------------------------= --------------------------------- > step1=EF=BC=9Aswapoff -a > > > step2=EF=BC=9Ause this script to create memcg > > #!/bin/bash > mkdir -p /tmp/test > for i in 'seq 2000' > do > sudo mkdir -p /sys/fs/cgroup/memory/user.slice/user-0.slice/test$= {i} > sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/test= $ {i}/tasks > sudo echo 'data' > /tmp/test/test$ {i} Assuming /tmp is a tmpfs mount, here you created 2000 child memcgs and allocated one tmpfs page in each of them. So each of those child memcgs is charged for one page of memory, and each charge holds a reference to the the respective memcg. > sudo echo $$ > /sys/fs/cgroup/memory/user.slice/user-0.slice/task= s > sudo rmdir /sys/fs/cgroup/memory/user.slice/user-0.slice/test$ {i= } Then you deleted those memcgs, but the kernel cannot free them yet because the tmpfs memory you allocated above is still charged to them. > done > > > step3=EF=BC=9Aview /proc/cgroup and /proc/meminfo files > > [root@localhost ~]# cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 10 1 = 1 > cpu 4 1 = 1 > cpuacct 4 1 = 1 > blkio 13 1 = 1 > memory 14 2009 1 Here you can see the cgroups you deleted still exist in the kernel. > devices 6 94 = 1 > > [root@localhost ~]# cat /proc/meminfo | grep Percpu > Percpu: 600576 kB The percpu memory you observe here is likely the per-CPU metadata that the kernel uses to keep track of each memcg. Since the memcgs are not freed, the metadata is not freed either. > > > step4=EF=BC=9Awhen I use "echo 0 > /sys/fs/cgroup/memory/user.slice/user-= 0.slice/memory.force_empty", I find the num_cgroups of memory and percpu h= ave no changed Yes, because at this point there is no swap, so the tmpfs memory charged to the deleted memcg cannot be reclaimed and cannot be freed, and the refs they hold cannot be dropped. > > [root@localhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slic= e/memory.force_empty > [root@localhost ~]# cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 10 1 1 > cpu 4 1 1 > cpuacct 4 1 1 > blkio 13 1 1 > memory 14 2039 1 > devices 6 87 1 > > [root@localhost ~]# cat /proc/meminfo | grep Percpu > Percpu: 600576 kB > > > step 5: when I use swapon -a to open swap, then echo 0 > /sys/fs/cgroup/m= emory/user.slice/user-0.slice/memory.force_empty again > > [root@localhost ~]# swapon -a > [root@localhost ~]# echo 0 > /sys/fs/cgroup/memory/user.slice/user-0.slic= e/memory.force_empty When you add a swapfile and try to reclaim memory from the cgroups again, the kernel is able to reclaim the tmpfs memory by swapping it out. The kernel is smart enough at this point to not charge the swap slots to the deleted cgroups, but to their living/online parent. At this point, the tmpfs memory is uncharged and freed, and the refs to the deleted cgroups are dropped. Now they can be deleted by the kernel. > > > step 6: view /proc/cgroup and /proc/meminfo files ,I found the the num_c= groups of memory and percpu have been reduced. > [root@localhost ~]# cat /proc/cgroups > #subsys_name hierarchy num_cgroups enabled > cpuset 10 1 1 > cpu 4 1 = 1 > cpuacct 4 1 1 > blkio 13 1 1 > memory 14 185 1 > devices 6 87 1 > freezer 9 1 1 > > [root@localhost ~]# cat /proc/meminfo | grep Percpu > Percpu: 120832 kB Now the memcgs are freed, and their associated per-CPU metadata is also fre= ed. > -------------------------------------------------------------------------= ------------------------------- > > > Therefore, I want to know why swap affects memcg memory reclamation, ech= o 0 > memory.force_empty this interface should force the memory used by the= cgroup to be reclaimed. > I want to know why ,I look forward to hearing back from the community. I hope it's now clear that the per-CPU memory cannot be freed when you use memory.force_empty on the parent memcg, because the per-CPU memory is the metadata of the deleted memcgs, and those cannot be freed until the memory charged to them is freed (which needs swap, because it's tmpfs not a regular file).