From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DDFBC6FD18 for ; Tue, 25 Apr 2023 18:42:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8ACCB6B0071; Tue, 25 Apr 2023 14:42:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85C476B0072; Tue, 25 Apr 2023 14:42:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74AFC6B0074; Tue, 25 Apr 2023 14:42:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 65A606B0071 for ; Tue, 25 Apr 2023 14:42:58 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1EE34ACC4B for ; Tue, 25 Apr 2023 18:42:58 +0000 (UTC) X-FDA: 80720785236.29.D46100D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 2C8E140008 for ; Tue, 25 Apr 2023 18:42:54 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Bp1B/HqR"; spf=pass (imf07.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682448175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gVdh3EskhXvSNgkkcnc90XfD4jl5qJEnnEfhoKtnoTY=; b=oT0meoXD3fyVCsFxCYHwoJ07SYCw+DwobqWOFRJfFqEgX/YUIDYlP3hQCVrZSTon82MECl o/+5c6UxHygIaIeAMN8t4NrMva81xD2kuAPZXnrcMRJ5VfdfeTo+5cy908/PuHv1K16Nm7 QJytS2Fy6TiBNTs5caay2ybk5h/vMjg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Bp1B/HqR"; spf=pass (imf07.hostedemail.com: domain of longman@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682448175; a=rsa-sha256; cv=none; b=al816HOy4PAG/D+AIgl3rnfN2P/MIipj/HrdZaLHyjtspMyraPXSHzSKmyWB5G+NMzncUx UbOfr96vwCUcBvnj3EOJyDX4EyiqFDmoVH8W7rscPefCN67arGo0Zmivh/Mqxsaiqw9fR2 C1iyx2B52rIO3ThYMR64s23bmfuhagE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682448173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gVdh3EskhXvSNgkkcnc90XfD4jl5qJEnnEfhoKtnoTY=; b=Bp1B/HqRkNxqgs7x1H0vIpg1qHP22vA6kYFqoT3HGRj+7YFuae1SO7F8gMa8Y+NCMGZa5n NTqK9rCSeFMspub0CIy6E/14RBkkxRscWU4XGiTWoTEBhiyksNVXZ1X0zyut8tuZJpHFt9 1HsI4OSGoNsguhzq/I7WGJ1dLubsEAI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-110-KfNHFgMyN6mZnY5l4BRKRQ-1; Tue, 25 Apr 2023 14:42:50 -0400 X-MC-Unique: KfNHFgMyN6mZnY5l4BRKRQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 68C08A0F397; Tue, 25 Apr 2023 18:42:49 +0000 (UTC) Received: from [10.22.8.189] (unknown [10.22.8.189]) by smtp.corp.redhat.com (Postfix) with ESMTP id 467CC2022ECD; Tue, 25 Apr 2023 18:42:47 +0000 (UTC) Message-ID: <27e15be8-d0eb-ed32-a0ec-5ec9b59f1f27@redhat.com> Date: Tue, 25 Apr 2023 14:42:41 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [LSF/MM/BPF TOPIC] Reducing zombie memcgs Content-Language: en-US To: Yosry Ahmed , "T.J. Mercier" Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Tejun Heo , Shakeel Butt , Muchun Song , Johannes Weiner , Roman Gushchin , Alistair Popple , Jason Gunthorpe , Kalesh Singh , Yu Zhao , Matthew Wilcox , David Rientjes , Greg Thelen References: From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2C8E140008 X-Stat-Signature: 7uq7uetxozdamqkft3ndzxcm5ppthj7k X-Rspam-User: X-HE-Tag: 1682448174-236335 X-HE-Meta: U2FsdGVkX1+ePA5DSFNAv6cNn09xcVvwMaObEcaY/IIsJGkILchzaTDhEtlCz9495r/6sbirzYKwhhRqAAbhF0OdLqxtNMcyExD7G20ZFYevatRwBjN3MdUx9gmLafos3AgBRcnknfskGBzmi9OLANqD+o9ORyOAdgUGKX+H7jB1IdWlkq3fd6T1oEdkw+/XdozX+T1rRhotcj0k4ZjkLDl4ppi9+wKBIBgk7YgkTGTO6srKlY1j7j7eaMgJ8RPUz6wXcxjBEprtqp5qFdaxKvPHwdCfpA4fY+mt/WrRYoEodV7MQcbESA7AHJKQllZ7gvUlUX2nIWSa31f/fUuT0cKvVPFzl0C0xftWIt8nn2/WUo0VzVPRGMqQTAkbdGfKPlkv9V0gxlcJ70c8NNSdHJC04DHyiOq1dBIoMEfmqJ3iIpaEZD1uRys4hU0B0WNuRBTSpBP8RuGjVlgOZOzXYtF10jXLW/fIK6v69W5Cjtr1Sj6VN0MrmV5yk+OzZ2nhZNne0/yNVa3EuZhHIDTp9HG4AA9ytIwZsNXfHjismoSesFXZoeC3p4c9jAjnIrN3M1LQyYX47sH52a1WixT29N2wPSHfwG73Oxsm+DZ6hXxksuDjiJnSyIv1V9XMlSxgFp6RHxIJK+YFT7IrG+GmRV5JIdHbzHLGVnI3+zwBuOy5F7sPeoMvDyT6Hd/GhvGhhvbyOXRj3p3ShjExhNTsGKJXrSHFnmFHxo3gDkXoosBVNFqCFbFX6Ehj73bi7hX2QSySX5zawliUnAjHDvn4AEclNlTyZjYpR9RYBtj/hAEgF//ZclwK9G4FFGhx8iJKMq8x/8UKQwFFGpufC3yLJSjjaXHAazTc1Yda3OOHJ0u54Vwve9/AUaTHXtSD+ByoogOQd+L85TkBazRp3eZchRVYWSRXpShHcm49Qp3/JqwrNkC411IprMkP+kE5Y7r13Cb47mStqTsuFh+nck8 m3KC0DtM VTuDgtDYIkWrrirlgRp+W1KxCveRm/+BrAZBWed3IS0AnUg9TYboNvrIaKVBUMSUuf4GpyULpsI951RxO2yrXPBwbQrlSrjWHjfdFUr4rDSAx43Ow+mX/PoKe3mgXvwRlzYGkEWyKQdFLo+DJfoegr90i4QHuNiCVgtCdU+r+FoKAxgvcwQZRq6lNeEkKBvQpCsbMPosOIH51jzN+MDmUkKLUlXzLZTYTGdR5/+XCrC7lKumtI44OA3zeP95gf4y7fQkc2VKTXE7ZbXlt7rG/FKwmJML8wC41Szw2P3+BWdrfIorYda97cGk7qfiqyvNfoArJNZ3V6lC03T3yR9flSC0M33fpYtk/hQpnDuVh6jMkM6KaFM9MbMgKQHuR1vE3ojP0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/25/23 07:36, Yosry Ahmed wrote: > +David Rientjes +Greg Thelen +Matthew Wilcox > > On Tue, Apr 11, 2023 at 4:48 PM Yosry Ahmed wrote: >> On Tue, Apr 11, 2023 at 4:36 PM T.J. Mercier wrote: >>> When a memcg is removed by userspace it gets offlined by the kernel. >>> Offline memcgs are hidden from user space, but they still live in the >>> kernel until their reference count drops to 0. New allocations cannot >>> be charged to offline memcgs, but existing allocations charged to >>> offline memcgs remain charged, and hold a reference to the memcg. >>> >>> As such, an offline memcg can remain in the kernel indefinitely, >>> becoming a zombie memcg. The accumulation of a large number of zombie >>> memcgs lead to increased system overhead (mainly percpu data in struct >>> mem_cgroup). It also causes some kernel operations that scale with the >>> number of memcgs to become less efficient (e.g. reclaim). >>> >>> There are currently out-of-tree solutions which attempt to >>> periodically clean up zombie memcgs by reclaiming from them. However >>> that is not effective for non-reclaimable memory, which it would be >>> better to reparent or recharge to an online cgroup. There are also >>> proposed changes that would benefit from recharging for shared >>> resources like pinned pages, or DMA buffer pages. >> I am very interested in attending this discussion, it's something that >> I have been actively looking into -- specifically recharging pages of >> offlined memcgs. >> >>> Suggested attendees: >>> Yosry Ahmed >>> Yu Zhao >>> T.J. Mercier >>> Tejun Heo >>> Shakeel Butt >>> Muchun Song >>> Johannes Weiner >>> Roman Gushchin >>> Alistair Popple >>> Jason Gunthorpe >>> Kalesh Singh > I was hoping I would bring a more complete idea to this thread, but > here is what I have so far. > > The idea is to recharge the memory charged to memcgs when they are > offlined. I like to think of the options we have to deal with memory > charged to offline memcgs as a toolkit. This toolkit includes: > > (a) Evict memory. > > This is the simplest option, just evict the memory. > > For file-backed pages, this writes them back to their backing files, > uncharging and freeing the page. The next access will read the page > again and the faulting process’s memcg will be charged. > > For swap-backed pages (anon/shmem), this swaps them out. Swapping out > a page charged to an offline memcg uncharges the page and charges the > swap to its parent. The next access will swap in the page and the > parent will be charged. This is effectively deferred recharging to the > parent. > > Pros: > - Simple. > > Cons: > - Behavior is different for file-backed vs. swap-backed pages, for > swap-backed pages, the memory is recharged to the parent (aka > reparented), not charged to the "rightful" user. > - Next access will incur higher latency, especially if the pages are active. > > (b) Direct recharge to the parent > > This can be done for any page and should be simple as the pages are > already hierarchically charged to the parent. > > Pros: > - Simple. > > Cons: > - If a different memcg is using the memory, it will keep taxing the > parent indefinitely. Same not the "rightful" user argument. Muchun had actually posted patch to do this last year. See https://lore.kernel.org/all/20220621125658.64935-10-songmuchun@bytedance.com/T/#me9dbbce85e2f3c4e5f34b97dbbdb5f79d77ce147 I am wondering if he is going to post an updated version of that or not. Anyway, I am looking forward to learn about the result of this discussion even thought I am not a conference invitee. Thanks, Longman