From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF4B0EB64DD for ; Tue, 1 Aug 2023 09:54:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33840900009; Tue, 1 Aug 2023 05:54:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E79B8E0002; Tue, 1 Aug 2023 05:54:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1AE6B900009; Tue, 1 Aug 2023 05:54:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 076698E0002 for ; Tue, 1 Aug 2023 05:54:53 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9741EB23A3 for ; Tue, 1 Aug 2023 09:54:52 +0000 (UTC) X-FDA: 81075076824.15.55CD319 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf06.hostedemail.com (Postfix) with ESMTP id 6077518001A for ; Tue, 1 Aug 2023 09:54:49 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Xm1y6TPV; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf06.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690883689; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wO63J1EyZpemnTVHqFZ8CyKvSylFpCyQaYjDytJ5JcI=; b=mIWAN77G76nJ0G2bfATnTTIln852dzjZm5bDEuhXf5IV8TBMNG5ltgpQsaVGp3SPT+HGdy 2Z99f59OO2w4qgDHs1kt9mfqA9hXdvf3OF5/7bG6aLxGV/8t8I1vMN+Hup4w2kmDG0Rrze 0cxzyoV42zOktbnYBFlEnBgdUgHmKKE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Xm1y6TPV; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf06.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690883689; a=rsa-sha256; cv=none; b=H2Uaq07NXGfXBb3chiqOqSImDd7nQSli2s+KsOZRaUrxnWofFtKN9ZM5N+9t+4v8dR/4YD g1kbiozjBQodAlGKTaeGO8slJs71MeRRaK5jlE2NmxrIpJ0Ao8JadlLvrQkz28Uz/cEcrv 38c36QUv0IFiZtDI+N3KrrCH9cmQsfk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C24E81F38D; Tue, 1 Aug 2023 09:54:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1690883687; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wO63J1EyZpemnTVHqFZ8CyKvSylFpCyQaYjDytJ5JcI=; b=Xm1y6TPV+ft2mgHUEs0a4S+VBf8IeVMT2AHrnpwxVYWrl3VVPM27WCA62GEGULwRhG07+U n0QYWPeJiNSw5BddFNX4NloFWNI0P/eu+VT8/XaOjKnQpON2VvvYJD09cUF5gpKij+y1c4 5kP2fChggBnUHXhaJts/jfCsA5DAVGc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B0B8813919; Tue, 1 Aug 2023 09:54:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Z5loKmfWyGRWUgAAMHmgww (envelope-from ); Tue, 01 Aug 2023 09:54:47 +0000 Date: Tue, 1 Aug 2023 11:54:47 +0200 From: Michal Hocko To: Johannes Weiner Cc: Yosry Ahmed , Andrew Morton , Roman Gushchin , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs Message-ID: References: <20230720070825.992023-1-yosryahmed@google.com> <20230720153515.GA1003248@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230720153515.GA1003248@cmpxchg.org> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6077518001A X-Stat-Signature: afbkrftoqpwj43f8excncjypmd1c4b3n X-HE-Tag: 1690883689-545445 X-HE-Meta: U2FsdGVkX18MSeSBTWqQ0CKz33960774On1FR2H8oEpslgeMldsWDsQLJgbPwzfXa6obeBQwSXTyzGFSr8bABlhGUr4NUJsGox59tG8WBhaXHeQJ/ViCg9zVmqauYsMjylg1etPO7jQB+R03rZWvHk2M6HncxAC2bbxcyaHYAMBJEWQvZMlx9Ro9p4mf69d78yzqN6Ie+rRuszxW6LS0MZh3AWX96FlM+ttLBEdSspDSULoH6NtaptZONk1yaF+8AlH19IUbVPKBSpNXK9epKj3MVbaBsRTrunTi68vvtM+6SC/leHVDueTOdBVwigwq44heeMqC9AY5dQ63k4SHyTD4raCWwgJAhY80jWgk5hmOp0o1JKVOeAXHZJLguWd0fk92cICsn76OV62QdIEPD6RiN3zsxUJDmkHvbjKE8b7sDIlw24H/yllxlrPXf+8bIK3/EXdhz1gOS/fl8l3+jH8sxoqJGmapG0YitXE5/KtZrMOpyVtuKnZN5JdfFyh8b+sgCKnW4BACwl72HlZFwQm64wZc5rZvrvzioD208NAgP8275InMaCETnwHnv2V1UScNQPxfTHRe7sNhCSgCZFVIrxofvpPw/cVqWQhnrdSm/mEQS+IZwlCZI9SJgzyinzLeF/ikTHTd4TUJCobuyewWIoLu5+CG+afq5Xr2s9SWogfKh1Age+0LtasuUIm+TuMbGLx4okSOGIRr3/I4WZhJqggyQsf3rmbl6V59Tm72lcsd7JAuxY0VzY2+X9AuBXq+rDt/Q2/3J3kZusi1vmnvKRgomyjX4X4bi0Jvcs/T9CN5bEoRkSkT/9TkMcKCkfCKC2HzrnrV1hqNOLCp4nMDJ83Iw0kK1YN5iE30fqYyTeQFosE3IOD6ajN5Qb2Tw5ICHzEU/Yh3/2swPBfL8V8SGY+3R+nr/9GXauqo/TdhDLdtnWrhbMpwksGcJKDfmtjy73YW9V2fCY4ZDkv 6Gq6zzOA GcE40OKVm6P8r7xWIlehcnnfFO7oH2SebMWwc5DesdE0G/eaJSIC4wSA/rQ7RCnJ7VJMTluxBz8qjWCFj9928CmzvCiULElsd/P9gZX5QJ6oGHuP35+fgwWtlDcYxZzD+XyDN27UVqaTbkKeR9oZqzykLjg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [Sorry for being late to this discussion] On Thu 20-07-23 11:35:15, Johannes Weiner wrote: [...] > I'm super skeptical of this proposal. Agreed. > Recharging *might* be the most desirable semantics from a user pov, > but only if it applies consistently to the whole memory footprint. > There is no mention of slab allocations such as inodes, dentries, > network buffers etc. which can be a significant part of a cgroup's > footprint. These are currently reparented. I don't think doing one > thing with half of the memory, and a totally different thing with the > other half upon cgroup deletion is going to be acceptable semantics. > > It appears this also brings back the reliability issue that caused us > to deprecate charge moving. The recharge path has trylocks, LRU > isolation attempts, GFP_ATOMIC allocations. These introduce a variable > error rate into the relocation process, which causes pages that should > belong to the same domain to be scattered around all over the place. > It also means that zombie pinning still exists, but it's now even more > influenced by timing and race conditions, and so less predictable. > > There are two issues being conflated here: > > a) the problem of zombie cgroups, and > > b) who controls resources that outlive the control domain. > > For a), reparenting is still the most reasonable proposal. It's > reliable for one, but it also fixes the problem fully within the > established, user-facing semantics: resources that belong to a cgroup > also hierarchically belong to all ancestral groups; if those resources > outlive the last-level control domain, they continue to belong to the > parents. This is how it works today, and this is how it continues to > work with reparenting. The only difference is that those resources no > longer pin a dead cgroup anymore, but instead are physically linked to > the next online ancestor. Since dead cgroups have no effective control > parameters anymore, this is semantically equivalent - it's just a more > memory efficient implementation of the same exact thing. > > b) is a discussion totally separate from this. We can argue what we > want this behavior to be, but I'd argue strongly that whatever we do > here should apply to all resources managed by the controller equally. > > It could also be argued that if you don't want to lose control over a > set of resources, then maybe don't delete their control domain while > they are still alive and in use. For example, when restarting a > workload, and the new instance is expected to have largely the same > workingset, consider reusing the cgroup instead of making a new one. > > For the zombie problem, I think we should merge Muchun's patches > ASAP. They've been proposed several times, they have Roman's reviews > and acks, and they do not change user-facing semantics. There is no > good reason not to merge them. Yes, fully agreed on both points. The problem with zombies is real but reparenting should address it for a large part. Ownership is a different problem. We have discussed that at LSFMM this year and in the past as well I believe. What we probably need is a concept of taking an ownership of the memory (something like madvise(MADV_OWN, range) or fadvise for fd based resources). This would allow the caller to take ownership of the said resource (like memcg charge of it). I understand that would require some changes to existing workloads. Whatever the interface will be, it has to be explicit otherwise we are hitting problems with unaccounted resources that are sitting without any actual ownership and an undeterministic and time dependeing hopping over owners. In other words, nobody should be able to drop responsibility of any object while it is still consuming resources. -- Michal Hocko SUSE Labs