From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C4DEEB64DA for ; Fri, 21 Jul 2023 00:07:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06FCB280171; Thu, 20 Jul 2023 20:07:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3B1628004C; Thu, 20 Jul 2023 20:07:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB43F280171; Thu, 20 Jul 2023 20:07:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C6E7828004C for ; Thu, 20 Jul 2023 20:07:54 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 96C43B1161 for ; Fri, 21 Jul 2023 00:07:54 +0000 (UTC) X-FDA: 81033680868.21.DE0AAB4 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf29.hostedemail.com (Postfix) with ESMTP id B992012000E for ; Fri, 21 Jul 2023 00:07:52 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=faHYAA+C; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689898072; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OaecvBVf9Flq0C+mZ5QEkg9NA82UBvH6EXy41+0INrA=; b=jAEvQhP0QqFwPc3znf6m8odn/dPHzGqInDMVUYiTfleYQPq6sWI62RIoq0Wd2BtWQQH4iQ 6EY0q7YBLdxYlpHITMYnskv3NQHALQTLIdzVQCYdWs5R5lhvogDDIMph1DAhTxX7QwbixL +RYYebrW0/tzYQBhpMIOLPg4Z9zQNi0= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=faHYAA+C; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689898072; a=rsa-sha256; cv=none; b=0c6xBBhRq+mDtm/wAq/FVy4EMt31o82spCLVvXt0l1j5js6VE4ZY9xVHbYcF3GBfCZre0F pog4h313JM0l5p4rt5TzWmvbAnVdrVqRTAz8DSrcqXrk9fx/m8q8HKlPgZCCpY2FWKMsBH h+rvE9zEmqeKfieoqk8E5cbUKFhGtHs= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-992acf67388so198354466b.1 for ; Thu, 20 Jul 2023 17:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689898071; x=1690502871; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OaecvBVf9Flq0C+mZ5QEkg9NA82UBvH6EXy41+0INrA=; b=faHYAA+C7baJNH4ty1xusZ1GAMizOuhswFoWstiefxPx5kIsdmuE9U8cUmQHhOeHrk 5D0dko+MIjrr37VaAnnddqI178VcnobmLffPqhhTmjyA/ZJqzxsB9P3o3tXGl1KOXupN 41KeL8AVmEyVjcLgywZInS05YyxV5osfdNAJfDYZdF/xdJZRYucW4OPzDbbYhaIpxG+h Brosl06kzi75w9/BHG8kM7Porua9q9p8IGdBTIMIO5EoUvyy8aXd9EGU6n3AVAUF8ezX 36fwLy4GUQZnwDJwnTMrYZAckVf5TRZDItY9IV2nC9OuSWoD/25PQk0tp63E/TydQY1B YQ/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689898071; x=1690502871; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OaecvBVf9Flq0C+mZ5QEkg9NA82UBvH6EXy41+0INrA=; b=Q5azMFRrtA+lNhuJwVYlMqQb/qJmLztPhLOr/NDRL5GWgCoBmV4yxLHmPBJKjxyM0T n3GJrxidWU3sMBXmCfzu91BmEd9Uc5qD3MbWqmiymCY6PISyfIy6Br2CgE26pMMdFFxy KGGZYPZh7gW0KbxxuFCHd2TwmEpSBqvY+LZZMVgWwodgq5mAnYoAw+TyWTw8dwFjjiAu bcJl8jfOOE1FwILue5oa/QwlKNBaTWeSbkjmewH6dZNjfXGwBVfGMVAedKKKKi9457xx Gv0QCsD+pyxw5HyveF5CGbj1AclOoIcr7wp0DtsyMeOizmFCD3bSmrNbU4IE9l/zIE9U RKaA== X-Gm-Message-State: ABy/qLYEaYUJ6Kvw9thVTGktRkoqvhpi3/Az+571wyPt/0HDoOyYa2Jo AzeJ9Zg3uoFKlq1CAbQ1PhbgFfUaWcXCC4z74rQ3nw== X-Google-Smtp-Source: APBJJlGCmBYCLfGUYj8+RWCJS0Yb3I3KQuFXuZWKw8k0TOqCrXxDtB0PWTMBDKaPH+ebMF2XPV6DTiN8mIa+p43wiTE= X-Received: by 2002:a17:906:59:b0:99b:4525:e06c with SMTP id 25-20020a170906005900b0099b4525e06cmr287734ejg.55.1689898071090; Thu, 20 Jul 2023 17:07:51 -0700 (PDT) MIME-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 20 Jul 2023 17:07:14 -0700 Message-ID: Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs To: Roman Gushchin Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B992012000E X-Stat-Signature: jrrix5hoag6gibt3xdr68xkk1f339ahd X-HE-Tag: 1689898072-126423 X-HE-Meta: U2FsdGVkX1+7P+tADHSi5D8pxYNpjokJJ9oBkCMCN/24ne5cs8B66u0U9bxZn/4yHOKAaMp5Cn6ojkl5uilfzP7jOl2Tt+36ZozIqIGQMYn7kWNA4WXdaEOtZDRvS2p5WAY43hkfw/VZ6rC9szQscXyQVxGj9u62Qe9MCslELne5QqGTIre0n7tI5yR9Wlvr4LA5MnxWoJpT/pgT4z4TmLKzdO7UV9V3Y8EucFhuWUZwmK5hSauLVWc+DPjbiG2YbHFXoMGQdEmd+RQynwax1pOmP4qoPLIgyLEQqhtEEOJ8RN1IkyEono0ak7ncSCUokhNt2vwKj9fWJqbvDT/bERXQ+lwt8SKzbzAs+CA8ktd+DkewgrmQ+udKSElHRzlqwXRa9tnSCEa8wygi/ndUKXuAdA8Q58O/VmGZIJ8O+m+aflloqPiRhcTPwXhf9t+XoCwflbzYhs0k03sudyHwBZlMOD/G5Lv/lJz5D6ZW6eCKjscLTG9GOzIVP06bnxTroaft1aIndKcO4po3pqbCHZC20JwSVRu6rim/1FIwm4k+2fMW9CKhvrGc6UsZyGcqcD2VgO2Axb3U9y3T7OzFja3BoVjsVpKn2QGQxsQcCxjAvCC0elalQXEsY/pLdZvCnV50tyYPNxYYDOiu2JbTaye0dSKrPYRt1s8ThaYS8YJy3IP/dRbfMhd9f1EA4BtD8NFLGCe308c8EGI1wksZAkxlhJnv5ndhuPPBiafinyF/jVUdy5UpAFUj70yR1AzDoAaWlegzaMqaQvTJWlGK3LiaUesEixUgz1juwhQTeKQUh09y0jSnMFP7eNKRG1sqiOFoSAX8HSF5w5FeDxAn6kzhNZ3cV4fEeSbBu3NaBmiY0F8ZYmIVV/Owo6n7+0qAI2vtNVXPhs3PL9BDCiaGyvimFHX0QFY3w3UAvL7isMSdTABuRPSW3gsuSZnegD+vAFgfpWltLOJ/qTBCPf5 kLEAsIKl w8gHe+LQbYq6OA+MZktz2nRwwEfNG4URyZXKrTp9skh6vzmCz9MKzGeHIIDR1jhRP6OV8kLdtH3Pr1z8MPNCfNKoXZa9zGmh8zjvnF8zfB/7vUhmhBFpotzBuR0bYdUML+j5vjKyzid3MZJarkyHlwFc3RuKckIF7hPzXNVDdmyh13NzwF8dieCurBNfU2Vyd1RmUw20pTWdiVcHo5A3Y2jsKvbIQQ1OaRZAC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000553, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 5:02=E2=80=AFPM Roman Gushchin wrote: > > On Thu, Jul 20, 2023 at 07:08:17AM +0000, Yosry Ahmed wrote: > > This patch series implements the proposal in LSF/MM/BPF 2023 conference > > for reducing offline/zombie memcgs by memory recharging [1]. The main > > difference is that this series focuses on recharging and does not > > include eviction of any memory charged to offline memcgs. > > > > Two methods of recharging are proposed: > > > > (a) Recharging of mapped folios. > > > > When a memcg is offlined, queue an asynchronous worker that will walk > > the lruvec of the offline memcg and try to recharge any mapped folios t= o > > the memcg of one of the processes mapping the folio. The main assumptio= n > > is that a process mapping the folio is the "rightful" owner of the > > memory. > > > > Currently, this is only supported for evictable folios, as the > > unevictable lru is imaginary and we cannot iterate the folios on it. A > > separate proposal [2] was made to revive the unevictable lru, which > > would allow recharging of unevictable folios. > > > > (b) Deferred recharging of folios. > > > > For folios that are unmapped, or mapped but we fail to recharge them > > with (a), we rely on deferred recharging. Simply put, any time a folio > > is accessed or dirtied by a userspace process, and that folio is charge= d > > to an offline memcg, we will try to recharge it to the memcg of the > > process accessing the folio. Again, we assume this process should be th= e > > "rightful" owner of the memory. This is also done asynchronously to avo= id > > slowing down the data access path. > > Unfortunately I have to agree with Johannes, Tejun and others who are not= big > fans of this approach. > > Lazy recharging leads to an interesting phenomena: a memory usage of a ru= nning > workload may suddenly go up only because some other workload is terminate= d and > now it's memory is being recharged. I find it confusing. It also makes ha= rd > to set up limits and/or guarantees. This can happen today. If memcg A starts accessing some memory and gets charged for it, and then memcg B also accesses it, it will not be charged for it. If at a later point memcg A runs into reclaim, and the memory is freed, then memcg B tries to access it, its usage will suddenly go up as well, because some other workload experienced reclaim. This is a very similar scenario, only instead of reclaim, the memcg was offlined. As a matter of fact, it's common to try to free up a memcg before removing it (by lowering the limit or using memory.reclaim). In that case, the net result would be exactly the same -- with the difference being that recharging will avoid freeing the memory and faulting it back in. > > In general, I don't think we can handle shared memory well without gettin= g rid > of "whoever allocates a page, pays the full price" policy and making a sh= ared > ownership a fully supported concept. Of course, it's a huge work and I be= lieve > the only way we can achieve it is to compromise on the granularity of the > accounting. Will the resulting system be better in the real life, it's ha= rd to > say in advance. > > Thanks!