From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 682DDEB64DA for ; Thu, 20 Jul 2023 22:24:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6E82280168; Thu, 20 Jul 2023 18:24:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1EFD28004C; Thu, 20 Jul 2023 18:24:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9984A280168; Thu, 20 Jul 2023 18:24:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 856A228004C for ; Thu, 20 Jul 2023 18:24:40 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 36CA4A0304 for ; Thu, 20 Jul 2023 22:24:40 +0000 (UTC) X-FDA: 81033420720.26.FE8450C Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf22.hostedemail.com (Postfix) with ESMTP id 5B0F6C0004 for ; Thu, 20 Jul 2023 22:24:38 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=jd6zJsXD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689891878; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G0owsP3MW0e32jBov7KLN+DK+7a5lEoZ//WqOCREwlQ=; b=SJtza5ofUCs2aYmk3Qa1krGr+r5an8d8lJb/q+YkmnaMcn8w5s6uh6Qw6gPRyDrsr+HHSH XbepMTQD62gp55cXeRqmjKFZui079+j2DPAhKF8bpP/sWz4H6BJ6l2jTMzZ5ujfBFsqg35 3eJCQ8XlU+o/NpInaP4GhLuKwSFfeKc= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=jd6zJsXD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689891878; a=rsa-sha256; cv=none; b=M6uJzaaCmwDrkcT83iB/mOsupOBJGryqiK1oJmXtTYAkD67ao9AE4rrxh1HSknSd7Czi0v rC4Qrd0sB9mTAcVssoH82S4jD/TnKoyaiktZmmaVwiFpONK/aFEnsbZJCKaMZVZe1jMPRs DIX77YpOY8oa7f8TBa0BLf6IqyKwOSI= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-992ca792065so203099866b.2 for ; Thu, 20 Jul 2023 15:24:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689891877; x=1690496677; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G0owsP3MW0e32jBov7KLN+DK+7a5lEoZ//WqOCREwlQ=; b=jd6zJsXD5mYj6yvKRyJH5NBquT74jAAnKJcxJxjtVugZ74xdKdKUA/Xqh2qycTBeFg vPln7NlfYVWD/ksCNMT/BFt0LQbv78OLE46EKPak+6l1I1wQxZ8TsalyBCa4/uy2UGfP 4bQUMBpcGL0kubjXSaUgnImEDhLu8s2ELtZbAO8Vj9SBPIax+YSn04fWP2IBG5Yylr5X 8aFVUOFXvi3d0ege/03nCH4/jH9luNjJE8Hnq5/xwKNoEVplwnFF+7KsZF2+GABq1CH7 V2lajH7+XWj4nEhKf6uxSpT0wIJZS55VBvNUUqvbp2iLZnQ3ETD0dgV6lA42DVvyjOE4 /+vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689891877; x=1690496677; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G0owsP3MW0e32jBov7KLN+DK+7a5lEoZ//WqOCREwlQ=; b=A1++t5NmWd99KXmfGj1aknyA3tTeqluKRREsMq/malrx1we1mEiYY4EJ7OMZLyEWyg 8i1xDHmkXFNXR8jsfhrheRFMHnR+AxOWRrsFWjSd65lO5/pQCpKPNh/dy0U97eTv5w7G HZZ+xkBE3oAiy7Ert4qeUrE/rx0apPHp/F17DBY11HQCWDfjIxaVhqmkdx1H0606UrTH mWurbwN73xz0UVXfS5CKKaJBcmGMOIEB3APDu2M1f29EeJwvWTUzYx37CEcGUhwgScU+ y5Kei99QdGpNp3OKyMY1I0ojiEXDtyVlX+75jTpua7NPMbI+8r6gNaNP/nsOMlaa3JQJ GPMQ== X-Gm-Message-State: ABy/qLY93Hz0eF/1F6VFVIELBfgFFJl8qU/GQBQLubiH6Ud3pe3/rrii qkUgBu04EAo7FlrD98jn0qrt2BhfM5UX6E2FNlNNkg== X-Google-Smtp-Source: APBJJlGt5Dh/Vk/Hz8mmwcC6oq1uWZMyrPxtTI1+hvyrsTrR/mlI9dY1NvhlUCmg8KdNdNHw1ubChfd1KYDcCxGTRH4= X-Received: by 2002:a17:907:7810:b0:993:f2b4:13c9 with SMTP id la16-20020a170907781000b00993f2b413c9mr78921ejc.21.1689891876705; Thu, 20 Jul 2023 15:24:36 -0700 (PDT) MIME-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> <20230720153515.GA1003248@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Thu, 20 Jul 2023 15:23:59 -0700 Message-ID: Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs To: Tejun Heo Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 5B0F6C0004 X-Stat-Signature: eomnr6u3b5w3z4jxfgr16w9boks6jqh6 X-Rspam-User: X-HE-Tag: 1689891878-352225 X-HE-Meta: U2FsdGVkX1+8XV4U6nh+AwzGsKWKErW3byjLyjDK5addn5yaWLIwmm37IazFiQ9e8kUXCqn+CTP9UHYgp8lDzDYGb3YvRAxOGH6djw92/6wYppKg493lW4RX4hI+N04jagIfgOCxhobnH0nSm0A3inFFMxkGjHzRvGsaU0miCWpaoBTFxioDIyVAHDT/dphRXph+bmY4+Z3dqswvUMwSPgONQww3QcEA8PSnVsfYHcg0OLwwmJGmYu/2yiWyO5KntylUely4+Fao9F7uQcN3o+6r5KY/MQOXGo7jnDK9cmOz0zOLa0Su8DAmN0uKTvrURV0WIKKuZfwCfXqqoTxGSJ49cxDkKKKzBDyv3uM7pLhnFLvNgEDIrM+1rWEMBHUrcOBL8lDflfpiPjMok0ieILpP5bfgAOvJ83a3s7wwlfKiktNUp843z91Aj/XTEvLkHoREwVF0HsiWvrRga2MncO2ojUMAr5p5/K84fMnwtmbVtDtIJH2yz2gKf6pUx6zbSL8Wj+fK725fdlj0BWEGIwmnytDrypALSG6BT6WDdlZwjdzBKa+42JDkR1ugbwXKDQet6HfbCF07h/YvTwsbJd3hSstzG2mW35kLACvhEDzqyESLKY5scm/ztnXIuE+c+Hkq0wmrGW+RPAh9A/Q2B03CZnm5wz1JfdBiSi8ZaBBmJu7oadM/fBRDXwO1Lc0kd/MukRcuTwbYy3LkDdm/WcDJWFCJc9L48Mo4c3Hv+0mFG5JyCQx+7AmroMQucIwj8bxvgvrAAFmDhozyBXfh9H9daFls2M+6C5IfCkoQIwCOlJulzWXHNjAvNthus1t5ndmRAYyQV6Ek3zMGak0pnT8h8ugCUbiaR7E876lLcgxV1XqZ90d6WyGZWi2LH+ar0gZxAmoxGsHja28R6GKtXC5qZpfdnamIPmvBK1YO6SzrJcxlxOdjKbtgc6FnyKf9nryrhrBlBUVb9cUDlHg t1YCIWFM PXuiXe/vN0fMtZy8NQoTa19trKndIKVr44x0lTNXpUzjjy+XCQifMcMbycmwztlus4t9Vg425xQJfQcPoHtnZCTSebSWxmj5trGYlE0cL6XGgxoYDqYPSCrL9ls5FfdTDYtiSEig2rm+DGDvXua1pC6+gfvCYy6m5zB2/yuL/Sh0r6zDxiOLe5+8ssKJGDljkdbgAjwC6UJ5HdnJFFtmlvVVjN3LOv1ZZGBofyh/ViwvwTHMZLSe5inyEUIpn3bL25QsRLVRxY+XsViXzK6PNSrG1E1MLYQGVpk67 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 3:12=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Thu, Jul 20, 2023 at 02:34:16PM -0700, Yosry Ahmed wrote: > > > Or just create a nesting layer so that there's a cgroup which represe= nts the > > > persistent resources and a nested cgroup instance inside representing= the > > > current instance. > > > > In practice it is not easy to know exactly which resources are shared > > and used by which cgroups, especially in a large dynamic environment. > > Yeah, that only covers when resource persistence is confined in a known > scope. That said, I have a hard time seeing how recharding once after cgr= oup > destruction can be a solution for the situations you describe. What if A > touches it once first, B constantly uses it but C only very occasionally = and > after A dies C ends up owning it due to timing. This is very much possibl= e > in a large dynamic environment but neither the initial or final situation= is > satisfactory. That is indeed possible, but it would be more likely that the charge is moved to B. As I said, it's not perfect, but it is an improvement over what we have today. Even if C ends up owning it, it's better than staying with the dead A. > > To solve the problems you're describing, you actually would have to > guarantee that memory pages are charged to the current majority user (or > maybe even spread across current active users). Maybe it can be argued th= at > this is a step towards that but it's a very partial step and at least wou= ld > need a technically viable direction that this development can follow. Right, that would be a much larger effort (arguably memcg v3 ;) ). This proposal is focused on the painful artifact of the sharing/sticky resources problem: zombie memcgs. We can extend the automatic charge movement semantics later to cover more cases or be smarter, or ditch the existing charging semantics completely and start over with sharing/stickiness in mind. Either way, that would be a long-term effort. There is a problem that exists today though that ideally can be fixed/improved by this proposal. > > On its own, AFAICS, I'm not sure the scope of problems it can actually so= lve > is justifiably greater than what can be achieved with simple nesting. In our use case nesting is not a viable option. As I said, in a large fleet where a lot of different workloads are dynamically being scheduled on different machines, and where there is no way of knowing what resources are being shared among what workloads, and even if we do, it wouldn't be constant, it's very difficult to construct the hierarchy with nesting to keep the resources confined. Keep in mind that the environment is dynamic, workloads are constantly coming and going. Even if find the perfect nesting to appropriately scope resources, some rescheduling may render the hierarchy obsolete and require us to start over. > > Thanks. > > -- > tejun