From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D52D5EB64DA for ; Fri, 21 Jul 2023 00:02:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 468E8280170; Thu, 20 Jul 2023 20:02:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41A2628004C; Thu, 20 Jul 2023 20:02:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E16A280170; Thu, 20 Jul 2023 20:02:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1DE8A28004C for ; Thu, 20 Jul 2023 20:02:34 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B9C901402DC for ; Fri, 21 Jul 2023 00:02:33 +0000 (UTC) X-FDA: 81033667386.30.2D09F3B Received: from out-3.mta1.migadu.com (out-3.mta1.migadu.com [95.215.58.3]) by imf08.hostedemail.com (Postfix) with ESMTP id D9BBD160008 for ; Fri, 21 Jul 2023 00:02:31 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=KOmSPRR+; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf08.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.3 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689897752; a=rsa-sha256; cv=none; b=ko8ukKTGd3pr+ZmUfBYigmDpAGm/U+zEqU52Qe9Or2a0LPYxGRSkXkQRsyQ0xcXVGuuKEI Bz5iOABdeD0ScXZF1OnTip6D2Jdf0tDJy2y06oZzz58IRGHo7M0UPyLx1FZ0RHR07IT8fc 2IkSGhOfmtjnVSzi7F10arwEeCUcOYM= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=KOmSPRR+; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf08.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.3 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689897752; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y2QMFvECdm8dFJV/IrLJkLq1p3G+GKee/U2h0MLyOkg=; b=FFVdMk6tJNrkU4KwQAMy7b3jTeeaR3veqCKbZiPkVB6RrJvclWbbyZKGcePls6Ly/2m8eq f9CBYzg1jCEA7lpn1OC38sD7bEJFAcW3t9z9nR8eDIDTUXxZW8kZSxUX7vapJ7ohNAHnOb 3OKlWhG8PUuFwc9005skT3n/4qA5b10= Date: Thu, 20 Jul 2023 17:02:22 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689897749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=y2QMFvECdm8dFJV/IrLJkLq1p3G+GKee/U2h0MLyOkg=; b=KOmSPRR+QbQBjXhqggDP5pjoKz8izA0OgekgidARozZue4LF/JrgMyfSRsVOon8kZShUCs iuKScGj3/4eJx8nQmoeKvDrgYrl23dCurXmodrsSemoBLksfFX420En5g1UBLuJaEqjteF xsULlcNlaLyxKaP2p1y6iCgloSi1UGE= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs Message-ID: References: <20230720070825.992023-1-yosryahmed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D9BBD160008 X-Stat-Signature: p1g6wdskgbdipzhuy8cfqsgjqscqa8zj X-HE-Tag: 1689897751-596121 X-HE-Meta: U2FsdGVkX19f6iSigTXLnj2b8AqfT1sQOnM165y6IHvq2tsFERoLe5yopyKZ1q1Ipkv6GgnM+4jVXaYsW2XyWnUwCi2a12+EkdQyzJJLWQXdelvsjNiMEjD4wyVlqFhPhR7xlnZl1FGkOG3Ttc+VYfe4wrO+gq0bNdVEF9uXUYQ1LFYscYo5Uhc4i0VlPePskrxAAQJZViiKGvYhozBl3gYgN1fTjS07LI/HsZulXmFMI/AdxvMjzrEr3wQYmSiItM6I5falBarDDGlK83NObmm2qzxBAph4efQbs3i8RAsMNYI40nQl6pAI9cQbZa71YFNasWnRcIIn0UdosuHu+kXqjOk7iL+2RBwLq3TJz5QL/5XE6ZklQmVX3sAwkzwa60FBIYqYC4cJMCRtcehbkskecNcdajyJOfY54JABrAJzR05vcDpWSsdQwF1nYgO2qK08JaqRlgnXiOtuikBhyOnNVqEyWYQx9ux2FXhj8LVgselPR7tYEkxkEGVndYWHX1Hx5KIg0AWecECuOwZYKQfl4yI/eit3IyzYlynWyiI/8dPAkh5/0/n6O56hly+b7DVR2maOecgYAn1BXZll1JzeK95oWx0HY3+iQrd1Hx6wVrAtWY0iA7eDLrbuWMSpuspCItYAafmgwMl5utuobSVJyWfhiod/oc5b1XWV+QxmXbgiyQGPu+zy8woHfel29G5Xqbt8K+Rul/Jd8MrGkHsdlokx6ry4ooTBIYOCk2JvttHYuhS0H8DWzBEYhJnY9yT1uZ2bM85/p+C72puwsYbqOFiq3IipqbyLUS0/uAclzSf/jv1H8ueyhoi+L9q9OrXJOdtRl3vhUbqWBPXTk+JHoR8FPmxWnT4w6xL49+KKlRbds04XkWRSo6XrwFfbG7Hx+pmCYMIGib66UeUPzSrpB+qgMSRg2C/2X1BPyb4lXxbgoXvhDbtUB1TcskxfmqYyoMXZfvPHHEm1aY9 Gt1543fJ CDcLrCpiKU4jvWGblyagAuFOLvxYusbQXxqWRHAvSh2YDPl6AZ2EQe2D05VKCZW9eOZMe/qeIVc3On7CtSkUe+ZVoM0goU8Ks5KxF0XAAHOKB0imHlJfC43NmdcGn5r2Vw0WpRu/TgE/n6Wk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 20, 2023 at 07:08:17AM +0000, Yosry Ahmed wrote: > This patch series implements the proposal in LSF/MM/BPF 2023 conference > for reducing offline/zombie memcgs by memory recharging [1]. The main > difference is that this series focuses on recharging and does not > include eviction of any memory charged to offline memcgs. > > Two methods of recharging are proposed: > > (a) Recharging of mapped folios. > > When a memcg is offlined, queue an asynchronous worker that will walk > the lruvec of the offline memcg and try to recharge any mapped folios to > the memcg of one of the processes mapping the folio. The main assumption > is that a process mapping the folio is the "rightful" owner of the > memory. > > Currently, this is only supported for evictable folios, as the > unevictable lru is imaginary and we cannot iterate the folios on it. A > separate proposal [2] was made to revive the unevictable lru, which > would allow recharging of unevictable folios. > > (b) Deferred recharging of folios. > > For folios that are unmapped, or mapped but we fail to recharge them > with (a), we rely on deferred recharging. Simply put, any time a folio > is accessed or dirtied by a userspace process, and that folio is charged > to an offline memcg, we will try to recharge it to the memcg of the > process accessing the folio. Again, we assume this process should be the > "rightful" owner of the memory. This is also done asynchronously to avoid > slowing down the data access path. Unfortunately I have to agree with Johannes, Tejun and others who are not big fans of this approach. Lazy recharging leads to an interesting phenomena: a memory usage of a running workload may suddenly go up only because some other workload is terminated and now it's memory is being recharged. I find it confusing. It also makes hard to set up limits and/or guarantees. In general, I don't think we can handle shared memory well without getting rid of "whoever allocates a page, pays the full price" policy and making a shared ownership a fully supported concept. Of course, it's a huge work and I believe the only way we can achieve it is to compromise on the granularity of the accounting. Will the resulting system be better in the real life, it's hard to say in advance. Thanks!