From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E11FC3DA4A for ; Wed, 21 Aug 2024 00:22:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D49DB6B00B6; Tue, 20 Aug 2024 20:22:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF8F96B00B7; Tue, 20 Aug 2024 20:22:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE8656B00B9; Tue, 20 Aug 2024 20:22:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A1D746B00B6 for ; Tue, 20 Aug 2024 20:22:01 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 432EF12078C for ; Wed, 21 Aug 2024 00:22:01 +0000 (UTC) X-FDA: 82474350042.23.AF520AE Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) by imf20.hostedemail.com (Postfix) with ESMTP id 4D9771C0009 for ; Wed, 21 Aug 2024 00:21:59 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uX9ffBcf; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.182 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724199703; a=rsa-sha256; cv=none; b=c6oRQUHfMFX5+0EXprysQ46WVCMmIgXRbAx9rw4gO4NRO2E1a8piCNpnRCxxzV6mQZh061 8KUr0AIwbKBWHg5YktB71u8yYjUnjioAymCSmMEWLgCYyiX+GdgxWJ7m8Fr2qM+65XV67l lu3+U8b++aX2lAJ9jO668QzN3ILFsDc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=uX9ffBcf; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf20.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.182 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724199703; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sJxyQBBIYLkrS1prYD2jOPVLP+NyZepcQ3cKy2Gpp0g=; b=gQcIjlk7jGrTYfe4a/fUwdoDphnLfo32Xy2Yz483JMrhD+AOE3qXMiH3eSme9QaKXm2xqL +n28WY0cbt8O8ScLf3QQ7WAF5X0CUIf/4sZ20mpkgKW8mEEOkEDwoHnSkslYZtfneAYL3a SZtAhdW2c01bNhLnI9noPcmIqnNSwVI= Date: Tue, 20 Aug 2024 17:21:48 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1724199717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sJxyQBBIYLkrS1prYD2jOPVLP+NyZepcQ3cKy2Gpp0g=; b=uX9ffBcfjQNqLrlx8bV687YLnQF2Odqs3ehbHtx/rAM8QAFxODXSTWbPMypAR5zodv5V54 sfBExJxMFhHeaQ/l5/HmYWb3edlMpI/0GyAHSurKDx0yCKKmChruNLkYg3o37b7TCIl5hE mbjCfZQ21Q9oeu5YiKxhZOgaQMNXhfg= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Nhat Pham , Michal Hocko , Chengming Zhou , Muchun Song , Chris Li , Yosry Ahmed , "Huang, Ying" , linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/swap, workingset: make anon shadow nodes memcg aware Message-ID: References: <20240820092359.97782-1-ryncsn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240820092359.97782-1-ryncsn@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 4D9771C0009 X-Rspamd-Server: rspam01 X-Stat-Signature: 1zcqb494ip9k1e37fs38371nno9st4qf X-HE-Tag: 1724199719-258099 X-HE-Meta: U2FsdGVkX1/ShbanTYywmzyKeLOjRfgsZUzIYnPBZTyVZue0MHgDBl97xFOC+BWuWcIUvyWTlkHohkoLDB9emM4VuaUiynYSuT4ZAAsxE3yW+GQKnDIAJTWr1j1H67OdysF7g95HOI+oa7e0D0w/pcW/j2GfMflIFOgCbEYPCnu5f1v0vWkRcGX19c7El1bKRxw0wdU1+q3NAt05Mzxv9/ldBaeHb+gB4GcvJulDmwIOb/lLAOMGmubwblztCk9NH9JHiG7ylE7B0b4VG2vsZUzCChGwsK/xmAY6fp1WbuPra21ZncVuvJ+qfGGp3QIoDwLeU9UDVXLM7+pqZGDSqvMFDzdBsSisUpGG/ZuorfzASNLhTiU9iJ57U9Ob49eZkNkBRcGvCYa+5uiKc7/TP230DYvjTTzB/3SdV0PMOJ0sgkIRbZI3Grj6ftGeB9Pv86HJGii5CWZas82VGMIEzRJ+uBucZHDZfC4CGSPMSiOfMLWCGyJ5nsbs2iiGXt5jPjKKo9klhxoIRGXjDZBS/JCiwmdRCJ7PHeus3rZOkxcVQ92If0L6zAGBrbfYBJ4XTAcnmEkGwOW96DyGosrsYa7EiyR6okHAVVJY4RDhRFmujG52VpgW6r+MDsO0hjgJUteLUpbIHvEgHTUZni/vTGqDJujP3gSzjWfTfuEYGER13d40QI1wqbAplFzeF1CRzFcfkIQH2YqUrXTXuZd2by/3ekTb9CEj50blXaHiIW91vFKCKRdXWTd0y4FhqfwFStV+HrwXXyVr3D5OXyFAc9tPQU5mogjNcuMnWsMsIyhTXIa9IxAs67iHYd0wMubK2L/2z9iInaGuPXqG/8c6XWb4b/tIegvTIGZMLYIPV33LUtqGfz72wxG8r6sLIn/TAT1ZCSyV1ZuAh0G7f6yHaAlYSGwfmMkFN1cNOxkBoQh0cwf52DuuTKexZqbSVQhK931CRbCjCo5SAnzJD7r FLrYuWES Xro3+vGEuaya57fKaIUXkgdPFr3PVUat/sf4uDfkykMXa7kLzaePKyZGJWWeUP2/fDwPrPXIi/VFBDCOa4UX3iAK2m5qqoJR4HOWFxXScecwmPw3BtMDvXj/BtGi7XKmV4lIAT+QMF6ydTVCVCeK2oBDQH9O0/HureWfzQ74asvsefIlBHawfmWoOnzFZPU6bGjHhDi/nBGhbcDoZ223msU7fD9PWCeqoSR1xZm0DLoiZQwTX9oafhxfffA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 20, 2024 at 05:23:59PM GMT, Kairui Song wrote: > From: Kairui Song > > Currently, the workingset (shadow) nodes of the swap cache are not > accounted to their corresponding memory cgroup, instead, they are > all accounted to the root cgroup. This leads to inaccurate accounting > and ineffective reclaiming. One cgroup could swap out a large amount > of memory, take up a large amount of memory with shadow nodes without > being accounted. > > This issue is similar to commit 7b785645e8f1 ("mm: fix page cache > convergence regression"), where page cache shadow nodes were incorrectly > accounted. That was due to the accidental dropping of the accounting > flag during the XArray conversion in commit a28334862993 > ("page cache: Finish XArray conversion"). > > However, this fix has a different cause. Swap cache shadow nodes were > never accounted even before the XArray conversion, since they did not > exist until commit 3852f6768ede ("mm/swapcache: support to handle the > shadow entries"), which was years after the XArray conversion. > > It's worth noting that one anon shadow Xarray node may contain > different entries from different cgroup, and it gets accounted at reclaim > time, so it's arguable which cgroup it should be accounted to (as > Shakeal Butt pointed out [1]). File pages may suffer similar issue > but less common. Things like proactive memory reclaim could make thing > more complex. > > So this commit still can't provide a 100% accurate accounting of anon > shadows, but it covers the cases when one memory cgroup uses significant > amount of swap, and in most cases memory pressure in one cgroup only > suppose to reclaim this cgroup and children. Besides, this fix is clean and > easy enough. > > Link: https://lore.kernel.org/all/7gzevefivueqtebzvikzbucnrnpurmh3scmfuiuo2tnrs37xso@haj7gzepjur2/ [1] > Signed-off-by: Kairui Song > Is this a real issue? Have you seen systems in the production with large amount of memory occupied by anon shadow entries? This is still limited to the amount of swap a cgroup is allowed to use. The reason I am asking is that this solution is worse than the perceived problem at least to me. With this patch, the kernel will be charging unrelated cgroups for the memory of swap xarray nodes during global reclaim and proactive reclaim. You can reduce this weirdness by using set_active_memcg() in add_to_swap_cache() using the given folio's memcg but still you have the case of multiple unrelated folios and shadow entries of different cgroups within the same node. For filesystem case, the userspace can control which files are shared between different cgroups and has more control on it. That is not the case for swap space. thanks, Shakeel