From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6FA8C52D6F for ; Wed, 21 Aug 2024 17:35:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 507A9940007; Wed, 21 Aug 2024 13:35:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B7166B0162; Wed, 21 Aug 2024 13:35:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37E75940007; Wed, 21 Aug 2024 13:35:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1989A6B0161 for ; Wed, 21 Aug 2024 13:35:51 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id ACEF840F70 for ; Wed, 21 Aug 2024 17:35:50 +0000 (UTC) X-FDA: 82476955260.21.1CC13D6 Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by imf23.hostedemail.com (Postfix) with ESMTP id CE343140013 for ; Wed, 21 Aug 2024 17:35:48 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UYg2xTDM; spf=pass (imf23.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724261685; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+pNeo36Vkbvqnl/W0Y1gbeBo9+nQQhKB3WvKM/c4qug=; b=LPXfRiE+x4Stf4jfkbxQZ0UOgQKk8HEk93cjRyAyERLl3T8YA3CQPtYrgOt1QHDQXxvhLQ P3jDf36Tgj0wHPQNFuHWjNtW+Tg1ftpCtxfT6HXgONwx+imoyOMbpfvzanxwYCMrZu6poe LytMSovH3nxZhJNHkG51Xl9JpEgMWC4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UYg2xTDM; spf=pass (imf23.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724261685; a=rsa-sha256; cv=none; b=XcYUgemFRO+fJGBiRuCiWAZSmcwWsdDS5KeFzIVW642flB3SaH1L2kSBaCqJqGj5ylpfop bIIzwkYcuEwfA2E/UiW9x6pMC3hSIGSGk/7y1XJHAHSAO5HLSUsPm6z9pvF8HZY+iaf26v ER0fO0o4+mHXj7oOvkRUYEKHBpEs6ms= Received: by mail-lj1-f180.google.com with SMTP id 38308e7fff4ca-2f3ce5bc7d2so49600881fa.0 for ; Wed, 21 Aug 2024 10:35:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724261747; x=1724866547; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+pNeo36Vkbvqnl/W0Y1gbeBo9+nQQhKB3WvKM/c4qug=; b=UYg2xTDMaKNftQS6lBvYWNq5nXiRe+pLPmfXhLFIsBW4wOt+QZStTTm//rnFmyH4p/ S38EepftP70cVWp3BAiK1VwtWWpOpEfihc9dYaitRjqChk+qeN+bii/qpIxqMfTUhwY8 5+mpAOuxf2qAmITPC9Z6bd6tmXbE18WR1PuwbFL6NZHI3FdA3IELj0fKPXh21zTiuGIm VU0lx0wThRoFT53U7f/G0tjfTXrkozp8FWNNkX6r4EjG0mhwbwO7WJdT/6ALfJeHAOuh 72C/WG7tIvdiQqCHbcamVJ0vPjELq7fXXIl96SMunD9Vkq7IpCZGmIE27rRbT9lvmYZc KIXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724261747; x=1724866547; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+pNeo36Vkbvqnl/W0Y1gbeBo9+nQQhKB3WvKM/c4qug=; b=OOuBO7zY+OvJpORfAoln7SqS8390TxwGPTtHnlVXTgq0PqSha77ug1Tbs3rKcXiexJ HSdMnxvTNuAqKTc60NDo5XX9fVxtY5SMCTKp9gPCxGDcJhp4OHWodSzz2soVHIzSVajs NwtEcQhKqKm51GK06OoNA3xYYTZ15l/RHPyu7gKS2+X4GqcCEe5VX7iKHWiKdDEhBD56 uWy18Ks+sTlIvyKCyytlcTR9wES6VTUCKoP3i35o6gj9b+NAPDETAbDY2rUh0YKYiQI3 ifCjSMGYTOfQQoh0ZlTEhBITxSs+iS5JsO8qsZSFwlBQgGU2aWntutLx4eG5qS3kY3AD ElwA== X-Gm-Message-State: AOJu0Yx/GGeVuAnFQNYsrmyp+JgqfiwtnhyAgHKV0x/LZHKHI+GHUWWD 6zttLoLGWgcWPQrBJXB22VqMsQ2Wsn4uZsAvruss/+zTUguggOngrZloeXed4v0jG2oqMxiY+C4 mtaYH/ulVsfZ/hBzpobdpsd9+Exc= X-Google-Smtp-Source: AGHT+IG/NZBOOu/e0wcNbV4g6vmKqzLubweH30ahIufDqXDEuPhI8DLL2aZsvxWeDKTodOxc7q/1B5XTeoM6Z4gFm/g= X-Received: by 2002:a2e:a592:0:b0:2f3:f170:8ec3 with SMTP id 38308e7fff4ca-2f3f88643f8mr21573881fa.21.1724261746690; Wed, 21 Aug 2024 10:35:46 -0700 (PDT) MIME-Version: 1.0 References: <20240820092359.97782-1-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Thu, 22 Aug 2024 01:35:29 +0800 Message-ID: Subject: Re: [PATCH] mm/swap, workingset: make anon shadow nodes memcg aware To: Shakeel Butt Cc: linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Nhat Pham , Michal Hocko , Chengming Zhou , Muchun Song , Chris Li , Yosry Ahmed , "Huang, Ying" , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: xeeqxr4i573x38ii8uppxo1x4wusiyai X-Rspam-User: X-Rspamd-Queue-Id: CE343140013 X-Rspamd-Server: rspam02 X-HE-Tag: 1724261748-406490 X-HE-Meta: U2FsdGVkX1/fTfJv3uy/9kYQ6PejWZAewFwtjKjdIgg5P/sHGQChsqxAqWcJL76wdyYZkzCeOCNsQ5OrosChqC/bxN3I1zf4B6H5bXRp/wIJC3xAkiQtbfQWobfLS3emAjTbovRQZwhu1d8HSkyA/QOKuZXGs66sIIFOkptha0EuV2ONmDPoz9BZ4dck20Xb0iqeVlY4iRx3xZ8bpWr1WvqAeCwK6u+Xg7VdsqoLNuf7ge2ZltgMnECYe8IJeGSELQOFmh1AbloJisPyyKA9weS0b7HUe4p6aUqQQ8+amgQA0Ftt4X/tHbLvL+3v/Tz3X/5yftN44t3JsfWi9AxqmGHXEZNJpJGF5xDMH/1Qp451iyOzDAs1ALrM8kBP7mODM7ld9Xf8y7ckwASNlOEfrv9WmaktHxSZtuE3+kGg8mIwz5jhOJx5zkOzST5tp8TB4x/AkHcDMshXm8S6HChvNQD1stoFYE8Xp9AmgD54ReCoSC1K3CVd+HO8DDx72O9E7PVFQtVox9vh+PpTzLmRmB6VqRqfhjBjTS2TOiRzbgNCA21zcfD/Q338uswfX+IY/npk0XmRNa/n6npddCFr5w6N1cYPbEWH7S7Euhf6WLw6ZUB+ALSe5B1kDSyWExH0WaaUxQXrps2KVl+pZz8BBg1UVhIxR+HzuZOsK29CFuWIzG+ASOA1pV4Abrm+bWTFLW9DHTlTSgxVcJlkJfe1WfbhItTAP3XJLnM2daV5a8tnP9uqE+GsXqFIHsi22EBGAoPOGNv2WssALIhZ5a5TKBkZFDtFGam/d3xsVmLTYBS1AJXP+lOBWwzN//BEaEpaZtNzUmw8sFDOGkSUZFf+oWs25jh8R9O4t2QOXWfAor1vtQ4jBe2t1J5y0IIymoe31hxeCzymOUPpIvDP/bPosTHaeYMJ+7icLNbD9TSIbj+UaijbwLlrIcPxYgTcN1p8iHCw5qELcFn72OwpiK3 7Xqh2dVC 8tMaPfvnJzFhmPTgyxpXfyOV61gnoksPd+SBUblr0cIGHjhfO48Gu3lKT5IsHjdGqgHJh/gqY9jnOj4Rws9FLdmX20ozxsv1q3d4IVtxYZ5vWQRgPmEAD/7AKvFaYykp9OLnNQKXH6VGUxllCodUVmYsx17wJagdp7Q6PvfA6NblLVEwT5Dwqzbrmsa2SDWNZH6RsWZ2eQ+RjYORzqITbg7fkUjv2xckXAb+UmeoGPY+oxFSvnzFP626MUHk5Qru1yxJJYRWZ+XFzpum5oChwXPiDDfecbcvgEs9UUdL8lSB7CRI4y4PyXc3Q3FBrmzXuJoBxi8yozIYm3QeZzxpx4gkhVpjddfuycqbsnFbQNg1szrZGPYX51MuxKh1r6nn5BT1KjfF9qkTbmH9r73UEERJPParTXxuLGOj3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Shakeel Butt =E4=BA=8E 2024=E5=B9=B48=E6=9C=8821= =E6=97=A5=E5=91=A8=E4=B8=89 08:22=E5=86=99=E9=81=93=EF=BC=9A > > On Tue, Aug 20, 2024 at 05:23:59PM GMT, Kairui Song wrote: > > From: Kairui Song > > > > Currently, the workingset (shadow) nodes of the swap cache are not > > accounted to their corresponding memory cgroup, instead, they are > > all accounted to the root cgroup. This leads to inaccurate accounting > > and ineffective reclaiming. One cgroup could swap out a large amount > > of memory, take up a large amount of memory with shadow nodes without > > being accounted. > > > > This issue is similar to commit 7b785645e8f1 ("mm: fix page cache > > convergence regression"), where page cache shadow nodes were incorrectl= y > > accounted. That was due to the accidental dropping of the accounting > > flag during the XArray conversion in commit a28334862993 > > ("page cache: Finish XArray conversion"). > > > > However, this fix has a different cause. Swap cache shadow nodes were > > never accounted even before the XArray conversion, since they did not > > exist until commit 3852f6768ede ("mm/swapcache: support to handle the > > shadow entries"), which was years after the XArray conversion. > > > > It's worth noting that one anon shadow Xarray node may contain > > different entries from different cgroup, and it gets accounted at recla= im > > time, so it's arguable which cgroup it should be accounted to (as > > Shakeal Butt pointed out [1]). File pages may suffer similar issue > > but less common. Things like proactive memory reclaim could make thing > > more complex. > > > > So this commit still can't provide a 100% accurate accounting of anon > > shadows, but it covers the cases when one memory cgroup uses significan= t > > amount of swap, and in most cases memory pressure in one cgroup only > > suppose to reclaim this cgroup and children. Besides, this fix is clean= and > > easy enough. > > > > Link: https://lore.kernel.org/all/7gzevefivueqtebzvikzbucnrnpurmh3scmfu= iuo2tnrs37xso@haj7gzepjur2/ [1] > > Signed-off-by: Kairui Song > > Hi, Thanks for the comments. > Is this a real issue? Have you seen systems in the production with > large amount of memory occupied by anon shadow entries? This is still > limited to the amount of swap a cgroup is allowed to use. No, this patch is cherry picked from previous series, this help separating the shadows to different cgroup properly according to my test, and reduces the lock contention of list_lru by a lot combined with later patches. Not very convincing on its own indeed, so I hesitated to send it alone. > The reason I am asking is that this solution is worse than the perceived > problem at least to me. With this patch, the kernel will be charging > unrelated cgroups for the memory of swap xarray nodes during global > reclaim and proactive reclaim. Yes, this could be a problem. I didn't observe this happening frequently with tests though, SWAP tends to cluster the SWAP allocations, and reclaiming tends to batch reclaim pages, so usually there is a fair high chance that shadows of pages of the same memcg stay on the same node. It could end up completely random when the SWAP device is getting fragmented or reclaim is struggling though. > You can reduce this weirdness by using set_active_memcg() in > add_to_swap_cache() using the given folio's memcg but still you have the > case of multiple unrelated folios and shadow entries of different > cgroups within the same node. For filesystem case, the userspace can > control which files are shared between different cgroups and has more > control on it. That is not the case for swap space. Right, this fix is not perfect, it's arguable if this new behaviour is better or worse than before. There is some ongoing work from the SWAP side so things may get fixed differently in the future, but I'll also check if this patch can be improved.