linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Kairui Song <kasong@tencent.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Waiman Long <longman@redhat.com>,
	 Shakeel Butt <shakeelb@google.com>,
	Nhat Pham <nphamcs@gmail.com>, Michal Hocko <mhocko@suse.com>,
	 Chengming Zhou <zhouchengming@bytedance.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	 Muchun Song <muchun.song@linux.dev>,
	Chris Li <chrisl@kernel.org>,
	 Yosry Ahmed <yosryahmed@google.com>,
	"Huang, Ying" <ying.huang@intel.com>
Subject: Re: [PATCH 1/7] mm/swap, workingset: make anon workingset nodes memcg aware
Date: Thu, 18 Jul 2024 18:34:37 -0700	[thread overview]
Message-ID: <7gzevefivueqtebzvikzbucnrnpurmh3scmfuiuo2tnrs37xso@haj7gzepjur2> (raw)
In-Reply-To: <20240624175313.47329-2-ryncsn@gmail.com>

On Tue, Jun 25, 2024 at 01:53:07AM GMT, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> Currently, the (shadow) nodes of the swap cache are not accounted to
> their corresponding memory cgroup, instead, they are all accounted
> to the root cgroup. This leads to inaccurate accounting and
> ineffective reclaiming.
> 
> This issue is similar to commit 7b785645e8f1 ("mm: fix page cache
> convergence regression"), where page cache shadow nodes were incorrectly
> accounted. That was due to the accidental dropping of the accounting
> flag during the XArray conversion in commit a28334862993
> ("page cache: Finish XArray conversion").
> 
> However, this fix has a different cause. Swap cache shadow nodes were
> never accounted even before the XArray conversion, since they did not
> exist until commit 3852f6768ede ("mm/swapcache: support to handle the
> shadow entries"), which was years after the XArray conversion. Without
> shadow nodes, swap cache nodes can only use a very small amount of memory
> and so reclaiming is not very important.
> 
> But now with shadow nodes, if a cgroup swaps out a large amount of
> memory, it could take up a lot of memory.
> 
> This can be easily fixed by adding proper flags and LRU setters.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>

As Muchun said, please send this patch separately. However as I am
thinking more about this patch, I think it is incomplete and the full
solution may be much more involved and might not be worth it.

One of the differences between file page cache and swap page cache is
the context in which the underlying xarray node can be allocated. For
file pages, such allocations happen in the context of the process/cgroup
owning the file pages and thus the memcg of the current is used for
charging. However xarray node allocations happen when a page is added to
swap cache which often happen in reclaim and reclaim can happen in any
context i.e. kernel thread, unrelated process/cgroups e.t.c. So, we may
charge unrelated memcg for these nodes.

Now you may argue that we can use the memcg of the page which is being
swapped out but then you may have an xarray node containing pointers to
pages (or shadows) of different memcgs. Who should be charged?

BTW filesystem shared between multiple cgroups can face this issue as
well but users have more control on shared filesystem as compare to
shared swap address space.

Shakeel


  parent reply	other threads:[~2024-07-19  1:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-24 17:53 [PATCH 0/7] Split list_lru lock into per-cgroup scope Kairui Song
2024-06-24 17:53 ` [PATCH 1/7] mm/swap, workingset: make anon workingset nodes memcg aware Kairui Song
2024-07-17  3:25   ` Muchun Song
2024-07-18 11:33     ` Kairui Song
2024-07-19  1:34   ` Shakeel Butt [this message]
2024-06-24 17:53 ` [PATCH 2/7] mm/list_lru: don't pass unnecessary key parameters Kairui Song
2024-06-24 17:53 ` [PATCH 3/7] mm/list_lru: don't export list_lru_add Kairui Song
2024-07-17  3:12   ` Muchun Song
2024-06-24 17:53 ` [PATCH 4/7] mm/list_lru: code clean up for reparenting Kairui Song
2024-07-15  9:10   ` Muchun Song
2024-07-16  8:15     ` Kairui Song
2024-06-24 17:53 ` [PATCH 5/7] mm/list_lru: simplify reparenting and initial allocation Kairui Song
2024-07-17  3:04   ` Muchun Song
2024-07-18 11:49     ` Kairui Song
2024-07-19  2:45       ` Muchun Song
2024-06-24 17:53 ` [PATCH 6/7] mm/list_lru: split the lock to per-cgroup scope Kairui Song
2024-06-24 17:53 ` [PATCH 7/7] mm/list_lru: Simplify the list_lru walk callback function Kairui Song
2024-06-27 19:58   ` kernel test robot
2024-06-24 21:26 ` [PATCH 0/7] Split list_lru lock into per-cgroup scope Andrew Morton
2024-06-25  7:47   ` Kairui Song
2024-06-25 17:00   ` Shakeel Butt
2024-08-27 18:35 ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7gzevefivueqtebzvikzbucnrnpurmh3scmfuiuo2tnrs37xso@haj7gzepjur2 \
    --to=shakeel.butt@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox