linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Chris Li <chrisl@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 "Huang, Ying" <ying.huang@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 05/24] mm/swap: move readahead policy checking into swapin_readahead
Date: Tue, 21 Nov 2023 16:32:45 +0800	[thread overview]
Message-ID: <CAMgjq7Cf1jQnnvWB-DCORhGBacbbnKa4Q_G-+3Z7Nwz-Ogxxzw@mail.gmail.com> (raw)
In-Reply-To: <CAF8kJuMoiGe3e98Lx0NWmb25vVx0s3SdKqR3yiiG2rQKk0ztNQ@mail.gmail.com>

Chris Li <chrisl@kernel.org> 于2023年11月21日周二 15:41写道:
>
> On Mon, Nov 20, 2023 at 10:35 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > Chris Li <chrisl@kernel.org> 于2023年11月21日周二 14:18写道:
> > >
> > > On Sun, Nov 19, 2023 at 11:48 AM Kairui Song <ryncsn@gmail.com> wrote:
> > > >
> > > > From: Kairui Song <kasong@tencent.com>
> > > >
> > > > This makes swapin_readahead a main entry for swapin pages,
> > > > prepare for optimizations in later commits.
> > > >
> > > > This also makes swapoff able to make use of readahead checking
> > > > based on entry. Swapping off a 10G ZRAM (lzo-rle) is faster:
> > > >
> > > > Before:
> > > > time swapoff /dev/zram0
> > > > real    0m12.337s
> > > > user    0m0.001s
> > > > sys     0m12.329s
> > > >
> > > > After:
> > > > time swapoff /dev/zram0
> > > > real    0m9.728s
> > > > user    0m0.001s
> > > > sys     0m9.719s
> > > >
> > > > And what's more, because now swapoff will also make use of no-readahead
> > > > swapin helper, this also fixed a bug for no-readahead case (eg. ZRAM):
> > > > when a process that swapped out some memory previously was moved to a new
> > > > cgroup, and the original cgroup is dead, swapoff the swap device will
> > > > make the swapped in pages accounted into the process doing the swapoff
> > > > instead of the new cgroup the process was moved to.
> > > >
> > > > This can be easily reproduced by:
> > > > - Setup a ramdisk (eg. ZRAM) swap.
> > > > - Create memory cgroup A, B and C.
> > > > - Spawn process P1 in cgroup A and make it swap out some pages.
> > > > - Move process P1 to memory cgroup B.
> > > > - Destroy cgroup A.
> > > > - Do a swapoff in cgroup C.
> > > > - Swapped in pages is accounted into cgroup C.
>
> In a strange way it makes sense to charge to C.
> Swap out == free up memory.
> Swap in == consume memory.
> C turn off swap, effectively this behavior will consume a lot of memory.
> C gets charged, so if the C is out of memory, it will punish C.
> C will not be able to continue swap in memory. The problem gets under control.

Yes, I think charging either C or B makes sense in their own way. To
me I think current behavior is kind of counter-intuitive.

Image if there are cgroup PC1, and its child cgroup CC1, CC2. If a process
swapped out some memory in CC1 then moved to CC2, and CC1 is dying.
On swapoff the charge will be moved out of PC1...

And swapoff often happens in some unlimited admin cgroup or some
cgroup for management agents.

If PC1 has a memory limit, the process in it can breach the limit easily,
we will see a process that never left PC1 having a much higher RSS
than PC1/CC1/CC2's limit.

And if there is a limit for the management agent cgroup, the agent
will be OOM instead of OOM in PC1.

Simply moving a process between the child cgroup of the same parent
cgroup won't cause a similar issue, things get weird when swapoff is
involved.

And actually with multiple layers of swap, it's less risky to swapoff
a device since other swap devices can catch over committed memory.

Oh, and there is one more case I forgot to cover in this series:
Moving a process is indeed something not happening very frequently,
but a process run in cgroup then exit, and leave some shmem swapped
out could be a common case.
Current behavior on swapoff will move these charges out of the
original parent cgroup too.

So maybe a more ideal solution for swapoff is: simply always charge a
dying cgroup parent cgroup?

Maybe a sysctl/cmdline could be introduced to control the behavior.


  reply	other threads:[~2023-11-21  8:33 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 19:47 [PATCH 00/24] Swapin path refactor for optimization and bugfix Kairui Song
2023-11-19 19:47 ` [PATCH 01/24] mm/swap: fix a potential undefined behavior issue Kairui Song
2023-11-19 20:55   ` Matthew Wilcox
2023-11-20  3:35     ` Chris Li
2023-11-20 11:14       ` Kairui Song
2023-11-20 17:34         ` Chris Li
2023-11-19 19:47 ` [PATCH 02/24] mm/swapfile.c: add back some comment Kairui Song
2023-11-19 19:47 ` [PATCH 03/24] mm/swap: move no readahead swapin code to a stand alone helper Kairui Song
2023-11-19 21:00   ` Matthew Wilcox
2023-11-20 11:14     ` Kairui Song
2023-11-20 14:55   ` Dan Carpenter
2023-11-21  5:34   ` Chris Li
2023-11-22 17:33     ` Kairui Song
2023-11-19 19:47 ` [PATCH 04/24] mm/swap: avoid setting page lock bit and doing extra unlock check Kairui Song
2023-11-20  4:17   ` Chris Li
2023-11-20 11:15     ` Kairui Song
2023-11-20 17:44       ` Chris Li
2023-11-22 17:32         ` Kairui Song
2023-11-22 20:57           ` Chris Li
2023-11-24  8:14             ` Kairui Song
2023-11-24  8:37               ` Christopher Li
2023-11-19 19:47 ` [PATCH 05/24] mm/swap: move readahead policy checking into swapin_readahead Kairui Song
2023-11-21  6:15   ` Chris Li
2023-11-21  6:35     ` Kairui Song
2023-11-21  7:41       ` Chris Li
2023-11-21  8:32         ` Kairui Song [this message]
2023-11-21 15:24           ` Chris Li
2023-11-19 19:47 ` [PATCH 06/24] swap: rework swapin_no_readahead arguments Kairui Song
2023-11-20  0:20   ` kernel test robot
2023-11-21  6:44   ` Chris Li
2023-11-23 10:51     ` Kairui Song
2023-11-19 19:47 ` [PATCH 07/24] mm/swap: move swap_count to header to be shared Kairui Song
2023-11-21  6:51   ` Chris Li
2023-11-21  7:03     ` Kairui Song
2023-11-19 19:47 ` [PATCH 08/24] mm/swap: check readahead policy per entry Kairui Song
2023-11-20  6:04   ` Huang, Ying
2023-11-20 11:17     ` Kairui Song
2023-11-21  1:10       ` Huang, Ying
2023-11-21  5:20         ` Chris Li
2023-11-21  5:13       ` Chris Li
2023-11-21  7:54   ` Chris Li
2023-11-23 10:52     ` Kairui Song
2023-11-19 19:47 ` [PATCH 09/24] mm/swap: inline __swap_count Kairui Song
2023-11-20  7:41   ` Huang, Ying
2023-11-21  8:02     ` Chris Li
2023-11-19 19:47 ` [PATCH 10/24] mm/swap: remove nr_rotate_swap and related code Kairui Song
2023-11-21 15:45   ` Chris Li
2023-11-19 19:47 ` [PATCH 11/24] mm/swap: also handle swapcache lookup in swapin_readahead Kairui Song
2023-11-20  0:47   ` kernel test robot
2023-11-21 16:06   ` Chris Li
2023-11-24  8:42     ` Kairui Song
2023-11-24  9:10       ` Chris Li
2023-11-19 19:47 ` [PATCH 12/24] mm/swap: simplify arguments for swap_cache_get_folio Kairui Song
2023-11-21 16:36   ` Chris Li
2023-11-19 19:47 ` [PATCH 13/24] swap: simplify swap_cache_get_folio Kairui Song
2023-11-21 16:50   ` Chris Li
2023-11-19 19:47 ` [PATCH 14/24] mm/swap: do shadow lookup as well when doing swap cache lookup Kairui Song
2023-11-21 16:55   ` Chris Li
2023-11-19 19:47 ` [PATCH 15/24] mm/swap: avoid an duplicated swap cache lookup for SYNCHRONOUS_IO device Kairui Song
2023-11-21 17:15   ` Chris Li
2023-11-22 18:08     ` Kairui Song
2023-11-19 19:47 ` [PATCH 16/24] mm/swap: reduce scope of get_swap_device in swapin path Kairui Song
2023-11-19 21:12   ` Matthew Wilcox
2023-11-20 11:14     ` Kairui Song
2023-11-21 17:25   ` Chris Li
2023-11-22  0:36   ` Huang, Ying
2023-11-23 11:13     ` Kairui Song
2023-11-24  0:40       ` Huang, Ying
2023-11-19 19:47 ` [PATCH 17/24] mm/swap: fix false error when swapoff race with swapin Kairui Song
2023-11-19 19:47 ` [PATCH 18/24] mm/swap: introduce a helper non fault swapin Kairui Song
2023-11-20  1:07   ` kernel test robot
2023-11-22  4:40   ` Chris Li
2023-11-28 11:22     ` Kairui Song
2023-12-13  2:22       ` Chris Li
2023-11-19 19:47 ` [PATCH 19/24] shmem, swap: refactor error check on OOM or race Kairui Song
2023-11-20  7:04   ` Chris Li
2023-11-20 11:17     ` Kairui Song
2023-11-19 19:47 ` [PATCH 20/24] swap: simplify and make swap_find_cache static Kairui Song
2023-11-22  5:01   ` Chris Li
2023-11-19 19:47 ` [PATCH 21/24] swap: make swapin_readahead result checking argument mandatory Kairui Song
2023-11-22  5:15   ` Chris Li
2023-11-24  8:14     ` Kairui Song
2023-11-19 19:47 ` [PATCH 22/24] swap: make swap_cluster_readahead static Kairui Song
2023-11-22  5:20   ` Chris Li
2023-11-19 19:47 ` [PATCH 23/24] swap: fix multiple swap leak when after cgroup migrate Kairui Song
2023-11-20  7:35   ` Huang, Ying
2023-11-20 11:17     ` Kairui Song
2023-11-22  5:34       ` Chris Li
2023-11-19 19:47 ` [PATCH 24/24] mm/swap: change swapin_readahead to swapin_page_fault Kairui Song
2023-11-20 19:09 ` [PATCH 00/24] Swapin path refactor for optimization and bugfix Yosry Ahmed
2023-11-20 20:22   ` Chris Li
2023-11-22  6:46     ` Kairui Song
2023-11-22  6:43   ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7Cf1jQnnvWB-DCORhGBacbbnKa4Q_G-+3Z7Nwz-Ogxxzw@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox