From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
To: linux-mm <linux-mm@kvack.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Hugh Dickins <hugh@veritas.com>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Subject: [RFC] memcg: handle swapcache leak
Date: Tue, 17 Mar 2009 13:57:02 +0900 [thread overview]
Message-ID: <20090317135702.4222e62e.nishimura@mxp.nes.nec.co.jp> (raw)
Hi.
There are (at least) 2 types(described later) of swapcache leak in current memcg.
I mean by "swapcache leak" a swapcache which:
a. the process that used the page has already exited(or
unmapped the page).
b. is not linked to memcg's LRU because the page is !PageCgroupUsed.
So, only the global page reclaim or swapoff can free these leaked swapcaches.
This means memcg's memory pressure can use up all swap entries if
the memory size of the system is greater than that of swap.
1. race between exit and swap-in
Assume processA is exitting and processB is doing swap-in.
If some pages of processA has been swapped out, it calls free_swap_and_cache().
And if at the same time, processB is calling read_swap_cache_async() about
a swap entry *that is used by processA*, a race like below can happen.
processA | processB
-------------------------------------+-------------------------------------
(free_swap_and_cache()) | (read_swap_cache_async())
| swap_duplicate()
| __set_page_locked()
| add_to_swap_cache()
swap_entry_free() == 0 |
find_get_page() -> found |
try_lock_page() -> fail & return |
| lru_cache_add_anon()
| doesn't link this page to memcg's
| LRU, because of !PageCgroupUsed.
This type of leak can be avoided by setting /proc/sys/vm/page-cluster to 0.
And this type of leaked swapcaches have been charged as swap,
so swap entries of them have reference to the associated memcg
and the refcnt of the memcg has been incremented.
As a result this memcg cannot be free'ed until global page reclaim
frees this swapcache or swapoff is executed.
Actually, I saw "struct mem_cgroup leak"(checked by "grep kmalloc-1024 /proc/slabinfo")
in my test, where I create a new directory, move all tasks to the new
directory, and remove the old directory under memcg's memory pressure.
And, this "struct mem_cgroup leak" didn't happen with setting
/proc/sys/vm/page-cluster to 0.
2. race between exit and swap-out
If page_remove_rmap() is called by the owner process about an anonymous
page(not on swapchache, so uncharged here) before shrink_page_list() adds
the page to swapcache, this page becomes a swapcache with !PageCgroupUsed.
And if this swapcache is not free'ed by shrink_page_list(), it goes back
to global LRU, but doesn't go back to memcg's LRU because the page is
!PageCgroupUsed.
This type of leak can be avoided by modifying shrink_page_list() like:
===
@@ -775,6 +776,21 @@ activate_locked:
SetPageActive(page);
pgactivate++;
keep_locked:
+ if (!scanning_global_lru(sc) && PageSwapCache(page)) {
+ struct page_cgroup *pc;
+
+ pc = lookup_page_cgroup(page);
+ /*
+ * Used bit of swapcache is solid under page lock.
+ */
+ if (unlikely(!PageCgroupUsed(pc)))
+ /*
+ * This can happen if the page is unmapped by
+ * the owner process before it is added to
+ * swapcache.
+ */
+ try_to_free_swap(page);
+ }
unlock_page(page);
keep:
list_add(&page->lru, &ret_pages);
===
I've confirmed that no leak happens with this patch for shrink_page_list() applied
and setting /proc/sys/vm/page-cluster to 0 in a simple swap in/out test.
(I think I should check page migration and rmdir too.)
I think the root cause of these problem is that !PageCgroupUsed pages are not linked
to any memcg's LRU.
So, I'm tring to implement "dummy_memcg" to maintain !PageCgroupUsed pages now.
Any comments or suggestions would be welcome.
Thanks,
Daisuke Nishimura.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2009-03-17 5:10 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-17 4:57 Daisuke Nishimura [this message]
2009-03-17 5:39 ` KAMEZAWA Hiroyuki
2009-03-17 6:11 ` Daisuke Nishimura
2009-03-17 7:29 ` KAMEZAWA Hiroyuki
2009-03-17 9:38 ` KAMEZAWA Hiroyuki
2009-03-18 1:17 ` Daisuke Nishimura
2009-03-18 1:34 ` KAMEZAWA Hiroyuki
2009-03-18 3:51 ` Daisuke Nishimura
2009-03-18 4:05 ` KAMEZAWA Hiroyuki
2009-03-18 8:57 ` [PATCH] fix unused/stale swap cache handling on memcg v1 (Re: " KAMEZAWA Hiroyuki
2009-03-18 14:17 ` Daisuke Nishimura
2009-03-18 23:45 ` KAMEZAWA Hiroyuki
2009-03-19 2:16 ` KAMEZAWA Hiroyuki
2009-03-19 9:06 ` [PATCH] fix unused/stale swap cache handling on memcg v2 KAMEZAWA Hiroyuki
2009-03-19 10:01 ` Daisuke Nishimura
2009-03-19 10:13 ` Daisuke Nishimura
2009-03-19 10:46 ` KAMEZAWA Hiroyuki
2009-03-19 11:36 ` KAMEZAWA Hiroyuki
2009-03-20 7:45 ` [PATCH] fix unused/stale swap cache handling on memcg v3 KAMEZAWA Hiroyuki
2009-03-23 1:45 ` Daisuke Nishimura
2009-03-23 2:41 ` KAMEZAWA Hiroyuki
2009-03-23 5:04 ` Daisuke Nishimura
2009-03-23 5:22 ` KAMEZAWA Hiroyuki
2009-03-24 8:32 ` Daisuke Nishimura
2009-03-24 23:57 ` KAMEZAWA Hiroyuki
2009-04-17 6:34 ` Daisuke Nishimura
2009-04-17 6:54 ` KAMEZAWA Hiroyuki
2009-04-17 7:50 ` Daisuke Nishimura
2009-04-17 7:58 ` KAMEZAWA Hiroyuki
2009-04-17 8:12 ` Daisuke Nishimura
2009-04-17 8:13 ` KAMEZAWA Hiroyuki
2009-04-21 2:35 ` Daisuke Nishimura
2009-04-21 2:57 ` KAMEZAWA Hiroyuki
2009-04-21 4:05 ` Daisuke Nishimura
2009-04-17 8:11 ` KAMEZAWA Hiroyuki
2009-03-18 0:08 ` [RFC] memcg: handle swapcache leak Daisuke Nishimura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090317135702.4222e62e.nishimura@mxp.nes.nec.co.jp \
--to=nishimura@mxp.nes.nec.co.jp \
--cc=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox