linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Daisuke Nishimura <d-nishimura@mtf.biglobe.ne.jp>,
	linux-mm <linux-mm@kvack.org>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Hugh Dickins <hugh@veritas.com>
Subject: [PATCH] fix unused/stale swap cache handling on memcg  v3
Date: Fri, 20 Mar 2009 16:45:20 +0900	[thread overview]
Message-ID: <20090320164520.f969907a.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <432ace3655a26d2d492a56303369a88a.squirrel@webmail-b.css.fujitsu.com>

I'll test this one in this week end.
Maybe much simpler than previous ones. Thank you for all your help!

-Kame
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Nishimura reported that, in racy case, swap cache is not freed even if
it will be never used. For making use of laziness of LRU, some racy pages
are not freed _interntionally_ and the kernel expects the global LRU will
reclaim it later.

When it comes to memcg, if well controlled, global LRU will not work very
often and above "ok, it's busy, reclaim it later by Global LRU" logic means
leak of swp_entry. Nishimura found that this can cause OOM.

This patch tries to fix this by calling try_to_free_swap() againt the
stale page caches.

Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/swap.h |    2 ++
 mm/memcontrol.c      |   41 +++++++++++++++++++++++++++++++++++++++++
 mm/swapfile.c        |   23 ++++++++++++++++++-----
 mm/vmscan.c          |    9 +++++++++
 4 files changed, 70 insertions(+), 5 deletions(-)

Index: mmotm-2.6.29-Mar11/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar11.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar11/mm/memcontrol.c
@@ -1550,8 +1550,49 @@ void mem_cgroup_uncharge_swap(swp_entry_
 	}
 	rcu_read_unlock();
 }
+
 #endif
 
+/* For handle some racy case. */
+struct memcg_swap_validate {
+	struct work_struct work;
+	struct page *page;
+};
+
+static void mem_cgroup_validate_swapcache_cb(struct work_struct *work)
+{
+	struct memcg_swap_validate *mywork;
+	struct page *page;
+
+	mywork = container_of(work, struct memcg_swap_validate, work);
+	page = mywork->page;
+	/* We can wait lock now....validate swap is still alive or not */
+	lock_page(page);
+	try_to_free_swap(page);
+	unlock_page(page);
+	put_page(page);
+	kfree(mywork);
+	return;
+}
+
+void mem_cgroup_validate_swapcache(struct page *page)
+{
+	struct memcg_swap_validate *work;
+	/*
+	 * Unfortunately, we cannot lock this page here. So, schedule this
+	 * again later.
+	 */
+	get_page(page);
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(&work->work, mem_cgroup_validate_swapcache_cb);
+		work->page = page;
+		schedule_work(&work->work);
+	} else /* If this small kmalloc() fails, LRU will work and find this */
+		put_page(page);
+	return;
+}
+
 /*
  * Before starting migration, account PAGE_SIZE to mem_cgroup that the old
  * page belongs to.
Index: mmotm-2.6.29-Mar11/mm/swapfile.c
===================================================================
--- mmotm-2.6.29-Mar11.orig/mm/swapfile.c
+++ mmotm-2.6.29-Mar11/mm/swapfile.c
@@ -578,6 +578,7 @@ int free_swap_and_cache(swp_entry_t entr
 {
 	struct swap_info_struct *p;
 	struct page *page = NULL;
+	struct page *check = NULL;
 
 	if (is_migration_entry(entry))
 		return 1;
@@ -586,9 +587,11 @@ int free_swap_and_cache(swp_entry_t entr
 	if (p) {
 		if (swap_entry_free(p, entry) == 1) {
 			page = find_get_page(&swapper_space, entry.val);
-			if (page && !trylock_page(page)) {
-				page_cache_release(page);
-				page = NULL;
+			if (page) {
+				if (!trylock_page(page)) {
+					check = page;
+					page = NULL;
+				}
 			}
 		}
 		spin_unlock(&swap_lock);
@@ -602,10 +605,20 @@ int free_swap_and_cache(swp_entry_t entr
 				(!page_mapped(page) || vm_swap_full())) {
 			delete_from_swap_cache(page);
 			SetPageDirty(page);
-		}
+		} else
+			check = page;
 		unlock_page(page);
-		page_cache_release(page);
+		if (!check)
+			page_cache_release(page);
 	}
+
+	if (check) {
+		/* Check accounting of this page in lazy way.*/
+		if (PageSwapCache(check) && !page_mapped(check))
+			mem_cgroup_validate_swapcache(check);
+		page_cache_release(check);
+	}
+
 	return p != NULL;
 }
 
Index: mmotm-2.6.29-Mar11/mm/vmscan.c
===================================================================
--- mmotm-2.6.29-Mar11.orig/mm/vmscan.c
+++ mmotm-2.6.29-Mar11/mm/vmscan.c
@@ -782,6 +782,15 @@ activate_locked:
 		SetPageActive(page);
 		pgactivate++;
 keep_locked:
+		/*
+		 * This can happen under racy case between unmap and us. If
+		 * a page is added to swapcache while it's unmmaped, the page
+		 * may reach here. Check again this page(swap) is worth to be
+		 * kept.
+		 * (Is this needed to be only under memcg ?
+		 */
+		if (PageSwapCache(page) && !page_mapped(page))
+			try_to_free_swap(page);
 		unlock_page(page);
 keep:
 		list_add(&page->lru, &ret_pages);
Index: mmotm-2.6.29-Mar11/include/linux/swap.h
===================================================================
--- mmotm-2.6.29-Mar11.orig/include/linux/swap.h
+++ mmotm-2.6.29-Mar11/include/linux/swap.h
@@ -337,11 +337,13 @@ static inline void disable_swap_token(vo
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent);
+extern void mem_cgroup_validate_swapcache(struct page *page);
 #else
 static inline void
 mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
 {
 }
+static inline void mem_cgroup_validate_swapcache(struct page *page) {}
 #endif
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern void mem_cgroup_uncharge_swap(swp_entry_t ent);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-03-20 16:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-17  4:57 [RFC] memcg: handle swapcache leak Daisuke Nishimura
2009-03-17  5:39 ` KAMEZAWA Hiroyuki
2009-03-17  6:11   ` Daisuke Nishimura
2009-03-17  7:29     ` KAMEZAWA Hiroyuki
2009-03-17  9:38       ` KAMEZAWA Hiroyuki
2009-03-18  1:17         ` Daisuke Nishimura
2009-03-18  1:34           ` KAMEZAWA Hiroyuki
2009-03-18  3:51             ` Daisuke Nishimura
2009-03-18  4:05               ` KAMEZAWA Hiroyuki
2009-03-18  8:57               ` [PATCH] fix unused/stale swap cache handling on memcg v1 (Re: " KAMEZAWA Hiroyuki
2009-03-18 14:17                 ` Daisuke Nishimura
2009-03-18 23:45                   ` KAMEZAWA Hiroyuki
2009-03-19  2:16                     ` KAMEZAWA Hiroyuki
2009-03-19  9:06                       ` [PATCH] fix unused/stale swap cache handling on memcg v2 KAMEZAWA Hiroyuki
2009-03-19 10:01                         ` Daisuke Nishimura
2009-03-19 10:13                           ` Daisuke Nishimura
2009-03-19 10:46                             ` KAMEZAWA Hiroyuki
2009-03-19 11:36                               ` KAMEZAWA Hiroyuki
2009-03-20  7:45                                 ` KAMEZAWA Hiroyuki [this message]
2009-03-23  1:45                                   ` [PATCH] fix unused/stale swap cache handling on memcg v3 Daisuke Nishimura
2009-03-23  2:41                                     ` KAMEZAWA Hiroyuki
2009-03-23  5:04                                       ` Daisuke Nishimura
2009-03-23  5:22                                         ` KAMEZAWA Hiroyuki
2009-03-24  8:32                                           ` Daisuke Nishimura
2009-03-24 23:57                                             ` KAMEZAWA Hiroyuki
2009-04-17  6:34                                               ` Daisuke Nishimura
2009-04-17  6:54                                                 ` KAMEZAWA Hiroyuki
2009-04-17  7:50                                                   ` Daisuke Nishimura
2009-04-17  7:58                                                     ` KAMEZAWA Hiroyuki
2009-04-17  8:12                                                       ` Daisuke Nishimura
2009-04-17  8:13                                                         ` KAMEZAWA Hiroyuki
2009-04-21  2:35                                                           ` Daisuke Nishimura
2009-04-21  2:57                                                             ` KAMEZAWA Hiroyuki
2009-04-21  4:05                                                               ` Daisuke Nishimura
2009-04-17  8:11                                                     ` KAMEZAWA Hiroyuki
2009-03-18  0:08       ` [RFC] memcg: handle swapcache leak Daisuke Nishimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090320164520.f969907a.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=d-nishimura@mtf.biglobe.ne.jp \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox