From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
mingo@elte.hu,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: [PATCH][BUGFIX] memcg: fix for deadlock between lock_page_cgroup and mapping tree_lock
Date: Tue, 12 May 2009 17:13:56 +0900 [thread overview]
Message-ID: <20090512171356.3d3a7554.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090512170007.ad7f5c7b.nishimura@mxp.nes.nec.co.jp>
On Tue, 12 May 2009 17:00:07 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> hmm, I see.
> cache_charge is outside of tree_lock, so moving uncharge would make sense.
> IMHO, we should make the period of spinlock as small as possible,
> and charge/uncharge of pagecache/swapcache is protected by page lock, not tree_lock.
>
How about this ?
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
As Nishimura pointed out, mapping->tree_lock can be aquired from interrupt
context. Then, following dead lock can occur.
Assume "A" as a page.
CPU0:
lock_page_cgroup(A)
interrupted
-> take mapping->tree_lock.
CPU1:
take mapping->tree_lock
-> lock_page_cgroup(A)
This patch tries to fix above deadlock by moving memcg's hook to out of
mapping->tree_lock.
After this patch, lock_page_cgroup() is not called under mapping->tree_lock.
Making Nishimura's first fix more fundamanetal for avoiding to add special case.
Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
mm/filemap.c | 6 +++---
mm/swap_state.c | 2 +-
mm/truncate.c | 1 +
mm/vmscan.c | 2 ++
4 files changed, 7 insertions(+), 4 deletions(-)
Index: mmotm-2.6.30-May07/mm/filemap.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/filemap.c
+++ mmotm-2.6.30-May07/mm/filemap.c
@@ -121,7 +121,6 @@ void __remove_from_page_cache(struct pag
mapping->nrpages--;
__dec_zone_page_state(page, NR_FILE_PAGES);
BUG_ON(page_mapped(page));
- mem_cgroup_uncharge_cache_page(page);
/*
* Some filesystems seem to re-dirty the page even after
@@ -145,6 +144,7 @@ void remove_from_page_cache(struct page
spin_lock_irq(&mapping->tree_lock);
__remove_from_page_cache(page);
spin_unlock_irq(&mapping->tree_lock);
+ mem_cgroup_uncharge_cache_page(page);
}
static int sync_page(void *word)
@@ -476,13 +476,13 @@ int add_to_page_cache_locked(struct page
if (likely(!error)) {
mapping->nrpages++;
__inc_zone_page_state(page, NR_FILE_PAGES);
+ spin_unlock_irq(&mapping->tree_lock);
} else {
page->mapping = NULL;
+ spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);
page_cache_release(page);
}
-
- spin_unlock_irq(&mapping->tree_lock);
radix_tree_preload_end();
} else
mem_cgroup_uncharge_cache_page(page);
Index: mmotm-2.6.30-May07/mm/swap_state.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/swap_state.c
+++ mmotm-2.6.30-May07/mm/swap_state.c
@@ -121,7 +121,6 @@ void __delete_from_swap_cache(struct pag
total_swapcache_pages--;
__dec_zone_page_state(page, NR_FILE_PAGES);
INC_CACHE_INFO(del_total);
- mem_cgroup_uncharge_swapcache(page, ent);
}
/**
@@ -191,6 +190,7 @@ void delete_from_swap_cache(struct page
__delete_from_swap_cache(page);
spin_unlock_irq(&swapper_space.tree_lock);
+ mem_cgroup_uncharge_swapcache(page, ent);
swap_free(entry);
page_cache_release(page);
}
Index: mmotm-2.6.30-May07/mm/truncate.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/truncate.c
+++ mmotm-2.6.30-May07/mm/truncate.c
@@ -359,6 +359,7 @@ invalidate_complete_page2(struct address
BUG_ON(page_has_private(page));
__remove_from_page_cache(page);
spin_unlock_irq(&mapping->tree_lock);
+ mem_cgroup_uncharge_cache_page(page);
page_cache_release(page); /* pagecache ref */
return 1;
failed:
Index: mmotm-2.6.30-May07/mm/vmscan.c
===================================================================
--- mmotm-2.6.30-May07.orig/mm/vmscan.c
+++ mmotm-2.6.30-May07/mm/vmscan.c
@@ -477,10 +477,12 @@ static int __remove_mapping(struct addre
swp_entry_t swap = { .val = page_private(page) };
__delete_from_swap_cache(page);
spin_unlock_irq(&mapping->tree_lock);
+ mem_cgroup_uncharge_swapcache(page);
swap_free(swap);
} else {
__remove_from_page_cache(page);
spin_unlock_irq(&mapping->tree_lock);
+ mem_cgroup_uncharge_cache_page(page);
}
return 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-05-12 8:15 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-12 1:44 [PATCH 0/3] fix stale swap cache account leak in memcg v7 KAMEZAWA Hiroyuki
2009-05-12 1:45 ` [PATCH 1/3] add check for mem cgroup is activated KAMEZAWA Hiroyuki
2009-05-12 1:46 ` [PATCH 2/3] fix swap cache account leak at swapin-readahead KAMEZAWA Hiroyuki
2009-05-12 4:32 ` Daisuke Nishimura
2009-05-12 11:24 ` Johannes Weiner
2009-05-12 23:58 ` KAMEZAWA Hiroyuki
2009-05-13 11:18 ` Johannes Weiner
2009-05-13 18:03 ` Hugh Dickins
2009-05-14 0:05 ` KAMEZAWA Hiroyuki
2009-05-12 1:47 ` [PATCH 3/3] fix stale swap cache at writeback KAMEZAWA Hiroyuki
2009-05-12 5:06 ` [PATCH 4/3] memcg: call uncharge_swapcache outside of tree_lock (Re: [PATCH 0/3] fix stale swap cache account leak in memcg v7) Daisuke Nishimura
2009-05-12 7:09 ` KAMEZAWA Hiroyuki
2009-05-12 8:00 ` Daisuke Nishimura
2009-05-12 8:13 ` KAMEZAWA Hiroyuki [this message]
2009-05-12 10:58 ` [PATCH][BUGFIX] memcg: fix for deadlock between lock_page_cgroup and mapping tree_lock Daisuke Nishimura
2009-05-12 23:59 ` KAMEZAWA Hiroyuki
2009-05-13 0:28 ` Daisuke Nishimura
2009-05-13 0:32 ` KAMEZAWA Hiroyuki
2009-05-13 3:55 ` KAMEZAWA Hiroyuki
2009-05-13 4:11 ` nishimura
2009-05-12 9:51 ` [PATCH 0/3] fix stale swap cache account leak in memcg v7 Balbir Singh
2009-05-13 0:31 ` KAMEZAWA Hiroyuki
2009-05-14 23:47 ` KAMEZAWA Hiroyuki
2009-05-15 0:38 ` Daisuke Nishimura
2009-05-15 0:54 ` KAMEZAWA Hiroyuki
2009-05-15 1:12 ` Daisuke Nishimura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090512171356.3d3a7554.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox