From: Hirokazu Takahashi <taka@valinux.co.jp>
To: kamezawa.hiroyu@jp.fujitsu.com, hugh@veritas.com
Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
yamamoto@valinux.co.jp, riel@redhat.com
Subject: Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.
Date: Wed, 20 Feb 2008 15:27:53 +0900 (JST) [thread overview]
Message-ID: <20080220.152753.98212356.taka@valinux.co.jp> (raw)
In-Reply-To: <Pine.LNX.4.64.0802191449490.6254@blonde.site>
Hi,
> On Tue, 19 Feb 2008, KAMEZAWA Hiroyuki wrote:
> > I'd like to start from RFC.
> >
> > In following code
> > ==
> > lock_page_cgroup(page);
> > pc = page_get_page_cgroup(page);
> > unlock_page_cgroup(page);
> >
> > access 'pc' later..
> > == (See, page_cgroup_move_lists())
> >
> > There is a race because 'pc' is not a stable value without lock_page_cgroup().
> > (mem_cgroup_uncharge can free this 'pc').
> >
> > For example, page_cgroup_move_lists() access pc without lock.
> > There is a small race window, between page_cgroup_move_lists()
> > and mem_cgroup_uncharge(). At uncharge, page_cgroup struct is immedieately
> > freed but move_list can access it after taking lru_lock.
> > (*) mem_cgroup_uncharge_page() can be called without zone->lru lock.
> >
> > This is not good manner.
> > .....
> > There is no quick fix (maybe). Moreover, I hear some people around me said
> > current memcontrol.c codes are very complicated.
> > I agree ;( ..it's caued by my work.
> >
> > I'd like to fix problems in clean way.
> > (Note: current -rc2 codes works well under heavy pressure. but there
> > is possibility of race, I think.)
>
> Yes, yes, indeed, I've been working away on this too.
>
> Ever since the VM_BUG_ON(page_get_page_cgroup(page)) went into
> free_hot_cold_page (at my own prompting), I've been hitting it
> just very occasionally in my kernel build testing. Was unable
> to reproduce it over the New Year, but a week or two ago found
> one machine and config on which it is relatively reproducible,
> pretty sure to happen within 12 hours.
>
> And on Saturday evening at last identified the cause, exactly
> where you have: that unsafety in mem_cgroup_move_lists - which
> has the nice property of putting pages from the lru on to SLUB's
> freelist!
>
> Unlike the unsafeties of force_empty, this is liable to hit anyone
> running with MEM_CONT compiled in, they don't have to be consciously
> using mem_cgroups at all.
As for force_empty, though this may not be the main topic here,
mem_cgroup_force_empty_list() can be implemented simpler.
It is possible to make the function just call mem_cgroup_uncharge_page()
instead of releasing page_cgroups by itself. The tips is to call get_page()
before invoking mem_cgroup_uncharge_page() so the page won't be released
during this function.
Kamezawa-san, you may want look into the attached patch.
I think you will be free from the weired complexity here.
This code can be optimized but it will be enough since this function
isn't critical.
Thanks.
Signed-off-by: Hirokazu Takahashi <taka@vallinux.co.jp>
--- mm/memcontrol.c.ORG 2008-02-12 18:44:45.000000000 +0900
+++ mm/memcontrol.c 2008-02-20 14:23:38.000000000 +0900
@@ -837,7 +837,7 @@ mem_cgroup_force_empty_list(struct mem_c
{
struct page_cgroup *pc;
struct page *page;
- int count;
+ int count = FORCE_UNCHARGE_BATCH;
unsigned long flags;
struct list_head *list;
@@ -846,30 +846,21 @@ mem_cgroup_force_empty_list(struct mem_c
else
list = &mz->inactive_list;
- if (list_empty(list))
- return;
-retry:
- count = FORCE_UNCHARGE_BATCH;
spin_lock_irqsave(&mz->lru_lock, flags);
-
- while (--count && !list_empty(list)) {
+ while (!list_empty(list)) {
pc = list_entry(list->prev, struct page_cgroup, lru);
page = pc->page;
- /* Avoid race with charge */
- atomic_set(&pc->ref_cnt, 0);
- if (clear_page_cgroup(page, pc) == pc) {
- css_put(&mem->css);
- res_counter_uncharge(&mem->res, PAGE_SIZE);
- __mem_cgroup_remove_list(pc);
- kfree(pc);
- } else /* being uncharged ? ...do relax */
- break;
+ get_page(page);
+ spin_unlock_irqrestore(&mz->lru_lock, flags);
+ mem_cgroup_uncharge_page(page);
+ put_page(page);
+ if (--count <= 0) {
+ count = FORCE_UNCHARGE_BATCH;
+ cond_resched();
+ }
+ spin_lock_irqsave(&mz->lru_lock, flags);
}
spin_unlock_irqrestore(&mz->lru_lock, flags);
- if (!list_empty(list)) {
- cond_resched();
- goto retry;
- }
return;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-02-20 6:27 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-19 12:54 KAMEZAWA Hiroyuki
2008-02-19 15:40 ` Hugh Dickins
2008-02-20 1:03 ` KAMEZAWA Hiroyuki
2008-02-20 4:14 ` Hugh Dickins
2008-02-20 4:37 ` KAMEZAWA Hiroyuki
2008-02-20 4:39 ` Hugh Dickins
2008-02-20 4:41 ` Balbir Singh
2008-02-20 6:40 ` Balbir Singh
2008-02-20 7:23 ` KAMEZAWA Hiroyuki
2008-02-20 3:14 ` KAMEZAWA Hiroyuki
2008-02-20 3:37 ` YAMAMOTO Takashi
2008-02-20 4:13 ` KAMEZAWA Hiroyuki
2008-02-20 4:32 ` Hugh Dickins
2008-02-20 5:57 ` Balbir Singh
2008-02-20 9:58 ` Hirokazu Takahashi
2008-02-20 10:06 ` Paul Menage
2008-02-20 10:11 ` Balbir Singh
2008-02-20 10:18 ` Paul Menage
2008-02-20 10:55 ` Balbir Singh
2008-02-20 11:21 ` KAMEZAWA Hiroyuki
2008-02-20 11:18 ` Balbir Singh
2008-02-20 11:32 ` KAMEZAWA Hiroyuki
2008-02-20 11:34 ` Balbir Singh
2008-02-20 11:44 ` Paul Menage
2008-02-20 11:41 ` Paul Menage
2008-02-20 11:36 ` Balbir Singh
2008-02-20 11:55 ` Paul Menage
2008-02-21 2:49 ` Hirokazu Takahashi
2008-02-21 6:35 ` KAMEZAWA Hiroyuki
2008-02-21 9:07 ` Hirokazu Takahashi
2008-02-21 9:21 ` KAMEZAWA Hiroyuki
2008-02-21 9:28 ` Balbir Singh
2008-02-21 9:44 ` KAMEZAWA Hiroyuki
2008-02-22 3:31 ` [RFC] Block I/O Cgroup Hirokazu Takahashi
2008-02-22 5:05 ` KAMEZAWA Hiroyuki
2008-02-22 5:45 ` Hirokazu Takahashi
2008-02-21 9:25 ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Balbir Singh
2008-02-20 6:27 ` Hirokazu Takahashi [this message]
2008-02-20 6:50 ` KAMEZAWA Hiroyuki
2008-02-20 8:32 ` Clean up force_empty (Was Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.) KAMEZAWA Hiroyuki
2008-02-20 10:07 ` Clean up force_empty Hirokazu Takahashi
2008-02-22 9:24 ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Hugh Dickins
2008-02-22 10:07 ` KAMEZAWA Hiroyuki
2008-02-22 10:25 ` Hugh Dickins
2008-02-22 10:34 ` KAMEZAWA Hiroyuki
2008-02-22 10:50 ` Hirokazu Takahashi
2008-02-22 11:14 ` Balbir Singh
2008-02-22 12:00 ` Hugh Dickins
2008-02-22 12:28 ` Balbir Singh
2008-02-22 12:53 ` Hugh Dickins
2008-02-25 3:18 ` Balbir Singh
2008-02-19 15:54 ` kamezawa.hiroyu
2008-02-19 16:26 ` Hugh Dickins
2008-02-20 1:55 ` KAMEZAWA Hiroyuki
2008-02-20 2:05 ` YAMAMOTO Takashi
2008-02-20 2:15 ` KAMEZAWA Hiroyuki
2008-02-20 2:32 ` YAMAMOTO Takashi
2008-02-20 4:27 ` Hugh Dickins
2008-02-20 6:38 ` Balbir Singh
2008-02-20 11:00 ` Hugh Dickins
2008-02-20 11:32 ` Balbir Singh
2008-02-20 14:19 ` Hugh Dickins
2008-02-20 5:00 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080220.152753.98212356.taka@valinux.co.jp \
--to=taka@valinux.co.jp \
--cc=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox