From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: linux-mm <linux-mm@kvack.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Paul Menage <menage@google.com>, Li Zefan <lizf@cn.fujitsu.com>
Subject: Re: [PATCH 5/8] memcg: migrate charge of anon
Date: Thu, 17 Sep 2009 15:25:40 +0900 [thread overview]
Message-ID: <20090917152540.10b10028.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20090917145657.4ccc8d27.nishimura@mxp.nes.nec.co.jp>
On Thu, 17 Sep 2009 14:56:57 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > > + /*
> > > + * We should put back all pages which remain on "list".
> > > + * This means try_charge above has failed.
> > > + * Pages which have been moved to mc->list would be put back at
> > > + * clear_migrate_charge.
> > > + */
> > > + if (!list_empty(&list))
> > > + list_for_each_entry_safe(page, tmp, &list, lru) {
> > > + list_del(&page->lru);
> > > + putback_lru_page(page);
> > > + put_page(page);
> >
> > I wonder this put_page() is not necessary.
> >
> get_page_unless_zero(page) is called above.
> putback_lru_page() will only decrease the refcount got by isolate_lru_page().
>
ok. I misunderstood.
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static int migrate_charge_prepare(void)
> > > +{
> > > + int ret = 0;
> > > + struct mm_struct *mm;
> > > + struct vm_area_struct *vma;
> > > +
> > > + mm = get_task_mm(mc->tsk);
> > > + if (!mm)
> > > + return 0;
> > > +
> > > + down_read(&mm->mmap_sem);
> > > + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > > + struct mm_walk migrate_charge_walk = {
> > > + .pmd_entry = migrate_charge_prepare_pte_range,
> > > + .mm = mm,
> > > + .private = vma,
> > > + };
> > > + if (signal_pending(current)) {
> > > + ret = -EINTR;
> > > + break;
> > > + }
> > > + if (is_vm_hugetlb_page(vma))
> > > + continue;
> > > + /* We migrate charge of private pages for now */
> > > + if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE))
> > > + continue;
> > > + if (mc->to->migrate_charge) {
> > > + ret = walk_page_range(vma->vm_start, vma->vm_end,
> > > + &migrate_charge_walk);
> > > + if (ret)
> > > + break;
> > > + }
> > > + }
> > > + up_read(&mm->mmap_sem);
> > > +
> > > + mmput(mm);
> >
> > Hmm, Does this means a thread which is moved can continue its work and
> > newly allocated pages will remain in old group ?
> >
> Yes.
> Pages allocated between can_attach() and cgroup_task_migrate() will be
> charged to old group.
> But, IIUC, we should release mmap_sem because attach() of cpuset tries to down_write
> mmap_sem(update_tasks_nodemask -> cpuset_change_nodemask -> mpol_rebind_mm).
>
agreed.
> >
> > > + return ret;
> > > +}
> > > +
> > > +static void mem_cgroup_clear_migrate_charge(void)
> > > +{
> > > + struct page *page, *tmp;
> > > +
> > > + VM_BUG_ON(!mc);
> > > +
> > > + if (!list_empty(&mc->list))
> >
> > I think list_for_each_entry_safe() handles empty case.
> >
>
> #define list_for_each_entry_safe(pos, n, head, member) \
> for (pos = list_entry((head)->next, typeof(*pos), member), \
> n = list_entry(pos->member.next, typeof(*pos), member); \
> &pos->member != (head); \
> pos = n, n = list_entry(n->member.next, typeof(*n), member))
>
> hmm, "pos = list_entry((head)->next, typeof(*pos), member)" points to proper pointer
> in list_empty(i.e. head->next == head) case ?
I think so.
> "mc->list" is struct list_head, not struct page.
>
yes.
pos = list_entry((mc->list)->next, struct page, lru);
here, mc->list->next == &mc->list if empty.
Then, pos is "struct page" but points to &mc->list - some bytes.
Then, pos->member == &mc->list == head.
But this point is a nitpick. plz do as you want.
if (!list_empty(mc->list)) show us mc->list can be empty.
> > > + mem_cgroup_clear_migrate_charge();
> > > }
> > >
> >
> > Okay, them, if other subsystem fails "can_attach()", migrated charges are not
> > moved back to original group. Right ?
> >
> We haven't migrated charges at can_attach() stage. It only does try_charge to
> the new group.
> If can_attach() of other subsystem fails, we only cancel the result of try_charge.
>
> >
> >
> > My biggest concern in this implementation is this "isolate" pages too much.
> > Could you modify the whole routine as...
> >
> > struct migrate_charge {
> > struct task_struct *tsk;
> > struct mem_cgroup *from;
> > struct mem_cgroup *to;
> > struct list_head list;
> > long charged;
> > long committed;
> > };
> > - mem_cgroup_can_migrate_charge() ....
> > count pages and do "charge", no page isolation.
> > and remember the number of charges as mc->charged++;
> > - mem_cgroup_migrate_charge()
> > - scan vmas/page table again. And isolate pages in fine grain.
> > (256 pages per scan..or some small number)
> > - migrate pages if success mc->committed++
> >
> > after move, uncharge (mc->charged - mc->commited)
> >
> hmm, I selected current implementation just to prevent parsing page table twice.
> But if you prefer the parse-again direction, I'll try it.
> It will fix most of the problems that current implementation has.
>
What I afraid of is a corner case, migrating "very big" process, and
INACITVE/ACTIVE lru of a zone goes down to be nearly empty, OOM.
If you look into other callers of isolate_lru_page(), the number of
pages isolated per iteration is limited to some value.
(Almost all callers use array for limiting/batching.)
Thanks,
-Kame
>
> Thanks,
> Daisuke Nishimura.
>
> > Maybe you can find better one. But isolating all pages of process at once is a
> > big hammber for vm, OOM.
> >
> > Thanks,
> > -Kame
> >
> >
> >
> > > static void mem_cgroup_move_task(struct cgroup_subsys *ss,
> > >
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-09-17 6:27 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-17 2:23 [RFC][EXPERIMENTAL][PATCH 0/8] memcg: migrate charge at task move Daisuke Nishimura
2009-09-17 2:24 ` [PATCH 1/8] memcg: introduce mem_cgroup_cancel_charge() Daisuke Nishimura
2009-09-17 4:12 ` KAMEZAWA Hiroyuki
2009-09-17 2:24 ` [PATCH 2/8] memcg: cleanup mem_cgroup_move_parent() Daisuke Nishimura
2009-09-17 4:15 ` KAMEZAWA Hiroyuki
2009-09-17 2:25 ` [PATCH 3/8] cgroup: introduce cancel_attach() Daisuke Nishimura
2009-09-17 2:26 ` [PATCH 4/8] memcg: add interface to migrate charge Daisuke Nishimura
2009-09-17 4:20 ` KAMEZAWA Hiroyuki
2009-09-17 4:40 ` Daisuke Nishimura
2009-09-17 2:26 ` [PATCH 5/8] memcg: migrate charge of anon Daisuke Nishimura
2009-09-17 4:57 ` KAMEZAWA Hiroyuki
2009-09-17 5:56 ` Daisuke Nishimura
2009-09-17 6:25 ` KAMEZAWA Hiroyuki [this message]
2009-09-17 23:52 ` KOSAKI Motohiro
2009-09-17 2:27 ` [PATCH 6/8] memcg: migrate charge of shmem Daisuke Nishimura
2009-09-17 5:02 ` KAMEZAWA Hiroyuki
2009-09-17 2:28 ` [PATCH 7/8] memcg: migrate charge of swap Daisuke Nishimura
2009-09-17 5:25 ` KAMEZAWA Hiroyuki
2009-09-17 6:17 ` Daisuke Nishimura
2009-09-17 6:28 ` KAMEZAWA Hiroyuki
2009-09-17 2:29 ` [PATCH 8/8] memcg: avoid oom during charge migration Daisuke Nishimura
2009-09-17 7:01 ` [RFC][EXPERIMENTAL][PATCH 0/8] memcg: migrate charge at task move Daisuke Nishimura
2009-09-24 5:42 ` [RFC][PATCH 0/8] memcg: migrate charge at task move (24/Sep) Daisuke Nishimura
2009-09-24 5:43 ` [RFC][PATCH 1/8] cgroup: introduce cancel_attach() Daisuke Nishimura
2009-09-24 6:33 ` KAMEZAWA Hiroyuki
2009-09-24 23:39 ` Daisuke Nishimura
2009-09-24 5:44 ` [RFC][PATCH 2/8] memcg: introduce mem_cgroup_cancel_charge() Daisuke Nishimura
2009-09-24 5:46 ` [RFC][PATCH 3/8] memcg: cleanup mem_cgroup_move_parent() Daisuke Nishimura
2009-09-24 6:37 ` KAMEZAWA Hiroyuki
2009-09-24 6:54 ` Daisuke Nishimura
2009-09-24 5:47 ` [RFC][PATCH 4/8] memcg: add interface to migrate charge Daisuke Nishimura
2009-09-24 6:54 ` KAMEZAWA Hiroyuki
2009-09-24 23:39 ` Daisuke Nishimura
2009-09-24 5:48 ` [RFC][PATCH 5/8] memcg: migrate charge of mapped page Daisuke Nishimura
2009-09-24 7:22 ` KAMEZAWA Hiroyuki
2009-09-24 8:00 ` Daisuke Nishimura
2009-09-25 0:28 ` Daisuke Nishimura
2009-09-25 0:49 ` KAMEZAWA Hiroyuki
2009-09-24 5:49 ` [RFC][PATCH 6/8] memcg: avoid oom during charge migration Daisuke Nishimura
2009-09-24 7:34 ` KAMEZAWA Hiroyuki
2009-09-25 1:44 ` Daisuke Nishimura
2009-09-25 1:55 ` KAMEZAWA Hiroyuki
2009-09-25 4:51 ` Daisuke Nishimura
2009-09-25 5:36 ` Daisuke Nishimura
2009-09-25 5:52 ` KAMEZAWA Hiroyuki
2009-09-24 5:49 ` [RFC][PATCH 7/8] memcg: migrate charge of swap Daisuke Nishimura
2009-09-24 5:50 ` [RFC][PATCH 8/8] memcg: migrate charge of shmem swap Daisuke Nishimura
2009-09-24 7:41 ` KAMEZAWA Hiroyuki
2009-09-25 0:28 ` Daisuke Nishimura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090917152540.10b10028.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=nishimura@mxp.nes.nec.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox