From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
minchan.kim@gmail.com, mingo@elte.hu
Subject: Re: [RFC mm][PATCH 2/5] percpu cached mm counter
Date: Fri, 11 Dec 2009 09:30:32 +0900 [thread overview]
Message-ID: <20091211093032.db7fdd91.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.0912101136110.5481@router.home>
thank you for review.
On Thu, 10 Dec 2009 11:51:24 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:
> On Thu, 10 Dec 2009, KAMEZAWA Hiroyuki wrote:
>
> > Now, mm's counter information is updated by atomic_long_xxx() functions if
> > USE_SPLIT_PTLOCKS is defined. This causes cache-miss when page faults happens
> > simultaneously in prural cpus. (Almost all process-shared objects is...)
>
> s/prural cpus/multiple cpus simultaneously/?
>
Ah, I see..I often does this misuse, sorry.
> > This patch implements per-cpu mm cache. This per-cpu cache is loosely
> > synchronized with mm's counter. Current design is..
>
> Some more explanation about the role of the per cpu data would be useful.
>
I see.
> For each cpu we keep a set of counters that can be incremented using per
> cpu operations. curr_mc points to the mm struct that is currently using
> the per cpu counters on a specific cpu?
>
yes. Precisely. per-cpu curr_mmc.mm points to mm_struct of current thread
if a page fault occurs since last schedule().
> > - prepare per-cpu object curr_mmc. curr_mmc containes pointer to mm and
> > array of counters.
> > - At page fault,
> > * if curr_mmc.mm != NULL, update curr_mmc.mm counter.
> > * if curr_mmc.mm == NULL, fill curr_mmc.mm = current->mm and account 1.
> > - At schedule()
> > * if curr_mm.mm != NULL, synchronize and invalidate cached information.
> > * if curr_mmc.mm == NULL, nothing to do.
>
> Sounds like a very good idea that could be expanded and used for other
> things like tracking the amount of memory used on a specific NUMA node in
> the future. Through that we may get to a schedule that can schedule with
> an awareness where the memory of a process is actually located.
>
Hmm. Expanding as per-node stat ?
> > By this.
> > - no atomic ops, which tends to cache-miss, under page table lock.
> > - mm->counters are synchronized when schedule() is called.
> > - No bad thing to read-side.
> >
> > Concern:
> > - added cost to schedule().
>
> That is only a simple check right?
yes.
> Are we already touching that cacheline in schedule?
0000000000010040 l O .data.percpu 0000000000000050 vmstat_work
00000000000100a0 g O .data.percpu 0000000000000030 curr_mmc
00000000000100e0 l O .data.percpu 0000000000000030 vmap_block_queue
Hmm...not touched unless a page fault occurs.
> Or place that structure near other stuff touched by the scheduer?
>
I'll think about that.
> >
> > +#if USE_SPLIT_PTLOCKS
> > +
> > +DEFINE_PER_CPU(struct pcp_mm_cache, curr_mmc);
> > +
> > +void __sync_mm_counters(struct mm_struct *mm)
> > +{
> > + struct pcp_mm_cache *mmc = &per_cpu(curr_mmc, smp_processor_id());
> > + int i;
> > +
> > + for (i = 0; i < NR_MM_COUNTERS; i++) {
> > + if (mmc->counters[i] != 0) {
>
> Omit != 0?
>
> if you change mmc->curr_mc then there is no need to set mmc->counters[0]
> to zero right? add_mm_counter_fast will set the counter to 1 next?
>
yes. I can omit that.
> > +static void add_mm_counter_fast(struct mm_struct *mm, int member, int val)
> > +{
> > + struct mm_struct *cached = percpu_read(curr_mmc.mm);
> > +
> > + if (likely(cached == mm)) { /* fast path */
> > + percpu_add(curr_mmc.counters[member], val);
> > + } else if (mm == current->mm) { /* 1st page fault in this period */
> > + percpu_write(curr_mmc.mm, mm);
> > + percpu_write(curr_mmc.counters[member], val);
> > + } else /* page fault via side-path context (get_user_pages()) */
> > + add_mm_counter(mm, member, val);
>
> So get_user pages will not be accellerated.
>
Yes. but I guess it's not fast path. I'll mention about that in patch description.
> > Index: mmotm-2.6.32-Dec8/kernel/sched.c
> > ===================================================================
> > --- mmotm-2.6.32-Dec8.orig/kernel/sched.c
> > +++ mmotm-2.6.32-Dec8/kernel/sched.c
> > @@ -2858,6 +2858,7 @@ context_switch(struct rq *rq, struct tas
> > trace_sched_switch(rq, prev, next);
> > mm = next->mm;
> > oldmm = prev->active_mm;
> > +
> > /*
> > * For paravirt, this is coupled with an exit in switch_to to
> > * combine the page table reload and the switch backend into
>
> Extraneous new line.
>
will fix.
> > @@ -5477,6 +5478,11 @@ need_resched_nonpreemptible:
> >
> > if (sched_feat(HRTICK))
> > hrtick_clear(rq);
> > + /*
> > + * sync/invaldidate per-cpu cached mm related information
> > + * before taling rq->lock. (see include/linux/mm.h)
> > + */
> > + sync_mm_counters_atomic();
> >
> > spin_lock_irq(&rq->lock);
> > update_rq_clock(rq);
>
> Could the per cpu counter stuff be placed into rq to avoid
> touching another cacheline?
>
I will try and check how it can be done without annoyting people.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-12-11 0:33 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-10 7:31 [RFC mm][PATCH 0/5] per mm counter updates KAMEZAWA Hiroyuki
2009-12-10 7:33 ` [RFC mm][PATCH 1/5] mm counter cleanup KAMEZAWA Hiroyuki
2009-12-10 17:30 ` Christoph Lameter
2009-12-10 23:42 ` KAMEZAWA Hiroyuki
2009-12-11 0:07 ` Minchan Kim
2009-12-10 7:34 ` [RFC mm][PATCH 2/5] percpu cached mm counter KAMEZAWA Hiroyuki
2009-12-10 7:54 ` Ingo Molnar
2009-12-10 8:20 ` KAMEZAWA Hiroyuki
2009-12-10 8:33 ` Ingo Molnar
2009-12-10 8:42 ` KAMEZAWA Hiroyuki
2009-12-10 17:35 ` Christoph Lameter
2009-12-10 17:38 ` Ingo Molnar
2009-12-10 18:04 ` Christoph Lameter
2009-12-10 18:54 ` Ingo Molnar
2009-12-11 0:11 ` KAMEZAWA Hiroyuki
2009-12-10 17:34 ` Christoph Lameter
2009-12-10 17:51 ` Christoph Lameter
2009-12-11 0:30 ` KAMEZAWA Hiroyuki [this message]
2009-12-11 0:40 ` Minchan Kim
2009-12-11 0:51 ` KAMEZAWA Hiroyuki
2009-12-11 1:25 ` Minchan Kim
2009-12-11 1:26 ` KAMEZAWA Hiroyuki
2009-12-10 7:59 ` [RFC mm][PATCH 3/5] counting swap ents per mm KAMEZAWA Hiroyuki
2009-12-10 17:55 ` Christoph Lameter
2009-12-11 0:33 ` KAMEZAWA Hiroyuki
2009-12-11 1:07 ` Minchan Kim
2009-12-10 8:00 ` [RFC mm][PATCH 4/5] add a lowmem check function KAMEZAWA Hiroyuki
2009-12-10 17:59 ` Christoph Lameter
2009-12-11 0:39 ` KAMEZAWA Hiroyuki
2009-12-11 13:35 ` Christoph Lameter
2009-12-11 1:09 ` Minchan Kim
2009-12-10 8:01 ` [RFC mm][PATCH 5/5] counting lowmem rss per mm KAMEZAWA Hiroyuki
2009-12-11 1:12 ` Minchan Kim
2009-12-10 8:03 ` [RFC mm][PATCH 0/5] per mm counter updates KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091211093032.db7fdd91.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox