linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	John Stultz <john.stultz@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH v4 2/2] ksm: provide support to use deferrable timers for scanner thread
Date: Mon, 8 Sep 2014 11:39:49 +0200	[thread overview]
Message-ID: <20140908093949.GZ6758@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <alpine.LSU.2.11.1409080023100.1610@eggly.anvils>

[-- Attachment #1: Type: text/plain, Size: 4221 bytes --]

On Mon, Sep 08, 2014 at 01:25:36AM -0700, Hugh Dickins wrote:
> Well, yes, but... how do we know when there is no more work to do?

Yeah, I figured that out _after_ I send that email..

> Thomas has given reason why KSM might simply fail to do its job if we
> rely on the deferrable timer.  So I've tried another approach, patch
> below; but I do not expect you to jump for joy at the sight of it!

Indeed :/

> I've tried to minimize the offensive KSM hook in context_switch().
> Why place it there, rather than do something near profile_tick() or
> account_process_tick()?  Because KSM is aware of mms not tasks, and
> context_switch() should have the next mm cachelines hot (if not, a
> slight regrouping in mm_struct should do it); whereas I can find
> no reference whatever to mm_struct in kernel/time, so hooking to
> KSM from there would drag in another few cachelines every tick.
> 
> (Another approach would be to set up KSM hint faulting, along the
> lines of NUMA hint faulting.  Not a path I'm keen to go down.)
> 
> I'm not thrilled with this patch, I think it's somewhat defective
> in several ways.  But maybe in practice it will prove good enough,
> and if so then I'd rather not waste effort on complicating it.
> 
> My own testing is not realistic, nor representative of real KSM users;
> and I have no idea what values of pages_to_scan and sleep_millisecs
> people really use (and those may make quite a difference to how
> well it works).
> 
> Chintan, even if the scheduler guys turn out to hate it, please would
> you give the patch below a try, to see how well it works in your
> environment, whether it seems to go better or worse than your own patch.
> 
> If it works well enough for you, maybe we can come up with ideas to
> make it more palatable.  I do think your issue is an important one
> to fix, one way or another.
> 
> Thanks,
> Hugh
> 
> [PATCH] ksm: avoid periodic wakeup while mergeable mms are quiet
> 
> Description yet to be written!
> 
> Reported-by: Chintan Pandya <cpandya@codeaurora.org>
> Not-Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> 
>  include/linux/ksm.h   |   14 +++++++++++
>  include/linux/sched.h |    1 
>  kernel/sched/core.c   |    9 ++++++-
>  mm/ksm.c              |   50 ++++++++++++++++++++++++++++------------
>  4 files changed, 58 insertions(+), 16 deletions(-)
> 
> --- 3.17-rc4/include/linux/ksm.h	2014-03-30 20:40:15.000000000 -0700
> +++ linux/include/linux/ksm.h	2014-09-07 11:54:41.528003316 -0700

> @@ -87,6 +96,11 @@ static inline void ksm_exit(struct mm_st
>  {
>  }
>  
> +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm)

s/ksm_switch/__&/

> +{
> +	return NULL;
> +}
> +
>  static inline int PageKsm(struct page *page)
>  {
>  	return 0;

> --- 3.17-rc4/kernel/sched/core.c	2014-08-16 16:00:54.062189063 -0700
> +++ linux/kernel/sched/core.c	2014-09-07 11:54:41.528003316 -0700

> @@ -2304,6 +2305,7 @@ context_switch(struct rq *rq, struct tas
>  	       struct task_struct *next)
>  {
>  	struct mm_struct *mm, *oldmm;
> +	wait_queue_head_t *wake_ksm = NULL;
>  
>  	prepare_task_switch(rq, prev, next);
>  
> @@ -2320,8 +2322,10 @@ context_switch(struct rq *rq, struct tas
>  		next->active_mm = oldmm;
>  		atomic_inc(&oldmm->mm_count);
>  		enter_lazy_tlb(oldmm, next);
> -	} else
> +	} else {
>  		switch_mm(oldmm, mm, next);
> +		wake_ksm = ksm_switch(mm);

Is this the right mm? We've just switched the stack, so we're looking at
next->mm when we switched away from current. That might not exist
anymore.

> +	}
>  
>  	if (!prev->mm) {
>  		prev->active_mm = NULL;
> @@ -2348,6 +2352,9 @@ context_switch(struct rq *rq, struct tas
>  	 * frame will be invalid.
>  	 */
>  	finish_task_switch(this_rq(), prev);
> +
> +	if (wake_ksm)
> +		wake_up_interruptible(wake_ksm);
>  }

Quite horrible for sure. I really hate seeing KSM cruft all the way down
here. Can't we create a new (timer) infrastructure that does the right
thing? Surely this isn't the only such case.

I know both RCU and some NOHZ_FULL muck already track when the system is
completely idle. This is yet another case of that.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-09-08  9:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-20 12:10 [PATCH v4 1/2] timer: provide an api for deferrable timeout Chintan Pandya
2014-08-20 12:10 ` [PATCH v4 2/2] ksm: provide support to use deferrable timers for scanner thread Chintan Pandya
2014-08-28  6:02   ` Hugh Dickins
2014-09-03  9:58     ` Peter Zijlstra
2014-09-03 10:32       ` Thomas Gleixner
2014-09-08  8:25       ` Hugh Dickins
2014-09-08  9:39         ` Peter Zijlstra [this message]
2014-09-09 14:52           ` Chintan Pandya
2014-09-09 20:37             ` Hugh Dickins
2014-09-09 20:14           ` Hugh Dickins
2014-09-10  8:10             ` Peter Zijlstra
2014-09-11 12:27               ` Hugh Dickins
2014-09-10  8:27             ` Peter Zijlstra
2014-09-11 12:59               ` Hugh Dickins
2014-09-11  8:01             ` Chintan Pandya
2014-09-11 13:25               ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140908093949.GZ6758@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cpandya@codeaurora.org \
    --cc=fweisbec@gmail.com \
    --cc=hughd@google.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox