Re: [RFC PATCH 00/10] Improve numa scheduling by consolidating tasks

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>, Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC PATCH 00/10] Improve numa scheduling by consolidating tasks
Date: Wed, 31 Jul 2013 17:09:23 +0200	[thread overview]
Message-ID: <20130731150923.GC3008@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130730094650.GB28656@linux.vnet.ibm.com>

On Tue, Jul 30, 2013 at 03:16:50PM +0530, Srikar Dronamraju wrote:
> I am not against fault and fault based handling is very much needed. 
> I have listed that this approach is complementary to numa faults that
> Mel is proposing. 
> 
> Right now I think if we can first get the tasks to consolidate on nodes
> and then use the numa faults to place the tasks, then we would be able
> to have a very good solution. 
> 
> Plain fault information is actually causing confusion in enough number
> of cases esp if the initial set of pages is all consolidated into fewer
> set of nodes. With plain fault information, memory follows cpu, cpu
> follows memory are conflicting with each other. memory wants to move to
> nodes where the tasks are currently running and the tasks are planning
> to move nodes where the current memory is around.

Since task weights are a completely random measure the above story
completely fails to make any sense. If you can collate on an arbitrary
number, why can't you collate on faults?

The fact that the placement policies so far have not had inter-task
relations doesn't mean its not possible.

> Also most of the consolidation that I have proposed is pretty
> conservative or either done at idle balance time. This would not affect
> the numa faulting in any way. When I run with my patches (along with
> some debug code), the consolidation happens pretty pretty quickly.
> Once consolidation has happened, numa faults would be of immense value.

And also completely broken in various 'fun' ways. You're far too fond of
nr_running for one.

Also, afaict it never does anything if the machine is overloaded and we
never hit the !nr_running case in rebalance_domains.

> Here is how I am looking at the solution.
> 
> 1. Till the initial scan delay, allow tasks to consolidate

I would really want to not change regular balance behaviour for now;
you're also adding far too many atomic operations to the scheduler fast
path, that's going to make people terribly unhappy.

> 2. After the first scan delay to the next scan delay, account numa
>    faults, allow memory to move. But dont use numa faults as yet to
>    drive scheduling decisions. Here also task continue to consolidate.
> 
> 	This will lead to tasks and memory moving to specific nodes and
> 	leading to consolidation.

This is just plain silly, once you have fault information you'd better
use it to move tasks towards where the memory is, doing anything else
is, like said, silly.

> 3. After the second scan delay, continue to account numa faults and
> allow numa faults to drive scheduling decisions.
> 
> Should we use also use task weights at stage 3 or just numa faults or
> which one should get more preference is something that I am not clear at
> this time. At this time, I would think we would need to factor in both
> of them.
> 
> I think this approach would mean tasks get consolidated but the inter
> process, inter task relations that you are looking for also remain
> strong.
> 
> Is this a acceptable solution?

No, again, task weight is a completely random number unrelated to
anything we want to do. Furthermore we simply cannot add mm wide atomics
to the scheduler hot paths.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-31 15:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-30  7:48 Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 01/10] sched: Introduce per node numa weights Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 02/10] sched: Use numa weights while migrating tasks Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 03/10] sched: Select a better task to pull across node using iterations Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 04/10] sched: Move active_load_balance_cpu_stop to a new helper function Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 05/10] sched: Extend idle balancing to look for consolidation of tasks Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 06/10] sched: Limit migrations from a node Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 07/10] sched: Pass hint to active balancer about the task to be chosen Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 08/10] sched: Prevent a task from migrating immediately after an active balance Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 09/10] sched: Choose a runqueue that has lesser local affinity tasks Srikar Dronamraju
2013-07-30  7:48 ` [RFC PATCH 10/10] x86, mm: Prevent gcc to re-read the pagetables Srikar Dronamraju
2013-07-30  8:17 ` [RFC PATCH 00/10] Improve numa scheduling by consolidating tasks Peter Zijlstra
2013-07-30  8:20   ` Peter Zijlstra
2013-07-30  9:03     ` Srikar Dronamraju
2013-07-30  9:10       ` Peter Zijlstra
2013-07-30  9:26         ` Peter Zijlstra
2013-07-30  9:46         ` Srikar Dronamraju
2013-07-31 15:09           ` Peter Zijlstra [this message]
2013-07-31 18:06             ` Srikar Dronamraju
2013-07-30  9:15     ` Srikar Dronamraju
2013-07-30  9:33       ` Peter Zijlstra
2013-07-31 17:35         ` Srikar Dronamraju
2013-07-31 13:33 ` Andrew Theurer
2013-07-31 15:43   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130731150923.GC3008@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox