Re: [PATCH RFC WIP] Process weights based scheduling for better consolidation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>, Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC WIP] Process weights based scheduling for better consolidation
Date: Fri, 5 Jul 2013 12:16:54 +0200	[thread overview]
Message-ID: <20130705101654.GL23916@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130704180227.GA31348@linux.vnet.ibm.com>

On Thu, Jul 04, 2013 at 11:32:27PM +0530, Srikar Dronamraju wrote:
> Here is an approach to look at numa balanced scheduling from a non numa fault
> angle. This approach uses process weights instead of faults as a basis to
> move or bring tasks together.

That doesn't make any sense..... how would weight be related to numa
placement?

What it appears to do it simply group tasks based on ->mm. And by
keeping them somewhat sticky to the same node it gets locality.

What about multi-process shared memory workloads? Its one of the things
I disliked about autonuma. It completely disregards the multi-process
scenario.

If you want to go without faults; you also won't migrate memory along
and if you just happen to place your workload elsewhere you've no idea
where your memory is. If you have the faults, you might as well account
them to get a notion of where the memory is at; its nearly free at that
point anyway.

Load spikes/fluctuations can easily lead to transient task movement to
keep balance. If these movements are indeed transient you want to return
to where you came from; however if they are not.. you want the memory to
come to you.

> +static void account_numa_enqueue(struct cfs_rq *cfs_rq, struct task_struct *p)
> +{
> +	struct rq *rq = rq_of(cfs_rq);
> +	unsigned long task_load = 0;
> +	int curnode = cpu_to_node(cpu_of(rq));
> +#ifdef CONFIG_SCHED_AUTOGROUP
> +	struct sched_entity *se;
> +
> +	se = cfs_rq->tg->se[cpu_of(rq)];
> +	if (!se)
> +		return;
> +
> +	if (cfs_rq->load.weight) {
> +		task_load =  p->se.load.weight * se->load.weight;
> +		task_load /= cfs_rq->load.weight;
> +	} else {
> +		task_load = 0;
> +	}
> +#else
> +	task_load = p->se.load.weight;
> +#endif

This looks broken; didn't you want to use task_h_load() here? There's
nothing autogroup specific about task_load. If anything you want to do
full cgroup which I think reduces to task_h_load() here.

> +	p->task_load = 0;
> +	if (!task_load)
> +		return;
> +
> +	if (p->mm && p->mm->numa_weights) {
> +		p->mm->numa_weights[curnode] += task_load;
> +		p->mm->numa_weights[nr_node_ids] += task_load;
> +	}
> +
> +	if (p->nr_cpus_allowed != num_online_cpus())
> +		rq->pinned_load += task_load;
> +	p->task_load = task_load;
> +}
> +

> @@ -5529,6 +5769,76 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
>  		if (!balance)
>  			break;
>  	}
> +#ifdef CONFIG_NUMA_BALANCING
> +	if (!rq->nr_running) {

This would only work for under utilized systems...

> +	}
> +#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-05 10:17 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-03 14:21 [PATCH 0/13] Basic scheduler support for automatic NUMA balancing V2 Mel Gorman
2013-07-03 14:21 ` [PATCH 01/13] mm: numa: Document automatic NUMA balancing sysctls Mel Gorman
2013-07-03 14:21 ` [PATCH 02/13] sched: Track NUMA hinting faults on per-node basis Mel Gorman
2013-07-03 14:21 ` [PATCH 03/13] sched: Select a preferred node with the most numa hinting faults Mel Gorman
2013-07-03 14:21 ` [PATCH 04/13] sched: Update NUMA hinting faults once per scan Mel Gorman
2013-07-03 14:21 ` [PATCH 05/13] sched: Favour moving tasks towards the preferred node Mel Gorman
2013-07-03 14:21 ` [PATCH 06/13] sched: Reschedule task on preferred NUMA node once selected Mel Gorman
2013-07-04 12:26   ` Srikar Dronamraju
2013-07-04 13:29     ` Mel Gorman
2013-07-03 14:21 ` [PATCH 07/13] sched: Split accounting of NUMA hinting faults that pass two-stage filter Mel Gorman
2013-07-03 21:56   ` Johannes Weiner
2013-07-04  9:23     ` Mel Gorman
2013-07-04 14:24       ` Rik van Riel
2013-07-04 19:36       ` Johannes Weiner
2013-07-05  9:41         ` Mel Gorman
2013-07-05 10:48         ` Peter Zijlstra
2013-07-03 14:21 ` [PATCH 08/13] sched: Increase NUMA PTE scanning when a new preferred node is selected Mel Gorman
2013-07-03 14:21 ` [PATCH 09/13] sched: Favour moving tasks towards nodes that incurred more faults Mel Gorman
2013-07-03 18:27   ` Peter Zijlstra
2013-07-04  9:25     ` Mel Gorman
2013-07-03 14:21 ` [PATCH 10/13] sched: Set the scan rate proportional to the size of the task being scanned Mel Gorman
2013-07-03 14:21 ` [PATCH 11/13] sched: Check current->mm before allocating NUMA faults Mel Gorman
2013-07-03 15:33   ` Mel Gorman
2013-07-04 12:48   ` Srikar Dronamraju
2013-07-05 10:07     ` Mel Gorman
2013-07-03 14:21 ` [PATCH 12/13] mm: numa: Scan pages with elevated page_mapcount Mel Gorman
2013-07-03 18:35   ` Peter Zijlstra
2013-07-04  9:27     ` Mel Gorman
2013-07-03 18:41   ` Peter Zijlstra
2013-07-04  9:32     ` Mel Gorman
2013-07-03 18:42   ` Peter Zijlstra
2013-07-03 14:21 ` [PATCH 13/13] sched: Account for the number of preferred tasks running on a node when selecting a preferred node Mel Gorman
2013-07-03 18:32   ` Peter Zijlstra
2013-07-04  9:37     ` Mel Gorman
2013-07-04 13:07       ` Srikar Dronamraju
2013-07-04 13:54         ` Mel Gorman
2013-07-04 14:06           ` Peter Zijlstra
2013-07-04 14:40             ` Mel Gorman
2013-07-03 16:19 ` [PATCH 0/13] Basic scheduler support for automatic NUMA balancing V2 Mel Gorman
2013-07-03 16:26   ` Mel Gorman
2013-07-04 18:02 ` [PATCH RFC WIP] Process weights based scheduling for better consolidation Srikar Dronamraju
2013-07-05 10:16   ` Peter Zijlstra [this message]
2013-07-05 12:49     ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130705101654.GL23916@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox