Re: [PATCH 07/13] sched: Split accounting of NUMA hinting faults that pass two-stage filter

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 07/13] sched: Split accounting of NUMA hinting faults that pass two-stage filter
Date: Fri, 5 Jul 2013 12:48:28 +0200	[thread overview]
Message-ID: <20130705104828.GO23916@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130704193638.GP17812@cmpxchg.org>

On Thu, Jul 04, 2013 at 03:36:38PM -0400, Johannes Weiner wrote:

> I was going for the opposite conclusion: that it does not matter
> whether memory is accessed privately or in a shared fashion, because
> there is no obvious connection to its access frequency, not to me at
> least.  

There is a relation to access freq; however due to the low sample rate
(once every 100ms or so) we obviously miss all high freq data there.

> > I acknowledge it's a problem and basically I'm making a big assumption
> > that private-dominated workloads are going to be the common case. Threaded
> > application on UMA with heavy amounts of shared data (within cache lines)
> > already suck in terms of performance so I'm expecting programmers already
> > try and avoid this sort of sharing. Obviously we are at a page granularity
> > here so the assumption will depend entirely on alignments and buffer sizes
> > so it might still fall apart.
> 
> Don't basically all VM-based mulithreaded programs have this usage
> pattern?  The whole runtime (text, heap) is shared between threads.
> If some thread-local memory spills over to another node, should the
> scheduler move this thread off node from a memory standpoint?  I don't
> think so at all.  I would expect it to always gravitate back towards
> this node with the VM on it, only get moved off for CPU load reasons,
> and get moved back as soon as the load situation permits.

All data being allocated on the same heap and being shared in the access
sense doesn't imply all threads will indeed use all data; even if TLS is
not used.

For a concurrent program to reach any useful level of concurrency gain
you need data partitioning. Threads must work on different data sets
otherwise they'd constantly be waiting on serialization -- which makes
your concurrency gain tank.

There's two main issues here:

Firstly; the question is if there's much false sharing on page
granularity. Typically you want the compute time per data fragment to be
significantly higher than the demux + mux overhead which favours larger
data units.

Secondly; you want your scan freq to be at least half the compute time
per data fragment. Otherwise you'll run the risk of not seeing the data
being local to that thread.

So for optimal benefit you want to minimize sharing pages between data
fragments and have your data fragment compute time as long as possible.
Luckily both are also goals for maximizing concurrency gain so we should
be good there.

This should cover all 'traditional' concurrent stuff; most of the 'new'
concurrency stuff can be different though -- some of it simply never
thought/designed for concurrency and just hopes it works. Others most
notably the multi-core concurrency stuff assumes the demux+mux cost are
_very_ low and therefore the data fragment and associated compute time
shrink to useless levels :/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-07-05 10:48 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-03 14:21 [PATCH 0/13] Basic scheduler support for automatic NUMA balancing V2 Mel Gorman
2013-07-03 14:21 ` [PATCH 01/13] mm: numa: Document automatic NUMA balancing sysctls Mel Gorman
2013-07-03 14:21 ` [PATCH 02/13] sched: Track NUMA hinting faults on per-node basis Mel Gorman
2013-07-03 14:21 ` [PATCH 03/13] sched: Select a preferred node with the most numa hinting faults Mel Gorman
2013-07-03 14:21 ` [PATCH 04/13] sched: Update NUMA hinting faults once per scan Mel Gorman
2013-07-03 14:21 ` [PATCH 05/13] sched: Favour moving tasks towards the preferred node Mel Gorman
2013-07-03 14:21 ` [PATCH 06/13] sched: Reschedule task on preferred NUMA node once selected Mel Gorman
2013-07-04 12:26   ` Srikar Dronamraju
2013-07-04 13:29     ` Mel Gorman
2013-07-03 14:21 ` [PATCH 07/13] sched: Split accounting of NUMA hinting faults that pass two-stage filter Mel Gorman
2013-07-03 21:56   ` Johannes Weiner
2013-07-04  9:23     ` Mel Gorman
2013-07-04 14:24       ` Rik van Riel
2013-07-04 19:36       ` Johannes Weiner
2013-07-05  9:41         ` Mel Gorman
2013-07-05 10:48         ` Peter Zijlstra [this message]
2013-07-03 14:21 ` [PATCH 08/13] sched: Increase NUMA PTE scanning when a new preferred node is selected Mel Gorman
2013-07-03 14:21 ` [PATCH 09/13] sched: Favour moving tasks towards nodes that incurred more faults Mel Gorman
2013-07-03 18:27   ` Peter Zijlstra
2013-07-04  9:25     ` Mel Gorman
2013-07-03 14:21 ` [PATCH 10/13] sched: Set the scan rate proportional to the size of the task being scanned Mel Gorman
2013-07-03 14:21 ` [PATCH 11/13] sched: Check current->mm before allocating NUMA faults Mel Gorman
2013-07-03 15:33   ` Mel Gorman
2013-07-04 12:48   ` Srikar Dronamraju
2013-07-05 10:07     ` Mel Gorman
2013-07-03 14:21 ` [PATCH 12/13] mm: numa: Scan pages with elevated page_mapcount Mel Gorman
2013-07-03 18:35   ` Peter Zijlstra
2013-07-04  9:27     ` Mel Gorman
2013-07-03 18:41   ` Peter Zijlstra
2013-07-04  9:32     ` Mel Gorman
2013-07-03 18:42   ` Peter Zijlstra
2013-07-03 14:21 ` [PATCH 13/13] sched: Account for the number of preferred tasks running on a node when selecting a preferred node Mel Gorman
2013-07-03 18:32   ` Peter Zijlstra
2013-07-04  9:37     ` Mel Gorman
2013-07-04 13:07       ` Srikar Dronamraju
2013-07-04 13:54         ` Mel Gorman
2013-07-04 14:06           ` Peter Zijlstra
2013-07-04 14:40             ` Mel Gorman
2013-07-03 16:19 ` [PATCH 0/13] Basic scheduler support for automatic NUMA balancing V2 Mel Gorman
2013-07-03 16:26   ` Mel Gorman
2013-07-04 18:02 ` [PATCH RFC WIP] Process weights based scheduling for better consolidation Srikar Dronamraju
2013-07-05 10:16   ` Peter Zijlstra
2013-07-05 12:49     ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130705104828.GO23916@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox