From: David Nellans <david@nellans.org>
To: Dave Hansen <dave@sr71.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"x86@kernel.org" <x86@kernel.org>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"riel@redhat.com" <riel@redhat.com>,
"mgorman@suse.de" <mgorman@suse.de>
Subject: Re: [PATCH 7/7] x86: mm: set TLB flush tunable to sane value (33)
Date: Wed, 02 Jul 2014 13:16:58 -0500 [thread overview]
Message-ID: <53B44C9A.9070808@nellans.org> (raw)
In-Reply-To: <20140701164856.3020D644@viggo.jf.intel.com>
On 07/01/2014 11:48 AM, Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> This has been run through Intel's LKP tests across a wide range
> of modern sytems and workloads and it wasn't shown to make a
> measurable performance difference positive or negative.
>
> Now that we have some shiny new tracepoints, we can actually
> figure out what the heck is going on.
>
> During a kernel compile, 60% of the flush_tlb_mm_range() calls
> are for a single page. It breaks down like this:
>
> size percent percent<=
> V V V
> GLOBAL: 2.20% 2.20% avg cycles: 2283
> 1: 56.92% 59.12% avg cycles: 1276
> 2: 13.78% 72.90% avg cycles: 1505
> 3: 8.26% 81.16% avg cycles: 1880
> 4: 7.41% 88.58% avg cycles: 2447
> 5: 1.73% 90.31% avg cycles: 2358
> 6: 1.32% 91.63% avg cycles: 2563
> 7: 1.14% 92.77% avg cycles: 2862
> 8: 0.62% 93.39% avg cycles: 3542
> 9: 0.08% 93.47% avg cycles: 3289
> 10: 0.43% 93.90% avg cycles: 3570
> 11: 0.20% 94.10% avg cycles: 3767
> 12: 0.08% 94.18% avg cycles: 3996
> 13: 0.03% 94.20% avg cycles: 4077
> 14: 0.02% 94.23% avg cycles: 4836
> 15: 0.04% 94.26% avg cycles: 5699
> 16: 0.06% 94.32% avg cycles: 5041
> 17: 0.57% 94.89% avg cycles: 5473
> 18: 0.02% 94.91% avg cycles: 5396
> 19: 0.03% 94.95% avg cycles: 5296
> 20: 0.02% 94.96% avg cycles: 6749
> 21: 0.18% 95.14% avg cycles: 6225
> 22: 0.01% 95.15% avg cycles: 6393
> 23: 0.01% 95.16% avg cycles: 6861
> 24: 0.12% 95.28% avg cycles: 6912
> 25: 0.05% 95.32% avg cycles: 7190
> 26: 0.01% 95.33% avg cycles: 7793
> 27: 0.01% 95.34% avg cycles: 7833
> 28: 0.01% 95.35% avg cycles: 8253
> 29: 0.08% 95.42% avg cycles: 8024
> 30: 0.03% 95.45% avg cycles: 9670
> 31: 0.01% 95.46% avg cycles: 8949
> 32: 0.01% 95.46% avg cycles: 9350
> 33: 3.11% 98.57% avg cycles: 8534
> 34: 0.02% 98.60% avg cycles: 10977
> 35: 0.02% 98.62% avg cycles: 11400
>
> We get in to dimishing returns pretty quickly. On pre-IvyBridge
> CPUs, we used to set the limit at 8 pages, and it was set at 128
> on IvyBrige. That 128 number looks pretty silly considering that
> less than 0.5% of the flushes are that large.
>
> The previous code tried to size this number based on the size of
> the TLB. Good idea, but it's error-prone, needs maintenance
> (which it didn't get up to now), and probably would not matter in
> practice much.
>
> Settting it to 33 means that we cover the mallopt
> M_TRIM_THRESHOLD, which is the most universally common size to do
> flushes.
>
> That's the short version. Here's the long one for why I chose 33:
>
> 1. These numbers have a constant bias in the timestamps from the
> tracing. Probably counts for a couple hundred cycles in each of
> these tests, but it should be fairly _even_ across all of them.
> The smallest delta between the tracepoints I have ever seen is
> 335 cycles. This is one reason the cycles/page cost goes down in
> general as the flushes get larger. The true cost is nearer to
> 100 cycles.
> 2. A full flush is more expensive than a single invlpg, but not
> by much (single percentages).
> 3. A dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
> (~34 cycles). At those rates, refilling the 512-entry dTLB takes
> 22,000 cycles.
> 4. 22,000 cycles is approximately the equivalent of doing 85
> invlpg operations. But, the odds are that the TLB can
> actually be filled up faster than that because TLB misses that
> are close in time also tend to leverage the same caches.
> 6. ~98% of flushes are <=33 pages. There are a lot of flushes of
> 33 pages, probably because libc's M_TRIM_THRESHOLD is set to
> 128k (32 pages)
> 7. I've found no consistent data to support changing the IvyBridge
> vs. SandyBridge tunable by a factor of 16
>
> I used the performance counters on this hardware (IvyBridge i5-3320M)
> to figure out the tlb miss costs:
>
> ocperf.py stat -e dtlb_load_misses.walk_duration,dtlb_load_misses.walk_completed,dtlb_store_misses.walk_duration,dtlb_store_misses.walk_completed,itlb_misses.walk_duration,itlb_misses.walk_completed,itlb.itlb_flush
>
> 7,720,030,970 dtlb_load_misses_walk_duration [57.13%]
> 169,856,353 dtlb_load_misses_walk_completed [57.15%]
> 708,832,859 dtlb_store_misses_walk_duration [57.17%]
> 19,346,823 dtlb_store_misses_walk_completed [57.17%]
> 2,779,687,402 itlb_misses_walk_duration [57.15%]
> 82,241,148 itlb_misses_walk_completed [57.13%]
> 770,717 itlb_itlb_flush [57.11%]
>
> Show that a dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
> (~34 cycles). At those rates, refilling the 512-entry dTLB takes
> 22,000 cycles. On a SandyBridge system with more cores and larger
> caches, those are dtlb=13.4ns and itlb=9.5ns.
Intuition here is that invalidate caused refills will almost always be serviced from the L2
or better since we've recently walked to modify the page needing flush and thus pre-warmed the caches
for any refill? Or is this an artifact of the flush/refill test setup? Main mem latency even on Ivybridge is ~100
clocks, worse in previous generations, so to get down to average ~30 cycle refill you basically can never be
missing in the L1 or maybe L2 which seems optimistic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-07-02 18:16 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-01 16:48 [PATCH 0/7] [RESEND][v4] x86: rework tlb range flushing code Dave Hansen
2014-07-01 16:48 ` [PATCH 1/7] x86: mm: clean up tlb " Dave Hansen
2014-07-01 16:48 ` [PATCH 2/7] x86: mm: rip out complicated, out-of-date, buggy TLB flushing Dave Hansen
2014-07-01 16:48 ` [PATCH 3/7] x86: mm: fix missed global TLB flush stat Dave Hansen
2014-07-01 16:48 ` [PATCH 4/7] x86: mm: unify remote invlpg code Dave Hansen
2014-07-01 16:48 ` [PATCH 5/7] x86: mm: add tracepoints for TLB flushes Dave Hansen
2014-07-01 16:48 ` [PATCH 6/7] x86: mm: new tunable for single vs full TLB flush Dave Hansen
2014-07-01 16:48 ` [PATCH 7/7] x86: mm: set TLB flush tunable to sane value (33) Dave Hansen
2014-07-02 18:16 ` David Nellans [this message]
2014-07-02 18:24 ` Dave Hansen
2014-07-02 19:04 ` [PATCH 0/7] [RESEND][v4] x86: rework tlb range flushing code Davidlohr Bueso
2014-07-31 15:40 Dave Hansen
2014-07-31 15:41 ` [PATCH 7/7] x86: mm: set TLB flush tunable to sane value (33) Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53B44C9A.9070808@nellans.org \
--to=david@nellans.org \
--cc=dave.hansen@linux.intel.com \
--cc=dave@sr71.net \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox