Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andi Kleen <andi@firstfloor.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>, H Peter Anvin <hpa@zytor.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5
Date: Wed, 10 Jun 2015 09:17:15 -0700	[thread overview]
Message-ID: <CA+55aFwVUkdaf0_rBk7uJHQjWXu+OcLTHc6FKuCn0Cb2Kvg9NA@mail.gmail.com> (raw)
In-Reply-To: <20150610131354.GO19417@two.firstfloor.org>

On Wed, Jun 10, 2015 at 6:13 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Assuming the page tables are cache-hot... And hot here does not mean
> L3 cache, but higher. But a memory intensive workload can easily
> violate that.

In practice, no.

You'll spend all your time on the actual real data cache misses, the
TLB misses won't be any more noticeable.

And if your access patters are even *remoptely* cache-friendly (ie
_not_ spending all your time just waiting for regular data cache
misses), then a radix-tree-like page table like Intel will have much
better locality in the page tables than in the actual data. So again,
the TLB misses won't be your big problem.

There may be pathological cases where you just look at one word per
page, but let's face it, we don't optimize for pathological or
unrealistic cases.

And the thing is, you need to look at the costs. Single-page
invalidation taking hundreds of cycles? Yeah, we definitely need to
take the downside of trying to be clever into account.

If the invalidation was really cheap, the rules might change. As it
is, I really don't think there is any question about this.

That's particularly true when the single-page invalidation approach
has lots of *software* overhead too - not just the complexity, but
even "obvious" costs feeding the list of pages to be invalidated
across CPU's. Think about it - there are cache misses there too, and
because we do those across CPU's those cache misses are *mandatory*.

So trying to avoid a few TLB misses by forcing mandatory cache misses
and extra complexity, and by doing lots of 200+ cycle operations?
Really? In what universe does that sound like a good idea?

Quite frankly, I can pretty much *guarantee* that you didn't actually
think about any real numbers, you've just been taught that fairy-tale
of "TLB misses are expensive". As if TLB entries were somehow sacred.

If somebody can show real numbers on a real workload, that's one
thing. But in the absence of those real numbers, we should go for the
simple and straightforward code, and not try to make up "but but but
it could be expensive" issues.

Trying to avoid IPI's by batching them up sounds like a very obvious
win. I absolutely believe that is worth it. I just don't believe it is
worth it trying to pass a lot of page detail data around. The number
of individual pages flushed should be *very* controlled.

Here's a suggestion: the absolute maximum number of TLB entries we
flush should be something we can describe in one single cacheline.
Strive to minimize the number of cachelines we have to transfer for
the IPI, instead of trying to minimize the number of TLB entries. So
maybe soemthing along the lines of "one word of flags (all or nothing,
or a range), or up to seven indivual addresses to be flushed". I
personally think even seven invlpg's is likely too much, but with the
"single cacheline" at least there's another basis for the limit too.

So anyway, I like the patch series. I just think that the final patch
- the one that actually saves the addreses, and limits things to
BATCH_TLBFLUSH_SIZE, should be limited.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-06-10 16:17 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-08 12:50 Mel Gorman
2015-06-08 12:50 ` [PATCH 1/3] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-06-08 12:50 ` [PATCH 2/3] mm: Send one IPI per CPU to TLB flush multiple pages that were recently unmapped Mel Gorman
2015-06-08 22:38   ` Andrew Morton
2015-06-09 11:07     ` Mel Gorman
2015-06-08 12:50 ` [PATCH 3/3] mm: Defer flush of writable TLB entries Mel Gorman
2015-06-08 17:45 ` [PATCH 0/3] TLB flush multiple pages per IPI v5 Ingo Molnar
2015-06-08 18:21   ` Dave Hansen
2015-06-08 19:52     ` Ingo Molnar
2015-06-08 20:03       ` Ingo Molnar
2015-06-08 21:07       ` Dave Hansen
2015-06-08 21:50         ` Ingo Molnar
2015-06-09  8:47   ` Mel Gorman
2015-06-09 10:32     ` Ingo Molnar
2015-06-09 11:20       ` Mel Gorman
2015-06-09 12:43         ` Ingo Molnar
2015-06-09 13:05           ` Mel Gorman
2015-06-10  8:51             ` Ingo Molnar
2015-06-10  9:08               ` Ingo Molnar
2015-06-10 10:15                 ` Mel Gorman
2015-06-11 15:26                   ` Ingo Molnar
2015-06-10  9:19               ` Mel Gorman
2015-06-09 15:34           ` Dave Hansen
2015-06-09 16:49             ` Dave Hansen
2015-06-09 21:14               ` Dave Hansen
2015-06-09 21:54                 ` Linus Torvalds
2015-06-09 22:32                   ` Mel Gorman
2015-06-09 22:35                     ` Mel Gorman
2015-06-10 13:13                   ` Andi Kleen
2015-06-10 16:17                     ` Linus Torvalds [this message]
2015-06-10 16:42                       ` Linus Torvalds
2015-06-10 17:24                         ` Mel Gorman
2015-06-10 17:31                           ` Linus Torvalds
2015-06-10 18:08                         ` Josh Boyer
2015-06-10 17:07                       ` Mel Gorman
2015-06-21 20:22             ` Kirill A. Shutemov
2015-06-25 11:48               ` Ingo Molnar
2015-06-25 18:36                 ` Linus Torvalds
2015-06-25 19:15                   ` Vlastimil Babka
2015-06-25 22:04                     ` Linus Torvalds
2015-06-25 18:46                 ` Dave Hansen
2015-06-26  9:08                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFwVUkdaf0_rBk7uJHQjWXu+OcLTHc6FKuCn0Cb2Kvg9NA@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox