Re: PCID review? - Andy Lutomirski

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andy Lutomirski <luto@amacapital.net>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andy Lutomirski <luto@kernel.org>,
	Nadav Amit <nadav.amit@gmail.com>, Borislav Petkov <bp@alien8.de>,
	Kees Cook <keescook@chromium.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: PCID review?
Date: Fri, 10 Feb 2017 14:07:19 -0800	[thread overview]
Message-ID: <CALCETrWToSZZsXHyrXg+YRiyvjRtWd7J0Myvn_mjJJdJoCXr+w@mail.gmail.com> (raw)
In-Reply-To: <20170210215708.j54cawm23nepgimd@techsingularity.net>

On Fri, Feb 10, 2017 at 1:57 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> On Fri, Feb 10, 2017 at 08:44:04AM -0800, Andy Lutomirski wrote:
>> On Fri, Feb 10, 2017 at 3:01 AM, Mel Gorman <mgorman@techsingularity.net> wrote:
>> > On Thu, Feb 09, 2017 at 06:46:57PM -0800, Andy Lutomirski wrote:
>> >> > try_to_unmap_flush then flushes the entire TLB as the cost of targetted
>> >> > a specific page to flush was so high (both maintaining the PFNs and the
>> >> > individual flush operations).
>> >>
>> >> I could just maybe make it possible to remotely poke a CPU to record
>> >> which mms need flushing, but the possible races there are a bit
>> >> terrifying.
>> >>
>> >
>> > The overhead is concerning. You may incur a remote cache miss accessing the
>> > data which is costly or you have to send an IPI which is also severe. You
>> > could attempt to do the same as the scheduler and directly modify if the
>> > CPUs share cache and IPI otherwise but you're looking at a lot of overhead
>> > either way.
>>
>> I think all of these approaches suck and I'll give up on this particular avenue.
>>
>
> Ok, probably for the best albeit that is based on an inability to figure
> out how it could be done efficiently and a suspicion that if it could be
> done, the scheduler would be doing it already.
>

FWIW, I am doing a bit of this.  For remote CPUs that aren't currently
running a given mm, I just bump a per-mm generation count so that they
know to flush next time around in switch_mm().  I'll need to add a new
hook to the batched flush code to get this right, and I'll cc you on
that.  Stay tuned.

> It's possible that covering all of this is overkill but it's the avenues
> of concern I'd expect if I was working on ASID support.

Agreed.

>
> [1] I could be completely wrong, I'm basing this on how people have
>     behaved in the past during TLB-flush related discussions. They
>     might have changed their mind.

We'll see.  The main benchmark that I'm relying on (so far) is that
context switches get way faster, just ping ponging back and forth.  I
suspect that the TLB refill cost is only a small part.

>
> [2] This could be covered already in the specifications and other
>     discussions. Again, I didn't actually look into what's truly new with
>     the Intel ASID.

I suspect I could find out how many ASIDs there really are under NDA,
but even that would be challenging and only dubiously useful.  For
now, I'm using a grand total of four ASIDs. :)

>
>> > I recognise that you'll be trying to balance this against processes
>> > that are carefully isolated that do not want interference from unrelated
>> > processes doing a TLB flush but it'll be hard to prove that it's worth it.
>> >
>> > It's almost certain that this will be Linus' primary concern
>> > given his contributions to similar conversations in the past
>> > (e.g. https://lkml.org/lkml/2015/6/25/666). It's also likely to be of
>> > major concern to Ingo (e.g. https://lkml.org/lkml/2015/6/9/276) as he had
>> > valid objections against clever flushing at the time the batching was
>> > introduced. Based on previous experience, I have my own concerns but I
>> > don't count as I'm highlighing them now :P
>>
>> I fully agree with those objections, but back then we didn't have the
>> capability to avoid a flush when switching mms.
>>
>
> True, so watch for questions on what the odds are of switching an MM will
> flush the TLB information anyway due to replacement policies.
>
>> >
>> > The outcome of the TLB batch flushiing discussion was that it was way
>> > cheaper to flush the full TLB and take the refill cost than flushing
>> > individual pages which had the cost of tracking the PFNs and the cost of
>> > each individual page flush operation.
>> >
>> > The current code is basically "build a cpumask and flush the TLB for
>> > multiple entries". We're talking about complex tracking of mm's with
>> > difficult locking, potential remote cache misses, potentially more IPIs or
>> > alternatively doing allocations from reclaim context. It'll be difficult
>> > to prove that doing this in the name of flushing ASID is cheaper and
>> > universally a good idea than just flushing the entire TLB.
>> >
>>
>> Maybe there's a middle ground.  I could keep track of whether more
>> than one mm is targetted in a deferred flush and just flush everything
>> if so.
>
> That would work and side-steps much of the state tracking concerns. It
> might even be a good fit for use cases like "limited number of VMs on a
> machine" or "one major application that must be isolated and some admin
> processes with little CPU time or kthreads" because you don't want to get
> burned with the "only a microbenchmark sees any benefit" hammer[3].
>
>> As a future improvement, I or someone else could add:
>>
>> struct mm_struct *mms[16];
>> int num_mms;
>>
>> to struct tlbflush_unmap_batch.  if num_mms > 16, then this just means
>> that we've given up on tracking them all and we do the global flush,
>> and, if not, we could teach the IPI handler to understand a list of
>> target mms.
>
> I *much* prefer a fallback of a full flush than kmallocing additional
> space. It's also something that feasibly could be switchable at runtime with
> a union of cpumask and an array of mms depending on the CPU capabilities with
> static branches determining which is used to minimise overhead.  That would
> have only minor overhead and with a debugging patch could allow switching
> between them at boot-time for like-like comparisons on a range of workloads.

Sounds good.  This means I need to make my code understand the concept
of a full flush, but that's manageable.

>
> [3] Can you tell I've been burned a few times by the "only
>     microbenchmarks care" feedback?
>

:)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-02-10 22:07 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 18:56 Andy Lutomirski
2017-02-07 19:11 ` Kees Cook
2017-02-07 19:24   ` Thomas Garnier
2017-02-07 19:37 ` Nadav Amit
2017-02-08 16:24   ` Andy Lutomirski
2017-02-07 21:06 ` Paul E. McKenney
2017-02-08 16:25   ` Andy Lutomirski
2017-02-08 16:52     ` Paul E. McKenney
2017-02-08 20:51 ` Andy Lutomirski
2017-02-09  0:10   ` Mel Gorman
2017-02-10  2:46     ` Andy Lutomirski
2017-02-10 11:01       ` Mel Gorman
2017-02-10 16:44         ` Andy Lutomirski
2017-02-10 21:57           ` Mel Gorman
2017-02-10 22:07             ` Andy Lutomirski [this message]
2017-02-10 22:25               ` Borislav Petkov
2017-02-10 22:58                 ` Andy Lutomirski
2017-02-13 10:05               ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrWToSZZsXHyrXg+YRiyvjRtWd7J0Myvn_mjJJdJoCXr+w@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=nadav.amit@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox