From: Andy Lutomirski <luto@amacapital.net>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andy Lutomirski <luto@kernel.org>,
Nadav Amit <nadav.amit@gmail.com>, Borislav Petkov <bp@alien8.de>,
Kees Cook <keescook@chromium.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: PCID review?
Date: Fri, 10 Feb 2017 14:07:19 -0800 [thread overview]
Message-ID: <CALCETrWToSZZsXHyrXg+YRiyvjRtWd7J0Myvn_mjJJdJoCXr+w@mail.gmail.com> (raw)
In-Reply-To: <20170210215708.j54cawm23nepgimd@techsingularity.net>
On Fri, Feb 10, 2017 at 1:57 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> On Fri, Feb 10, 2017 at 08:44:04AM -0800, Andy Lutomirski wrote:
>> On Fri, Feb 10, 2017 at 3:01 AM, Mel Gorman <mgorman@techsingularity.net> wrote:
>> > On Thu, Feb 09, 2017 at 06:46:57PM -0800, Andy Lutomirski wrote:
>> >> > try_to_unmap_flush then flushes the entire TLB as the cost of targetted
>> >> > a specific page to flush was so high (both maintaining the PFNs and the
>> >> > individual flush operations).
>> >>
>> >> I could just maybe make it possible to remotely poke a CPU to record
>> >> which mms need flushing, but the possible races there are a bit
>> >> terrifying.
>> >>
>> >
>> > The overhead is concerning. You may incur a remote cache miss accessing the
>> > data which is costly or you have to send an IPI which is also severe. You
>> > could attempt to do the same as the scheduler and directly modify if the
>> > CPUs share cache and IPI otherwise but you're looking at a lot of overhead
>> > either way.
>>
>> I think all of these approaches suck and I'll give up on this particular avenue.
>>
>
> Ok, probably for the best albeit that is based on an inability to figure
> out how it could be done efficiently and a suspicion that if it could be
> done, the scheduler would be doing it already.
>
FWIW, I am doing a bit of this. For remote CPUs that aren't currently
running a given mm, I just bump a per-mm generation count so that they
know to flush next time around in switch_mm(). I'll need to add a new
hook to the batched flush code to get this right, and I'll cc you on
that. Stay tuned.
> It's possible that covering all of this is overkill but it's the avenues
> of concern I'd expect if I was working on ASID support.
Agreed.
>
> [1] I could be completely wrong, I'm basing this on how people have
> behaved in the past during TLB-flush related discussions. They
> might have changed their mind.
We'll see. The main benchmark that I'm relying on (so far) is that
context switches get way faster, just ping ponging back and forth. I
suspect that the TLB refill cost is only a small part.
>
> [2] This could be covered already in the specifications and other
> discussions. Again, I didn't actually look into what's truly new with
> the Intel ASID.
I suspect I could find out how many ASIDs there really are under NDA,
but even that would be challenging and only dubiously useful. For
now, I'm using a grand total of four ASIDs. :)
>
>> > I recognise that you'll be trying to balance this against processes
>> > that are carefully isolated that do not want interference from unrelated
>> > processes doing a TLB flush but it'll be hard to prove that it's worth it.
>> >
>> > It's almost certain that this will be Linus' primary concern
>> > given his contributions to similar conversations in the past
>> > (e.g. https://lkml.org/lkml/2015/6/25/666). It's also likely to be of
>> > major concern to Ingo (e.g. https://lkml.org/lkml/2015/6/9/276) as he had
>> > valid objections against clever flushing at the time the batching was
>> > introduced. Based on previous experience, I have my own concerns but I
>> > don't count as I'm highlighing them now :P
>>
>> I fully agree with those objections, but back then we didn't have the
>> capability to avoid a flush when switching mms.
>>
>
> True, so watch for questions on what the odds are of switching an MM will
> flush the TLB information anyway due to replacement policies.
>
>> >
>> > The outcome of the TLB batch flushiing discussion was that it was way
>> > cheaper to flush the full TLB and take the refill cost than flushing
>> > individual pages which had the cost of tracking the PFNs and the cost of
>> > each individual page flush operation.
>> >
>> > The current code is basically "build a cpumask and flush the TLB for
>> > multiple entries". We're talking about complex tracking of mm's with
>> > difficult locking, potential remote cache misses, potentially more IPIs or
>> > alternatively doing allocations from reclaim context. It'll be difficult
>> > to prove that doing this in the name of flushing ASID is cheaper and
>> > universally a good idea than just flushing the entire TLB.
>> >
>>
>> Maybe there's a middle ground. I could keep track of whether more
>> than one mm is targetted in a deferred flush and just flush everything
>> if so.
>
> That would work and side-steps much of the state tracking concerns. It
> might even be a good fit for use cases like "limited number of VMs on a
> machine" or "one major application that must be isolated and some admin
> processes with little CPU time or kthreads" because you don't want to get
> burned with the "only a microbenchmark sees any benefit" hammer[3].
>
>> As a future improvement, I or someone else could add:
>>
>> struct mm_struct *mms[16];
>> int num_mms;
>>
>> to struct tlbflush_unmap_batch. if num_mms > 16, then this just means
>> that we've given up on tracking them all and we do the global flush,
>> and, if not, we could teach the IPI handler to understand a list of
>> target mms.
>
> I *much* prefer a fallback of a full flush than kmallocing additional
> space. It's also something that feasibly could be switchable at runtime with
> a union of cpumask and an array of mms depending on the CPU capabilities with
> static branches determining which is used to minimise overhead. That would
> have only minor overhead and with a debugging patch could allow switching
> between them at boot-time for like-like comparisons on a range of workloads.
Sounds good. This means I need to make my code understand the concept
of a full flush, but that's manageable.
>
> [3] Can you tell I've been burned a few times by the "only
> microbenchmarks care" feedback?
>
:)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-02-10 22:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-07 18:56 Andy Lutomirski
2017-02-07 19:11 ` Kees Cook
2017-02-07 19:24 ` Thomas Garnier
2017-02-07 19:37 ` Nadav Amit
2017-02-08 16:24 ` Andy Lutomirski
2017-02-07 21:06 ` Paul E. McKenney
2017-02-08 16:25 ` Andy Lutomirski
2017-02-08 16:52 ` Paul E. McKenney
2017-02-08 20:51 ` Andy Lutomirski
2017-02-09 0:10 ` Mel Gorman
2017-02-10 2:46 ` Andy Lutomirski
2017-02-10 11:01 ` Mel Gorman
2017-02-10 16:44 ` Andy Lutomirski
2017-02-10 21:57 ` Mel Gorman
2017-02-10 22:07 ` Andy Lutomirski [this message]
2017-02-10 22:25 ` Borislav Petkov
2017-02-10 22:58 ` Andy Lutomirski
2017-02-13 10:05 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALCETrWToSZZsXHyrXg+YRiyvjRtWd7J0Myvn_mjJJdJoCXr+w@mail.gmail.com \
--to=luto@amacapital.net \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=keescook@chromium.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=nadav.amit@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox