Re: tlb_gather_mmu() and semantics of "fullmm"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hugh@veritas.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>,
	"David S. Miller" <davem@davemloft.net>,
	Zach Amsden <zach@vmware.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: tlb_gather_mmu() and semantics of "fullmm"
Date: Thu, 26 Mar 2009 14:08:17 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0903261232060.27412@blonde.anvils> (raw)
In-Reply-To: <1238043674.25062.823.camel@pasglop>

On Thu, 26 Mar 2009, Benjamin Herrenschmidt wrote:
> 
> I'd like to clarify something about the semantics of the "full_mm_flush"
> argument of tlb_gather_mmu().
> 
> The reason is that it can either mean:
> 
>  - All the mappings for that mm are being flushed
> 
> or
> 
>  - The above +plus+ the mm is dead and has no remaining user. IE, we
> can relax some of the rules because we know the mappings cannot be
> accessed concurrently, and thus the PTEs cannot be reloaded into the
> TLB.

No remaining user in the sense of no longer connected to any user task,
but may still be active_mm on some cpus.

> 
> If it means the later (which it does in practice today, since we only
> call it from exit_mmap(), unless I missed an important detail), then I
> could implement some optimisations in my own arch code, but more

Yes, I'm pretty sure you can assume the latter.  The whole point
of the "full mm" stuff (would have better been named "exit mm") is
to allow optimizations, and I don't see what optimization there is to
be made from knowing you're going the whole length of the mm; whereas
optimizations can be made if you know nothing can happen in parallel.

Cc'ed DaveM who introduced it for sparc64, and Zach and Jeremy
who have delved there, in case they wish to disagree.

> importantly, I believe we might also be able to optimize the generic
> (and x86) code to avoid flushing the TLB when the batch of pages fills
> up, before freeing the pages.

I'd be surprised if there are still such optimizations to be made:
maybe a whole different strategy could be more efficient, but I'd be
surprised if there's really a superfluous TLB flush to be tweaked away.

Although it looks as if there's a TLB flush at the end of every batch,
isn't that deceptive (on x86 anyway)?  I'm thinking that the first
flush_tlb_mm() will end up calling leave_mm(), and the subsequent
ones do nothing because the cpu_vm_mask is then empty.

Hmm, but the cpu which is actually doing the flush_tlb_mm() calls
leave_mm() without considering cpu_vm_mask: won't we get repeated
unnecessary load_cr3(swapper_pg_dir)s from that?

> 
> That would have the side effect of speeding up exit of large processes
> by limiting the number of tlb flushes they do. Since the TLB would need
> to be flushed only once at the end for archs that may carry more than
> one context in their TLB, and possibly not at all on x86 since it
> doesn't and the context isn't active any more.

It's tempting to think that even that one TLB flush is one too many,
given that the next user task to run on any cpu will have to load %cr3
for its own address space.

But I think that leaves a danger from speculative TLB loads by kernel
threads, after the pagetables of the original mm have got freed and
reused for something else: I think they would at least need to remain
good pagetables until the last cpu's TLB has been flushed.

> 
> Or am I missing something ?

I suspect so, but please don't take my word for it: you've
probably put more thought into asking than I have in answering.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-03-26 13:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-26  5:01 Benjamin Herrenschmidt
2009-03-26 14:08 ` Hugh Dickins [this message]
2009-03-26 16:38   ` Linus Torvalds
2009-03-26 23:13     ` Benjamin Herrenschmidt
2009-03-26 17:21   ` Jeremy Fitzhardinge
2009-03-26 20:39   ` David Miller
2009-03-26 22:33   ` Benjamin Herrenschmidt
2009-03-27  5:04     ` David Miller
2009-03-27  5:38       ` Benjamin Herrenschmidt
2009-03-27  5:44         ` David Miller
2009-03-27  5:54           ` Benjamin Herrenschmidt
2009-03-27  5:57             ` David Miller
2009-03-27  6:10               ` Benjamin Herrenschmidt
2009-03-27  8:05                 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0903261232060.27412@blonde.anvils \
    --to=hugh@veritas.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=jeremy@goop.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox