From: Hugh Dickins <hugh@veritas.com>
To: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Cc: Andi Kleen <ak@suse.de>,
"Chen, Kenneth W" <kenneth.w.chen@intel.com>,
Christoph Lameter <clameter@sgi.com>,
linux-ia64@vger.kernel.org, linux-mm@kvack.org,
Zoltan.Menyhart@free.fr
Subject: Re: RFC: RCU protected page table walking
Date: Thu, 4 May 2006 16:27:42 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0605041611340.13830@blonde.wat.veritas.com> (raw)
In-Reply-To: <445A0784.2090803@bull.net>
On Thu, 4 May 2006, Zoltan Menyhart wrote:
> Andi Kleen wrote:
>
> > > >We don't free the pages until the other CPUs have been flushed
> > > >synchronously.
> > >
> > >Do you mean the TLB entries mapping the leaf pages?
> > >If yes, then I agree with you about them.
> > >Yet I speak about the directory pages. Let's take an example:
> >
> > x86 uses this for the directory pages too (well for PMD/PUD - PGD never
> > goes away until final exit).
>
> The i386 branch:
>
> tlb_remove_page():
> // assuming !tlb_fast_mode(tlb)
> tlb_flush_mmu():
> tlb_flush():
> flush_tlb_mm():
> __flush_tlb();
> free_pages_and_swap_cache();
>
> __flush_tlb():
> "movl %%cr3, %0;
> "movl %0, %%cr3; # flush TLB
>
> Do I understand correctly that it purges the local TLBs only?
__flush_tlb() purges the local TLBs only; but when you found the i386
or x86_64 flush_tlb_mm() calling __flush_tlb() above, you were looking
at the #ifndef CONFIG_SMP block of include/asm/tlbflush.h. Go over to
arch/{i386,x86_64}/kernel/smp.c to see what CONFIG_SMP flush_tlb_mm does.
> > Actually x86-64 didn't
> > fully at some point and it resulted in a nasty to track down bug.
> > But it was fixed then. I really went all over this with a very fine
> > comb back then and I'm pretty sure it's correct now :)
>
> Can you please indicate how the page table walking of the other
> CPUs is "aborted"?
I cannot answer for other architectures: you need to ask the specialist
list of each architecture for its answer (or hope that a specialist in
each is already reading this thread on linux-mm). What's certain is
that the issue is _supposed_ to be already covered safely on all arches,
hence the care which has gone into include/asm-generic/tlb.h etc. But
you may be right that some architectures get it wrong, I cannot tell.
I've CC'ed Ken Chen and linux-ia64 (as Christoph intended to), since
that's your first concern; but I'm reluctant to CC lots of different
architecture lists together myself.
Hugh
> > > >After the flush the other CPUs don't walk pages anymore.
>
> Can you please point me where it is documented that the HW walkers
> abort on a TLB flush / purge?
>
> Yet I did verify that it is not (always) the case for the RISC-s.
>
> E.g. arch/ia64/kernel/ivt.S:
>
> ENTRY(vhpt_miss)
> ...
> // r17 = pmd_offset(pud, addr)
> // -->
> (p7) ld8 r20=[r17] // get *pmd (may be 0)
>
> Assume we have reached the point indicated by "// -->":
> we have got a valid address for the next level.
> Assume "free_pgtables()" sets free these PMD / PTE pages.
> The eventual TLB flushes do not do anything to the "ld8"
> going to be executed.
>
> Can you explain please why you think that walking the
>
> rx = ... -> pgd[i] -> pud[j] -> pmd[k] -> pte[l]
>
> chain is safe in this condition, too?
>
> Another example in arch/ppc/kernel/head_44x.S:
>
> /* Data TLB Error Interrupt */
> START_EXCEPTION(DataTLBError)
> ...
> // r11 -> PGD or PTE page, r12 = index * sizeof(void *)
> // -->
> lwzx r11, r12, r11 /* Get pgd/pmd entry */
>
> > >Can you explain please why they do not?
> >
> > Because the PGD/PMD/PUD has been rewritten and they won't be able
> > to find the old pages anymore.
>
> As in the two examples above, the walkers have already picked up
> references to the next levels, and these references were valid
> at that moment.
>
> > They also don't have it in their
> > TLBs because that has been flushed.
>
> Are you sure this is true for the RISC-s, too?
> Even if an architecture does not play with TLB-s before really
> finding a valid PTE?
>
> > >There is a possibility that walking has already been started, but it has
> > >not been completed yet, when "free_pgtables()" runs.
> >
> > Yes, that is why we delay the freeing of the pages to prevent anything
> > going wrong.
>
> Can you explain please why the already-started walks, which do not
> care for the TLB flushes, can be safe?
>
> > What do you mean with "physical mode"?
>
> Not using any TLB entry (or any HW supported address translation stuff)
> to translate the data addresses before they go out of the CPU.
>
> > >is insensitive to any TLB purges, therefore these purges do not make sure
> > >that there is no other CPU just
> > >in the middle of page table walking.
>
> > A TLB Flush stops all MMU activity - or rather waits for it to finish.
>
> This is what I am trying to say: not on all archtectures.
>
> Thanks,
>
> Zoltan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-05-04 15:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-03 15:31 Zoltan Menyhart
2006-05-03 16:46 ` Andi Kleen
2006-05-03 18:00 ` Hugh Dickins
2006-05-03 23:54 ` Christoph Lameter
2006-05-04 2:51 ` Chen, Kenneth W
2006-05-04 4:28 ` Hugh Dickins
2006-05-04 9:26 ` Zoltan Menyhart
2006-05-04 9:31 ` Andi Kleen
2006-05-04 11:32 ` Zoltan Menyhart
2006-05-04 12:00 ` Andi Kleen
2006-05-04 13:13 ` Robin Holt
2006-05-04 13:54 ` Zoltan Menyhart
2006-05-04 15:27 ` Hugh Dickins [this message]
2006-05-04 9:19 ` Zoltan Menyhart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0605041611340.13830@blonde.wat.veritas.com \
--to=hugh@veritas.com \
--cc=Zoltan.Menyhart@bull.net \
--cc=Zoltan.Menyhart@free.fr \
--cc=ak@suse.de \
--cc=clameter@sgi.com \
--cc=kenneth.w.chen@intel.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox