From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Nutter Subject: Re: ppc64/cell: local TLB flush with active SPEs Date: Wed, 12 Oct 2005 17:09:26 -0500 Message-ID: References: <200510122003.59701.arnd@arndb.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1857875709==" Return-path: In-Reply-To: <200510122003.59701.arnd@arndb.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Mime-version: 1.0 Sender: linuxppc64-dev-bounces@ozlabs.org Errors-To: linuxppc64-dev-bounces@ozlabs.org To: Arnd Bergmann Cc: linux-mm@kvack.org, Ulrich Weigand , Paul Mackerras , Max Aguilar , linuxppc64-dev@ozlabs.org, Michael Day List-Id: linux-mm.kvack.org This is a multipart message in MIME format. --===============1857875709== Content-Type: multipart/alternative; boundary="=_alternative 0079BEBB86257098_=" This is a multipart message in MIME format. --=_alternative 0079BEBB86257098_= Content-Type: text/plain; charset="US-ASCII" For reference, the 2.6.3 bring-up kernel always issued global TLBIE. This was a hack, and we very much wanted to improve performance if possible, particularly for the vast majority of PPC applications out there that don't use SPEs. As long as we are thinking about a proper solution, the whole mm->cpu_vm_mask thing is broken, at least as a selector for local -vs- global TLBIE. The problem, as I see it, is that memory regions can shared among processes (via mmap/shmat), with each task bound to different processors. If we are to continue using a cpumask as selector for TLBIE, then we really need a vma->cpu_vma_mask. --- Mark Nutter STI Design Center / IBM email: mnutter@us.ibm.com voice: 512-838-1612 fax: 512-838-1927 11400 Burnet Road Mail Stop 906/3003B Austin, TX 78758 Arnd Bergmann 10/12/2005 01:03 PM To: linuxppc64-dev@ozlabs.org, linux-mm@kvack.org cc: Benjamin Herrenschmidt , Paul Mackerras , Mark Nutter/Austin/IBM@IBMUS, Michael Day/Austin/IBM@IBMUS, Ulrich Weigand Subject: ppc64/cell: local TLB flush with active SPEs I'm looking for a clean solution to detect the need for global TLB flush when an mm_struct is only used on one logical PowerPC CPU (PPE) and also mapped with the memory flow controller of an SPE on the Cell CPU. Normally, we set bits in mm_struct:cpu_vm_mask for each CPU that accesses the mm and then do global flushes instead of local flushes when CPUs other than the currently running one are marked as used in that mask. When an SPE does DMA to that mm, it also gets local TLB entries that are only flushed with a global tlbie broadcast. The current hack is to always set cpu_vm_mask to all bits set when we map an mm into an SPE to ensure receiving the broadcast, but that is obviously not how it's meant to be used. In particular, it doesn't work in UP configurations where the cpumask contains only one bit. One solution that might be better could be to introduce a new special flag in addition to cpu_vm_mask for this purpose. We already have a bit field in mm_struct for dumpable, so adding another bit there at least does not waste space for other platforms, and it's likely to be in the same cache line as cpu_vm_mask. However, I'm reluctant to add more bit fields to such a prominent place, because it might encourage other people to add more bit fields or thing that they are accepted coding practice. Another idea would be to add a new field to mm_context_t, so it stays in the architecture specific code. Again, adding an int here does not waste space because there is currently padding in that place on ppc64. Or maybe there is a completely different solution. Suggestions? Arnd <>< --=_alternative 0079BEBB86257098_= Content-Type: text/html; charset="US-ASCII"
For reference, the 2.6.3 bring-up kernel always issued global TLBIE.  This was a hack, and we very much wanted to improve performance if possible, particularly for the vast majority of PPC applications out there that don't use SPEs.

As long as we are thinking about a proper solution, the whole mm->cpu_vm_mask thing is broken, at least as a selector for local -vs- global TLBIE.  The problem, as I see it, is that memory regions can shared among processes (via mmap/shmat), with each task bound to different processors.  If we are to continue using a cpumask as selector for TLBIE, then we really need a vma->cpu_vma_mask.
 
---
Mark Nutter
STI Design Center / IBM
email: mnutter@us.ibm.com
voice: 512-838-1612
fax: 512-838-1927
11400 Burnet Road
Mail Stop 906/3003B
Austin, TX 78758



Arnd Bergmann <arnd@arndb.de>

10/12/2005 01:03 PM

       
        To:        linuxppc64-dev@ozlabs.org, linux-mm@kvack.org
        cc:        Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Mark Nutter/Austin/IBM@IBMUS, Michael Day/Austin/IBM@IBMUS, Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
        Subject:        ppc64/cell: local TLB flush with active SPEs



I'm looking for a clean solution to detect the need for global
TLB flush when an mm_struct is only used on one logical PowerPC
CPU (PPE) and also mapped with the memory flow controller of an
SPE on the Cell CPU.

Normally, we set bits in mm_struct:cpu_vm_mask for each CPU that
accesses the mm and then do global flushes instead of local flushes
when CPUs other than the currently running one are marked as used
in that mask. When an SPE does DMA to that mm, it also gets local
TLB entries that are only flushed with a global tlbie broadcast.

The current hack is to always set cpu_vm_mask to all bits set
when we map an mm into an SPE to ensure receiving the broadcast,
but that is obviously not how it's meant to be used. In particular,
it doesn't work in UP configurations where the cpumask contains
only one bit.

One solution that might be better could be to introduce a new special
flag in addition to cpu_vm_mask for this purpose. We already have
a bit field in mm_struct for dumpable, so adding another bit there
at least does not waste space for other platforms, and it's likely
to be in the same cache line as cpu_vm_mask. However, I'm reluctant
to add more bit fields to such a prominent place, because it might
encourage other people to add more bit fields or thing that they
are accepted coding practice.

Another idea would be to add a new field to mm_context_t, so it stays
in the architecture specific code. Again, adding an int here does
not waste space because there is currently padding in that place on
ppc64.

Or maybe there is a completely different solution.

Suggestions?

                Arnd <><

--=_alternative 0079BEBB86257098_=-- --===============1857875709== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Linuxppc64-dev mailing list Linuxppc64-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc64-dev --===============1857875709==--