* [BUGFIX]{PATCH] flush icache on ia64 take2
@ 2007-07-06 2:29 KAMEZAWA Hiroyuki
2007-07-06 9:41 ` Zoltan Menyhart
2007-07-19 6:56 ` KAMEZAWA Hiroyuki
0 siblings, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-06 2:29 UTC (permalink / raw)
To: linux-ia64
Cc: LKML, linux-mm, tony.luck, nickpiggin, mike, Zoltan.Menyhart,
dmosberger, GOTO
This is a patch for fixing icache flush race in ia64(Montecito) by implementing
flush_icache_page() at el.
Changelog:
- updated against 2.6.22-rc7 (previous one was against 2.6.21)
- removed hugetlbe's lazy_mmu_prot_update().
- rewrote patch description.
- removed patch against mprotect() if flushes cache.
Brief Description:
In current kernel, ia64 flushes executable page's icache by
lazy_mmu_prot_update() after set_pte(). But multi-threaded programs can access
text page if pte says "a page is present". Then, there is a race condition.
This patch guarantees cache is flushed before set_pte() call.
Basic Information:
Montecito, new ia64 processor, has separated L2 I-cache and L2 D-cache,
and I-cache and D-cache is not guaranteed to be consistent in automatic way.
L1 cache is also separated but L1 D-cache is write-through.
Montecito has separated L2 cache and Mixed L3 cache. And...L2 D-cache is
*write back*. (See http://download.intel.com/design/Itanium2/manuals/
30806501.pdf section 2.3.3)
What we found and my understanding:
In our environment, I found SIGILL occurs when...
* A multi threaded program is executed. (a program is a HPC program and uses
automatic parallelizaiton.)
* Above program is on NFS.
* SIGILL happes only when a file is newly loaded into the page-cache.
* Where failure occurs is not fixed. SIGILL comes at random instruction point.
* This didn't happen before Montecito.
>From Itanium2 document, we can see....
* Any invalidation of a cache-line will invalidate an I-cache.
* I-cache and D-cache is not coherent.
* L1D-cache to L2 cache is *write-through*.
* L2D-cache to L3-mixed cache is *write-back*.
And we don't see this problem before Montecito. Big difference between old ones
and Montecito is
* old cpus has Mixed L2 cache.
* Montecito has separated L2I-cache and L2D-cache.
Following is my understanding. I'd like to hear ia64 specialist's opinion.
Assume CPU(A) and CPU(B).
1. CPU(A) causes a page fault in text and calls do_no_page().
2. CPU(B) executes NFS's ops and fill page with received RPC result(from NFS).
and do SetPageUptodate().
3. CPU(A) continues a page fault operation and calls set_pte().
4. CPU(B)'s another thread executes a text in the page which CPU(A) mapped.
5-a. CPU(A) calls lazy_mmu_prot_update() and flush icache (this is slow).
5-b. CPU(B) continues execution and cause SIGILL by an instruction which
CPU(A) haven't flushed yet.
In stage 2. , CPU(B) loads all text data into L2-Dcache and L3-mixed cache.
And write them.
But data which can be accessed by CPU(B)'s L2-Icache is in L3-mixed cache.
Then, If write back from L2-Dcache to L3-mixed cache is delayed, L2-Icache
of CPU(B) will fetch wrong data.
Note: CPU(A) will fetch fetch correct instruction in above case.
Usual file systems uses DMA and it purges cache. But NFS uses copy-by-cpu.
Anyway, there is SIGILL problem in NFS/ia64 and icache flush can fix
SIGILL problem (in our HPC team test.)
calling lazy_mmu_prot_update() before set_pte() can fix this. But
it seems strange way. Then, flush_icache_page() is implemented.
Note1: icache flush is called only when VM_EXEC flag is on and
PG_arch_1 is not set.
Note2: description in Devid Mosberger's "ia64 linux kernel" pp204 says
"linux taditionally maps the memory stack and memory allocated by
the brk() with executable permission turned on...."
But this is changed now. anon/stack is not mapped as executable usually.
It depens on READ_IMPLIES_EXEC personality. So checking VM_EXEC is enough.
What this patch does:
1. remove all lazy_mmu_prot_update()...which is used by only ia64.
2. implements flush_cache_page()/flush_icache_page() for ia64.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
arch/ia64/mm/init.c | 7 +------
include/asm-generic/pgtable.h | 4 ----
include/asm-ia64/cacheflush.h | 24 ++++++++++++++++++++++--
include/asm-ia64/pgtable.h | 9 ---------
mm/fremap.c | 1 -
mm/hugetlb.c | 5 +++--
mm/memory.c | 13 ++++++-------
mm/migrate.c | 6 +++++-
mm/mprotect.c | 1 -
mm/rmap.c | 1 -
10 files changed, 37 insertions(+), 34 deletions(-)
Index: devel-2.6.22-rc7/include/asm-ia64/cacheflush.h
===================================================================
--- devel-2.6.22-rc7.orig/include/asm-ia64/cacheflush.h
+++ devel-2.6.22-rc7/include/asm-ia64/cacheflush.h
@@ -10,18 +10,38 @@
#include <asm/bitops.h>
#include <asm/page.h>
+#include <linux/mm.h>
/*
* Cache flushing routines. This is the kind of stuff that can be very expensive, so try
* to avoid them whenever possible.
*/
+extern void __flush_icache_page_ia64(struct page *page);
#define flush_cache_all() do { } while (0)
#define flush_cache_mm(mm) do { } while (0)
#define flush_cache_dup_mm(mm) do { } while (0)
#define flush_cache_range(vma, start, end) do { } while (0)
-#define flush_cache_page(vma, vmaddr, pfn) do { } while (0)
-#define flush_icache_page(vma,page) do { } while (0)
+
+static inline void
+flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
+ unsigned long pfn)
+{
+ if (vma->vm_flags & VM_EXEC) {
+ struct page *page;
+ if (!pfn_valid(pfn))
+ return;
+ page = pfn_to_page(pfn);
+ __flush_icache_page_ia64(page);
+ }
+}
+
+static inline void
+flush_icache_page(struct vm_area_struct *vma,struct page *page) {
+ if (vma->vm_flags & VM_EXEC)
+ __flush_icache_page_ia64(page);
+}
+
#define flush_cache_vmap(start, end) do { } while (0)
#define flush_cache_vunmap(start, end) do { } while (0)
Index: devel-2.6.22-rc7/arch/ia64/mm/init.c
===================================================================
--- devel-2.6.22-rc7.orig/arch/ia64/mm/init.c
+++ devel-2.6.22-rc7/arch/ia64/mm/init.c
@@ -54,16 +54,11 @@ struct page *zero_page_memmap_ptr; /* ma
EXPORT_SYMBOL(zero_page_memmap_ptr);
void
-lazy_mmu_prot_update (pte_t pte)
+__flush_icache_page_ia64 (struct page *page)
{
unsigned long addr;
- struct page *page;
unsigned long order;
- if (!pte_exec(pte))
- return; /* not an executable page... */
-
- page = pte_page(pte);
addr = (unsigned long) page_address(page);
if (test_bit(PG_arch_1, &page->flags))
Index: devel-2.6.22-rc7/include/asm-ia64/pgtable.h
===================================================================
--- devel-2.6.22-rc7.orig/include/asm-ia64/pgtable.h
+++ devel-2.6.22-rc7/include/asm-ia64/pgtable.h
@@ -151,7 +151,6 @@
#include <linux/sched.h> /* for mm_struct */
#include <asm/bitops.h>
-#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
#include <asm/processor.h>
@@ -502,13 +501,6 @@ extern struct page *zero_page_memmap_ptr
#define HUGETLB_PGDIR_MASK (~(HUGETLB_PGDIR_SIZE-1))
#endif
-/*
- * IA-64 doesn't have any external MMU info: the page tables contain all the necessary
- * information. However, we use this routine to take care of any (delayed) i-cache
- * flushing that may be necessary.
- */
-extern void lazy_mmu_prot_update (pte_t pte);
-
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
/*
* Update PTEP with ENTRY, which is guaranteed to be a less
@@ -596,7 +588,6 @@ extern void lazy_mmu_prot_update (pte_t
#define __HAVE_ARCH_PTEP_SET_WRPROTECT
#define __HAVE_ARCH_PTE_SAME
#define __HAVE_ARCH_PGD_OFFSET_GATE
-#define __HAVE_ARCH_LAZY_MMU_PROT_UPDATE
#ifndef CONFIG_PGTABLE_4
#include <asm-generic/pgtable-nopud.h>
Index: devel-2.6.22-rc7/include/asm-generic/pgtable.h
===================================================================
--- devel-2.6.22-rc7.orig/include/asm-generic/pgtable.h
+++ devel-2.6.22-rc7/include/asm-generic/pgtable.h
@@ -168,10 +168,6 @@ static inline void ptep_set_wrprotect(st
#define pgd_offset_gate(mm, addr) pgd_offset(mm, addr)
#endif
-#ifndef __HAVE_ARCH_LAZY_MMU_PROT_UPDATE
-#define lazy_mmu_prot_update(pte) do { } while (0)
-#endif
-
#ifndef __HAVE_ARCH_MOVE_PTE
#define move_pte(pte, prot, old_addr, new_addr) (pte)
#endif
Index: devel-2.6.22-rc7/mm/memory.c
===================================================================
--- devel-2.6.22-rc7.orig/mm/memory.c
+++ devel-2.6.22-rc7/mm/memory.c
@@ -1693,7 +1693,6 @@ static int do_wp_page(struct mm_struct *
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
if (ptep_set_access_flags(vma, address, page_table, entry,1)) {
update_mmu_cache(vma, address, entry);
- lazy_mmu_prot_update(entry);
}
ret |= VM_FAULT_WRITE;
goto unlock;
@@ -1735,7 +1734,6 @@ gotten:
flush_cache_page(vma, address, pte_pfn(orig_pte));
entry = mk_pte(new_page, vma->vm_page_prot);
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
- lazy_mmu_prot_update(entry);
/*
* Clear the pte entry and flush it first, before updating the
* pte with the new entry. This will avoid a race condition
@@ -2200,7 +2198,6 @@ static int do_swap_page(struct mm_struct
/* No need to invalidate - it was non-present before */
update_mmu_cache(vma, address, pte);
- lazy_mmu_prot_update(pte);
unlock:
pte_unmap_unlock(page_table, ptl);
out:
@@ -2257,12 +2254,16 @@ static int do_anonymous_page(struct mm_s
inc_mm_counter(mm, file_rss);
page_add_file_rmap(page);
}
-
+ /*
+ * new page is zero-filled, but we have to guarantee icache-dcache
+ * synchronization before setting pte on some processor.
+ */
+ if (write_access)
+ flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, entry);
/* No need to invalidate - it was non-present before */
update_mmu_cache(vma, address, entry);
- lazy_mmu_prot_update(entry);
unlock:
pte_unmap_unlock(page_table, ptl);
return VM_FAULT_MINOR;
@@ -2407,7 +2408,6 @@ retry:
/* no need to invalidate: a not-present page shouldn't be cached */
update_mmu_cache(vma, address, entry);
- lazy_mmu_prot_update(entry);
unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
@@ -2563,7 +2563,6 @@ static inline int handle_pte_fault(struc
entry = pte_mkyoung(entry);
if (ptep_set_access_flags(vma, address, pte, entry, write_access)) {
update_mmu_cache(vma, address, entry);
- lazy_mmu_prot_update(entry);
} else {
/*
* This is needed only for protection faults but the arch code
Index: devel-2.6.22-rc7/mm/migrate.c
===================================================================
--- devel-2.6.22-rc7.orig/mm/migrate.c
+++ devel-2.6.22-rc7/mm/migrate.c
@@ -172,6 +172,11 @@ static void remove_migration_pte(struct
pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
if (is_write_migration_entry(entry))
pte = pte_mkwrite(pte);
+ /*
+ * If the processor doesn't guarantee icache-dicache synchronization.
+ * We need to flush icache before set_pte.
+ */
+ flush_icache_page(vma, new);
set_pte_at(mm, addr, ptep, pte);
if (PageAnon(new))
@@ -181,7 +186,6 @@ static void remove_migration_pte(struct
/* No need to invalidate - it was non-present before */
update_mmu_cache(vma, addr, pte);
- lazy_mmu_prot_update(pte);
out:
pte_unmap_unlock(ptep, ptl);
Index: devel-2.6.22-rc7/mm/mprotect.c
===================================================================
--- devel-2.6.22-rc7.orig/mm/mprotect.c
+++ devel-2.6.22-rc7/mm/mprotect.c
@@ -53,7 +53,6 @@ static void change_pte_range(struct mm_s
if (dirty_accountable && pte_dirty(ptent))
ptent = pte_mkwrite(ptent);
set_pte_at(mm, addr, pte, ptent);
- lazy_mmu_prot_update(ptent);
#ifdef CONFIG_MIGRATION
} else if (!pte_file(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
Index: devel-2.6.22-rc7/mm/rmap.c
===================================================================
--- devel-2.6.22-rc7.orig/mm/rmap.c
+++ devel-2.6.22-rc7/mm/rmap.c
@@ -436,7 +436,6 @@ static int page_mkclean_one(struct page
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
- lazy_mmu_prot_update(entry);
ret = 1;
}
Index: devel-2.6.22-rc7/mm/fremap.c
===================================================================
--- devel-2.6.22-rc7.orig/mm/fremap.c
+++ devel-2.6.22-rc7/mm/fremap.c
@@ -83,7 +83,6 @@ int install_page(struct mm_struct *mm, s
set_pte_at(mm, addr, pte, pte_val);
page_add_file_rmap(page);
update_mmu_cache(vma, addr, pte_val);
- lazy_mmu_prot_update(pte_val);
err = 0;
unlock:
pte_unmap_unlock(pte, ptl);
Index: devel-2.6.22-rc7/mm/hugetlb.c
===================================================================
--- devel-2.6.22-rc7.orig/mm/hugetlb.c
+++ devel-2.6.22-rc7/mm/hugetlb.c
@@ -328,7 +328,6 @@ static void set_huge_ptep_writable(struc
entry = pte_mkwrite(pte_mkdirty(*ptep));
if (ptep_set_access_flags(vma, address, ptep, entry, 1)) {
update_mmu_cache(vma, address, entry);
- lazy_mmu_prot_update(entry);
}
}
@@ -445,6 +444,7 @@ static int hugetlb_cow(struct mm_struct
* and just make the page writable */
avoidcopy = (page_count(old_page) == 1);
if (avoidcopy) {
+ /* cache flush is not necessary in this case */
set_huge_ptep_writable(vma, address, ptep);
return VM_FAULT_MINOR;
}
@@ -463,6 +463,8 @@ static int hugetlb_cow(struct mm_struct
ptep = huge_pte_offset(mm, address & HPAGE_MASK);
if (likely(pte_same(*ptep, pte))) {
+ /* We have to confirm new_page's cache coherency if VM_EXEC */
+ flush_icache_page(vma,new_page);
/* Break COW */
set_huge_pte_at(mm, address, ptep,
make_huge_pte(vma, new_page, 1));
@@ -681,7 +683,6 @@ void hugetlb_change_protection(struct vm
pte = huge_ptep_get_and_clear(mm, address, ptep);
pte = pte_mkhuge(pte_modify(pte, newprot));
set_huge_pte_at(mm, address, ptep, pte);
- lazy_mmu_prot_update(pte);
}
}
spin_unlock(&mm->page_table_lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-06 2:29 [BUGFIX]{PATCH] flush icache on ia64 take2 KAMEZAWA Hiroyuki
@ 2007-07-06 9:41 ` Zoltan Menyhart
2007-07-06 9:50 ` KAMEZAWA Hiroyuki
2007-07-19 6:56 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 13+ messages in thread
From: Zoltan Menyhart @ 2007-07-06 9:41 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, LKML, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, GOTO
KAMEZAWA Hiroyuki wrote:
> Note1: icache flush is called only when VM_EXEC flag is on and
> PG_arch_1 is not set.
If you have not got the page in the cache, then the new page will
be allocated with PG_arch_1 bit off.
You are going to flush pages which are read by HW DMA, i.e. the L2I
of Montecito does not keep old lines for those pages anyway.
...->a_ops->readpage() of "L2I safe" file systems should set PG_arch_1
if the CPU is ia64 and it has got separate L2I.
On the other hand, arch. independent file systems should not play with
PG_arch_1.
The base kernel should export a macro for the file systems...
Thanks,
Zoltan Menyhart
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-06 9:41 ` Zoltan Menyhart
@ 2007-07-06 9:50 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-06 9:50 UTC (permalink / raw)
To: Zoltan Menyhart
Cc: linux-ia64, LKML, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, GOTO
On Fri, 06 Jul 2007 11:41:38 +0200
Zoltan Menyhart <Zoltan.Menyhart@bull.net> wrote:
> KAMEZAWA Hiroyuki wrote:
>
> > Note1: icache flush is called only when VM_EXEC flag is on and
> > PG_arch_1 is not set.
>
> If you have not got the page in the cache, then the new page will
> be allocated with PG_arch_1 bit off.
> You are going to flush pages which are read by HW DMA, i.e. the L2I
> of Montecito does not keep old lines for those pages anyway.
>
yes, it's my concern.
> ...->a_ops->readpage() of "L2I safe" file systems should set PG_arch_1
> if the CPU is ia64 and it has got separate L2I.
>
> On the other hand, arch. independent file systems should not play with
> PG_arch_1.
> The base kernel should export a macro for the file systems...
>
Hm, but it should be discussed in another thread... I think.
>From memory management view, flush_icache before set_pte() is sane thing.
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-06 2:29 [BUGFIX]{PATCH] flush icache on ia64 take2 KAMEZAWA Hiroyuki
2007-07-06 9:41 ` Zoltan Menyhart
@ 2007-07-19 6:56 ` KAMEZAWA Hiroyuki
2007-07-19 12:05 ` Zoltan Menyhart
2007-07-19 12:15 ` Zoltan Menyhart
1 sibling, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-19 6:56 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, LKML, linux-mm, tony.luck, nickpiggin, mike,
Zoltan.Menyhart, dmosberger, GOTO
On Fri, 6 Jul 2007 11:29:01 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> This is a patch for fixing icache flush race in ia64(Montecito) by implementing
> flush_icache_page() at el.
>
> Changelog:
> - updated against 2.6.22-rc7 (previous one was against 2.6.21)
> - removed hugetlbe's lazy_mmu_prot_update().
> - rewrote patch description.
> - removed patch against mprotect() if flushes cache.
>
Then, what should I do more for fixing this SIGILL problem ?
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 6:56 ` KAMEZAWA Hiroyuki
@ 2007-07-19 12:05 ` Zoltan Menyhart
2007-07-19 13:01 ` KAMEZAWA Hiroyuki
2007-07-19 12:15 ` Zoltan Menyhart
1 sibling, 1 reply; 13+ messages in thread
From: Zoltan Menyhart @ 2007-07-19 12:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, LKML, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, GOTO
KAMEZAWA Hiroyuki wrote:
> Then, what should I do more for fixing this SIGILL problem ?
>
> -Kame
I can think of a relatively cheap solution:
New pages are allocated with the bit PG_arch_1 off, see
page_cache_read() ... prep_new_page(), i.e. the I-cache is
not coherent with the D-cache.
page_cache_read() should add a macro, say:
ARCH_PREP_PAGE_BEFORE_READ(page);
before actually calling mapping->a_ops->readpage(file, page).
This macro can be for ia64 something like:
do { if (CPU_has_split_L2_I_cache) set_bit(PG_arch_1, &page->flags); }
and empty for the the architectures non concerned.
The file systems which are identified not to use HW tools to
avoid split I-cache incoherency, e.g. nfs_readpage(), are required
to add a macro, say:
ARCH_CHECK_READ_PAGE_COHERENCY(page);
This macro can be for ia64:
do { if (CPU_has_split_L2_I_cache) clear_bit(PG_arch_1, &page->flags); }
and empty for the the architectures non concerned.
Back to do_no_page():
if the new PTE includes the exec bit and PG_arch_1 is set,
the page has to be flushed from the I-cache before the PTE is
made globally visible.
File systems using local block devices with DMA are considered
to be safe, with the exceptions of the bounce buffers.
When you copy into the destination page, another macro should be
added, say:
ARCH_CHECK_BOUNCE_READ_COHERENCY(bio_vec);
#define ARCH_CHECK_BOUNCE_READ_COHERENCY(bio_vec) \
ARCH_CHECK_READ_PAGE_COHERENCY(bio_vec->bv_page)
Remote DMA based network file systems, e.g. Lustre on Quadrics,
Infiniband are also considered to be safe.
Thanks,
Zoltan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 12:05 ` Zoltan Menyhart
@ 2007-07-19 13:01 ` KAMEZAWA Hiroyuki
2007-07-19 13:32 ` KAMEZAWA Hiroyuki
2007-07-19 14:15 ` Zoltan Menyhart
0 siblings, 2 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-19 13:01 UTC (permalink / raw)
To: Zoltan Menyhart
Cc: linux-ia64, linux-kernel, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, y-goto
On Thu, 19 Jul 2007 14:05:06 +0200
Zoltan Menyhart <Zoltan.Menyhart@bull.net> wrote:
> KAMEZAWA Hiroyuki wrote:
>
> > Then, what should I do more for fixing this SIGILL problem ?
> >
> > -Kame
>
> I can think of a relatively cheap solution:
>
Maybe I should take performance numbers with the patch.
But is it too costly that flushing icache page only if a page is newly
installed into the system (PG_arch1) && it is mapped as executable ?
I don't want to leak this (stupid) corner case to the file system layer.
Hmm...can't we do clever flushing (like your idea) in VM layer ?
-Kame
> New pages are allocated with the bit PG_arch_1 off, see
> page_cache_read() ... prep_new_page(), i.e. the I-cache is
> not coherent with the D-cache.
>
> page_cache_read() should add a macro, say:
>
> ARCH_PREP_PAGE_BEFORE_READ(page);
>
> before actually calling mapping->a_ops->readpage(file, page).
>
> This macro can be for ia64 something like:
>
> do { if (CPU_has_split_L2_I_cache) set_bit(PG_arch_1, &page->flags); }
>
> and empty for the the architectures non concerned.
>
> The file systems which are identified not to use HW tools to
> avoid split I-cache incoherency, e.g. nfs_readpage(), are required
> to add a macro, say:
>
> ARCH_CHECK_READ_PAGE_COHERENCY(page);
>
> This macro can be for ia64:
>
> do { if (CPU_has_split_L2_I_cache) clear_bit(PG_arch_1, &page->flags); }
>
> and empty for the the architectures non concerned.
>
> Back to do_no_page():
> if the new PTE includes the exec bit and PG_arch_1 is set,
> the page has to be flushed from the I-cache before the PTE is
> made globally visible.
>
> File systems using local block devices with DMA are considered
> to be safe, with the exceptions of the bounce buffers.
>
> When you copy into the destination page, another macro should be
> added, say:
>
> ARCH_CHECK_BOUNCE_READ_COHERENCY(bio_vec);
>
> #define ARCH_CHECK_BOUNCE_READ_COHERENCY(bio_vec) \
> ARCH_CHECK_READ_PAGE_COHERENCY(bio_vec->bv_page)
>
> Remote DMA based network file systems, e.g. Lustre on Quadrics,
> Infiniband are also considered to be safe.
>
> Thanks,
>
> Zoltan
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 13:01 ` KAMEZAWA Hiroyuki
@ 2007-07-19 13:32 ` KAMEZAWA Hiroyuki
2007-07-19 14:33 ` Zoltan Menyhart
2007-07-19 14:15 ` Zoltan Menyhart
1 sibling, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-19 13:32 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Zoltan.Menyhart, linux-ia64, linux-kernel, linux-mm, tony.luck,
nickpiggin, mike, dmosberger, y-goto
On Thu, 19 Jul 2007 22:01:18 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 19 Jul 2007 14:05:06 +0200
> Zoltan Menyhart <Zoltan.Menyhart@bull.net> wrote:
>
> > KAMEZAWA Hiroyuki wrote:
> >
> > > Then, what should I do more for fixing this SIGILL problem ?
> > >
> > > -Kame
> >
> > I can think of a relatively cheap solution:
> >
> Maybe I should take performance numbers with the patch.
>
> But is it too costly that flushing icache page only if a page is newly
> installed into the system (PG_arch1) && it is mapped as executable ?
>
> I don't want to leak this (stupid) corner case to the file system layer.
> Hmm...can't we do clever flushing (like your idea) in VM layer ?
>
A bit new idea. How about this ?
==
- Set PG_arch_1 if "icache is *not* coherent"
- make flush_dcache_page() to be empty func.
- For Montecito, add kmap_atomic(). This function just set PG_arch1.
Then, "the page which is copied by the kernel" is marked as "not icache coherent page"
- icache_flush_page() just flushes a page which has PG_arch_1.
- Anonymous page is always has PG_arch_1. Tkae care of Copy-On-Write.
==
looks easy ?
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 13:32 ` KAMEZAWA Hiroyuki
@ 2007-07-19 14:33 ` Zoltan Menyhart
0 siblings, 0 replies; 13+ messages in thread
From: Zoltan Menyhart @ 2007-07-19 14:33 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, linux-kernel, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, y-goto
KAMEZAWA Hiroyuki wrote:
> A bit new idea. How about this ?
> ==
> - Set PG_arch_1 if "icache is *not* coherent"
page-flags.h:
* PG_arch_1 is an architecture specific page state bit. The generic code
* guarantees that this bit is cleared for a page when it first is entered into
* the page cache.
I do not think you can easily change it.
I can agree, making nfs_readpage() call an architecture dependent service
is not an easy stuff either. :-)
> - make flush_dcache_page() to be empty func.
> - For Montecito, add kmap_atomic(). This function just set PG_arch1.
kmap_atomic() is used at several places. Do you want to set
PG_arch1, everywhere kmap_atomic() is called?
> Then, "the page which is copied by the kernel" is marked as "not icache coherent page"
> - icache_flush_page() just flushes a page which has PG_arch_1.
> - Anonymous page is always has PG_arch_1. Tkae care of Copy-On-Write.
You can allocate (even in user mode) an anonymous page, hand-create or
read() in some code from a file, and mprotect(...., EXEC)-it. The page has
to become I-cache coherent.
I am not sure I can really understand your proposal.
I cannot see how the compatibility to the existing code is made sure.
Thanks,
Zoltan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 13:01 ` KAMEZAWA Hiroyuki
2007-07-19 13:32 ` KAMEZAWA Hiroyuki
@ 2007-07-19 14:15 ` Zoltan Menyhart
2007-07-19 14:51 ` KAMEZAWA Hiroyuki
1 sibling, 1 reply; 13+ messages in thread
From: Zoltan Menyhart @ 2007-07-19 14:15 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, linux-kernel, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, y-goto
KAMEZAWA Hiroyuki wrote:
> But is it too costly that flushing icache page only if a page is newly
> installed into the system (PG_arch1) && it is mapped as executable ?
Well it was a bit long time ago, I measured on a Tiger box with
CPUs of 1.3 GHz:
Flushing a page of 64 Kbytes, with modified data in D-cache
(it's slower that not having modified data in the D-cache):
13.1 ... 14.7 usec.
You may have quicker machines, but having more CPUs or a NUMA architecture
can slow it down considerably:
- more CPUs have to agree that that's the moment to carry out a flush
- NUMA adds delay
We may have, say 1 Gbyte / sec local i/o activity (using some RAIDs).
Assume a few % of this 1 Gbyte is the program execution, or program swap in.
It gives some hundreds of new exec pages / sec =>
some msec-s can be lost each sec.
I can agree that it should not be a big deal :-)
> I don't want to leak this (stupid) corner case to the file system layer.
> Hmm...can't we do clever flushing (like your idea) in VM layer ?
As the VM layer is designed to be independent of the page read in stuff...
Thanks,
Zoltan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 14:15 ` Zoltan Menyhart
@ 2007-07-19 14:51 ` KAMEZAWA Hiroyuki
2007-07-19 15:18 ` Zoltan Menyhart
0 siblings, 1 reply; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-19 14:51 UTC (permalink / raw)
To: Zoltan Menyhart
Cc: linux-ia64, linux-kernel, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, y-goto
On Thu, 19 Jul 2007 16:15:03 +0200
Zoltan Menyhart <Zoltan.Menyhart@bull.net> wrote:
> We may have, say 1 Gbyte / sec local i/o activity (using some RAIDs).
> Assume a few % of this 1 Gbyte is the program execution, or program swap in.
> It gives some hundreds of new exec pages / sec =>
> some msec-s can be lost each sec.
>
> I can agree that it should not be a big deal :-)
>
Hmm...but the current code flushes the page. just do it in "lazy" way.
much difference ?
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 14:51 ` KAMEZAWA Hiroyuki
@ 2007-07-19 15:18 ` Zoltan Menyhart
2007-07-20 1:51 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 13+ messages in thread
From: Zoltan Menyhart @ 2007-07-19 15:18 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, linux-kernel, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, y-goto
KAMEZAWA Hiroyuki wrote:
> Hmm...but the current code flushes the page. just do it in "lazy" way.
> much difference ?
I agree the current code flushes the I-cache for all kinds of file
systems (for PTEs with the exec bit on).
The error is that it does it after the PTE is written.
In addition, I wanted to optimize it to gain a few %.
Apparently this idea is not much welcome.
I can agree that flushing the I-cache (if the architecture requires it)
before setting the PTE eliminates the error.
Thanks,
Zoltan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 15:18 ` Zoltan Menyhart
@ 2007-07-20 1:51 ` KAMEZAWA Hiroyuki
0 siblings, 0 replies; 13+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-20 1:51 UTC (permalink / raw)
To: Zoltan Menyhart
Cc: linux-ia64, linux-kernel, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, y-goto
On Thu, 19 Jul 2007 17:18:59 +0200
Zoltan Menyhart <Zoltan.Menyhart@bull.net> wrote:
> KAMEZAWA Hiroyuki wrote:
>
> > Hmm...but the current code flushes the page. just do it in "lazy" way.
> > much difference ?
>
> I agree the current code flushes the I-cache for all kinds of file
> systems (for PTEs with the exec bit on).
>
> The error is that it does it after the PTE is written.
>
> In addition, I wanted to optimize it to gain a few %.
I'm glad, too if we can do it. But it seems that we need a bit
clever way to do that.
> Apparently this idea is not much welcome.
>
> I can agree that flushing the I-cache (if the architecture requires it)
> before setting the PTE eliminates the error.
>
Hm, I'll refresh and repost the patch.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [BUGFIX]{PATCH] flush icache on ia64 take2
2007-07-19 6:56 ` KAMEZAWA Hiroyuki
2007-07-19 12:05 ` Zoltan Menyhart
@ 2007-07-19 12:15 ` Zoltan Menyhart
1 sibling, 0 replies; 13+ messages in thread
From: Zoltan Menyhart @ 2007-07-19 12:15 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: linux-ia64, LKML, linux-mm, tony.luck, nickpiggin, mike,
dmosberger, GOTO
> Back to do_no_page():
> if the new PTE includes the exec bit and PG_arch_1 is set,
> the page has to be flushed from the I-cache before the PTE is
> made globally visible.
Sorry, I wanted to say:
if the new PTE includes the exec bit and PG_arch_1 is NOT set
Thanks,
Zoltan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-07-20 1:51 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-06 2:29 [BUGFIX]{PATCH] flush icache on ia64 take2 KAMEZAWA Hiroyuki
2007-07-06 9:41 ` Zoltan Menyhart
2007-07-06 9:50 ` KAMEZAWA Hiroyuki
2007-07-19 6:56 ` KAMEZAWA Hiroyuki
2007-07-19 12:05 ` Zoltan Menyhart
2007-07-19 13:01 ` KAMEZAWA Hiroyuki
2007-07-19 13:32 ` KAMEZAWA Hiroyuki
2007-07-19 14:33 ` Zoltan Menyhart
2007-07-19 14:15 ` Zoltan Menyhart
2007-07-19 14:51 ` KAMEZAWA Hiroyuki
2007-07-19 15:18 ` Zoltan Menyhart
2007-07-20 1:51 ` KAMEZAWA Hiroyuki
2007-07-19 12:15 ` Zoltan Menyhart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox