* [RFC] [PATCH]kanoj-mm15-2.3.10 Fix ia32 SMP/clone pte races
@ 1999-07-16 0:24 Kanoj Sarcar
1999-07-16 1:09 ` Linus Torvalds
0 siblings, 1 reply; 3+ messages in thread
From: Kanoj Sarcar @ 1999-07-16 0:24 UTC (permalink / raw)
To: torvalds, Kanoj Sarcar; +Cc: linux-mm, linux-kernel
This note describes clone/SMP race problems specifically on processors
like the ia32 that have the "accessed" and "dirty" bits updated by the
processor while the kernel is reading/modifying the pte, assuming it is
completely frozen. The problem of "accessed" bit races is probably minor,
since the worst that happens with this ignorance is that kswapd might
sometimes decide to steal the wrong page. On the other hand, letting the
"dirty" bit updates be racy leads kswapd (and possibly other code) to
unwittingly clear the bit, leading to possible loss of data (eg, when
kswapd decides to steal a page but not write it out, or when sync()
misses syncing a page, etc).
In the presence of clones, a few paths need to be fixed, not just for
ia32, but for all processors. Linux platform independent code sometimes
lets clones update page data when really, such updates should be
disallowed.
1. During munmap, clones should be restricted while the file pages are
being written out to disk, else some writes are not synced to disk at
unmap time.
2. Looking at how often the code does tlb/cache flushes in the sync()
path, this path can be heavily optimized by clumping all the flushes.
On ia32 SMP, clones might lead to "dirty" bit getting updated when the
kernel does not expect it. These paths are in:
3. fork in copy_page_range.
4. mremap
5. mprotect
6. try_to_swap_out is racy, if the page stealer is stealing for a different
process, or the stealer is a clone, since the "dirty" bit could be getting
changed as the code looks at it.
Most of the other code that invokes set_pte or pte_clear are safe because
they are either setting the dirty bit unconditioanlly, or have made sure
there are no access/updates via the ptes.
For the sake of simplicity, a single macro, pte_freeze_mm_range is used
for 1, 2, 3, 4 and 5 to prevent any clone from updating the data pages or
changing the pte "dirty"/"accessed" bit. This also optimizes on the amount
of tlb flushes needed. While I have coded an ia32 SMP pte_freeze_mm_range,
it will probably benefit all platforms to implement this macro to fix
problems 1 and 2. For 6, the pte_freeze macro is used, which is nop
for all processors except the ia32. Note that the ia32 pte_freeze_mm_range
makes sure that no clone can update/access the pte, or pull the translation
into its tlb until the kernel has made the pte consistant (since callers
of pte_freeze_mm_range hold the mmap_sem). Due to the optimized
implementation which twiddles bits in the PDE instead of the PTE, it
is required that a matching pte_unfreeze_mm_range always be made.
The patch also fixes other minor bugs in vmscan.c and mremap.c.
Note that an alternate solution to the ia32 SMP pte race is to change
PAGE_SHARED in include/asm-i386/pgtable.h to not drop in _PAGE_RW. On a
write fault, mark the pte dirty, as well as drop in the _PAGE_RW bit into
the pte. Basically, behave as if the processor does not have a hardware
dirty bit, and adopt the same strategy as in the alpha/mips code.
Disadvantage - take an extra fault on shm/file mmap'ed pages when a
program accesses the page, *then* dirties it (no change if the first
access is a write).
Kanoj
--- /usr/tmp/p_rdiff_a001U7/smp.c Wed Jul 14 22:32:00 1999
+++ arch/i386/kernel/smp.c Wed Jul 14 21:24:37 1999
@@ -2157,3 +2157,45 @@
#undef APIC_DIVISOR
+#define PMD_TABLE_MASK ((PTRS_PER_PMD-1) * sizeof(pmd_t))
+
+void pte_range(struct mm_struct *mm, unsigned long start, unsigned long end,
+ int op)
+{
+ pgd_t * dir = pgd_offset(mm, start) - 1;
+
+ for(;;) {
+ pmd_t *pmd;
+
+ dir++;
+ if ((pgd_none(*dir)) || (pgd_bad(*dir))) {
+ start = (start + PGDIR_SIZE) & PGDIR_MASK;
+ if (start >= end)
+ return;
+ continue;
+ }
+
+ pmd = pmd_offset(dir, start);
+ do {
+ if ((pmd_none(*pmd)) || (pmd_bad(*pmd))) {
+ start = (start + PMD_SIZE) & PMD_MASK;
+ if (start >= end)
+ return;
+ pmd++;
+ continue;
+ }
+
+ switch(op) {
+ case 1:
+ pmd_mkabsent(pmd);
+ break;
+ case 0:
+ pmd_mkpresent(pmd);
+ break;
+ }
+ start = (start + PMD_SIZE) & PMD_MASK;
+ if (start >= end)
+ return;
+ } while ((unsigned long)pmd & PMD_TABLE_MASK);
+ }
+}
--- /usr/tmp/p_rdiff_a001SW/pgtable.h Wed Jul 14 22:32:08 1999
+++ include/asm-alpha/pgtable.h Wed Jul 14 16:37:51 1999
@@ -636,6 +636,11 @@
#define module_map vmalloc
#define module_unmap vfree
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
#define PageSkip(page) (0)
#define kern_addr_valid(addr) (1)
--- /usr/tmp/p_rdiff_a001Sq/pgtable.h Wed Jul 14 22:32:14 1999
+++ include/asm-arm/pgtable.h Wed Jul 14 16:39:13 1999
@@ -7,6 +7,11 @@
#define module_map vmalloc
#define module_unmap vfree
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
extern int do_check_pgt_cache(int, int);
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
--- /usr/tmp/p_rdiff_a001Tf/pgtable.h Wed Jul 14 22:32:22 1999
+++ include/asm-i386/pgtable.h Wed Jul 14 18:31:05 1999
@@ -316,9 +316,17 @@
#define pte_clear(xp) do { pte_val(*(xp)) = 0; } while (0)
#define pmd_none(x) (!pmd_val(x))
-#define pmd_bad(x) ((pmd_val(x) & (~PAGE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE)
+#define pmd_bad(x) ((pmd_val(x) & (~PAGE_MASK & ~(_PAGE_USER | _PAGE_PRESENT))) != (_KERNPG_TABLE & ~_PAGE_PRESENT))
#define pmd_present(x) (pmd_val(x) & _PAGE_PRESENT)
#define pmd_clear(xp) do { pmd_val(*(xp)) = 0; } while (0)
+#define pmd_mkabsent(pmd) do { \
+ pmd_val(*(pmd)) = \
+ (pmd_val(*(pmd)) & (~_PAGE_PRESENT));\
+ } while (0)
+#define pmd_mkpresent(pmd) do { \
+ pmd_val(*(pmd)) = \
+ (pmd_val(*(pmd)) | _PAGE_PRESENT); \
+ } while(0)
/*
* The "pgd_xxx()" functions here are trivial for a folded two-level
@@ -589,6 +597,50 @@
#define module_map vmalloc
#define module_unmap vfree
+
+
+#ifndef __SMP__
+
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
+#else /* !__SMP__ */
+
+extern void pte_range(struct mm_struct *, unsigned long, unsigned long, int);
+
+#define pte_freeze_mm_range(mm, s, e) \
+ ({ \
+ int ret = 1; \
+ if (atomic_read(&(mm)->count) != 1) \
+ pte_range((mm), (s), (e), 1); \
+ flush_tlb_range((mm), (s), (e)); \
+ ret; \
+ })
+
+#define pte_unfreeze_mm_range(mm, s, e) \
+ do { \
+ if (atomic_read(&(mm)->count) != 1) \
+ pte_range((mm), (s), (e), 0); \
+ } while(0)
+#define pte_freeze_dirty(vma, page_table, pte, wr) \
+ do { \
+ if ((pte_write(*(page_table))) && \
+ ((current->mm != (vma)->vm_mm) || \
+ (atomic_read(&(vma)->vm_mm->count) != 1))) { \
+ (wr) = 1; \
+ clear_bit(1, (unsigned long *)(page_table)); \
+ smp_flush_tlb(); \
+ (pte) = *(page_table); \
+ } \
+ } while(0)
+#define pte_unfreeze_dirty(page_table, wr) \
+ do { \
+ if (wr) pte_val(*(page_table)) |= _PAGE_RW; \
+ } while(0)
+
+#endif /* !__SMP__ */
#endif /* !__ASSEMBLY__ */
--- /usr/tmp/p_rdiff_a001Tp/pgtable.h Wed Jul 14 22:32:30 1999
+++ include/asm-m68k/pgtable.h Wed Jul 14 16:39:23 1999
@@ -824,6 +824,11 @@
#define module_map vmalloc
#define module_unmap vfree
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
#define PageSkip(page) (0)
#define kern_addr_valid(addr) (1)
--- /usr/tmp/p_rdiff_a001Uf/pgtable.h Wed Jul 14 22:32:38 1999
+++ include/asm-mips/pgtable.h Wed Jul 14 16:39:37 1999
@@ -587,6 +587,11 @@
#define module_map vmalloc
#define module_unmap vfree
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
#define PageSkip(page) (0)
#define kern_addr_valid(addr) (1)
--- /usr/tmp/p_rdiff_a001U8/pgtable.h Wed Jul 14 22:32:46 1999
+++ include/asm-ppc/pgtable.h Wed Jul 14 16:39:46 1999
@@ -615,6 +615,11 @@
#define module_map vmalloc
#define module_unmap vfree
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
/* CONFIG_APUS */
/* For virtual address to physical address conversion */
extern void cache_clear(__u32 addr, int length);
--- /usr/tmp/p_rdiff_a001UB/pgtable.h Wed Jul 14 22:32:54 1999
+++ include/asm-sparc/pgtable.h Wed Jul 14 16:39:56 1999
@@ -576,6 +576,11 @@
#define module_unmap vfree
extern unsigned long *sparc_valid_addr_bitmap;
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
#define PageSkip(page) (test_bit(PG_skip, &(page)->flags))
#define kern_addr_valid(addr) (test_bit(__pa((unsigned long)(addr))>>20, sparc_valid_addr_bitmap))
--- /usr/tmp/p_rdiff_a001V6/pgtable.h Wed Jul 14 22:33:02 1999
+++ include/asm-sparc64/pgtable.h Wed Jul 14 16:40:07 1999
@@ -648,6 +648,11 @@
extern void module_unmap (void *addr);
extern unsigned long *sparc64_valid_addr_bitmap;
+#define pte_freeze_mm_range(mm, s, e) 0
+#define pte_unfreeze_mm_range(mm, s, e)
+#define pte_freeze_dirty(vma, page_table, pte, wr)
+#define pte_unfreeze_dirty(page_table, wr)
+
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
#define PageSkip(page) (test_bit(PG_skip, &(page)->flags))
#define kern_addr_valid(addr) (test_bit(__pa((unsigned long)(addr))>>22, sparc64_valid_addr_bitmap))
--- /usr/tmp/p_rdiff_a001VU/fork.c Wed Jul 14 22:33:11 1999
+++ kernel/fork.c Wed Jul 14 21:03:46 1999
@@ -231,9 +231,10 @@
static inline int dup_mmap(struct mm_struct * mm)
{
struct vm_area_struct * mpnt, *tmp, **pprev;
- int retval;
+ int retval, ret;
flush_cache_mm(current->mm);
+ ret = pte_freeze_mm_range(mm, 0, TASK_SIZE);
pprev = &mm->mmap;
for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) {
struct file *file;
@@ -284,7 +285,9 @@
build_mmap_avl(mm);
fail_nomem:
- flush_tlb_mm(current->mm);
+ if (ret == 0)
+ flush_tlb_mm(current->mm);
+ pte_unfreeze_mm_range(mm, 0, TASK_SIZE);
return retval;
}
--- /usr/tmp/p_rdiff_a001VB/filemap.c Wed Jul 14 22:33:22 1999
+++ mm/filemap.c Wed Jul 14 18:08:37 1999
@@ -1604,13 +1604,11 @@
int error = 0;
dir = pgd_offset(vma->vm_mm, address);
- flush_cache_range(vma->vm_mm, end - size, end);
while (address < end) {
error |= filemap_sync_pmd_range(dir, address, end - address, vma, flags);
address = (address + PGDIR_SIZE) & PGDIR_MASK;
dir++;
}
- flush_tlb_range(vma->vm_mm, end - size, end);
return error;
}
@@ -1688,7 +1686,10 @@
unsigned long start, unsigned long end, int flags)
{
if (vma->vm_file && vma->vm_ops && vma->vm_ops->sync) {
- int error;
+ int error, ret;
+
+ flush_cache_range(vma->vm_mm, start, end);
+ ret = pte_freeze_mm_range(vma->vm_mm, start, end);
error = vma->vm_ops->sync(vma, start, end-start, flags);
if (!error && (flags & MS_SYNC)) {
struct file * file = vma->vm_file;
@@ -1697,6 +1698,9 @@
error = file_fsync(file, dentry);
}
}
+ if (ret == 0)
+ flush_tlb_range(vma->vm_mm, start, end);
+ pte_unfreeze_mm_range(vma->vm_mm, start, end);
return error;
}
return 0;
--- /usr/tmp/p_rdiff_a001Uj/mmap.c Wed Jul 14 22:33:30 1999
+++ mm/mmap.c Wed Jul 14 21:21:23 1999
@@ -671,6 +671,7 @@
*/
while ((mpnt = free) != NULL) {
unsigned long st, end, size;
+ int ret;
free = free->vm_next;
@@ -679,6 +680,8 @@
end = end > mpnt->vm_end ? mpnt->vm_end : end;
size = end - st;
+ flush_cache_range(mm, st, end);
+ ret = pte_freeze_mm_range(mm, st, end);
lock_kernel();
if (mpnt->vm_ops && mpnt->vm_ops->unmap)
mpnt->vm_ops->unmap(mpnt, st, size);
@@ -687,9 +690,10 @@
remove_shared_vm_struct(mpnt);
mm->map_count--;
- flush_cache_range(mm, st, end);
zap_page_range(mm, st, size);
- flush_tlb_range(mm, st, end);
+ if (ret == 0)
+ flush_tlb_range(mm, st, end);
+ pte_unfreeze_mm_range(mm, st, end);
/*
* Fix the mapping, and free the old area if it wasn't reused.
--- /usr/tmp/p_rdiff_a001T3/mprotect.c Wed Jul 14 22:33:37 1999
+++ mm/mprotect.c Wed Jul 14 18:02:45 1999
@@ -67,15 +67,19 @@
{
pgd_t *dir;
unsigned long beg = start;
+ int ret;
dir = pgd_offset(current->mm, start);
flush_cache_range(current->mm, beg, end);
+ ret = pte_freeze_mm_range(current->mm, beg, end);
while (start < end) {
change_pmd_range(dir, start, end - start, newprot);
start = (start + PGDIR_SIZE) & PGDIR_MASK;
dir++;
}
- flush_tlb_range(current->mm, beg, end);
+ if (ret == 0)
+ flush_tlb_range(current->mm, beg, end);
+ pte_unfreeze_mm_range(current->mm, beg, end);
return;
}
--- /usr/tmp/p_rdiff_a001W8/mremap.c Wed Jul 14 22:33:44 1999
+++ mm/mremap.c Wed Jul 14 18:26:01 1999
@@ -91,9 +91,10 @@
unsigned long new_addr, unsigned long old_addr, unsigned long len)
{
unsigned long offset = len;
+ int ret;
flush_cache_range(mm, old_addr, old_addr + len);
- flush_tlb_range(mm, old_addr, old_addr + len);
+ ret = pte_freeze_mm_range(mm, old_addr, old_addr + len);
/*
* This is not the clever way to do this, but we're taking the
@@ -105,6 +106,9 @@
if (move_one_page(mm, old_addr + offset, new_addr + offset))
goto oops_we_failed;
}
+ if (ret == 0)
+ flush_tlb_range(mm, old_addr, old_addr + len);
+ pte_unfreeze_mm_range(mm, old_addr, old_addr + len);
return 0;
/*
@@ -115,10 +119,10 @@
* the old page tables)
*/
oops_we_failed:
+ pte_unfreeze_mm_range(mm, old_addr, old_addr + len);
flush_cache_range(mm, new_addr, new_addr + len);
while ((offset += PAGE_SIZE) < len)
move_one_page(mm, new_addr + offset, old_addr + offset);
- zap_page_range(mm, new_addr, new_addr + len);
flush_tlb_range(mm, new_addr, new_addr + len);
return -1;
}
--- /usr/tmp/p_rdiff_a001TW/vmscan.c Wed Jul 14 22:33:51 1999
+++ mm/vmscan.c Wed Jul 14 16:02:44 1999
@@ -38,6 +38,7 @@
unsigned long entry;
unsigned long page_addr;
struct page * page;
+ int writable;
pte = *page_table;
if (!pte_present(pte))
@@ -49,8 +50,9 @@
page = mem_map + MAP_NR(page_addr);
spin_lock(&tsk->mm->page_table_lock);
if (pte_val(pte) != pte_val(*page_table))
- goto out_failed_unlock;
+ goto out_failed_unlock_nofreeze;
+ pte_freeze_dirty(vma, page_table, pte, writable);
/*
* Dont be too eager to get aging right if
* memory is dangerously low.
@@ -155,7 +157,7 @@
*/
entry = get_swap_page();
if (!entry)
- goto out_failed; /* No swap space left */
+ goto out_failed_unlock; /* No swap space left */
vma->vm_mm->rss--;
tsk->nswap++;
@@ -175,6 +177,8 @@
__free_page(page);
return 1;
out_failed_unlock:
+ pte_unfreeze_dirty(page_table, writable);
+out_failed_unlock_nofreeze:
spin_unlock(&tsk->mm->page_table_lock);
out_failed:
return 0;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [RFC] [PATCH]kanoj-mm15-2.3.10 Fix ia32 SMP/clone pte races
1999-07-16 0:24 [RFC] [PATCH]kanoj-mm15-2.3.10 Fix ia32 SMP/clone pte races Kanoj Sarcar
@ 1999-07-16 1:09 ` Linus Torvalds
1999-07-16 18:20 ` Kanoj Sarcar
0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 1999-07-16 1:09 UTC (permalink / raw)
To: Kanoj Sarcar; +Cc: linux-mm, linux-kernel
On Thu, 15 Jul 1999, Kanoj Sarcar wrote:
>
> Note that an alternate solution to the ia32 SMP pte race is to change
> PAGE_SHARED in include/asm-i386/pgtable.h to not drop in _PAGE_RW.
That is imho preferable to the "freeze_range" thing.
However, the _most_ preferable solution is probably just to update the
page tables with locked read-modify-write operations. Not fun, but not
horrible either. We'll have to change some of the interfaces, but it's
probably not too bad.
Note that for "unmap()" and for a lot of the special cases, I don't care
about the race at all. If some thread writes to the mapping at the same
time as it is being unmapped, tough luck. If we lose the dirty bit it's
not our problem: a program that races on unmap gets what it deserves, I
don't think there is any valid use of that race.
It's not a security issue, it's more an issue of what to do in situations
that can not happen with well-behaved applications anyway. My opinion is
that if we screw badly behaved programs, that is not a problem (the same
way that anything that passes in a bad pointer to a system call is
immediately undefined behaviour: we return EFAULT just to be polite, that
does NOT imply that we actually do anything sane).
So we do need to handle some of the cases, but others might as well just
be left racy.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] [PATCH]kanoj-mm15-2.3.10 Fix ia32 SMP/clone pte races
1999-07-16 1:09 ` Linus Torvalds
@ 1999-07-16 18:20 ` Kanoj Sarcar
0 siblings, 0 replies; 3+ messages in thread
From: Kanoj Sarcar @ 1999-07-16 18:20 UTC (permalink / raw)
To: Linus Torvalds, alan; +Cc: linux-mm, linux-kernel
>
>
> On Thu, 15 Jul 1999, Kanoj Sarcar wrote:
> >
> > Note that an alternate solution to the ia32 SMP pte race is to change
> > PAGE_SHARED in include/asm-i386/pgtable.h to not drop in _PAGE_RW.
>
> That is imho preferable to the "freeze_range" thing.
While changing the PAGE_SHARED value is a little more cleaner (and
probably what a lot of other os's do), it does penalize all programs
when their first write to a page comes later than their first access.
I was trying to not incur this, but let me know if you would like
to see a patch on this line.
The freeze_range solution does not penalize most programs, assuming
most are non shared address spaces. Its a judgement call whether
clone programs are slowed too much while doing the freeze_range,
compared to the slowdown seen by all programs with the PAGE_SHARED
scheme.
The freeze_range solution has the added advantage that in some cases,
it prevents updates to the page. See my comments down below too.
>
> However, the _most_ preferable solution is probably just to update the
> page tables with locked read-modify-write operations. Not fun, but not
> horrible either. We'll have to change some of the interfaces, but it's
> probably not too bad.
>
I thought about this a bit, and was concerned at the amount of changes
involved. A lot of code reads *pte, then uses it to make decisions.
You would have to change all those places to
1. atomically turn off the RW bit.
2. flush tlb
3. read the pte, which is then unchangeable.
The problem of course is that when you are working on a set of
contiguous ptes, you are incurring the tlb flush for every pte ...
specially bad in a shared address space environment. The freeze_range
solution freezes all the pte's first, then incurs one tlbflush. Also
note that with the freeze_range fix, you are not doing any more
flushes than what happens today.
For an example of how freeze_range can cut down number of tlb
flushes, look at how many calls can be saved in filemap_sync_pte,
while looping over the ptes, if all platforms implemented freeze_range.
(It can be done just as an ia32 specific optimization too, if the
ia32 has a freeze_range).
> Note that for "unmap()" and for a lot of the special cases, I don't care
> about the race at all. If some thread writes to the mapping at the same
> time as it is being unmapped, tough luck. If we lose the dirty bit it's
> not our problem: a program that races on unmap gets what it deserves, I
> don't think there is any valid use of that race.
I differ ... loosing the dirty bit seems to always make the os the
culprit. Its one thing to say that while unmapping, racy writes from a
clone are not synced to disk (some apps will argue with that too,
seeing this is not POSIX behavior, but their writes are not lost), its
quite another to loose the dirty bit and hence the writes to the page
completely.
Specially if we have a non performance impact solution which seems to
take care of this issue, why not just be safe and do it ...
Let me know if you want me to create the PAGE_SHARED patch. Or if I
should try to come up with a different scheme ... Seems to me that
data corruption bugs like this do need to be fixed in 2.2 (and 2.3)
soonest.
Thanks.
Kanoj
>
> It's not a security issue, it's more an issue of what to do in situations
> that can not happen with well-behaved applications anyway. My opinion is
> that if we screw badly behaved programs, that is not a problem (the same
> way that anything that passes in a bad pointer to a system call is
> immediately undefined behaviour: we return EFAULT just to be polite, that
> does NOT imply that we actually do anything sane).
>
> So we do need to handle some of the cases, but others might as well just
> be left racy.
>
> Linus
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://humbolt.geo.uu.nl/Linux-MM/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~1999-07-16 18:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-07-16 0:24 [RFC] [PATCH]kanoj-mm15-2.3.10 Fix ia32 SMP/clone pte races Kanoj Sarcar
1999-07-16 1:09 ` Linus Torvalds
1999-07-16 18:20 ` Kanoj Sarcar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox