From mboxrd@z Thu Jan 1 00:00:00 1970 From: kanoj@google.engr.sgi.com (Kanoj Sarcar) Message-Id: <199909072027.NAA23624@google.engr.sgi.com> Subject: [PATCH] [RFT] kanoj-mm16-2.3.16 rawio fixes Date: Tue, 7 Sep 1999 13:27:51 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: torvalds@transmeta.com, sct@redhat.com, Kanoj Sarcar Cc: linux-mm@kvack.org List-ID: Linus/Stephen, Here's a couple of fixes to the vm part of rawio. Let me prepend a little explanation as to why each of this is needed. 1. map_user_kiobuf must enforce a security check that the caller is not trying to do raw-io into read-only memory (else it could mmap a readonly file, then destroy the file contents by doing rawio into it). The other changes to map_user_kiobuf are mostly picked from the fault handling code before it invokes handle_mm_fault. 2. unmap_kiobuf must unlock the page before freeing it (else, it will catch the assert in __free_page). Also, unmap_kiobuf must release any resources (like swap handles etc) associated with the page, instead of just freeing the page. Specially important in a clone'd environment, where one clone starts a rawio, and another just unmaps the target vma, this will prevent resource leakage. 3. copy_page_range must break c-o-w for pages into which rawio is in progress. To understand why, imagine clone C1 doing the rawio into page P, whereas clone C2 is doing a fork. If C2 ends up marking P c-o-w, then turns around and writes to the page, C1/C2 will get a new page, thus loosing the contents of the rawio. This is something that the app can not guard against, given that C2 might be writing to the top half of the page, whereas C1 might be doing rawio into the bottom half of the page. NOTE: the copy_page_range fix is *not perfect*, since the PageLocking check might trigger for file pages too (think which is the bigger evil and pick the other one). The problem is, there is no way to figure out whether a rawio is in progress in the current implementation. Instead of using the PG_locked bit to mark pages on which rawio is in flight, having a raw_count field in the page structure is a better way to keep track of this information, but even in that case, a 100% perfect fix to copy_page_range is not possible. Although, having a raw_count would mean multiple rawio's could be in flight on the same page (which I imagine is not a rare thing to be wanted by database apps), instead of the map_user_kiobuf code doing complicated repeat loops. Linus, I could look into this, if you are not inherently opposed to the idea of a raw_count field in the page structure. Let me know. I will be out of email access for the next 7 days, so all responses will be answered after that. Thanks. Kanoj --- /usr/tmp/p_rdiff_a00CUW/memory.c Tue Sep 7 12:49:38 1999 +++ linux/mm/memory.c Tue Sep 7 11:45:10 1999 @@ -66,6 +66,17 @@ copy_bigpage(to, from); } +static inline void break_cow(struct vm_area_struct * vma, unsigned long + old_page, unsigned long new_page, unsigned long address, + pte_t *page_table) +{ + copy_cow_page(old_page,new_page); + flush_page_to_ram(new_page); + flush_cache_page(vma, address); + set_pte(page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot)))); + flush_tlb_page(vma, address); +} + mem_map_t * mem_map = NULL; /* @@ -239,6 +250,20 @@ } /* If it's a COW mapping, write protect it both in the parent and the child */ if (cow) { + /* + * Raw io might be in progress on + * this page via a different clone. + * For dmaout, could still do cow, + * except there's no way to make out. + */ + if (PageLocked(mem_map + page_nr)) { + unsigned long new_page = + __get_free_page(GFP_BIGUSER); + if (!new_page) + goto nomem; + break_cow(vma, pte_page(pte), new_page, address, dst_pte); + goto cont_copy_pte_range; + } pte = pte_wrprotect(pte); set_pte(src_pte, pte); } @@ -446,6 +471,7 @@ int doublepage = 0; int repeat = 0; int i; + int datain = (rw == READ); /* Make sure the iobuf is not already mapped somewhere. */ if (iobuf->nr_pages) @@ -478,8 +504,19 @@ vma = find_vma(current->mm, ptr); if (!vma) goto out_unlock; + if (vma->vm_start > ptr) { + if (!(vma->vm_flags & VM_GROWSDOWN)) + goto out_unlock; + if (expand_stack(vma, ptr)) + goto out_unlock; + } + if (((datain) && (!(vma->vm_flags & VM_WRITE))) || + (!(vma->vm_flags & VM_READ))) { + err = -EACCES; + goto out_unlock; + } } - if (handle_mm_fault(current, vma, ptr, (rw==READ)) <= 0) + if (handle_mm_fault(current, vma, ptr, datain) <= 0) goto out_unlock; spin_lock(&mm->page_table_lock); page = follow_page(ptr); @@ -566,8 +603,8 @@ map = iobuf->maplist[i]; if (map && iobuf->locked) { - __free_page(map); UnlockPage(map); + free_page_and_swap_cache(page_address(map)); } } @@ -824,12 +861,7 @@ if (pte_val(*page_table) == pte_val(pte)) { if (PageReserved(page)) ++vma->vm_mm->rss; - copy_cow_page(old_page,new_page); - flush_page_to_ram(new_page); - flush_cache_page(vma, address); - set_pte(page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot)))); - flush_tlb_page(vma, address); - + break_cow(vma, old_page, new_page, address, page_table); /* Free the old page.. */ new_page = old_page; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://humbolt.geo.uu.nl/Linux-MM/