KAMEZAWA Hiroyuki wrote: > On Thu, 18 Dec 2008 16:29:52 +0100 > Andrea Arcangeli wrote: > >> On Wed, Nov 19, 2008 at 05:58:19PM +0100, Andrea Arcangeli wrote: >>> On Wed, Nov 19, 2008 at 03:25:59PM +1100, Nick Piggin wrote: >>>> The solution either involves synchronising forks and get_user_pages, >>>> or probably better, to do copy on fork rather than COW in the case >>>> that we detect a page is subject to get_user_pages. The trick is in >>>> the details :) > >> From: Andrea Arcangeli >> Subject: fork-o_direct-race >> >> Think a thread writing constantly to the last 512bytes of a page, while another >> thread read and writes to/from the first 512bytes of the page. We can lose >> O_DIRECT reads, the very moment we mark any pte wrprotected because a third >> unrelated thread forks off a child. >> >> This fixes it by never wprotecting anon ptes if there can be any direct I/O in >> flight to the page, and by instantiating a readonly pte and triggering a COW in >> the child. The only trouble here are O_DIRECT reads (writes to memory, read >> from disk). Checking the page_count under the PT lock guarantees no >> get_user_pages could be running under us because if somebody wants to write to >> the page, it has to break any cow first and that requires taking the PT lock in >> follow_page before increasing the page count. >> >> The COW triggered inside fork will run while the parent pte is read-write, this >> is not usual but that's ok as it's only a page copy and it doesn't modify the >> page contents. >> >> In the long term there should be a smp_wmb() in between page_cache_get and >> SetPageSwapCache in __add_to_swap_cache and a smp_rmb in between the >> PageSwapCache and the page_count() to remove the trylock op. >> >> Fixed version of original patch from Nick Piggin. >> >> Signed-off-by: Andrea Arcangeli > > Confirmed this fixes the problem. > We tested with RHEL 5.2 + patch on i386 using the test program provided by Tim LaBerge, though the program can pass but sometimes hanged. strace log is attached, and we'll test it again with LOCKDEP enabled to see if we can get some other information. BTW, the patch works fine on IA64. > Hmm, but, fork() gets slower.