From: Andrew Morton <akpm@osdl.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Nick Piggin <npiggin@suse.de>,
Linux Memory Management <linux-mm@kvack.org>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [patch 2/5] mm: fault vs invalidate/truncate race fix
Date: Tue, 10 Oct 2006 23:10:36 -0700 [thread overview]
Message-ID: <20061010231036.66f609ea.akpm@osdl.org> (raw)
In-Reply-To: <452C8613.7080708@yahoo.com.au>
On Wed, 11 Oct 2006 15:50:11 +1000
Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Andrew Morton wrote:
>
> >On Tue, 10 Oct 2006 16:21:49 +0200 (CEST)
> >Nick Piggin <npiggin@suse.de> wrote:
> >
> >
> >>--- linux-2.6.orig/mm/filemap.c
> >>+++ linux-2.6/mm/filemap.c
> >>@@ -1392,9 +1392,10 @@ struct page *filemap_nopage(struct vm_ar
> >> unsigned long size, pgoff;
> >> int did_readaround = 0, majmin = VM_FAULT_MINOR;
> >>
> >>+ BUG_ON(!(area->vm_flags & VM_CAN_INVALIDATE));
> >>+
> >> pgoff = ((address-area->vm_start) >> PAGE_CACHE_SHIFT) + area->vm_pgoff;
> >>
> >>-retry_all:
> >> size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> >> if (pgoff >= size)
> >> goto outside_data_content;
> >>@@ -1416,7 +1417,7 @@ retry_all:
> >> * Do we have something in the page cache already?
> >> */
> >> retry_find:
> >>- page = find_get_page(mapping, pgoff);
> >>+ page = find_lock_page(mapping, pgoff);
> >>
> >
> >Here's a little problem. Locking the page in the pagefault handler takes
> >our deadlock while writing from a mmapped copy of the page into the same
> >page from "extremely hard to hit" to "super-easy to hit". Try running
> >write-deadlock-demo.c from
> >http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
> >
> >It conveniently deadlocks while holding mmap_sem, so `ps' get stuck too.
> >
> >So this whole idea of locking the page in the fault handler is off the
> >table until we fix that deadlock for real.
> >
>
> OK. Can it sit in -mm for now, though?
argh. It took me two goes to unpickle all the bits and pieces (please
patch things like cachefiles separately, unless you want your stuff to be
merged after that stuff) and now I've gone and deleted it all.
Maybe later? We do have that infinite-loop-on-EIO to look at as well.
> Or is this deadlock less theoretical
> than it sounds?
I _think_ people have hit it in the wild, due to memory pressure.
But no, it's a silly thing which will only hit when people are running
silly tests under silly amounts of load.
Or if they're trying to kill your computer...
> At any rate, thanks for catching this.
>
> > Coincidentally I started coding
> >a fix for that a couple of weeks ago, but spend too much time with my nose
> >in other people's crap to get around to writing my own crap.
> >
> >The basic idea is
> >
> >- revert the recent changes to the core write() code (the ones which
> > killed writev() performance, especially on NFS overwrites).
> >
> >- clean some stuff up
> >
> >- modify the core of write() so that instead of doing copy_from_user(),
> > we do inc_preempt_count();copy_from_user_inatomic(). So we never enter
> > the pagefault handler while holding the lock on the pagecache page.
> >
> > If the fault happens, we run commit_write() on however much stuff we
> > managed to copy and then go back and try to fault the target page back in
> > again. Repeat for ten times then give up.
> >
>
> Without looking at any code, perhaps we could instead run get_user_pages
> and copy the memory that way.
That would certainly work, but we've always shied away from doing that
because of the performance implications.
> We'd still want to do try the initial copy_from_user, because the TLB is
> quite likely to exist or at least the pte will exist so the low level TLB
> refill can reach it - so we don't want to walk the pagetables manually if
> we can help it.
Yeah, that's an alternative to the fault-it-in-ten-times-then-give-up
approach.
> At that point, if we end up doing the get_user_pages thing, do we even need
> to do the intermediate commit_write()?
Yes, we will. get_user_pages() will run the pagefault handler, which will
lock the page, so we're back to square one.
> Or just do the whole copy (the
> partial
> copied data is going to be in cache on physically indexed caches anyway, so
> it will be very low cost to copy again). And it should be a reasonably
> unlikely path... but I'll instrument it.
I'm not sure what you're suggesting here.
> > It gets tricky because it means that we'll need to go back to zeroing
> > out the uncopied part of the pagecache page before
> > commit_write+unlock_page(). This will resurrect the recently-fixed
> > problem where userspace can fleetingly see a bunch of zeroes in pagecache
> > where it expected to see either the old data or the new data.
> >
> > But I don't think that problem was terribly serious, and we can improve
> > the situation quite a lot by not doing that zeroing if the page is
> > already up-to-date.
> >
> >Anyway, if you're feeling up to it I'll document the patches I have and hand
> >them over - they're not making much progress here.
> >
>
> Yeah I'll have a go.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-10-11 6:10 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-10 14:21 [rfc] 2.6.19-rc1-git5: consolidation of file backed fault handlers Nick Piggin
2006-10-10 14:21 ` [patch 1/5] mm: fault vs invalidate/truncate check Nick Piggin
2006-10-10 14:21 ` [patch 2/5] mm: fault vs invalidate/truncate race fix Nick Piggin
2006-10-11 4:38 ` Andrew Morton
2006-10-11 5:39 ` Nick Piggin
2006-10-11 6:00 ` Andrew Morton
2006-10-11 9:21 ` Nick Piggin
2006-10-11 16:21 ` Linus Torvalds
2006-10-11 16:57 ` SPAM: " Nick Piggin
2006-10-11 17:11 ` Linus Torvalds
2006-10-11 17:21 ` SPAM: " Nick Piggin
2006-10-11 17:38 ` Linus Torvalds
2006-10-12 3:33 ` Nick Piggin
2006-10-12 15:37 ` Linus Torvalds
2006-10-12 15:40 ` Nick Piggin
2006-10-11 5:13 ` Andrew Morton
2006-10-11 5:50 ` Nick Piggin
2006-10-11 6:10 ` Andrew Morton [this message]
2006-10-11 6:17 ` [patch 1/6] revert "generic_file_buffered_write(): handle zero length iovec segments" Andrew Morton, Andrew Morton
[not found] ` <20061010231150.fb9e30f5.akpm@osdl.org>
2006-10-11 6:17 ` [patch 2/6] revert "generic_file_buffered_write(): deadlock on vectored write" Andrew Morton, Andrew Morton
[not found] ` <20061010231243.bc8b834c.akpm@osdl.org>
2006-10-11 6:17 ` [patch 3/6] generic_file_buffered_write() cleanup Andrew Morton, Andrew Morton
[not found] ` <20061010231339.a79c1fae.akpm@osdl.org>
2006-10-11 6:18 ` [patch 4/6] generic_file_buffered_write(): fix page prefaulting Andrew Morton, Andrew Morton
[not found] ` <20061010231424.db88931f.akpm@osdl.org>
2006-10-11 6:18 ` [patch 5/6] generic_file_buffered_write(): max_len cleanup Andrew Morton, Andrew Morton
[not found] ` <20061010231514.c1da7355.akpm@osdl.org>
2006-10-11 6:18 ` [patch 6/6] fix pagecache write deadlocks Andrew Morton, Andrew Morton
2006-10-21 1:53 ` [patch 2/5] mm: fault vs invalidate/truncate race fix Benjamin Herrenschmidt
2006-10-10 14:22 ` [patch 3/5] mm: fault handler to replace nopage and populate Nick Piggin
2006-10-10 14:22 ` [patch 4/5] mm: add vm_insert_pfn helpler Nick Piggin
2006-10-10 14:22 ` [patch 5/5] mm: merge nopfn with fault handler Nick Piggin
2006-10-10 14:26 ` [rfc] 2.6.19-rc1-git5: consolidation of file backed fault handlers Nick Piggin
2006-10-10 14:33 ` Christoph Hellwig
2006-10-10 15:01 ` Nick Piggin
2006-10-10 16:09 ` Arjan van de Ven
2006-10-11 0:46 ` SPAM: " Nick Piggin
2006-10-10 15:07 ` Arjan van de Ven
-- strict thread matches above, loose matches on Subject: below --
2006-10-09 16:12 Nick Piggin
2006-10-09 16:12 ` [patch 2/5] mm: fault vs invalidate/truncate race fix Nick Piggin
2006-10-09 21:10 ` Mark Fasheh
2006-10-10 1:10 ` Nick Piggin
2006-10-11 18:34 ` Mark Fasheh
2006-10-12 3:28 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061010231036.66f609ea.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox