From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@novell.com>,
Hugh Dickins <hugh@veritas.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org
Subject: Re: [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix]
Date: Wed, 11 Mar 2009 14:28:08 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0903111417230.32478@localhost.localdomain> (raw)
In-Reply-To: <20090311205529.GR27823@random.random>
On Wed, 11 Mar 2009, Andrea Arcangeli wrote:
> On Wed, Mar 11, 2009 at 01:33:17PM -0700, Linus Torvalds wrote:
> > Btw, if we don't do that, then there are better alternatives. One is:
> >
> > - fork already always takes the write lock on mmap_sem (and f*ck no, I
> > doubt anybody will ever care one whit how "parallel" you can do forks
> > from threads, so I don't think this is an issue)
> >
> > - Just make the rule be that people who use get_user_pages() always
> > have to have the read-lock on mmap_sem until they've used the pages.
>
> How do you handle pages where gup already returned and I/O still in
> flight?
The rule is:
- either keep the mmap_sem for reading until the IO is done
- admit the fact that IO is asynchronous, and has visible async behavior.
> Forcing gup-fast to be called with mmap_sem already hold (like
> gup used to require) only avoids the need of changes in gup-fast
> AFAICT. You'll still get pages that are pinned and calling gup-fast
> under mmap_sem (no matter if read or even write mode) won't make a
> difference, still those pages will be pinned while fork runs and with
> dma going to them (by O_DIRECT or some driver using gup, as long as
> PageReserved isn't set on them).
The point I'm trying to make is that anybody who thinks that pages are
stable over various behavior that runs in another thread - be it a fork, a
mmap/munmap, or anything else, is just fooling themselves. The pages are
going to show up in "random" places.
The fact that the non-fast "get_user_pages()" takes the mmap semaphore for
reading doesn't even protect that. It just means that the pages made sense
at the time the get_user_pages() happened, not necessarily at the time
when the actual use of them did.
> Releasing the mmap_sem read mode in the irq-completion handler context
> should be possible, however fork will end up throttled blocking for
> I/O which isn't very nice behavior. BTW, direct-io.c is a total mess,
> I couldn't even figure out where to release those locks in the I/O
> completion handlers when I tried something like this with PG_lock
> instead of the mmap_sem... Eventually I gave it up because this isn't
> just about O_DIRECT but all gup users have this trouble with fork.
O_DIRECT is actually the _simple_ case, since we won't be returning until
it is done (ie it's not actually a async interface). So no, O_DIRECT
doesn't need any interrupt handler games. It would just need to hold the
sem over the actual call to the filesystem (ie just over the ->direct_IO()
call).
Of course, I suspect that all users of O_DIRECT would be _very_ unhappy if
they cannot do mmap/unmap/brk on other areas while O_DIRECT is going on,
so it's almost certainly not reasonable.
People want the relaxed synchronization we give them, and that's literally
why get_user_pages_fast exists - because people don't want _more_
synchronization, they want _less_.
But the thing is, with less synchronization, the behavior really is
surprising in the edge cases. Which is why I think "threaded fork" plus
"get_user_pages_fast" just doesn't make sense to even _worry_ about. If
you use O_DIRECT and mix it with fork, you get what you get, and it's
random - exactly because people who want O_DIRECT don't want any locking.
It's a user-space issue, not a kernel issue.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-11 21:30 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090311170611.GA2079@elte.hu>
2009-03-11 17:33 ` Linus Torvalds
2009-03-11 17:41 ` Ingo Molnar
2009-03-11 17:58 ` Linus Torvalds
2009-03-11 18:37 ` Andrea Arcangeli
2009-03-11 18:46 ` Linus Torvalds
2009-03-11 19:01 ` Linus Torvalds
2009-03-11 19:59 ` Andrea Arcangeli
2009-03-11 20:19 ` Linus Torvalds
2009-03-11 20:33 ` Linus Torvalds
2009-03-11 20:55 ` Andrea Arcangeli
2009-03-11 21:28 ` Linus Torvalds [this message]
2009-03-11 21:57 ` Andrea Arcangeli
2009-03-11 22:06 ` Linus Torvalds
2009-03-11 22:07 ` Linus Torvalds
2009-03-11 22:22 ` Davide Libenzi
2009-03-11 22:32 ` Linus Torvalds
2009-03-14 5:07 ` Benjamin Herrenschmidt
2009-03-11 20:48 ` Andrea Arcangeli
2009-03-14 5:06 ` Benjamin Herrenschmidt
2009-03-14 5:20 ` Nick Piggin
2009-03-16 16:01 ` KOSAKI Motohiro
2009-03-16 16:23 ` Nick Piggin
2009-03-16 16:32 ` Linus Torvalds
2009-03-16 16:50 ` Nick Piggin
2009-03-16 17:02 ` Linus Torvalds
2009-03-16 17:19 ` Nick Piggin
2009-03-16 17:42 ` Linus Torvalds
2009-03-16 18:02 ` Nick Piggin
2009-03-16 18:05 ` Nick Piggin
2009-03-16 18:17 ` Linus Torvalds
2009-03-16 18:33 ` Nick Piggin
2009-03-16 19:22 ` Linus Torvalds
2009-03-17 5:44 ` Nick Piggin
2009-03-16 18:14 ` Linus Torvalds
2009-03-16 18:29 ` Nick Piggin
2009-03-16 19:17 ` Linus Torvalds
2009-03-17 5:42 ` Nick Piggin
2009-03-17 5:58 ` Nick Piggin
2009-03-16 18:37 ` Andrea Arcangeli
2009-03-16 18:28 ` Andrea Arcangeli
2009-03-16 23:59 ` KAMEZAWA Hiroyuki
2009-03-18 2:04 ` KOSAKI Motohiro
2009-03-22 12:23 ` KOSAKI Motohiro
2009-03-23 0:13 ` KOSAKI Motohiro
2009-03-23 16:29 ` Ingo Molnar
2009-03-23 16:46 ` Linus Torvalds
2009-03-24 5:08 ` KOSAKI Motohiro
2009-03-24 13:43 ` Nick Piggin
2009-03-24 17:56 ` Linus Torvalds
2009-03-30 10:52 ` KOSAKI Motohiro
[not found] ` <200904022307.12043.nickpiggin@yahoo.com.au>
2009-04-03 3:49 ` Nick Piggin
2009-03-17 0:44 ` Linus Torvalds
2009-03-17 0:56 ` KAMEZAWA Hiroyuki
2009-03-17 12:19 ` Andrea Arcangeli
2009-03-17 16:43 ` Linus Torvalds
2009-03-17 17:01 ` Linus Torvalds
2009-03-17 17:10 ` Andrea Arcangeli
2009-03-17 17:43 ` Linus Torvalds
2009-03-17 18:09 ` Linus Torvalds
2009-03-17 18:19 ` Linus Torvalds
2009-03-17 18:46 ` Andrea Arcangeli
2009-03-17 19:03 ` Linus Torvalds
2009-03-17 19:35 ` Andrea Arcangeli
2009-03-17 19:55 ` Linus Torvalds
2009-03-11 19:06 ` Andrea Arcangeli
2009-03-12 5:36 ` Nick Piggin
2009-03-12 16:23 ` Nick Piggin
2009-03-12 17:00 ` Andrea Arcangeli
2009-03-12 17:20 ` Nick Piggin
2009-03-12 17:23 ` Nick Piggin
2009-03-12 18:06 ` Andrea Arcangeli
2009-03-12 18:58 ` Andrea Arcangeli
2009-03-13 16:09 ` Nick Piggin
2009-03-13 19:34 ` Andrea Arcangeli
2009-03-14 4:59 ` Nick Piggin
2009-03-16 13:56 ` Andrea Arcangeli
2009-03-16 16:01 ` Nick Piggin
2009-03-14 4:46 ` Nick Piggin
2009-03-14 5:06 ` Nick Piggin
2009-03-11 18:53 ` Andrea Arcangeli
2009-03-11 18:22 ` Andrea Arcangeli
2009-03-11 19:06 ` Ingo Molnar
2009-03-11 19:15 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0903111417230.32478@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=npiggin@novell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox