linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@novell.com>,
	Hugh Dickins <hugh@veritas.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org
Subject: Re: [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix]
Date: Wed, 11 Mar 2009 14:28:08 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.0903111417230.32478@localhost.localdomain> (raw)
In-Reply-To: <20090311205529.GR27823@random.random>



On Wed, 11 Mar 2009, Andrea Arcangeli wrote:

> On Wed, Mar 11, 2009 at 01:33:17PM -0700, Linus Torvalds wrote:
> > Btw, if we don't do that, then there are better alternatives. One is:
> > 
> >  - fork already always takes the write lock on mmap_sem (and f*ck no, I 
> >    doubt anybody will ever care one whit how "parallel" you can do forks 
> >    from threads, so I don't think this is an issue)
> > 
> >  - Just make the rule be that people who use get_user_pages() always 
> >    have to have the read-lock on mmap_sem until they've used the pages.
> 
> How do you handle pages where gup already returned and I/O still in
> flight?

The rule is:
 - either keep the mmap_sem for reading until the IO is done
 - admit the fact that IO is asynchronous, and has visible async behavior.

> Forcing gup-fast to be called with mmap_sem already hold (like
> gup used to require) only avoids the need of changes in gup-fast
> AFAICT. You'll still get pages that are pinned and calling gup-fast
> under mmap_sem (no matter if read or even write mode) won't make a
> difference, still those pages will be pinned while fork runs and with
> dma going to them (by O_DIRECT or some driver using gup, as long as
> PageReserved isn't set on them).

The point I'm trying to make is that anybody who thinks that pages are 
stable over various behavior that runs in another thread - be it a fork, a 
mmap/munmap, or anything else, is just fooling themselves. The pages are 
going to show up in "random" places. 

The fact that the non-fast "get_user_pages()" takes the mmap semaphore for 
reading doesn't even protect that. It just means that the pages made sense 
at the time the get_user_pages() happened, not necessarily at the time 
when the actual use of them did. 

> Releasing the mmap_sem read mode in the irq-completion handler context
> should be possible, however fork will end up throttled blocking for
> I/O which isn't very nice behavior. BTW, direct-io.c is a total mess,
> I couldn't even figure out where to release those locks in the I/O
> completion handlers when I tried something like this with PG_lock
> instead of the mmap_sem...  Eventually I gave it up because this isn't
> just about O_DIRECT but all gup users have this trouble with fork.

O_DIRECT is actually the _simple_ case, since we won't be returning until 
it is done (ie it's not actually a async interface). So no, O_DIRECT 
doesn't need any interrupt handler games. It would just need to hold the 
sem over the actual call to the filesystem (ie just over the ->direct_IO() 
call).

Of course, I suspect that all users of O_DIRECT would be _very_ unhappy if 
they cannot do mmap/unmap/brk on other areas while O_DIRECT is going on, 
so it's almost certainly not reasonable.

People want the relaxed synchronization we give them, and that's literally 
why get_user_pages_fast exists - because people don't want _more_ 
synchronization, they want _less_.

But the thing is, with less synchronization, the behavior really is 
surprising in the edge cases. Which is why I think "threaded fork" plus 
"get_user_pages_fast" just doesn't make sense to even _worry_ about. If 
you use O_DIRECT and mix it with fork, you get what you get, and it's 
random - exactly because people who want O_DIRECT don't want any locking. 

It's a user-space issue, not a kernel issue.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-03-11 21:30 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090311170611.GA2079@elte.hu>
2009-03-11 17:33 ` Linus Torvalds
2009-03-11 17:41   ` Ingo Molnar
2009-03-11 17:58     ` Linus Torvalds
2009-03-11 18:37       ` Andrea Arcangeli
2009-03-11 18:46         ` Linus Torvalds
2009-03-11 19:01           ` Linus Torvalds
2009-03-11 19:59             ` Andrea Arcangeli
2009-03-11 20:19               ` Linus Torvalds
2009-03-11 20:33                 ` Linus Torvalds
2009-03-11 20:55                   ` Andrea Arcangeli
2009-03-11 21:28                     ` Linus Torvalds [this message]
2009-03-11 21:57                       ` Andrea Arcangeli
2009-03-11 22:06                         ` Linus Torvalds
2009-03-11 22:07                           ` Linus Torvalds
2009-03-11 22:22                           ` Davide Libenzi
2009-03-11 22:32                             ` Linus Torvalds
2009-03-14  5:07                   ` Benjamin Herrenschmidt
2009-03-11 20:48                 ` Andrea Arcangeli
2009-03-14  5:06                 ` Benjamin Herrenschmidt
2009-03-14  5:20                   ` Nick Piggin
2009-03-16 16:01                     ` KOSAKI Motohiro
2009-03-16 16:23                       ` Nick Piggin
2009-03-16 16:32                         ` Linus Torvalds
2009-03-16 16:50                           ` Nick Piggin
2009-03-16 17:02                             ` Linus Torvalds
2009-03-16 17:19                               ` Nick Piggin
2009-03-16 17:42                                 ` Linus Torvalds
2009-03-16 18:02                                   ` Nick Piggin
2009-03-16 18:05                                     ` Nick Piggin
2009-03-16 18:17                                       ` Linus Torvalds
2009-03-16 18:33                                         ` Nick Piggin
2009-03-16 19:22                                           ` Linus Torvalds
2009-03-17  5:44                                             ` Nick Piggin
2009-03-16 18:14                                     ` Linus Torvalds
2009-03-16 18:29                                       ` Nick Piggin
2009-03-16 19:17                                         ` Linus Torvalds
2009-03-17  5:42                                           ` Nick Piggin
2009-03-17  5:58                                             ` Nick Piggin
2009-03-16 18:37                                       ` Andrea Arcangeli
2009-03-16 18:28                                   ` Andrea Arcangeli
2009-03-16 23:59                             ` KAMEZAWA Hiroyuki
2009-03-18  2:04                         ` KOSAKI Motohiro
2009-03-22 12:23                           ` KOSAKI Motohiro
2009-03-23  0:13                             ` KOSAKI Motohiro
2009-03-23 16:29                               ` Ingo Molnar
2009-03-23 16:46                                 ` Linus Torvalds
2009-03-24  5:08                                   ` KOSAKI Motohiro
2009-03-24 13:43                             ` Nick Piggin
2009-03-24 17:56                               ` Linus Torvalds
2009-03-30 10:52                               ` KOSAKI Motohiro
     [not found]                                 ` <200904022307.12043.nickpiggin@yahoo.com.au>
2009-04-03  3:49                                   ` Nick Piggin
2009-03-17  0:44                       ` Linus Torvalds
2009-03-17  0:56                         ` KAMEZAWA Hiroyuki
2009-03-17 12:19                         ` Andrea Arcangeli
2009-03-17 16:43                           ` Linus Torvalds
2009-03-17 17:01                             ` Linus Torvalds
2009-03-17 17:10                               ` Andrea Arcangeli
2009-03-17 17:43                                 ` Linus Torvalds
2009-03-17 18:09                                   ` Linus Torvalds
2009-03-17 18:19                                     ` Linus Torvalds
2009-03-17 18:46                                       ` Andrea Arcangeli
2009-03-17 19:03                                         ` Linus Torvalds
2009-03-17 19:35                                           ` Andrea Arcangeli
2009-03-17 19:55                                             ` Linus Torvalds
2009-03-11 19:06           ` Andrea Arcangeli
2009-03-12  5:36           ` Nick Piggin
2009-03-12 16:23             ` Nick Piggin
2009-03-12 17:00               ` Andrea Arcangeli
2009-03-12 17:20                 ` Nick Piggin
2009-03-12 17:23                   ` Nick Piggin
2009-03-12 18:06                   ` Andrea Arcangeli
2009-03-12 18:58                     ` Andrea Arcangeli
2009-03-13 16:09                     ` Nick Piggin
2009-03-13 19:34                       ` Andrea Arcangeli
2009-03-14  4:59                         ` Nick Piggin
2009-03-16 13:56                           ` Andrea Arcangeli
2009-03-16 16:01                             ` Nick Piggin
2009-03-14  4:46                       ` Nick Piggin
2009-03-14  5:06                         ` Nick Piggin
2009-03-11 18:53     ` Andrea Arcangeli
2009-03-11 18:22   ` Andrea Arcangeli
2009-03-11 19:06     ` Ingo Molnar
2009-03-11 19:15       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0903111417230.32478@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=aarcange@redhat.com \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@novell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox