From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@novell.com>,
Hugh Dickins <hugh@veritas.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org
Subject: Re: [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix]
Date: Wed, 11 Mar 2009 13:19:03 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0903111306080.32478@localhost.localdomain> (raw)
In-Reply-To: <20090311195935.GO27823@random.random>
On Wed, 11 Mar 2009, Andrea Arcangeli wrote:
>
> Did you notice the check after 'mark it gup' that will run in CPU0?
Ahh, no. I just read the patch through fairly quickly, and the whole
"(gup_get_pte & mask) != mask" didn't trigger as obvious. But yeah, I see
that it ends up re-checking the RW bit.
> gup-fast will _not_ succeed because of the set_wr_protect that just
> happened on CPU1. That's why I added the above check after
> setpagegup/get_page.
Ok, with the recheck I think it's fine.
> > Also, having to set the PG_GUP bit means that the "fast" gup is likely not
> > much faster than the slow one. It now has two atomics per page it looks
> > up, afaik, which sounds like it would delete any advantage it had over the
> > slow version that needed locking.
>
> gup-fast has already to get_page, so I don't see it.
That's my point. It used to have one atomic. Now it has two (and a memory
barrier). Those tend to be pretty expensive - even when there's no
cacheline bouncing.
> Furthermore starting from the second access GUP is already
> set
That's a totally bogus argument. It will be true for _benchmarks_, but if
somebody is trying to avoid buffered IO, one very possible common case is
that it's all going to be new pages all the time.
That said, I don't know who the crazy O_DIRECT users are. It may be true
that some O_DIRECT users end up using the same pages over and over again,
and that this is a good optimization for them.
> > What we _could_ try to do is to always make the COW breaking be a
> > _directed_ event - we'd make sure that we always break COW in the
> > direction of the first owner (going to the rmap chains). That might solve
> > everything, and be purely local to the logic in mm/memory.c (do_wp_page).
>
> That's a really interesting idea and frankly I didn't think about it.
The advantage of it is that it fixes the problem not just in one place,
but "forever". No hacks about exactly how you access the mappings etc.
Of course, nothing _really_ solves things. If you do some delayed IO after
having looked up the mapping and turned it into a physical page, and the
original allocator actually unmaps it (or exits), then the same issue can
still happen (well, not the _same_ one - but the very similar issue of the
child seeing changes even though the IO was started in the parent).
This is why I think any "look up by physical" is fundamentally flawed. It
very basically becomes a "I have a secret local TLB that cannot be changed
or flushed". And any single-bit solution (GUP) is always going to be
fairly broken.
> The cost of my fix to fork is not measurable with fork microbenchmark,
> while the cost of finding who owns the original shared page in
> do_wp_page would be potentially be much bigger. The only slowdown to
> fork is in the O_DIRECT slow path which we don't care about and in the
> worst case is limited to the total amount of in-flight I/O.
Agreed. However, I really think this is a O_DIRECT problem. Just document
it. Tell people that O_DIRECT simply doesn't work with COW, and
fundamentally can never work well.
If you use O_DIRECT with threading, you had better know what the hell
you're doing anyway. I do not think that the kernel should do stupid
things just because stupid users don't understand the semantics of the
_non-stupid_ thing (which is to just let people think about COW for five
seconds).
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-11 20:22 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090311170611.GA2079@elte.hu>
2009-03-11 17:33 ` Linus Torvalds
2009-03-11 17:41 ` Ingo Molnar
2009-03-11 17:58 ` Linus Torvalds
2009-03-11 18:37 ` Andrea Arcangeli
2009-03-11 18:46 ` Linus Torvalds
2009-03-11 19:01 ` Linus Torvalds
2009-03-11 19:59 ` Andrea Arcangeli
2009-03-11 20:19 ` Linus Torvalds [this message]
2009-03-11 20:33 ` Linus Torvalds
2009-03-11 20:55 ` Andrea Arcangeli
2009-03-11 21:28 ` Linus Torvalds
2009-03-11 21:57 ` Andrea Arcangeli
2009-03-11 22:06 ` Linus Torvalds
2009-03-11 22:07 ` Linus Torvalds
2009-03-11 22:22 ` Davide Libenzi
2009-03-11 22:32 ` Linus Torvalds
2009-03-14 5:07 ` Benjamin Herrenschmidt
2009-03-11 20:48 ` Andrea Arcangeli
2009-03-14 5:06 ` Benjamin Herrenschmidt
2009-03-14 5:20 ` Nick Piggin
2009-03-16 16:01 ` KOSAKI Motohiro
2009-03-16 16:23 ` Nick Piggin
2009-03-16 16:32 ` Linus Torvalds
2009-03-16 16:50 ` Nick Piggin
2009-03-16 17:02 ` Linus Torvalds
2009-03-16 17:19 ` Nick Piggin
2009-03-16 17:42 ` Linus Torvalds
2009-03-16 18:02 ` Nick Piggin
2009-03-16 18:05 ` Nick Piggin
2009-03-16 18:17 ` Linus Torvalds
2009-03-16 18:33 ` Nick Piggin
2009-03-16 19:22 ` Linus Torvalds
2009-03-17 5:44 ` Nick Piggin
2009-03-16 18:14 ` Linus Torvalds
2009-03-16 18:29 ` Nick Piggin
2009-03-16 19:17 ` Linus Torvalds
2009-03-17 5:42 ` Nick Piggin
2009-03-17 5:58 ` Nick Piggin
2009-03-16 18:37 ` Andrea Arcangeli
2009-03-16 18:28 ` Andrea Arcangeli
2009-03-16 23:59 ` KAMEZAWA Hiroyuki
2009-03-18 2:04 ` KOSAKI Motohiro
2009-03-22 12:23 ` KOSAKI Motohiro
2009-03-23 0:13 ` KOSAKI Motohiro
2009-03-23 16:29 ` Ingo Molnar
2009-03-23 16:46 ` Linus Torvalds
2009-03-24 5:08 ` KOSAKI Motohiro
2009-03-24 13:43 ` Nick Piggin
2009-03-24 17:56 ` Linus Torvalds
2009-03-30 10:52 ` KOSAKI Motohiro
[not found] ` <200904022307.12043.nickpiggin@yahoo.com.au>
2009-04-03 3:49 ` Nick Piggin
2009-03-17 0:44 ` Linus Torvalds
2009-03-17 0:56 ` KAMEZAWA Hiroyuki
2009-03-17 12:19 ` Andrea Arcangeli
2009-03-17 16:43 ` Linus Torvalds
2009-03-17 17:01 ` Linus Torvalds
2009-03-17 17:10 ` Andrea Arcangeli
2009-03-17 17:43 ` Linus Torvalds
2009-03-17 18:09 ` Linus Torvalds
2009-03-17 18:19 ` Linus Torvalds
2009-03-17 18:46 ` Andrea Arcangeli
2009-03-17 19:03 ` Linus Torvalds
2009-03-17 19:35 ` Andrea Arcangeli
2009-03-17 19:55 ` Linus Torvalds
2009-03-11 19:06 ` Andrea Arcangeli
2009-03-12 5:36 ` Nick Piggin
2009-03-12 16:23 ` Nick Piggin
2009-03-12 17:00 ` Andrea Arcangeli
2009-03-12 17:20 ` Nick Piggin
2009-03-12 17:23 ` Nick Piggin
2009-03-12 18:06 ` Andrea Arcangeli
2009-03-12 18:58 ` Andrea Arcangeli
2009-03-13 16:09 ` Nick Piggin
2009-03-13 19:34 ` Andrea Arcangeli
2009-03-14 4:59 ` Nick Piggin
2009-03-16 13:56 ` Andrea Arcangeli
2009-03-16 16:01 ` Nick Piggin
2009-03-14 4:46 ` Nick Piggin
2009-03-14 5:06 ` Nick Piggin
2009-03-11 18:53 ` Andrea Arcangeli
2009-03-11 18:22 ` Andrea Arcangeli
2009-03-11 19:06 ` Ingo Molnar
2009-03-11 19:15 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0903111306080.32478@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=npiggin@novell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox