From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
Tim LaBerge <tim.laberge@quantum.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: Corruption with O_DIRECT and unaligned user buffers
Date: Fri, 19 Dec 2008 16:19:11 +0900 [thread overview]
Message-ID: <20081219161911.dcf15331.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20081218152952.GW24856@random.random>
On Thu, 18 Dec 2008 16:29:52 +0100
Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Wed, Nov 19, 2008 at 05:58:19PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 19, 2008 at 03:25:59PM +1100, Nick Piggin wrote:
> > > The solution either involves synchronising forks and get_user_pages,
> > > or probably better, to do copy on fork rather than COW in the case
> > > that we detect a page is subject to get_user_pages. The trick is in
> > > the details :)
> >
> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: fork-o_direct-race
>
> Think a thread writing constantly to the last 512bytes of a page, while another
> thread read and writes to/from the first 512bytes of the page. We can lose
> O_DIRECT reads, the very moment we mark any pte wrprotected because a third
> unrelated thread forks off a child.
>
> This fixes it by never wprotecting anon ptes if there can be any direct I/O in
> flight to the page, and by instantiating a readonly pte and triggering a COW in
> the child. The only trouble here are O_DIRECT reads (writes to memory, read
> from disk). Checking the page_count under the PT lock guarantees no
> get_user_pages could be running under us because if somebody wants to write to
> the page, it has to break any cow first and that requires taking the PT lock in
> follow_page before increasing the page count.
>
> The COW triggered inside fork will run while the parent pte is read-write, this
> is not usual but that's ok as it's only a page copy and it doesn't modify the
> page contents.
>
> In the long term there should be a smp_wmb() in between page_cache_get and
> SetPageSwapCache in __add_to_swap_cache and a smp_rmb in between the
> PageSwapCache and the page_count() to remove the trylock op.
>
> Fixed version of original patch from Nick Piggin.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Confirmed this fixes the problem.
Hmm, but, fork() gets slower.
Result of cost-of-fork() on ia64.
==
size of memory before after
Anon=1M , 0.07ms, 0.08ms
Anon=10M , 0.17ms, 0.22ms
Anon=100M , 1.15ms, 1.64ms
Anon=1000M , 11.5ms, 15.821ms
==
fork() cost is 135% when the process has 1G of Anon.
test program is below. (used "/usr/bin/time" for measurement.)
==
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(int argc, char *argv[])
{
int size, i, status;
char *c;
size = atoi(argv[1]) * 1024 * 1024;
c = malloc(size);
memset(c, 0,size);
for (i = 0; i < 5000; i++) {
if (!fork()) {
exit(0);
}
wait(&status);
}
}
==
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-19 7:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-14 17:04 Tim LaBerge
2008-11-19 4:25 ` Nick Piggin
2008-11-19 6:52 ` Nick Piggin
2008-11-19 16:58 ` Andrea Arcangeli
2008-12-18 15:29 ` Andrea Arcangeli
2008-12-19 2:21 ` KAMEZAWA Hiroyuki
2008-12-19 5:06 ` KAMEZAWA Hiroyuki
2008-12-19 6:34 ` KOSAKI Motohiro
2008-12-20 16:02 ` Andrea Arcangeli
2008-12-19 7:19 ` KAMEZAWA Hiroyuki [this message]
2008-12-19 7:44 ` Li Zefan
2008-12-19 8:45 ` Li Zefan
2008-12-19 20:27 ` Andrea Arcangeli
2008-12-20 15:55 ` Andrea Arcangeli
2008-12-19 11:51 ` Li Zefan
2008-12-19 12:14 ` KOSAKI Motohiro
2008-12-19 12:58 ` Hugh Dickins
2008-12-19 20:34 ` Andrea Arcangeli
2014-06-27 2:08 Xiaoguang Wang
2014-07-01 4:18 ` Hugh Dickins
2014-07-02 11:39 ` Xiaoguang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081219161911.dcf15331.kamezawa.hiroyu@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=aarcange@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=tim.laberge@quantum.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox