linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Tim LaBerge <tim.laberge@quantum.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: Corruption with O_DIRECT and unaligned user buffers
Date: Fri, 19 Dec 2008 16:19:11 +0900	[thread overview]
Message-ID: <20081219161911.dcf15331.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20081218152952.GW24856@random.random>

On Thu, 18 Dec 2008 16:29:52 +0100
Andrea Arcangeli <aarcange@redhat.com> wrote:

> On Wed, Nov 19, 2008 at 05:58:19PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 19, 2008 at 03:25:59PM +1100, Nick Piggin wrote:
> > > The solution either involves synchronising forks and get_user_pages,
> > > or probably better, to do copy on fork rather than COW in the case
> > > that we detect a page is subject to get_user_pages. The trick is in
> > > the details :)
> > 

> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: fork-o_direct-race
> 
> Think a thread writing constantly to the last 512bytes of a page, while another
> thread read and writes to/from the first 512bytes of the page. We can lose
> O_DIRECT reads, the very moment we mark any pte wrprotected because a third
> unrelated thread forks off a child.
> 
> This fixes it by never wprotecting anon ptes if there can be any direct I/O in
> flight to the page, and by instantiating a readonly pte and triggering a COW in
> the child. The only trouble here are O_DIRECT reads (writes to memory, read
> from disk). Checking the page_count under the PT lock guarantees no
> get_user_pages could be running under us because if somebody wants to write to
> the page, it has to break any cow first and that requires taking the PT lock in
> follow_page before increasing the page count.
> 
> The COW triggered inside fork will run while the parent pte is read-write, this
> is not usual but that's ok as it's only a page copy and it doesn't modify the
> page contents.
> 
> In the long term there should be a smp_wmb() in between page_cache_get and
> SetPageSwapCache in __add_to_swap_cache and a smp_rmb in between the
> PageSwapCache and the page_count() to remove the trylock op.
> 
> Fixed version of original patch from Nick Piggin.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Confirmed this fixes the problem.

Hmm, but, fork() gets slower. 

Result of cost-of-fork() on ia64.
==
  size of memory  before  after
  Anon=1M   	, 0.07ms, 0.08ms
  Anon=10M  	, 0.17ms, 0.22ms
  Anon=100M 	, 1.15ms, 1.64ms
  Anon=1000M	, 11.5ms, 15.821ms
==

fork() cost is 135% when the process has 1G of Anon.

test program is below. (used "/usr/bin/time" for measurement.)
==
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>


int main(int argc, char *argv[])
{
        int size, i, status;
        char *c;

        size = atoi(argv[1]) * 1024 * 1024;
        c = malloc(size);
        memset(c, 0,size);
        for (i = 0; i < 5000; i++) {
                if (!fork()) {
                        exit(0);
                }
                wait(&status);
        }
}
==





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-12-19  7:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-14 17:04 Tim LaBerge
2008-11-19  4:25 ` Nick Piggin
2008-11-19  6:52   ` Nick Piggin
2008-11-19 16:58   ` Andrea Arcangeli
2008-12-18 15:29     ` Andrea Arcangeli
2008-12-19  2:21       ` KAMEZAWA Hiroyuki
2008-12-19  5:06         ` KAMEZAWA Hiroyuki
2008-12-19  6:34       ` KOSAKI Motohiro
2008-12-20 16:02         ` Andrea Arcangeli
2008-12-19  7:19       ` KAMEZAWA Hiroyuki [this message]
2008-12-19  7:44         ` Li Zefan
2008-12-19  8:45           ` Li Zefan
2008-12-19 20:27           ` Andrea Arcangeli
2008-12-20 15:55         ` Andrea Arcangeli
2008-12-19 11:51       ` Li Zefan
2008-12-19 12:14         ` KOSAKI Motohiro
2008-12-19 12:58         ` Hugh Dickins
2008-12-19 20:34         ` Andrea Arcangeli
2014-06-27  2:08 Xiaoguang Wang
2014-07-01  4:18 ` Hugh Dickins
2014-07-02 11:39   ` Xiaoguang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081219161911.dcf15331.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=aarcange@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=tim.laberge@quantum.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox