From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>,
Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Tony Luck <tony.luck@intel.com>
Subject: Re: Dirty/Access bits vs. page content
Date: Wed, 23 Apr 2014 12:33:15 -0700 [thread overview]
Message-ID: <CA+55aFwm9BT4ecXF7dD+OM0-+1Wz5vd4ts44hOkS8JdQ74SLZQ@mail.gmail.com> (raw)
In-Reply-To: <20140423184145.GH17824@quack.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 1826 bytes --]
On Wed, Apr 23, 2014 at 11:41 AM, Jan Kara <jack@suse.cz> wrote:
>
> Now I'm not sure how to fix Linus' patches. For all I care we could just
> rip out pte dirty bit handling for file mappings. However last time I
> suggested this you corrected me that tmpfs & ramfs need this. I assume this
> is still the case - however, given we unconditionally mark the page dirty
> for write faults, where exactly do we need this?
Honza, you're missing the important part: it does not matter one whit
that we unconditionally mark the page dirty, when we do it *early*,
and it can be then be marked clean before it's actually clean!
The problem is that page cleaning can clean the page when there are
still writers dirtying the page. Page table tear-down removes the
entry from the page tables, but it's still there in the TLB on other
CPU's. So other CPU's are possibly writing to the page, when
clear_page_dirty_for_io() has marked it clean (because it didn't see
the page table entries that got torn down, and it hasn't seen the
dirty bit in the page yet).
I'm including Dave Hansen's "racewrite.c" with his commentary:
"This is a will-it-scale test-case which handles all the thread creation
and CPU binding for me: https://github.com/antonblanchard/will-it-scale
. Just stick the test case in tests/. I also loopback-mounted a ramfs
file as an ext4 filesystem on /mnt to make sure the writeback could
happen fast.
This reproduces the bug pretty darn quickly and with as few as 4 threads
running like this: ./racewrite_threads -t 4 -s 999"
and
"It reproduces in about 5 seconds on my 4770 on an unpatched kernel. It
also reproduces on a _normal_ filesystem and doesn't apparently need the
loopback-mounted ext4 ramfs file that I was trying before."
so this can actually be triggered.
Linus
[-- Attachment #2: racewrite.c --]
[-- Type: text/x-csrc, Size: 1584 bytes --]
#define _GNU_SOURCE
#define _XOPEN_SOURCE 500
#include <sched.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
#define BUFLEN 4096
static char wistmpfile[] = "/mnt/willitscale.XXXXXX";
char *testcase_description = "Same file pwrite";
char *buf;
#define FILE_SIZE (4096*1024)
void testcase_prepare(void)
{
int fd = mkstemp(wistmpfile);
assert(fd >= 0);
assert(pwrite(fd, "X", 1, FILE_SIZE-1) == 1);
buf = mmap(NULL, FILE_SIZE, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, 0);
assert(buf != (void *)-1);
close(fd);
}
void testcase(unsigned long long *iterations)
{
int cpu = sched_getcpu();
int fd = open(wistmpfile, O_RDWR);
off_t offset = sched_getcpu() * BUFLEN;
long counter = 0;
long counterread = 0;
long *counterbuf = (void *)&buf[offset];
printf("offset: %ld\n", offset);
printf(" buf: %p\n", buf);
printf("counterbuf: %p\n", counterbuf);
assert(fd >= 0);
while (1) {
int ret;
if (cpu == 1) {
ret = madvise(buf, FILE_SIZE, MADV_DONTNEED);
continue;
}
*counterbuf = counter;
posix_fadvise(fd, offset, BUFLEN, POSIX_FADV_DONTNEED);
ret = pread(fd, &counterread, sizeof(counterread), offset);
assert(ret == sizeof(counterread));
if (counterread != counter) {
printf("cpu: %d\n", cpu);
printf(" counter %ld\n", counter);
printf("counterread %ld\n", counterread);
printf("*counterbuf %ld\n", *counterbuf);
while(1);
}
counter++;
(*iterations)++;
}
}
void testcase_cleanup(void)
{
unlink(wistmpfile);
}
next prev parent reply other threads:[~2014-04-23 19:33 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1398032742.19682.11.camel@pasglop>
[not found] ` <CA+55aFz1sK+PF96LYYZY7OB7PBpxZu-uNLWLvPiRz-tJsBqX3w@mail.gmail.com>
[not found] ` <1398054064.19682.32.camel@pasglop>
[not found] ` <1398057630.19682.38.camel@pasglop>
[not found] ` <CA+55aFwWHBtihC3w9E4+j4pz+6w7iTnYhTf4N3ie15BM9thxLQ@mail.gmail.com>
[not found] ` <53558507.9050703@zytor.com>
[not found] ` <CA+55aFxGm6J6N=4L7exLUFMr1_siNGHpK=wApd9GPCH1=63PPA@mail.gmail.com>
[not found] ` <53559F48.8040808@intel.com>
2014-04-22 0:31 ` Linus Torvalds
2014-04-22 0:44 ` Linus Torvalds
2014-04-22 5:15 ` Tony Luck
2014-04-22 14:55 ` Linus Torvalds
2014-04-22 7:34 ` Peter Zijlstra
2014-04-22 7:54 ` Peter Zijlstra
2014-04-22 21:36 ` Linus Torvalds
2014-04-22 21:46 ` Dave Hansen
2014-04-22 22:08 ` Linus Torvalds
2014-04-22 22:41 ` Dave Hansen
2014-04-23 2:44 ` Linus Torvalds
2014-04-23 3:08 ` Hugh Dickins
2014-04-23 4:23 ` Linus Torvalds
2014-04-23 6:14 ` Benjamin Herrenschmidt
2014-04-23 18:41 ` Jan Kara
2014-04-23 19:33 ` Linus Torvalds [this message]
2014-04-24 6:51 ` Peter Zijlstra
2014-04-24 18:40 ` Hugh Dickins
2014-04-24 19:45 ` Linus Torvalds
2014-04-24 20:02 ` Hugh Dickins
2014-04-24 23:46 ` Linus Torvalds
2014-04-25 1:37 ` Benjamin Herrenschmidt
2014-04-25 2:41 ` Benjamin Herrenschmidt
2014-04-25 2:46 ` Linus Torvalds
2014-04-25 2:50 ` H. Peter Anvin
2014-04-25 3:03 ` Linus Torvalds
2014-04-25 12:01 ` Hugh Dickins
2014-04-25 13:51 ` Peter Zijlstra
2014-04-25 19:41 ` Hugh Dickins
2014-04-26 18:07 ` Peter Zijlstra
2014-04-27 7:20 ` Peter Zijlstra
2014-04-27 12:20 ` Hugh Dickins
2014-04-27 19:33 ` Peter Zijlstra
2014-04-27 19:47 ` Linus Torvalds
2014-04-27 20:09 ` Hugh Dickins
2014-04-28 9:25 ` Peter Zijlstra
2014-04-28 10:14 ` Peter Zijlstra
2014-04-27 16:21 ` Linus Torvalds
2014-04-27 23:13 ` Benjamin Herrenschmidt
2014-04-25 16:54 ` Dave Hansen
2014-04-25 18:41 ` Hugh Dickins
2014-04-25 22:00 ` Dave Hansen
2014-04-26 3:11 ` Hugh Dickins
2014-04-26 3:48 ` Linus Torvalds
2014-04-25 17:56 ` Linus Torvalds
2014-04-25 19:13 ` Hugh Dickins
2014-04-25 16:30 ` Dave Hansen
2014-04-23 20:11 ` Hugh Dickins
2014-04-24 8:49 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+55aFwm9BT4ecXF7dD+OM0-+1Wz5vd4ts44hOkS8JdQ74SLZQ@mail.gmail.com \
--to=torvalds@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=linux-arch@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=peterz@infradead.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox