From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>, Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Russell King - ARM Linux <linux@arm.linux.org.uk>,
Tony Luck <tony.luck@intel.com>
Subject: Re: Dirty/Access bits vs. page content
Date: Tue, 22 Apr 2014 21:23:47 -0700 [thread overview]
Message-ID: <CA+55aFw7JjEBUJRHXuwc7bGBD5c=J41mt46ovwHKAoMfPowWOw@mail.gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1404221847120.1759@eggly.anvils>
On Tue, Apr 22, 2014 at 8:08 PM, Hugh Dickins <hughd@google.com> wrote:
>
> At first I thought you were right, and set_page_dirty_lock() needed;
> but now I think not. We only(?) need set_page_dirty_lock() when
> there's a danger that the struct address_space itself might be
> evicted beneath us.
>
> But here (even without relying on Al's delayed_fput) the fput()
> is done by remove_vma() called _after_ the tlb_finish_mmu().
> So I see no danger of struct address_space being freed:
Yeah, that's what the comments say. I'm a bit worried about
"page->mapping" races, though. So I don't think the address_space gets
freed, but I worry about truncate NULL'ing out page->mapping under us.
There, the page being either mapped in a page table (with the page
table lock held), or holding the page lock should protect us against
concurrent truncate doing that.
So I'm still a bit worried.
> page->mapping might be truncated to NULL at any moment without page
> lock and without mapping->tree_lock (and without page table lock
> helping to serialize page_mapped against unmap_mapping_range); but
> __set_page_dirty_nobuffers() (admittedly not the only set_page_dirty)
> is careful about that, and it's just not the problem Dave is seeing.
Yeah, I guess you're right. And no, page->mapping becoming NULL
doesn't actually explain Dave's issue anyway.
> However... Virginia and I get rather giddy when it comes to the
> clear_page_dirty_for_io() page_mkclean() dance: we trust you and
> Peter and Jan on that, and page lock there has some part to play.
>
> My suspicion, not backed up at all, is that by leaving the
> set_page_dirty() until after the page has been unmapped (and page
> table lock dropped), that dance has been disturbed in such a way
> that ext4 can be tricked into freeing page buffers before it will
> need them again for a final dirty writeout.
So I don't think it's new, but I agree that it opens a *much* wider
window where the dirty bit is not visible in either the page tables or
(yet) in the page dirty bit.
The same does happen in try_to_unmap_one() and in
clear_page_dirty_for_io(), but in both of those cases we hold the page
lock for the entire duration of the sequence, so this whole "page is
not visibly dirty and page lock is not held" is new. And yes, I could
imagine that that makes some code go "Ahh, we don't need buffers for
that page any more then".
In short, I think your suspicion is on the right track. I will have to
think about this.
But I'm starting to consider this whole thing to be a 3.16 issue by
now. It wasn't as simple as it looked, and while our old location of
set_page_dirty() is clearly wrong, and DaveH even got a test-case for
it (which I initially doubted would even be possible), I still
seriously doubt that anybody sane who cares about data consistency
will do concurrent unmaps (or MADV_DONTNEED) while another writer is
actively using that mapping.
Pretty much by definition, if you care about data consistency, you'd
never do insane things like that. You'd make damn sure that every
writer is all done before you unmap the area they are writing to.
So this is a "oops, we're clearly doing something wrong by marking the
page dirty too early early, but anybody who really hits it has it
coming to them" kind of situation. We want to fix it, but it doesn't
seem to be a high priority.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-04-23 4:23 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1398032742.19682.11.camel@pasglop>
[not found] ` <CA+55aFz1sK+PF96LYYZY7OB7PBpxZu-uNLWLvPiRz-tJsBqX3w@mail.gmail.com>
[not found] ` <1398054064.19682.32.camel@pasglop>
[not found] ` <1398057630.19682.38.camel@pasglop>
[not found] ` <CA+55aFwWHBtihC3w9E4+j4pz+6w7iTnYhTf4N3ie15BM9thxLQ@mail.gmail.com>
[not found] ` <53558507.9050703@zytor.com>
[not found] ` <CA+55aFxGm6J6N=4L7exLUFMr1_siNGHpK=wApd9GPCH1=63PPA@mail.gmail.com>
[not found] ` <53559F48.8040808@intel.com>
2014-04-22 0:31 ` Linus Torvalds
2014-04-22 0:44 ` Linus Torvalds
2014-04-22 5:15 ` Tony Luck
2014-04-22 14:55 ` Linus Torvalds
2014-04-22 7:34 ` Peter Zijlstra
2014-04-22 7:54 ` Peter Zijlstra
2014-04-22 21:36 ` Linus Torvalds
2014-04-22 21:46 ` Dave Hansen
2014-04-22 22:08 ` Linus Torvalds
2014-04-22 22:41 ` Dave Hansen
2014-04-23 2:44 ` Linus Torvalds
2014-04-23 3:08 ` Hugh Dickins
2014-04-23 4:23 ` Linus Torvalds [this message]
2014-04-23 6:14 ` Benjamin Herrenschmidt
2014-04-23 18:41 ` Jan Kara
2014-04-23 19:33 ` Linus Torvalds
2014-04-24 6:51 ` Peter Zijlstra
2014-04-24 18:40 ` Hugh Dickins
2014-04-24 19:45 ` Linus Torvalds
2014-04-24 20:02 ` Hugh Dickins
2014-04-24 23:46 ` Linus Torvalds
2014-04-25 1:37 ` Benjamin Herrenschmidt
2014-04-25 2:41 ` Benjamin Herrenschmidt
2014-04-25 2:46 ` Linus Torvalds
2014-04-25 2:50 ` H. Peter Anvin
2014-04-25 3:03 ` Linus Torvalds
2014-04-25 12:01 ` Hugh Dickins
2014-04-25 13:51 ` Peter Zijlstra
2014-04-25 19:41 ` Hugh Dickins
2014-04-26 18:07 ` Peter Zijlstra
2014-04-27 7:20 ` Peter Zijlstra
2014-04-27 12:20 ` Hugh Dickins
2014-04-27 19:33 ` Peter Zijlstra
2014-04-27 19:47 ` Linus Torvalds
2014-04-27 20:09 ` Hugh Dickins
2014-04-28 9:25 ` Peter Zijlstra
2014-04-28 10:14 ` Peter Zijlstra
2014-04-27 16:21 ` Linus Torvalds
2014-04-27 23:13 ` Benjamin Herrenschmidt
2014-04-25 16:54 ` Dave Hansen
2014-04-25 18:41 ` Hugh Dickins
2014-04-25 22:00 ` Dave Hansen
2014-04-26 3:11 ` Hugh Dickins
2014-04-26 3:48 ` Linus Torvalds
2014-04-25 17:56 ` Linus Torvalds
2014-04-25 19:13 ` Hugh Dickins
2014-04-25 16:30 ` Dave Hansen
2014-04-23 20:11 ` Hugh Dickins
2014-04-24 8:49 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+55aFw7JjEBUJRHXuwc7bGBD5c=J41mt46ovwHKAoMfPowWOw@mail.gmail.com' \
--to=torvalds@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=linux-arch@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=peterz@infradead.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox