linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linux-arch@vger.kernel.org, rmk@arm.linux.org.uk,
	James.Bottomley@HansenPartnership.com
Subject: Re: [patch] mm: fix PageUptodate memory ordering bug
Date: Sun, 23 Dec 2007 07:54:46 +0100	[thread overview]
Message-ID: <20071223065446.GB29288@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0712221152370.7460@blonde.wat.veritas.com>

On Sat, Dec 22, 2007 at 12:14:42PM +0000, Hugh Dickins wrote:
> On Sat, 22 Dec 2007, Andrew Morton wrote:
> > On Tue, 18 Dec 2007 02:26:32 +0100 Nick Piggin <npiggin@suse.de> wrote:
> > 
> > > After running SetPageUptodate, preceeding stores to the page contents to
> > > actually bring it uptodate may not be ordered with the store to set the page
> > > uptodate.
> > > 
> > > Therefore, another CPU which checks PageUptodate is true, then reads the
> > > page contents can get stale data.
> > > 
> > > Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after
> > > PageUptodate.
> > > 
> > > Many places that test PageUptodate, do so with the page locked, and this
> > > would be enough to ensure memory ordering in those places if SetPageUptodate
> > > were only called while the page is locked. Unfortunately that is not always
> > > the case for some filesystems, but it could be an idea for the future.
> > > 
> > > One thing I like about it is that it brings the handling of anonymous page
> > > uptodateness in line with that of file backed page management, by marking anon
> > > pages as uptodate when they _are_ uptodate, rather than when our implementation
> > > requires that they be marked as such.
> 
> Nick, you're welcome to make that a separate, less controversial patch,
> to send in ahead.  Though I think the last time this came around, I hit
> one of your BUGs in testing shmem.c swapout or swapin or swapoff:
> something missing there that I've lost the record of - please do
> try testing that, maybe it's already fixed this time around.

I've given it some hours in your patented swapping kbuild-on-ext2-on-loop-on-tmpfs
stress testing (including swapoff). Haven't seen a problem as yet (except the tmpfs
swapin deadlock, which I've been patching out).

But if you see anything, please let me know...


> > >  #ifdef CONFIG_S390
> > > +	page_clear_dirty(page);
> > > +#endif
> > > +}
> 
> That's an odd little extract, since page_clear_dirty only does anything
> on s390.

Ah yeah, we could just get rid of the ifdef. Although I don't mind it too much,
as it kind of helps the reader match the other ifdef there...

 
> > For an overall 0.5% increase in the i386 size of several core mm files.  If
> > you don't blow us up on the spot, you'll slowly bleed us to death.
> > 
> > Can it be improved?
> 
> I do wish it could be.
> 
> I never find the time to give it the thought it needs; and any criticism
> I make is probably unjust, probably patiently answered by Nick on a
> previous round.
> 
> I'm never convinced that SetPageUptodate is the right place for
> this: what's wrong with doing it in those page copying functions?
> Or flush_dcache_page?

There are various places we _could_ do it, but I think PG_uptodate macros
are logically the best, without being too intrusive.

Let me explain. Normally I think the convention would be to open-code the
barriers in the callees (ie. between memset(); SetPageUptodate();, and
if (PageUptodate()) { read from page }).

However I think that would require going through quite a bit of code (including
filesystems) to audit. So I think having them in these macros is pretty
reasonable, and amounts to less thinking required by others.

Why don't I like doing it in page copying functions? Just because there are more
and more varied uses. I can't think of any reasons to rather do it in the page
copying functions, and some reasons against.

flush_dcache_page? Well this bug really is a problem ordering stores to the
page with store to page flags against loads from the same; nothing to do with
cache aliasing. So putting the smp_wmb in flush_dcache_page leaves you without
a natural complement to put the smp_rmb. Although it could be done, I think it
makes it more tangled than having the ordering done in the macros.  We also
only need to order the *initial* stores which bring the page uptodate, rather
than for each store, in the case of flush_dcache_page.


>  Don't we need different kinds of barrier
> according to how the data got into the page (by DMA or not)?

I had thought of that (my previous patch had an XXX: help...) for this
very issue. Without actually knowing what the underlying architecture does,
I "concluded" that it should be done somewhere down at the block layer. I
think it would be silly for the block layer to signal completion if the
results are still incoherent with the CPU cache... but if the experts have
a different opinion, then this needs to be solved with another call anyway
(not in the page uptodate macros and it's not exactly a memory ordering issue).
eg. direct IO reads would have the same DMA cache synchronisation before it
completes to userspace, and this is completely independent of PG_uptodate...

> Doesn't that enter territory discussed down the years between
> James Bottomley and Russell King?  Worth CC'ing them the original?

... but since you bring this up again, I think that would be worthwhile. In
the interest of maintaining this thread I'll just link the original:

http://marc.info/?l=linux-mm&m=119794127303483&w=2

The question is this:

Must read from net/disk/etc into page P.
Device DMAs into P, signals completion

CPU0: handles completion, store to ram to mark P uptodate

CPU0/1: load from ram sees P uptodate, load from P must only see uptodate data

Are we guaranteed to get uptodate data from above the block layer, or do we
need to do anything special?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-12-23  6:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-18  1:26 Nick Piggin
2007-12-22  8:57 ` Andrew Morton
2007-12-22 12:14   ` Hugh Dickins
2007-12-23  6:54     ` Nick Piggin [this message]
2007-12-23  5:57   ` Nick Piggin
2007-12-23  6:32     ` Andrew Morton
2007-12-23  7:15       ` Nick Piggin
2007-12-23  7:29         ` Andrew Morton
2007-12-23  9:14           ` Nick Piggin
2007-12-23  9:28             ` Andrew Morton
2007-12-23 16:02               ` Andi Kleen
2007-12-30 16:33             ` Ingo Molnar
2008-01-01 23:26               ` Nick Piggin
2008-01-02 21:01                 ` Andi Kleen
2008-01-03  3:32                   ` Nick Piggin
2008-01-03 13:08                     ` Andi Kleen
2007-12-23 17:22         ` Linus Torvalds
2007-12-23 21:35           ` Nick Piggin
2007-12-23 22:41           ` Nick Piggin
2008-01-01 23:41           ` Alan Cox
2008-01-02 11:02             ` [patch] i386: avoid expensive ppro ordering workaround for default 686 kernels Nick Piggin
2008-01-02 13:44               ` Alan Cox
2008-01-03  4:17                 ` Nick Piggin
2008-01-03 14:23                   ` Alan Cox
2008-01-03 20:20                     ` Benjamin Herrenschmidt
2008-01-03 22:23                       ` Alan Cox
2008-01-03 23:10                     ` Nick Piggin
2008-01-04 16:27                       ` Alan Cox
2008-01-07  0:12                         ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071223065446.GB29288@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=hugh@veritas.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rmk@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox