Re: [PATCH] Remove unnecessary smp_wmb from clear_user_highpage()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: mel@skynet.ie (Mel Gorman)
To: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org
Subject: Re: [PATCH] Remove unnecessary smp_wmb from clear_user_highpage()
Date: Fri, 20 Jul 2007 14:08:49 +0100	[thread overview]
Message-ID: <20070720130848.GA15214@skynet.ie> (raw)
In-Reply-To: <20070719021743.GC23641@wotan.suse.de>

On (19/07/07 04:17), Nick Piggin didst pronounce:
> On Wed, Jul 18, 2007 at 05:45:22PM +0100, Hugh Dickins wrote:
> > On Wed, 18 Jul 2007, Mel Gorman wrote:
> > > 
> > > At the nudging of Andrew, I was checking to see if the architecture-specific
> > > implementations of alloc_zeroed_user_highpage() can be removed or not.
> > 
> > Ah, so that was part of the deal for getting MOVABLE in, eh ;-?
> > 
> > > With the exception of barriers, the differences are negligible and the main
> > > memory barrier is in clear_user_highpage(). However, it's unclear why it's
> > > needed. Do you mind looking at the following patch and telling me if it's
> > > wrong and if so, why?
> > > 
> > > Thanks a lot.
> > 
> > I laugh when someone approaches me with a question on barriers ;)
> > I usually get confused and have to go ask someone else.
> > 
> > And I should really to leave this query to Nick: he'll be glad of the
> > opportunity to post his PageUptodate memorder patches again (looking
> > in my mailbox I see versions from February, but I'm pretty sure he put
> > out a more compact, less scary one later on).  He contends that the
> > barrier in clear_user_highpage should not be there, but instead
> > barriers (usually) needed when setting and testing PageUptodate.
> > 
> > Andrew and I weren't entirely convinced: I don't think we found
> > him wrong, just didn't find time to think about it deeply enough,
> > suspicious of a fix in search of a problem, scared by the extent
> > of the first patch, put off by the usual host of __..._nolock
> > variants and micro-optimizations.  It is worth another look.
> 
> Well, at least I probably won't have to debug the remaining problem --
> the IBM guys will :)
> 

I weep for joy. I'll go looking for a test case for this. It sounds like
something that we'll need anyway if this area is to be kicked at all.

> > But setting aside PageUptodate futures...  "git blame" is handy,
> > and took me to the patch from Linus appended.  I think there's
> > as much need for that smp_wmb() now as there was then.  (But
> > am I really _thinking_?  No, just pointing you in directions.)
> > 
> > > ===
> > > 
> > >     This patch removes an unnecessary write barrier from clear_user_highpage().
> > >     
> > >     clear_user_highpage() is called from alloc_zeroed_user_highpage() on a
> > >     number of architectures and from clear_huge_page(). However, these callers
> > >     are already protected by the necessary memory barriers due to spinlocks
> > 
> > Be careful: as Linus indicates, spinlocks on x86 act as good barriers,
> > but on some architectures they guarantee no more than is strictly
> > necessary.  alpha, powerpc and ia64 spring to my mind as particularly
> > difficult ordering-wise, but I bet there are others too.
> 
> The problem cases here are those which don't provide an smp_mb() over
> locks (eg. ones which only give acquire semantics). I think these only
> are ia64 and powerpc.

If IA64 has these sort of semantics, then it's current behaviour is
buggy unless their call to flush_dcache_page() has a similar effect to
having a write barrier elsewhere. I'll ask them.

> Of those, I think only powerpc implementations have
> a really deep out of order memory system (at least on the store side)...
> which is probably why they see and have to fix most of our barrier
> problems :)
> 

Yeah, this could be more of the same.

> 
> > >     in the fault path and the page should not be visible on other CPUs anyway
> > 
> > The page may not be intentionally visible on another CPU yet.  But imagine
> > interesting stale data in the page being cleared, and another thread
> > peeking racily at unfaulted areas, hoping to catch sight of that data.
> > 
> > >     making the barrier unnecessary. A hint of lack of necessity is that there
> > >     does not appear to be a read barrier anywhere for this zeroed page.
> > 
> > Yes, I think Nick was similarly suspicious of a wmb without an rmb; but
> > Linus is _very_ barrier-savvy, so we might want to ask him about it (CC'ed).
> 
> I was not so suspicious in the page fault case: there is a causal
> ordering between loading the valid pte and dereferencing it to load
> the page data. Potentially I think alpha is the only thing that
> could have problems here, but a) if any implementations did hardware
> TLB fills, they would have to do the rmb in microcode; and b) the
> software path appears to use the regular fault handler, so it would
> be subject to synchronisatoin via ptl. But maybe they are unsafe...
> 

One way to find out. Minimally, I think the cleanup here if it exists at
all is to replace the arch-specific alloc_zeroed helpers with barrier and
no-barrier versions and have architectures specify when they do not require
barrier to exist so the default behaviour is the safer choice. At least the
issue will be a bit clearer then to the next guy. IA64 will still be the
different but maybe it can be brought in line with other arches behaviour.

> What I am worried about is exactly the same race at the read(2)/write(2)
> level where there is _no_ spinlock synchronisation, and no wmb, let
> alone a rmb :)
> 

Where is the race in read/write that is affected by the behaviour of
clear_user_highpage()? Is it where a sparse file is mmap()ed and being read
at the same time or what?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-07-20 13:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-18 15:05 Mel Gorman
2007-07-18 16:45 ` Hugh Dickins
2007-07-19  2:17   ` Nick Piggin
2007-07-20 13:08     ` Mel Gorman [this message]
2007-07-23  2:02       ` Nick Piggin
2007-07-19  2:28   ` Linus Torvalds
2007-07-19  2:58     ` Nick Piggin
2007-07-19  2:36   ` Nick Piggin
2007-07-19 11:16   ` Mel Gorman
2007-07-19  1:57 ` Nick Piggin
2007-07-20 21:06 Oleg Nesterov
2007-07-20 21:57 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070720130848.GA15214@skynet.ie \
    --to=mel@skynet.ie \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox