Re: smp_rmb in mm/memory.c in 2.6.10

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hugh@veritas.com>
To: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Anton Blanchard <anton@samba.org>, Andi Kleen <ak@suse.de>,
	William Lee Irwin III <wli@holomorphy.com>,
	Andrea Arcangeli <andrea@suse.de>,
	linux-mm@kvack.org, davem@redhat.com
Subject: Re: smp_rmb in mm/memory.c in 2.6.10
Date: Fri, 14 Jan 2005 22:09:17 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.44.0501142127430.3050-100000@localhost.localdomain> (raw)
In-Reply-To: <20050114211441.59635.qmail@web14305.mail.yahoo.com>

On Fri, 14 Jan 2005, Kanoj Sarcar wrote:
> 
> Here are the relevant steps of the two procedures:
> 
> do_no_page()
> 1. sequence = atomic_read(&mapping->truncate_count);
> 2. smp_rmb();
> 3. vma->vm_ops->nopage()
> 4. spin_lock(&mm->page_table_lock);
> 5. Retry if sequence !=
> atomic_read(&mapping->truncate_count)
> 5a. See later.
> 6. update_mmu_cache()
> 7. spin_unlock(&mm->page_table_lock);
> 
> unmap_mapping_range()
> 8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */
> 9. atomic_inc(&mapping->truncate_count);
> 10.zap_page_range():spin_lock(&mm->page_table_lock);
> zap_page_range():tlbcleaning
> zap_page_range():spin_unlock(&mm->page_table_lock)
> 11. spin_unlock(&mapping->i_mmap_lock);

Yes (except that 8 is somewhat relevant to removing atomicity;
I say somewhat because there's also an exclusive i_sem protecting).

> --- Hugh Dickins <hugh@veritas.com> wrote:
> > On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> > > 
> > > Thanks, I think this explains it. IE, if
> > do_no_page()
> > > reads truncate_count, and then later goes on to
> > > acquire a lock in nopage(), the smp_rmb() is
> > > guaranteeing that the read of truncate_count
> > completes
> > > before nopage() starts executing. 
> > > 
> > > For x86 at least, it seems to me that since the
> > > spin_lock (in nopage()) uses a "lock" instruction,
> > > that itself guarantees that the truncate_count
> > read is
> > > completed, even without the smp_rmb(). (Refer to
> > IA32
> > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus
> > for
> > > x86, the smp_rmb is superfluous.
> > 
> > You're making me nervous.  If you look at 2.6.11-rc1
> > you'll find
> > that I too couldn't see the point of that smp_rmb(),
> > on any architecture,
> > and so removed it; while also removing the
> > "atomicity" of truncate_count.
> 
> I haven't looked at the 2.6.11 code,

Please do if you have time.

> but you could look at atomicity and smp_rmb()
> as two different changes.

Definitely (oh, the shame that I put them together in one patch!)

> I believe the ordering of the C code in steps
> 8 and 9 could be interchanged without any problems, ie
> truncate_count is not protected by i_mmap_lock. In
> that case, you would need truncate_count to be atomic,
> unless you can guarantee unmap_mapping_range() is
> single threaded wrt "mapping" from callers.  

Right, but given the ordering 8 before 9,
there is no point to truncate_count being atomic.

> > Here was my comment to that patch:
> > > Why is mapping->truncate_count atomic?  It's
> > incremented inside
> > > i_mmap_lock (and i_sem), and the reads don't need
> > it to be atomic.
> > > 
> > > And why smp_rmb() before call to ->nopage?  The
> > compiler cannot reorder
> > > the initial assignment of sequence after the call
> > to ->nopage, and no
> > > cpu (yet!) can read from the future, which is all
> > that matters there.
> > 
> > Now I'm not so convinced by that "no cpu can read
> > from the future".
> > 
> > I don't entirely follow your remarks above, but I do
> > think people
> > on this thread have a better grasp of these matters
> > than I have:
> > does anyone now think that smp_rmb() needs to be
> > restored?
> 
> As to the smp_rmb() part, I believe it is required; we
> are not talking about compiler reorderings,

Did need to be considered, but I still agree with
myself that the function call makes it no problem.

> rather cpu
> reorderings. Given just steps 1 and 3 above, there is
> no guarantee from the cpu that the read of
> truncate_count would not be performed before nopage()
> is almost complete, even though the compiler generated
> the proper instruction order (ie the cpu could pull
> down the read of truncate_count).

This is your crucial point.  Now I think you're right.

But I have remembered how I was thinking at the time,
what's behind my "no cpu can read from the future" remark.

Suppose unmap_mapping_range is incrementing truncate_count
from 0 to 1.  I could conceive of do_no_page's read into
"sequence" not completing until the spin_lock at step 4.
But I believed that the read issued before ->nopage could
only err on the safe side, sometimes fetching 0 instead of 1
when 1 would already be safe, but never seeing 1 too soon.

That belief was naive, wasn't it?  I was thinking in terms
of "slow" instructions rather than reordered instructions.

> Whoever wrote this code did a careful job.

It was Andrea (one reason I've copied him now -
as I did when posting the patch to remove it).

Unless someone sees this differently, I should send a patch to
restore the smp_rmb(), with a longer code comment on what it's for.

Thanks a lot for your detailed answer.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-01-14 22:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-13 20:26 Kanoj Sarcar
2005-01-13 20:39 ` William Lee Irwin III
2005-01-13 21:02   ` Kanoj Sarcar
2005-01-13 21:06     ` Andi Kleen
2005-01-13 21:29       ` Kanoj Sarcar
2005-01-13 21:59         ` Anton Blanchard
2005-01-13 23:22           ` Kanoj Sarcar
2005-01-14 20:37             ` Hugh Dickins
2005-01-14 21:14               ` Kanoj Sarcar
2005-01-14 21:38                 ` Andrea Arcangeli
2005-01-14 22:09                 ` Hugh Dickins [this message]
2005-01-14 22:34                   ` Andrea Arcangeli
2005-01-14 21:25               ` Andrea Arcangeli
2005-01-14 21:32                 ` Andrea Arcangeli
2005-01-14 22:22                   ` Kanoj Sarcar
2005-01-14 22:47                     ` Hugh Dickins
2005-01-14 22:51                     ` Andrea Arcangeli
2005-01-14 23:14                       ` Kanoj Sarcar
2005-01-14 23:26                         ` Andrea Arcangeli
2005-01-14 22:36                   ` Hugh Dickins
2005-01-14 23:01                     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0501142127430.3050-100000@localhost.localdomain \
    --to=hugh@veritas.com \
    --cc=ak@suse.de \
    --cc=andrea@suse.de \
    --cc=anton@samba.org \
    --cc=davem@redhat.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=linux-mm@kvack.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox