linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Hugh Dickins <hugh@veritas.com>,
	Anton Blanchard <anton@samba.org>, Andi Kleen <ak@suse.de>,
	William Lee Irwin III <wli@holomorphy.com>,
	linux-mm@kvack.org, davem@redhat.com,
	Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>
Subject: Re: smp_rmb in mm/memory.c in 2.6.10
Date: Fri, 14 Jan 2005 23:51:59 +0100	[thread overview]
Message-ID: <20050114225159.GO8709@dualathlon.random> (raw)
In-Reply-To: <20050114222210.51725.qmail@web14324.mail.yahoo.com>

On Fri, Jan 14, 2005 at 02:22:10PM -0800, Kanoj Sarcar wrote:
> handled. No? (Btw, I did not look at i_size_write() in
> the case of !CONFIG_SMP and CONFIG_PREEMPT, there
> might need to be some barriers put in there, not
> sure).

i_size_write is inode->i_size = i_size in common code terms, that's the
64bit case, and it's the one with the weaker semantics (in turn it's the
only one we can consider in terms of common code correctness). So it has
no barriers at all (nor cpu barriers, nor compiler barriers).

> But, based on what you said, yes, I believe an
> smp_wmb() is required _after_
> atomic_inc(truncate_count) in unmap_mapping_range() to
> ensure that the write happens before  it does the TLB
> shootdown. Right?

The smp_wmb() I mean should be put _before_ the truncate_count incrase.
It is mean to avoid the i_size_write to pass the truncate_count
increase (which can happen with spin_lock having inclusive semantics).

The order we must enforce at the cpu level to be correct is this: first
we set i_size, then we increase truncate_count to restart the page
fault, and finally we zap the ptes (and the zap starts by reading the
old contents set by the page fault). And it must be enforced with cpu
and compiler memory barries.

It seems you're right that we need an additional smp_mb() (not just
smp_wmb(), because the pte shootdown does reads first) even after the
truncate_count increase, but I thought the locking inside
unmap_mapping_range_list would avoid us to do that, sounds like it's not
the case.

I can only see the page_table_lock in there, so in theory the
truncate_write could enter the page_table_lock critical section on ia64.

So I guess we need both an explicit smp_wmb() before atomic_inc (to
serialize with i_size_write on ia64 which is 64bit and doesn't have any
implcit locking there) and smp_mb() after atomic_inc to fix this on ia64
too. But it really should be a smp_mb__after_atomic_inc kind of thing,
or we'll bite on x86 performance (in theory even the smp_wmb should be a
smp_wmb__before_atomic_inc, though smp_wmb is zerocost on x86 so it has
less impact ... ;).

As said in practice x86 and x86-64 are already rock solid with 2.6.10,
because atomic_add is an implicit smp_mb() before and after the
atomic_inc there.

> I am sure there might be other ways to clean up this
> code. Some documentation could not hurt, it could save
> everyone's head hurting when they look at this code!

Indeed.

> Btw, do all callers of vmtruncate() guarantee they do
> not concurrently invoke vmtruncate() on the same file?
> Seems like they could be stepping on each other while
> updating i_size ...

i_sem for that (and i_truncate_sem in 2.4), no problem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  parent reply	other threads:[~2005-01-14 22:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-13 20:26 Kanoj Sarcar
2005-01-13 20:39 ` William Lee Irwin III
2005-01-13 21:02   ` Kanoj Sarcar
2005-01-13 21:06     ` Andi Kleen
2005-01-13 21:29       ` Kanoj Sarcar
2005-01-13 21:59         ` Anton Blanchard
2005-01-13 23:22           ` Kanoj Sarcar
2005-01-14 20:37             ` Hugh Dickins
2005-01-14 21:14               ` Kanoj Sarcar
2005-01-14 21:38                 ` Andrea Arcangeli
2005-01-14 22:09                 ` Hugh Dickins
2005-01-14 22:34                   ` Andrea Arcangeli
2005-01-14 21:25               ` Andrea Arcangeli
2005-01-14 21:32                 ` Andrea Arcangeli
2005-01-14 22:22                   ` Kanoj Sarcar
2005-01-14 22:47                     ` Hugh Dickins
2005-01-14 22:51                     ` Andrea Arcangeli [this message]
2005-01-14 23:14                       ` Kanoj Sarcar
2005-01-14 23:26                         ` Andrea Arcangeli
2005-01-14 22:36                   ` Hugh Dickins
2005-01-14 23:01                     ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050114225159.GO8709@dualathlon.random \
    --to=andrea@suse.de \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=anton@samba.org \
    --cc=davem@redhat.com \
    --cc=hugh@veritas.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@osdl.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox