smp_rmb in mm/memory.c in 2.6.10

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* smp_rmb in mm/memory.c in 2.6.10
@ 2005-01-13 20:26 Kanoj Sarcar
  2005-01-13 20:39 ` William Lee Irwin III
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-13 20:26 UTC (permalink / raw)
  To: linux-mm

Hi folks,

I am trying to understand the usage of smp_rmb() in
mm/memory.c in 2.6.10.

This is my understanding of the relevant parts of
do_no_page():

1. If vma is file backed, snapshot truncate_count
(without holding page_table_lock).
2. smp_rmb() makes sure the above snapshot is
complete.
3. nopage() then figures out the physical address of
the file page.
4. Get page_table_lock.
5. Reread truncate_count, and decide whether to retry
operation.

What I don't understand is that in step 1,
truncate_count is read without page_table_lock, but in
step 5, it is read with page_table_lock. A consistent
approach would seem to be either always snapshot
truncate_count with page_table_lock, and hence do away
with the smp_rmb() in step 2; or always snapshot
truncate_count without page_table_lock, in which case
do not grab page_table_lock in step 4, but rather do
another smp_rmb() (page_table_lock can be grabbed as
step 6). Isn't that reasonable?

The second question is that even though truncate_count
is declared atomic (ie probably volatile on most
architectures), that does not make gcc guarantee
anything in terms of ordering, right?

Finally, does anyone really believe that a smp_rmb()
is required in step 2? My logic is that nopage() is
guaranteed to grab/release (spin)locks etc as part of
its processing, and that would force the snapshots of
truncate_count to be properly ordered.

Thanks.

Kanoj

__________________________________ 
Do you Yahoo!? 
All your favorites on one personal page ? Try My Yahoo!
http://my.yahoo.com 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 20:26 smp_rmb in mm/memory.c in 2.6.10 Kanoj Sarcar
@ 2005-01-13 20:39 ` William Lee Irwin III
  2005-01-13 21:02   ` Kanoj Sarcar
  0 siblings, 1 reply; 21+ messages in thread
From: William Lee Irwin III @ 2005-01-13 20:39 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-mm

On Thu, Jan 13, 2005 at 12:26:42PM -0800, Kanoj Sarcar wrote:
> The second question is that even though truncate_count
> is declared atomic (ie probably volatile on most
> architectures), that does not make gcc guarantee
> anything in terms of ordering, right?
> Finally, does anyone really believe that a smp_rmb()
> is required in step 2? My logic is that nopage() is
> guaranteed to grab/release (spin)locks etc as part of
> its processing, and that would force the snapshots of
> truncate_count to be properly ordered.

spin_unlock() does not imply a memory barrier. e.g. on ia32 it's
not even an atomic operation.


-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 20:39 ` William Lee Irwin III
@ 2005-01-13 21:02   ` Kanoj Sarcar
  2005-01-13 21:06     ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-13 21:02 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-mm

--- William Lee Irwin III <wli@holomorphy.com> wrote:

> On Thu, Jan 13, 2005 at 12:26:42PM -0800, Kanoj
> Sarcar wrote:
> > The second question is that even though
> truncate_count
> > is declared atomic (ie probably volatile on most
> > architectures), that does not make gcc guarantee
> > anything in terms of ordering, right?
> > Finally, does anyone really believe that a
> smp_rmb()
> > is required in step 2? My logic is that nopage()
> is
> > guaranteed to grab/release (spin)locks etc as part
> of
> > its processing, and that would force the snapshots
> of
> > truncate_count to be properly ordered.
> 
> spin_unlock() does not imply a memory barrier. e.g.
> on ia32 it's
> not even an atomic operation.

In include/asm-i386/spinlock.h, spin_unlock_string has
a "xchgb" (in case its required). That should be
enough  of a barrier for the hardware, no? 

Thanks.

Kanoj

> 
> 
> -- wli
> --
> To unsubscribe, send a message with 'unsubscribe
> linux-mm' in
> the body to majordomo@kvack.org.  For more info on
> Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"aart@kvack.org">
> aart@kvack.org </a>
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 21:02   ` Kanoj Sarcar
@ 2005-01-13 21:06     ` Andi Kleen
  2005-01-13 21:29       ` Kanoj Sarcar
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-01-13 21:06 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: William Lee Irwin III, linux-mm

> In include/asm-i386/spinlock.h, spin_unlock_string has
> a "xchgb" (in case its required). That should be
> enough  of a barrier for the hardware, no? 

It is, but only for broken PPros or OOSTORE system
(currently only VIA C3). For kernels compiled for non broken CPUs  
there isn't any kind of barrier. 

-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 21:06     ` Andi Kleen
@ 2005-01-13 21:29       ` Kanoj Sarcar
  2005-01-13 21:59         ` Anton Blanchard
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-13 21:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: William Lee Irwin III, linux-mm

--- Andi Kleen <ak@suse.de> wrote:

> > In include/asm-i386/spinlock.h, spin_unlock_string
> has
> > a "xchgb" (in case its required). That should be
> > enough  of a barrier for the hardware, no? 
> 
> It is, but only for broken PPros or OOSTORE system
> (currently only VIA C3). For kernels compiled for
> non broken CPUs  
> there isn't any kind of barrier. 
> 
> -Andi

Okay, I think I see what you and wli meant. But the
assumption that spin_lock will order memory operations
is still correct, right?

Going back to what I meant in the first place, the
memory.c code is doing something like 1. read
truncate_count, 2. invoke nopage, which will probably
get locks, which will ensure the read of
truncate_count is complete, right? So, the original
point that smp_rmb() is not required (at least in the
position it currently is in) still holds, correct?

Thanks.

Kanoj


		
__________________________________ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 
http://mobile.yahoo.com/maildemo 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 21:29       ` Kanoj Sarcar
@ 2005-01-13 21:59         ` Anton Blanchard
  2005-01-13 23:22           ` Kanoj Sarcar
  0 siblings, 1 reply; 21+ messages in thread
From: Anton Blanchard @ 2005-01-13 21:59 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Andi Kleen, William Lee Irwin III, linux-mm

 
Hi Kanoj,

> Okay, I think I see what you and wli meant. But the assumption that
> spin_lock will order memory operations is still correct, right?

A spin_lock will only guarantee loads and stores inside the locked
region dont leak outside. Loads and stores before the spin_lock may leak
into the critical region. Likewise loads and stores after the
spin_unlock may leak into the critical region.

Also they dont guarantee ordering for cache inhibited loads and stores.

Anton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 21:59         ` Anton Blanchard
@ 2005-01-13 23:22           ` Kanoj Sarcar
  2005-01-14 20:37             ` Hugh Dickins
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-13 23:22 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Andi Kleen, William Lee Irwin III, linux-mm, davem

Hi Anton,

Thanks, I think this explains it. IE, if do_no_page()
reads truncate_count, and then later goes on to
acquire a lock in nopage(), the smp_rmb() is
guaranteeing that the read of truncate_count completes
before nopage() starts executing. 

For x86 at least, it seems to me that since the
spin_lock (in nopage()) uses a "lock" instruction,
that itself guarantees that the truncate_count read is
completed, even without the smp_rmb(). (Refer to IA32
SDM Vol 3 section 7.2.4 last para page 7-11). Thus for
x86, the smp_rmb is superfluous.

See below also.

--- Anton Blanchard <anton@samba.org> wrote:

>  
> Hi Kanoj,
> 
> > Okay, I think I see what you and wli meant. But
> the assumption that
> > spin_lock will order memory operations is still
> correct, right?
> 
> A spin_lock will only guarantee loads and stores
> inside the locked
> region dont leak outside. Loads and stores before
> the spin_lock may leak
> into the critical region. Likewise loads and stores
> after the
> spin_unlock may leak into the critical region.

Looking at the membars in
include/asm-sparc64/spinlock.h, what the sparc port
seems to be doing is guranteeing that no stores prior
to the spinlock() can leak into the critical region,
and no stores in the critical region can leak outside
the spinunlock.

Thanks.

Kanoj

> 
> Also they dont guarantee ordering for cache
> inhibited loads and stores.
> 
> Anton
> --
> To unsubscribe, send a message with 'unsubscribe
> linux-mm' in
> the body to majordomo@kvack.org.  For more info on
> Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"aart@kvack.org">
> aart@kvack.org </a>
> 

__________________________________ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-13 23:22           ` Kanoj Sarcar
@ 2005-01-14 20:37             ` Hugh Dickins
  2005-01-14 21:14               ` Kanoj Sarcar
  2005-01-14 21:25               ` Andrea Arcangeli
  0 siblings, 2 replies; 21+ messages in thread
From: Hugh Dickins @ 2005-01-14 20:37 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Anton Blanchard, Andi Kleen, William Lee Irwin III,
	Andrea Arcangeli, linux-mm, davem

On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> 
> Thanks, I think this explains it. IE, if do_no_page()
> reads truncate_count, and then later goes on to
> acquire a lock in nopage(), the smp_rmb() is
> guaranteeing that the read of truncate_count completes
> before nopage() starts executing. 
> 
> For x86 at least, it seems to me that since the
> spin_lock (in nopage()) uses a "lock" instruction,
> that itself guarantees that the truncate_count read is
> completed, even without the smp_rmb(). (Refer to IA32
> SDM Vol 3 section 7.2.4 last para page 7-11). Thus for
> x86, the smp_rmb is superfluous.

You're making me nervous.  If you look at 2.6.11-rc1 you'll find
that I too couldn't see the point of that smp_rmb(), on any architecture,
and so removed it; while also removing the "atomicity" of truncate_count.

Here was my comment to that patch:
> Why is mapping->truncate_count atomic?  It's incremented inside
> i_mmap_lock (and i_sem), and the reads don't need it to be atomic.
> 
> And why smp_rmb() before call to ->nopage?  The compiler cannot reorder
> the initial assignment of sequence after the call to ->nopage, and no
> cpu (yet!) can read from the future, which is all that matters there.

Now I'm not so convinced by that "no cpu can read from the future".

I don't entirely follow your remarks above, but I do think people
on this thread have a better grasp of these matters than I have:
does anyone now think that smp_rmb() needs to be restored?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 20:37             ` Hugh Dickins
@ 2005-01-14 21:14               ` Kanoj Sarcar
  2005-01-14 21:38                 ` Andrea Arcangeli
  2005-01-14 22:09                 ` Hugh Dickins
  2005-01-14 21:25               ` Andrea Arcangeli
  1 sibling, 2 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-14 21:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Anton Blanchard, Andi Kleen, William Lee Irwin III,
	Andrea Arcangeli, linux-mm, davem

Hi Hugh,

Here are the relevant steps of the two procedures:

do_no_page()
1. sequence = atomic_read(&mapping->truncate_count);
2. smp_rmb();
3. vma->vm_ops->nopage()
4. spin_lock(&mm->page_table_lock);
5. Retry if sequence !=
atomic_read(&mapping->truncate_count)
5a. See later.
6. update_mmu_cache()
7. spin_unlock(&mm->page_table_lock);

unmap_mapping_range()
8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */
9. atomic_inc(&mapping->truncate_count);
10.zap_page_range():spin_lock(&mm->page_table_lock);
zap_page_range():tlbcleaning
zap_page_range():spin_unlock(&mm->page_table_lock)
11. spin_unlock(&mapping->i_mmap_lock);



--- Hugh Dickins <hugh@veritas.com> wrote:

> On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> > 
> > Thanks, I think this explains it. IE, if
> do_no_page()
> > reads truncate_count, and then later goes on to
> > acquire a lock in nopage(), the smp_rmb() is
> > guaranteeing that the read of truncate_count
> completes
> > before nopage() starts executing. 
> > 
> > For x86 at least, it seems to me that since the
> > spin_lock (in nopage()) uses a "lock" instruction,
> > that itself guarantees that the truncate_count
> read is
> > completed, even without the smp_rmb(). (Refer to
> IA32
> > SDM Vol 3 section 7.2.4 last para page 7-11). Thus
> for
> > x86, the smp_rmb is superfluous.
> 
> You're making me nervous.  If you look at 2.6.11-rc1
> you'll find
> that I too couldn't see the point of that smp_rmb(),
> on any architecture,
> and so removed it; while also removing the
> "atomicity" of truncate_count.

I haven't looked at the 2.6.11 code, but you could
look at atomicity and smp_rmb() as two different
changes. I believe the ordering of the C code in steps
8 and 9 could be interchanged without any problems, ie
truncate_count is not protected by i_mmap_lock. In
that case, you would need truncate_count to be atomic,
unless you can guarantee unmap_mapping_range() is
single threaded wrt "mapping" from callers.  

> 
> Here was my comment to that patch:
> > Why is mapping->truncate_count atomic?  It's
> incremented inside
> > i_mmap_lock (and i_sem), and the reads don't need
> it to be atomic.
> > 
> > And why smp_rmb() before call to ->nopage?  The
> compiler cannot reorder
> > the initial assignment of sequence after the call
> to ->nopage, and no
> > cpu (yet!) can read from the future, which is all
> that matters there.
> 
> Now I'm not so convinced by that "no cpu can read
> from the future".
> 
> I don't entirely follow your remarks above, but I do
> think people
> on this thread have a better grasp of these matters
> than I have:
> does anyone now think that smp_rmb() needs to be
> restored?

As to the smp_rmb() part, I believe it is required; we
are not talking about compiler reorderings, rather cpu
reorderings. Given just steps 1 and 3 above, there is
no guarantee from the cpu that the read of
truncate_count would not be performed before nopage()
is almost complete, even though the compiler generated
the proper instruction order (ie the cpu could pull
down the read of truncate_count). You do not need a
similar smp_rmb() before step 5, because the
spin_lock() in step4 will prevent the cpu from pulling
up the read of truncate_count to any earlier than
step4. The other part is that the spin_lock() in step4
can not be moved down to step 5a, because that opens a
race with the unmap_mapping_range() code.

Whoever wrote this code did a careful job.

Kanoj
 
> 
> Hugh
> 
> 



		
__________________________________ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 20:37             ` Hugh Dickins
  2005-01-14 21:14               ` Kanoj Sarcar
@ 2005-01-14 21:25               ` Andrea Arcangeli
  2005-01-14 21:32                 ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 21:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem

On Fri, Jan 14, 2005 at 08:37:58PM +0000, Hugh Dickins wrote:
> On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> > 
> > Thanks, I think this explains it. IE, if do_no_page()
> > reads truncate_count, and then later goes on to
> > acquire a lock in nopage(), the smp_rmb() is
> > guaranteeing that the read of truncate_count completes
> > before nopage() starts executing. 
> > 
> > For x86 at least, it seems to me that since the
> > spin_lock (in nopage()) uses a "lock" instruction,
> > that itself guarantees that the truncate_count read is
> > completed, even without the smp_rmb(). (Refer to IA32
> > SDM Vol 3 section 7.2.4 last para page 7-11). Thus for
> > x86, the smp_rmb is superfluous.
> 
> You're making me nervous.  If you look at 2.6.11-rc1 you'll find
> that I too couldn't see the point of that smp_rmb(), on any architecture,
> and so removed it; while also removing the "atomicity" of truncate_count.
> 
> Here was my comment to that patch:
> > Why is mapping->truncate_count atomic?  It's incremented inside
> > i_mmap_lock (and i_sem), and the reads don't need it to be atomic.
> > 
> > And why smp_rmb() before call to ->nopage?  The compiler cannot reorder
> > the initial assignment of sequence after the call to ->nopage, and no
> > cpu (yet!) can read from the future, which is all that matters there.
> 
> Now I'm not so convinced by that "no cpu can read from the future".
> 
> I don't entirely follow your remarks above, but I do think people
> on this thread have a better grasp of these matters than I have:
> does anyone now think that smp_rmb() needs to be restored?

You could have asked even before breaking mainline ;).

The rmb serializes the read of truncate_count with the read of
inode->i_size. The rmb is definitely required, and I would leave it an
atomic op to be sure gcc doesn't outsmart unmap_mapping_range_list (gcc
can see the internals of unmap_mapping_range_list). I mean just in case.
We must increase that piece of ram before we truncate the ptes and after
we updated the i_size.

Infact it seems to me right now that we miss a smp_wmb() right before
atomic_inc(&mapping->truncate_count): the spin_lock has inclusive
semantics on ia64, and in turn the i_size update could happen after the
atomic_inc without a smp_wmb().

So please backout the buggy changes and add the smp_wmb() to fix this
ia64 altix race.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 21:25               ` Andrea Arcangeli
@ 2005-01-14 21:32                 ` Andrea Arcangeli
  2005-01-14 22:22                   ` Kanoj Sarcar
  2005-01-14 22:36                   ` Hugh Dickins
  0 siblings, 2 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 21:32 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

Added Andrew and Linus since they were the other signers of the buggy
changeset, so they can review I'm not missing something too.

	http://linux.bkbits.net:8080/linux-2.5/cset%401.2247.12.78?nav=index.html|ChangeSet@-7d

Thanks.

On Fri, Jan 14, 2005 at 10:25:33PM +0100, Andrea Arcangeli wrote:
> On Fri, Jan 14, 2005 at 08:37:58PM +0000, Hugh Dickins wrote:
> > On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> > > 
> > > Thanks, I think this explains it. IE, if do_no_page()
> > > reads truncate_count, and then later goes on to
> > > acquire a lock in nopage(), the smp_rmb() is
> > > guaranteeing that the read of truncate_count completes
> > > before nopage() starts executing. 
> > > 
> > > For x86 at least, it seems to me that since the
> > > spin_lock (in nopage()) uses a "lock" instruction,
> > > that itself guarantees that the truncate_count read is
> > > completed, even without the smp_rmb(). (Refer to IA32
> > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus for
> > > x86, the smp_rmb is superfluous.
> > 
> > You're making me nervous.  If you look at 2.6.11-rc1 you'll find
> > that I too couldn't see the point of that smp_rmb(), on any architecture,
> > and so removed it; while also removing the "atomicity" of truncate_count.
> > 
> > Here was my comment to that patch:
> > > Why is mapping->truncate_count atomic?  It's incremented inside
> > > i_mmap_lock (and i_sem), and the reads don't need it to be atomic.
> > > 
> > > And why smp_rmb() before call to ->nopage?  The compiler cannot reorder
> > > the initial assignment of sequence after the call to ->nopage, and no
> > > cpu (yet!) can read from the future, which is all that matters there.
> > 
> > Now I'm not so convinced by that "no cpu can read from the future".
> > 
> > I don't entirely follow your remarks above, but I do think people
> > on this thread have a better grasp of these matters than I have:
> > does anyone now think that smp_rmb() needs to be restored?
> 
> You could have asked even before breaking mainline ;).
> 
> The rmb serializes the read of truncate_count with the read of
> inode->i_size. The rmb is definitely required, and I would leave it an
> atomic op to be sure gcc doesn't outsmart unmap_mapping_range_list (gcc
> can see the internals of unmap_mapping_range_list). I mean just in case.
> We must increase that piece of ram before we truncate the ptes and after
> we updated the i_size.
> 
> Infact it seems to me right now that we miss a smp_wmb() right before
> atomic_inc(&mapping->truncate_count): the spin_lock has inclusive
> semantics on ia64, and in turn the i_size update could happen after the
> atomic_inc without a smp_wmb().
> 
> So please backout the buggy changes and add the smp_wmb() to fix this
> ia64 altix race.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 21:14               ` Kanoj Sarcar
@ 2005-01-14 21:38                 ` Andrea Arcangeli
  2005-01-14 22:09                 ` Hugh Dickins
  1 sibling, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 21:38 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

On Fri, Jan 14, 2005 at 01:14:40PM -0800, Kanoj Sarcar wrote:
> 
> unmap_mapping_range()
> 8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */
> 9. atomic_inc(&mapping->truncate_count);

The above trace doesn't start there, it's the i_size_write/read we're
serializing against. You should start the above with i_size_write, then
it would make sense.

We probably thought the above spinlock was relevant (and that's why
there's no smp_wmb there yet) but it isn't, because it has inclusive
semantics, and in turn the i_size_write can pass it and it can even pass
atomic_inc (in common code terms of course, on x86 not). So we need a
smp_wmb() before atomic_inc too to be correct (not an issue for x86).
While the reader part (i.e. the smp_rmb erroneously removed), is an
issue at the very least for x86-64 and probably for x86 too despite the
increased 64bit locking for the long long i_size. since those x86* archs
can reorder reads (but not writes). (and atomic_read isn't implying any
cpu barrier, it's only a compiler barrier) So the bug you opened up is
real, while the missing smp_wmb I just noticed is not real for x86* and
it's theoretical only for ia64.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 21:14               ` Kanoj Sarcar
  2005-01-14 21:38                 ` Andrea Arcangeli
@ 2005-01-14 22:09                 ` Hugh Dickins
  2005-01-14 22:34                   ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2005-01-14 22:09 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Anton Blanchard, Andi Kleen, William Lee Irwin III,
	Andrea Arcangeli, linux-mm, davem

On Fri, 14 Jan 2005, Kanoj Sarcar wrote:
> 
> Here are the relevant steps of the two procedures:
> 
> do_no_page()
> 1. sequence = atomic_read(&mapping->truncate_count);
> 2. smp_rmb();
> 3. vma->vm_ops->nopage()
> 4. spin_lock(&mm->page_table_lock);
> 5. Retry if sequence !=
> atomic_read(&mapping->truncate_count)
> 5a. See later.
> 6. update_mmu_cache()
> 7. spin_unlock(&mm->page_table_lock);
> 
> unmap_mapping_range()
> 8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */
> 9. atomic_inc(&mapping->truncate_count);
> 10.zap_page_range():spin_lock(&mm->page_table_lock);
> zap_page_range():tlbcleaning
> zap_page_range():spin_unlock(&mm->page_table_lock)
> 11. spin_unlock(&mapping->i_mmap_lock);

Yes (except that 8 is somewhat relevant to removing atomicity;
I say somewhat because there's also an exclusive i_sem protecting).

> --- Hugh Dickins <hugh@veritas.com> wrote:
> > On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> > > 
> > > Thanks, I think this explains it. IE, if
> > do_no_page()
> > > reads truncate_count, and then later goes on to
> > > acquire a lock in nopage(), the smp_rmb() is
> > > guaranteeing that the read of truncate_count
> > completes
> > > before nopage() starts executing. 
> > > 
> > > For x86 at least, it seems to me that since the
> > > spin_lock (in nopage()) uses a "lock" instruction,
> > > that itself guarantees that the truncate_count
> > read is
> > > completed, even without the smp_rmb(). (Refer to
> > IA32
> > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus
> > for
> > > x86, the smp_rmb is superfluous.
> > 
> > You're making me nervous.  If you look at 2.6.11-rc1
> > you'll find
> > that I too couldn't see the point of that smp_rmb(),
> > on any architecture,
> > and so removed it; while also removing the
> > "atomicity" of truncate_count.
> 
> I haven't looked at the 2.6.11 code,

Please do if you have time.

> but you could look at atomicity and smp_rmb()
> as two different changes.

Definitely (oh, the shame that I put them together in one patch!)

> I believe the ordering of the C code in steps
> 8 and 9 could be interchanged without any problems, ie
> truncate_count is not protected by i_mmap_lock. In
> that case, you would need truncate_count to be atomic,
> unless you can guarantee unmap_mapping_range() is
> single threaded wrt "mapping" from callers.  

Right, but given the ordering 8 before 9,
there is no point to truncate_count being atomic.

> > Here was my comment to that patch:
> > > Why is mapping->truncate_count atomic?  It's
> > incremented inside
> > > i_mmap_lock (and i_sem), and the reads don't need
> > it to be atomic.
> > > 
> > > And why smp_rmb() before call to ->nopage?  The
> > compiler cannot reorder
> > > the initial assignment of sequence after the call
> > to ->nopage, and no
> > > cpu (yet!) can read from the future, which is all
> > that matters there.
> > 
> > Now I'm not so convinced by that "no cpu can read
> > from the future".
> > 
> > I don't entirely follow your remarks above, but I do
> > think people
> > on this thread have a better grasp of these matters
> > than I have:
> > does anyone now think that smp_rmb() needs to be
> > restored?
> 
> As to the smp_rmb() part, I believe it is required; we
> are not talking about compiler reorderings,

Did need to be considered, but I still agree with
myself that the function call makes it no problem.

> rather cpu
> reorderings. Given just steps 1 and 3 above, there is
> no guarantee from the cpu that the read of
> truncate_count would not be performed before nopage()
> is almost complete, even though the compiler generated
> the proper instruction order (ie the cpu could pull
> down the read of truncate_count).

This is your crucial point.  Now I think you're right.

But I have remembered how I was thinking at the time,
what's behind my "no cpu can read from the future" remark.

Suppose unmap_mapping_range is incrementing truncate_count
from 0 to 1.  I could conceive of do_no_page's read into
"sequence" not completing until the spin_lock at step 4.
But I believed that the read issued before ->nopage could
only err on the safe side, sometimes fetching 0 instead of 1
when 1 would already be safe, but never seeing 1 too soon.

That belief was naive, wasn't it?  I was thinking in terms
of "slow" instructions rather than reordered instructions.

> Whoever wrote this code did a careful job.

It was Andrea (one reason I've copied him now -
as I did when posting the patch to remove it).

Unless someone sees this differently, I should send a patch to
restore the smp_rmb(), with a longer code comment on what it's for.

Thanks a lot for your detailed answer.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 21:32                 ` Andrea Arcangeli
@ 2005-01-14 22:22                   ` Kanoj Sarcar
  2005-01-14 22:47                     ` Hugh Dickins
  2005-01-14 22:51                     ` Andrea Arcangeli
  2005-01-14 22:36                   ` Hugh Dickins
  1 sibling, 2 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-14 22:22 UTC (permalink / raw)
  To: Andrea Arcangeli, Hugh Dickins
  Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

Hi Andrea,

I read through your other pieces of email, but
responding on this to catch Linus and Andrew. In a
prior mail, you said:

"Infact it seems to me right now that we miss a
smp_wmb() right before
atomic_inc(&mapping->truncate_count): the spin_lock
has inclusive semantics on ia64, and in turn the
i_size update could happen after the atomic_inc
without a smp_wmb().
                                                      
                         
So please backout the buggy changes and add the
smp_wmb() to fix this ia64 altix race."

I haven't really tracked 2.6 closely, so pardon any
omissions I make in these comments ...

Note that vmtruncate() does a i_size_write(), which
does a write_seqcount_end() after updating the i_size,
which has an embedded smp_wmb() right after the i_size
update, so the case you are talking about is already
handled. No? (Btw, I did not look at i_size_write() in
the case of !CONFIG_SMP and CONFIG_PREEMPT, there
might need to be some barriers put in there, not
sure).

But, based on what you said, yes, I believe an
smp_wmb() is required _after_
atomic_inc(truncate_count) in unmap_mapping_range() to
ensure that the write happens before  it does the TLB
shootdown. Right?

I am sure there might be other ways to clean up this
code. Some documentation could not hurt, it could save
everyone's head hurting when they look at this code!

Btw, do all callers of vmtruncate() guarantee they do
not concurrently invoke vmtruncate() on the same file?
Seems like they could be stepping on each other while
updating i_size ...

Thanks.

Kanoj



--- Andrea Arcangeli <andrea@suse.de> wrote:

> Added Andrew and Linus since they were the other
> signers of the buggy
> changeset, so they can review I'm not missing
> something too.
> 
> 
>
http://linux.bkbits.net:8080/linux-2.5/cset%401.2247.12.78?nav=index.html|ChangeSet@-7d
> 
> Thanks.
> 
> On Fri, Jan 14, 2005 at 10:25:33PM +0100, Andrea
> Arcangeli wrote:
> > On Fri, Jan 14, 2005 at 08:37:58PM +0000, Hugh
> Dickins wrote:
> > > On Thu, 13 Jan 2005, Kanoj Sarcar wrote:
> > > > 
> > > > Thanks, I think this explains it. IE, if
> do_no_page()
> > > > reads truncate_count, and then later goes on
> to
> > > > acquire a lock in nopage(), the smp_rmb() is
> > > > guaranteeing that the read of truncate_count
> completes
> > > > before nopage() starts executing. 
> > > > 
> > > > For x86 at least, it seems to me that since
> the
> > > > spin_lock (in nopage()) uses a "lock"
> instruction,
> > > > that itself guarantees that the truncate_count
> read is
> > > > completed, even without the smp_rmb(). (Refer
> to IA32
> > > > SDM Vol 3 section 7.2.4 last para page 7-11).
> Thus for
> > > > x86, the smp_rmb is superfluous.
> > > 
> > > You're making me nervous.  If you look at
> 2.6.11-rc1 you'll find
> > > that I too couldn't see the point of that
> smp_rmb(), on any architecture,
> > > and so removed it; while also removing the
> "atomicity" of truncate_count.
> > > 
> > > Here was my comment to that patch:
> > > > Why is mapping->truncate_count atomic?  It's
> incremented inside
> > > > i_mmap_lock (and i_sem), and the reads don't
> need it to be atomic.
> > > > 
> > > > And why smp_rmb() before call to ->nopage? 
> The compiler cannot reorder
> > > > the initial assignment of sequence after the
> call to ->nopage, and no
> > > > cpu (yet!) can read from the future, which is
> all that matters there.
> > > 
> > > Now I'm not so convinced by that "no cpu can
> read from the future".
> > > 
> > > I don't entirely follow your remarks above, but
> I do think people
> > > on this thread have a better grasp of these
> matters than I have:
> > > does anyone now think that smp_rmb() needs to be
> restored?
> > 
> > You could have asked even before breaking mainline
> ;).
> > 
> > The rmb serializes the read of truncate_count with
> the read of
> > inode->i_size. The rmb is definitely required, and
> I would leave it an
> > atomic op to be sure gcc doesn't outsmart
> unmap_mapping_range_list (gcc
> > can see the internals of
> unmap_mapping_range_list). I mean just in case.
> > We must increase that piece of ram before we
> truncate the ptes and after
> > we updated the i_size.
> > 
> > Infact it seems to me right now that we miss a
> smp_wmb() right before
> > atomic_inc(&mapping->truncate_count): the
> spin_lock has inclusive
> > semantics on ia64, and in turn the i_size update
> could happen after the
> > atomic_inc without a smp_wmb().
> > 
> > So please backout the buggy changes and add the
> smp_wmb() to fix this
> > ia64 altix race.
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 22:09                 ` Hugh Dickins
@ 2005-01-14 22:34                   ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 22:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem

> > As to the smp_rmb() part, I believe it is required; we
> > are not talking about compiler reorderings,
On Fri, Jan 14, 2005 at 10:09:17PM +0000, Hugh Dickins wrote:
> Did need to be considered, but I still agree with
> myself that the function call makes it no problem.

I believe gcc is learning how to get around function calls, in this case
it's a different file that we're calling so it's very unlikely to get us
compiler problems.

But the real reason of the smp_rmb is the cpu, the compiler not.

> as I did when posting the patch to remove it).

Woops ... I must have missed it sorry, I owe you an apology! It has been
a failry busy week here around (some kernel testing stuff has been going
on here, eventually the kernel was not to blame so all completed well ;).

> Unless someone sees this differently, I should send a patch to
> restore the smp_rmb(), with a longer code comment on what it's for.

Sure go ahead. I was thinking the same. Originally the code was more
obvious when I did it with two counters, and then Paul improved it with
a single counter but now it deserves a bit more of commentary.

Thanks Hugh and Kanoj!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 21:32                 ` Andrea Arcangeli
  2005-01-14 22:22                   ` Kanoj Sarcar
@ 2005-01-14 22:36                   ` Hugh Dickins
  2005-01-14 23:01                     ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Hugh Dickins @ 2005-01-14 22:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

On Fri, 14 Jan 2005, Andrea Arcangeli wrote:
> > 
> > You could have asked even before breaking mainline ;).

Sorry (but check your mailbox for 3rd October -
I'd hoped the patch would be more provocative than a question!)

> > The rmb serializes the read of truncate_count with the read of
> > inode->i_size.

Yes, that's a clearer way of putting it, thank you.

> > The rmb is definitely required, and I would leave it an
> > atomic op to be sure gcc doesn't outsmart unmap_mapping_range_list (gcc
> > can see the internals of unmap_mapping_range_list). I mean just in case.
> > We must increase that piece of ram before we truncate the ptes and after
> > we updated the i_size.

I don't follow your argument for atomic there - "just in case"?
I still see its atomic ops as serving no point (and it was
tiresome to extend their use in the patches that followed).

> > Infact it seems to me right now that we miss a smp_wmb() right before
> > atomic_inc(&mapping->truncate_count): the spin_lock has inclusive
> > semantics on ia64, and in turn the i_size update could happen after the
> > atomic_inc without a smp_wmb().

That's interesting, and I'm glad my screwup has borne some good fruit.
And an smp_rmb() in one place makes more sense to me if there's an
smp_wmb() in the complementary place (though I've a suspicion that
"making sense to me" is not the prime consideration here ;)

> > So please backout the buggy changes and add the smp_wmb() to fix this
> > ia64 altix race.

Will do, though not today.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 22:22                   ` Kanoj Sarcar
@ 2005-01-14 22:47                     ` Hugh Dickins
  2005-01-14 22:51                     ` Andrea Arcangeli
  1 sibling, 0 replies; 21+ messages in thread
From: Hugh Dickins @ 2005-01-14 22:47 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Andrea Arcangeli, Anton Blanchard, Andi Kleen,
	William Lee Irwin III, linux-mm, davem, Andrew Morton,
	Linus Torvalds

On Fri, 14 Jan 2005, Kanoj Sarcar wrote:
> 
> Note that vmtruncate() does a i_size_write(), which
> does a write_seqcount_end() after updating the i_size,
> which has an embedded smp_wmb() right after the i_size
> update, so the case you are talking about is already
> handled. No? (Btw, I did not look at i_size_write() in
> the case of !CONFIG_SMP and CONFIG_PREEMPT, there
> might need to be some barriers put in there, not
> sure).
> 
> But, based on what you said, yes, I believe an
> smp_wmb() is required _after_
> atomic_inc(truncate_count) in unmap_mapping_range() to
> ensure that the write happens before  it does the TLB
> shootdown. Right?

Hmm, I'd better look tomorrow to see where you and
Andrea have decided the smp_wmb()s should go.

> I am sure there might be other ways to clean up this
> code. Some documentation could not hurt, it could save
> everyone's head hurting when they look at this code!
> 
> Btw, do all callers of vmtruncate() guarantee they do
> not concurrently invoke vmtruncate() on the same file?
> Seems like they could be stepping on each other while
> updating i_size ...

We're on safer ground there: inode->i_sem is held.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 22:22                   ` Kanoj Sarcar
  2005-01-14 22:47                     ` Hugh Dickins
@ 2005-01-14 22:51                     ` Andrea Arcangeli
  2005-01-14 23:14                       ` Kanoj Sarcar
  1 sibling, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 22:51 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

On Fri, Jan 14, 2005 at 02:22:10PM -0800, Kanoj Sarcar wrote:
> handled. No? (Btw, I did not look at i_size_write() in
> the case of !CONFIG_SMP and CONFIG_PREEMPT, there
> might need to be some barriers put in there, not
> sure).

i_size_write is inode->i_size = i_size in common code terms, that's the
64bit case, and it's the one with the weaker semantics (in turn it's the
only one we can consider in terms of common code correctness). So it has
no barriers at all (nor cpu barriers, nor compiler barriers).

> But, based on what you said, yes, I believe an
> smp_wmb() is required _after_
> atomic_inc(truncate_count) in unmap_mapping_range() to
> ensure that the write happens before  it does the TLB
> shootdown. Right?

The smp_wmb() I mean should be put _before_ the truncate_count incrase.
It is mean to avoid the i_size_write to pass the truncate_count
increase (which can happen with spin_lock having inclusive semantics).

The order we must enforce at the cpu level to be correct is this: first
we set i_size, then we increase truncate_count to restart the page
fault, and finally we zap the ptes (and the zap starts by reading the
old contents set by the page fault). And it must be enforced with cpu
and compiler memory barries.

It seems you're right that we need an additional smp_mb() (not just
smp_wmb(), because the pte shootdown does reads first) even after the
truncate_count increase, but I thought the locking inside
unmap_mapping_range_list would avoid us to do that, sounds like it's not
the case.

I can only see the page_table_lock in there, so in theory the
truncate_write could enter the page_table_lock critical section on ia64.

So I guess we need both an explicit smp_wmb() before atomic_inc (to
serialize with i_size_write on ia64 which is 64bit and doesn't have any
implcit locking there) and smp_mb() after atomic_inc to fix this on ia64
too. But it really should be a smp_mb__after_atomic_inc kind of thing,
or we'll bite on x86 performance (in theory even the smp_wmb should be a
smp_wmb__before_atomic_inc, though smp_wmb is zerocost on x86 so it has
less impact ... ;).

As said in practice x86 and x86-64 are already rock solid with 2.6.10,
because atomic_add is an implicit smp_mb() before and after the
atomic_inc there.

> I am sure there might be other ways to clean up this
> code. Some documentation could not hurt, it could save
> everyone's head hurting when they look at this code!

Indeed.

> Btw, do all callers of vmtruncate() guarantee they do
> not concurrently invoke vmtruncate() on the same file?
> Seems like they could be stepping on each other while
> updating i_size ...

i_sem for that (and i_truncate_sem in 2.4), no problem.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 22:36                   ` Hugh Dickins
@ 2005-01-14 23:01                     ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 23:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

On Fri, Jan 14, 2005 at 10:36:17PM +0000, Hugh Dickins wrote:
> On Fri, 14 Jan 2005, Andrea Arcangeli wrote:
> > > 
> > > You could have asked even before breaking mainline ;).
> 
> Sorry (but check your mailbox for 3rd October -
> I'd hoped the patch would be more provocative than a question!)

Hmm I thought it was more recent, so I guess it could have been when I
got the @novell.com email downtime. I lost email for several days, then
I got back to @suse.de. Sorry anyway!

> I don't follow your argument for atomic there - "just in case"?
> I still see its atomic ops as serving no point (and it was
> tiresome to extend their use in the patches that followed).

Actually see the last email I posted, seems like we need both a
smp_wmb() before the increase and a smp_mb() after it. The reason is
that it must be done in that very order. And on x86 doing it with
atomic_inc would enforce it.

I definitely agree truncate_count can be done in C _after_ we add
smp_wmb() before the increase and smp_mb() after the increase.

Infact now that I think about this will also avoid us to implement
smp_wmb__before_atomic_add.

> That's interesting, and I'm glad my screwup has borne some good fruit.

Indeed ;). Me too.

> And an smp_rmb() in one place makes more sense to me if there's an
> smp_wmb() in the complementary place (though I've a suspicion that

Hmm, I assume you meant "there's _not_ an", otherwise I don't get it.

> Will do, though not today.

Thanks! The only problem here is ia64, few people runs test kernels in
production so it's not an hurry.

I also need to rediff my pending VM stuff for Andrew but I've been
extremely busy with other kernel stuff in the last few days, so I had no
time for that yet.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 22:51                     ` Andrea Arcangeli
@ 2005-01-14 23:14                       ` Kanoj Sarcar
  2005-01-14 23:26                         ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-14 23:14 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

Hi Andrea,

I will try to read and understand the rest of your
email later, but one point sticks out  ...

--- Andrea Arcangeli <andrea@suse.de> wrote:

> On Fri, Jan 14, 2005 at 02:22:10PM -0800, Kanoj
> Sarcar wrote:
> > handled. No? (Btw, I did not look at
> i_size_write() in
> > the case of !CONFIG_SMP and CONFIG_PREEMPT, there
> > might need to be some barriers put in there, not
> > sure).
> 
> i_size_write is inode->i_size = i_size in common
> code terms, that's the
> 64bit case, and it's the one with the weaker
> semantics (in turn it's the
> only one we can consider in terms of common code
> correctness). So it has
> no barriers at all (nor cpu barriers, nor compiler
> barriers).

Yes, I ignored the 64bit case, and I am surprised the
semantics are different. I can't see why it is a good
idea to want different memory barrier semantics out of
i_size_write() and friends depending on various CONFIG
options and architecture. Maybe you can argue
performance here, but is the slight advantage in
performance on certain architectures (which is
achieved nonetheless with over-barriering eg on x86)
worth the extra code maintenance hassles?

Thanks.

Kanoj
 
> 
> > But, based on what you said, yes, I believe an
> > smp_wmb() is required _after_
> > atomic_inc(truncate_count) in
> unmap_mapping_range() to
> > ensure that the write happens before  it does the
> TLB
> > shootdown. Right?
> 
> The smp_wmb() I mean should be put _before_ the
> truncate_count incrase.
> It is mean to avoid the i_size_write to pass the
> truncate_count
> increase (which can happen with spin_lock having
> inclusive semantics).
> 
> The order we must enforce at the cpu level to be
> correct is this: first
> we set i_size, then we increase truncate_count to
> restart the page
> fault, and finally we zap the ptes (and the zap
> starts by reading the
> old contents set by the page fault). And it must be
> enforced with cpu
> and compiler memory barries.
> 
> It seems you're right that we need an additional
> smp_mb() (not just
> smp_wmb(), because the pte shootdown does reads
> first) even after the
> truncate_count increase, but I thought the locking
> inside
> unmap_mapping_range_list would avoid us to do that,
> sounds like it's not
> the case.
> 
> I can only see the page_table_lock in there, so in
> theory the
> truncate_write could enter the page_table_lock
> critical section on ia64.
> 
> So I guess we need both an explicit smp_wmb() before
> atomic_inc (to
> serialize with i_size_write on ia64 which is 64bit
> and doesn't have any
> implcit locking there) and smp_mb() after atomic_inc
> to fix this on ia64
> too. But it really should be a
> smp_mb__after_atomic_inc kind of thing,
> or we'll bite on x86 performance (in theory even the
> smp_wmb should be a
> smp_wmb__before_atomic_inc, though smp_wmb is
> zerocost on x86 so it has
> less impact ... ;).
> 
> As said in practice x86 and x86-64 are already rock
> solid with 2.6.10,
> because atomic_add is an implicit smp_mb() before
> and after the
> atomic_inc there.
> 
> > I am sure there might be other ways to clean up
> this
> > code. Some documentation could not hurt, it could
> save
> > everyone's head hurting when they look at this
> code!
> 
> Indeed.
> 
> > Btw, do all callers of vmtruncate() guarantee they
> do
> > not concurrently invoke vmtruncate() on the same
> file?
> > Seems like they could be stepping on each other
> while
> > updating i_size ...
> 
> i_sem for that (and i_truncate_sem in 2.4), no
> problem.
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: smp_rmb in mm/memory.c in 2.6.10
  2005-01-14 23:14                       ` Kanoj Sarcar
@ 2005-01-14 23:26                         ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2005-01-14 23:26 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III,
	linux-mm, davem, Andrew Morton, Linus Torvalds

On Fri, Jan 14, 2005 at 03:14:22PM -0800, Kanoj Sarcar wrote:
> Yes, I ignored the 64bit case, and I am surprised the
> semantics are different. I can't see why it is a good
> idea to want different memory barrier semantics out of
> i_size_write() and friends depending on various CONFIG
> options and architecture. Maybe you can argue
> performance here, but is the slight advantage in
> performance on certain architectures (which is
> achieved nonetheless with over-barriering eg on x86)
> worth the extra code maintenance hassles?

The only reason there are barriers in the non-64bit cases is to
serialize i_size_read vs i_size_write (long long is not atomic with
gcc on 32bit), it wasn't meant to be anymore than that, on 64bit that's
automatic since gcc handles long long atomically natively (this isn't
really true in theory, but the pratical guys on l-k will never agree to
change that to assume standard C semantics that cannot guarantee
atomicity). See all other theorical race conditions I enlightened in
set_pte in all archs (which requires atomicity and in turn shouldn't be
implementd in C, it's exactly the same issue as i_size_write/read). So
in practice curent code is fine, and the locking was not meant to
serialize the truncate vs nopage race.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-01-14 23:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-13 20:26 smp_rmb in mm/memory.c in 2.6.10 Kanoj Sarcar
2005-01-13 20:39 ` William Lee Irwin III
2005-01-13 21:02   ` Kanoj Sarcar
2005-01-13 21:06     ` Andi Kleen
2005-01-13 21:29       ` Kanoj Sarcar
2005-01-13 21:59         ` Anton Blanchard
2005-01-13 23:22           ` Kanoj Sarcar
2005-01-14 20:37             ` Hugh Dickins
2005-01-14 21:14               ` Kanoj Sarcar
2005-01-14 21:38                 ` Andrea Arcangeli
2005-01-14 22:09                 ` Hugh Dickins
2005-01-14 22:34                   ` Andrea Arcangeli
2005-01-14 21:25               ` Andrea Arcangeli
2005-01-14 21:32                 ` Andrea Arcangeli
2005-01-14 22:22                   ` Kanoj Sarcar
2005-01-14 22:47                     ` Hugh Dickins
2005-01-14 22:51                     ` Andrea Arcangeli
2005-01-14 23:14                       ` Kanoj Sarcar
2005-01-14 23:26                         ` Andrea Arcangeli
2005-01-14 22:36                   ` Hugh Dickins
2005-01-14 23:01                     ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox