* smp_rmb in mm/memory.c in 2.6.10
@ 2005-01-13 20:26 Kanoj Sarcar
2005-01-13 20:39 ` William Lee Irwin III
0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2005-01-13 20:26 UTC (permalink / raw)
To: linux-mm
Hi folks,
I am trying to understand the usage of smp_rmb() in
mm/memory.c in 2.6.10.
This is my understanding of the relevant parts of
do_no_page():
1. If vma is file backed, snapshot truncate_count
(without holding page_table_lock).
2. smp_rmb() makes sure the above snapshot is
complete.
3. nopage() then figures out the physical address of
the file page.
4. Get page_table_lock.
5. Reread truncate_count, and decide whether to retry
operation.
What I don't understand is that in step 1,
truncate_count is read without page_table_lock, but in
step 5, it is read with page_table_lock. A consistent
approach would seem to be either always snapshot
truncate_count with page_table_lock, and hence do away
with the smp_rmb() in step 2; or always snapshot
truncate_count without page_table_lock, in which case
do not grab page_table_lock in step 4, but rather do
another smp_rmb() (page_table_lock can be grabbed as
step 6). Isn't that reasonable?
The second question is that even though truncate_count
is declared atomic (ie probably volatile on most
architectures), that does not make gcc guarantee
anything in terms of ordering, right?
Finally, does anyone really believe that a smp_rmb()
is required in step 2? My logic is that nopage() is
guaranteed to grab/release (spin)locks etc as part of
its processing, and that would force the snapshots of
truncate_count to be properly ordered.
Thanks.
Kanoj
__________________________________
Do you Yahoo!?
All your favorites on one personal page ? Try My Yahoo!
http://my.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 20:26 smp_rmb in mm/memory.c in 2.6.10 Kanoj Sarcar @ 2005-01-13 20:39 ` William Lee Irwin III 2005-01-13 21:02 ` Kanoj Sarcar 0 siblings, 1 reply; 21+ messages in thread From: William Lee Irwin III @ 2005-01-13 20:39 UTC (permalink / raw) To: Kanoj Sarcar; +Cc: linux-mm On Thu, Jan 13, 2005 at 12:26:42PM -0800, Kanoj Sarcar wrote: > The second question is that even though truncate_count > is declared atomic (ie probably volatile on most > architectures), that does not make gcc guarantee > anything in terms of ordering, right? > Finally, does anyone really believe that a smp_rmb() > is required in step 2? My logic is that nopage() is > guaranteed to grab/release (spin)locks etc as part of > its processing, and that would force the snapshots of > truncate_count to be properly ordered. spin_unlock() does not imply a memory barrier. e.g. on ia32 it's not even an atomic operation. -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 20:39 ` William Lee Irwin III @ 2005-01-13 21:02 ` Kanoj Sarcar 2005-01-13 21:06 ` Andi Kleen 0 siblings, 1 reply; 21+ messages in thread From: Kanoj Sarcar @ 2005-01-13 21:02 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-mm --- William Lee Irwin III <wli@holomorphy.com> wrote: > On Thu, Jan 13, 2005 at 12:26:42PM -0800, Kanoj > Sarcar wrote: > > The second question is that even though > truncate_count > > is declared atomic (ie probably volatile on most > > architectures), that does not make gcc guarantee > > anything in terms of ordering, right? > > Finally, does anyone really believe that a > smp_rmb() > > is required in step 2? My logic is that nopage() > is > > guaranteed to grab/release (spin)locks etc as part > of > > its processing, and that would force the snapshots > of > > truncate_count to be properly ordered. > > spin_unlock() does not imply a memory barrier. e.g. > on ia32 it's > not even an atomic operation. In include/asm-i386/spinlock.h, spin_unlock_string has a "xchgb" (in case its required). That should be enough of a barrier for the hardware, no? Thanks. Kanoj > > > -- wli > -- > To unsubscribe, send a message with 'unsubscribe > linux-mm' in > the body to majordomo@kvack.org. For more info on > Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"aart@kvack.org"> > aart@kvack.org </a> > __________________________________ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 21:02 ` Kanoj Sarcar @ 2005-01-13 21:06 ` Andi Kleen 2005-01-13 21:29 ` Kanoj Sarcar 0 siblings, 1 reply; 21+ messages in thread From: Andi Kleen @ 2005-01-13 21:06 UTC (permalink / raw) To: Kanoj Sarcar; +Cc: William Lee Irwin III, linux-mm > In include/asm-i386/spinlock.h, spin_unlock_string has > a "xchgb" (in case its required). That should be > enough of a barrier for the hardware, no? It is, but only for broken PPros or OOSTORE system (currently only VIA C3). For kernels compiled for non broken CPUs there isn't any kind of barrier. -Andi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 21:06 ` Andi Kleen @ 2005-01-13 21:29 ` Kanoj Sarcar 2005-01-13 21:59 ` Anton Blanchard 0 siblings, 1 reply; 21+ messages in thread From: Kanoj Sarcar @ 2005-01-13 21:29 UTC (permalink / raw) To: Andi Kleen; +Cc: William Lee Irwin III, linux-mm --- Andi Kleen <ak@suse.de> wrote: > > In include/asm-i386/spinlock.h, spin_unlock_string > has > > a "xchgb" (in case its required). That should be > > enough of a barrier for the hardware, no? > > It is, but only for broken PPros or OOSTORE system > (currently only VIA C3). For kernels compiled for > non broken CPUs > there isn't any kind of barrier. > > -Andi Okay, I think I see what you and wli meant. But the assumption that spin_lock will order memory operations is still correct, right? Going back to what I meant in the first place, the memory.c code is doing something like 1. read truncate_count, 2. invoke nopage, which will probably get locks, which will ensure the read of truncate_count is complete, right? So, the original point that smp_rmb() is not required (at least in the position it currently is in) still holds, correct? Thanks. Kanoj __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 21:29 ` Kanoj Sarcar @ 2005-01-13 21:59 ` Anton Blanchard 2005-01-13 23:22 ` Kanoj Sarcar 0 siblings, 1 reply; 21+ messages in thread From: Anton Blanchard @ 2005-01-13 21:59 UTC (permalink / raw) To: Kanoj Sarcar; +Cc: Andi Kleen, William Lee Irwin III, linux-mm Hi Kanoj, > Okay, I think I see what you and wli meant. But the assumption that > spin_lock will order memory operations is still correct, right? A spin_lock will only guarantee loads and stores inside the locked region dont leak outside. Loads and stores before the spin_lock may leak into the critical region. Likewise loads and stores after the spin_unlock may leak into the critical region. Also they dont guarantee ordering for cache inhibited loads and stores. Anton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 21:59 ` Anton Blanchard @ 2005-01-13 23:22 ` Kanoj Sarcar 2005-01-14 20:37 ` Hugh Dickins 0 siblings, 1 reply; 21+ messages in thread From: Kanoj Sarcar @ 2005-01-13 23:22 UTC (permalink / raw) To: Anton Blanchard; +Cc: Andi Kleen, William Lee Irwin III, linux-mm, davem Hi Anton, Thanks, I think this explains it. IE, if do_no_page() reads truncate_count, and then later goes on to acquire a lock in nopage(), the smp_rmb() is guaranteeing that the read of truncate_count completes before nopage() starts executing. For x86 at least, it seems to me that since the spin_lock (in nopage()) uses a "lock" instruction, that itself guarantees that the truncate_count read is completed, even without the smp_rmb(). (Refer to IA32 SDM Vol 3 section 7.2.4 last para page 7-11). Thus for x86, the smp_rmb is superfluous. See below also. --- Anton Blanchard <anton@samba.org> wrote: > > Hi Kanoj, > > > Okay, I think I see what you and wli meant. But > the assumption that > > spin_lock will order memory operations is still > correct, right? > > A spin_lock will only guarantee loads and stores > inside the locked > region dont leak outside. Loads and stores before > the spin_lock may leak > into the critical region. Likewise loads and stores > after the > spin_unlock may leak into the critical region. Looking at the membars in include/asm-sparc64/spinlock.h, what the sparc port seems to be doing is guranteeing that no stores prior to the spinlock() can leak into the critical region, and no stores in the critical region can leak outside the spinunlock. Thanks. Kanoj > > Also they dont guarantee ordering for cache > inhibited loads and stores. > > Anton > -- > To unsubscribe, send a message with 'unsubscribe > linux-mm' in > the body to majordomo@kvack.org. For more info on > Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"aart@kvack.org"> > aart@kvack.org </a> > __________________________________ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-13 23:22 ` Kanoj Sarcar @ 2005-01-14 20:37 ` Hugh Dickins 2005-01-14 21:14 ` Kanoj Sarcar 2005-01-14 21:25 ` Andrea Arcangeli 0 siblings, 2 replies; 21+ messages in thread From: Hugh Dickins @ 2005-01-14 20:37 UTC (permalink / raw) To: Kanoj Sarcar Cc: Anton Blanchard, Andi Kleen, William Lee Irwin III, Andrea Arcangeli, linux-mm, davem On Thu, 13 Jan 2005, Kanoj Sarcar wrote: > > Thanks, I think this explains it. IE, if do_no_page() > reads truncate_count, and then later goes on to > acquire a lock in nopage(), the smp_rmb() is > guaranteeing that the read of truncate_count completes > before nopage() starts executing. > > For x86 at least, it seems to me that since the > spin_lock (in nopage()) uses a "lock" instruction, > that itself guarantees that the truncate_count read is > completed, even without the smp_rmb(). (Refer to IA32 > SDM Vol 3 section 7.2.4 last para page 7-11). Thus for > x86, the smp_rmb is superfluous. You're making me nervous. If you look at 2.6.11-rc1 you'll find that I too couldn't see the point of that smp_rmb(), on any architecture, and so removed it; while also removing the "atomicity" of truncate_count. Here was my comment to that patch: > Why is mapping->truncate_count atomic? It's incremented inside > i_mmap_lock (and i_sem), and the reads don't need it to be atomic. > > And why smp_rmb() before call to ->nopage? The compiler cannot reorder > the initial assignment of sequence after the call to ->nopage, and no > cpu (yet!) can read from the future, which is all that matters there. Now I'm not so convinced by that "no cpu can read from the future". I don't entirely follow your remarks above, but I do think people on this thread have a better grasp of these matters than I have: does anyone now think that smp_rmb() needs to be restored? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 20:37 ` Hugh Dickins @ 2005-01-14 21:14 ` Kanoj Sarcar 2005-01-14 21:38 ` Andrea Arcangeli 2005-01-14 22:09 ` Hugh Dickins 2005-01-14 21:25 ` Andrea Arcangeli 1 sibling, 2 replies; 21+ messages in thread From: Kanoj Sarcar @ 2005-01-14 21:14 UTC (permalink / raw) To: Hugh Dickins Cc: Anton Blanchard, Andi Kleen, William Lee Irwin III, Andrea Arcangeli, linux-mm, davem Hi Hugh, Here are the relevant steps of the two procedures: do_no_page() 1. sequence = atomic_read(&mapping->truncate_count); 2. smp_rmb(); 3. vma->vm_ops->nopage() 4. spin_lock(&mm->page_table_lock); 5. Retry if sequence != atomic_read(&mapping->truncate_count) 5a. See later. 6. update_mmu_cache() 7. spin_unlock(&mm->page_table_lock); unmap_mapping_range() 8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */ 9. atomic_inc(&mapping->truncate_count); 10.zap_page_range():spin_lock(&mm->page_table_lock); zap_page_range():tlbcleaning zap_page_range():spin_unlock(&mm->page_table_lock) 11. spin_unlock(&mapping->i_mmap_lock); --- Hugh Dickins <hugh@veritas.com> wrote: > On Thu, 13 Jan 2005, Kanoj Sarcar wrote: > > > > Thanks, I think this explains it. IE, if > do_no_page() > > reads truncate_count, and then later goes on to > > acquire a lock in nopage(), the smp_rmb() is > > guaranteeing that the read of truncate_count > completes > > before nopage() starts executing. > > > > For x86 at least, it seems to me that since the > > spin_lock (in nopage()) uses a "lock" instruction, > > that itself guarantees that the truncate_count > read is > > completed, even without the smp_rmb(). (Refer to > IA32 > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus > for > > x86, the smp_rmb is superfluous. > > You're making me nervous. If you look at 2.6.11-rc1 > you'll find > that I too couldn't see the point of that smp_rmb(), > on any architecture, > and so removed it; while also removing the > "atomicity" of truncate_count. I haven't looked at the 2.6.11 code, but you could look at atomicity and smp_rmb() as two different changes. I believe the ordering of the C code in steps 8 and 9 could be interchanged without any problems, ie truncate_count is not protected by i_mmap_lock. In that case, you would need truncate_count to be atomic, unless you can guarantee unmap_mapping_range() is single threaded wrt "mapping" from callers. > > Here was my comment to that patch: > > Why is mapping->truncate_count atomic? It's > incremented inside > > i_mmap_lock (and i_sem), and the reads don't need > it to be atomic. > > > > And why smp_rmb() before call to ->nopage? The > compiler cannot reorder > > the initial assignment of sequence after the call > to ->nopage, and no > > cpu (yet!) can read from the future, which is all > that matters there. > > Now I'm not so convinced by that "no cpu can read > from the future". > > I don't entirely follow your remarks above, but I do > think people > on this thread have a better grasp of these matters > than I have: > does anyone now think that smp_rmb() needs to be > restored? As to the smp_rmb() part, I believe it is required; we are not talking about compiler reorderings, rather cpu reorderings. Given just steps 1 and 3 above, there is no guarantee from the cpu that the read of truncate_count would not be performed before nopage() is almost complete, even though the compiler generated the proper instruction order (ie the cpu could pull down the read of truncate_count). You do not need a similar smp_rmb() before step 5, because the spin_lock() in step4 will prevent the cpu from pulling up the read of truncate_count to any earlier than step4. The other part is that the spin_lock() in step4 can not be moved down to step 5a, because that opens a race with the unmap_mapping_range() code. Whoever wrote this code did a careful job. Kanoj > > Hugh > > __________________________________ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 21:14 ` Kanoj Sarcar @ 2005-01-14 21:38 ` Andrea Arcangeli 2005-01-14 22:09 ` Hugh Dickins 1 sibling, 0 replies; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 21:38 UTC (permalink / raw) To: Kanoj Sarcar Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds On Fri, Jan 14, 2005 at 01:14:40PM -0800, Kanoj Sarcar wrote: > > unmap_mapping_range() > 8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */ > 9. atomic_inc(&mapping->truncate_count); The above trace doesn't start there, it's the i_size_write/read we're serializing against. You should start the above with i_size_write, then it would make sense. We probably thought the above spinlock was relevant (and that's why there's no smp_wmb there yet) but it isn't, because it has inclusive semantics, and in turn the i_size_write can pass it and it can even pass atomic_inc (in common code terms of course, on x86 not). So we need a smp_wmb() before atomic_inc too to be correct (not an issue for x86). While the reader part (i.e. the smp_rmb erroneously removed), is an issue at the very least for x86-64 and probably for x86 too despite the increased 64bit locking for the long long i_size. since those x86* archs can reorder reads (but not writes). (and atomic_read isn't implying any cpu barrier, it's only a compiler barrier) So the bug you opened up is real, while the missing smp_wmb I just noticed is not real for x86* and it's theoretical only for ia64. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 21:14 ` Kanoj Sarcar 2005-01-14 21:38 ` Andrea Arcangeli @ 2005-01-14 22:09 ` Hugh Dickins 2005-01-14 22:34 ` Andrea Arcangeli 1 sibling, 1 reply; 21+ messages in thread From: Hugh Dickins @ 2005-01-14 22:09 UTC (permalink / raw) To: Kanoj Sarcar Cc: Anton Blanchard, Andi Kleen, William Lee Irwin III, Andrea Arcangeli, linux-mm, davem On Fri, 14 Jan 2005, Kanoj Sarcar wrote: > > Here are the relevant steps of the two procedures: > > do_no_page() > 1. sequence = atomic_read(&mapping->truncate_count); > 2. smp_rmb(); > 3. vma->vm_ops->nopage() > 4. spin_lock(&mm->page_table_lock); > 5. Retry if sequence != > atomic_read(&mapping->truncate_count) > 5a. See later. > 6. update_mmu_cache() > 7. spin_unlock(&mm->page_table_lock); > > unmap_mapping_range() > 8. spin_lock(&mapping->i_mmap_lock); /* irrelevant */ > 9. atomic_inc(&mapping->truncate_count); > 10.zap_page_range():spin_lock(&mm->page_table_lock); > zap_page_range():tlbcleaning > zap_page_range():spin_unlock(&mm->page_table_lock) > 11. spin_unlock(&mapping->i_mmap_lock); Yes (except that 8 is somewhat relevant to removing atomicity; I say somewhat because there's also an exclusive i_sem protecting). > --- Hugh Dickins <hugh@veritas.com> wrote: > > On Thu, 13 Jan 2005, Kanoj Sarcar wrote: > > > > > > Thanks, I think this explains it. IE, if > > do_no_page() > > > reads truncate_count, and then later goes on to > > > acquire a lock in nopage(), the smp_rmb() is > > > guaranteeing that the read of truncate_count > > completes > > > before nopage() starts executing. > > > > > > For x86 at least, it seems to me that since the > > > spin_lock (in nopage()) uses a "lock" instruction, > > > that itself guarantees that the truncate_count > > read is > > > completed, even without the smp_rmb(). (Refer to > > IA32 > > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus > > for > > > x86, the smp_rmb is superfluous. > > > > You're making me nervous. If you look at 2.6.11-rc1 > > you'll find > > that I too couldn't see the point of that smp_rmb(), > > on any architecture, > > and so removed it; while also removing the > > "atomicity" of truncate_count. > > I haven't looked at the 2.6.11 code, Please do if you have time. > but you could look at atomicity and smp_rmb() > as two different changes. Definitely (oh, the shame that I put them together in one patch!) > I believe the ordering of the C code in steps > 8 and 9 could be interchanged without any problems, ie > truncate_count is not protected by i_mmap_lock. In > that case, you would need truncate_count to be atomic, > unless you can guarantee unmap_mapping_range() is > single threaded wrt "mapping" from callers. Right, but given the ordering 8 before 9, there is no point to truncate_count being atomic. > > Here was my comment to that patch: > > > Why is mapping->truncate_count atomic? It's > > incremented inside > > > i_mmap_lock (and i_sem), and the reads don't need > > it to be atomic. > > > > > > And why smp_rmb() before call to ->nopage? The > > compiler cannot reorder > > > the initial assignment of sequence after the call > > to ->nopage, and no > > > cpu (yet!) can read from the future, which is all > > that matters there. > > > > Now I'm not so convinced by that "no cpu can read > > from the future". > > > > I don't entirely follow your remarks above, but I do > > think people > > on this thread have a better grasp of these matters > > than I have: > > does anyone now think that smp_rmb() needs to be > > restored? > > As to the smp_rmb() part, I believe it is required; we > are not talking about compiler reorderings, Did need to be considered, but I still agree with myself that the function call makes it no problem. > rather cpu > reorderings. Given just steps 1 and 3 above, there is > no guarantee from the cpu that the read of > truncate_count would not be performed before nopage() > is almost complete, even though the compiler generated > the proper instruction order (ie the cpu could pull > down the read of truncate_count). This is your crucial point. Now I think you're right. But I have remembered how I was thinking at the time, what's behind my "no cpu can read from the future" remark. Suppose unmap_mapping_range is incrementing truncate_count from 0 to 1. I could conceive of do_no_page's read into "sequence" not completing until the spin_lock at step 4. But I believed that the read issued before ->nopage could only err on the safe side, sometimes fetching 0 instead of 1 when 1 would already be safe, but never seeing 1 too soon. That belief was naive, wasn't it? I was thinking in terms of "slow" instructions rather than reordered instructions. > Whoever wrote this code did a careful job. It was Andrea (one reason I've copied him now - as I did when posting the patch to remove it). Unless someone sees this differently, I should send a patch to restore the smp_rmb(), with a longer code comment on what it's for. Thanks a lot for your detailed answer. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 22:09 ` Hugh Dickins @ 2005-01-14 22:34 ` Andrea Arcangeli 0 siblings, 0 replies; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 22:34 UTC (permalink / raw) To: Hugh Dickins Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem > > As to the smp_rmb() part, I believe it is required; we > > are not talking about compiler reorderings, On Fri, Jan 14, 2005 at 10:09:17PM +0000, Hugh Dickins wrote: > Did need to be considered, but I still agree with > myself that the function call makes it no problem. I believe gcc is learning how to get around function calls, in this case it's a different file that we're calling so it's very unlikely to get us compiler problems. But the real reason of the smp_rmb is the cpu, the compiler not. > as I did when posting the patch to remove it). Woops ... I must have missed it sorry, I owe you an apology! It has been a failry busy week here around (some kernel testing stuff has been going on here, eventually the kernel was not to blame so all completed well ;). > Unless someone sees this differently, I should send a patch to > restore the smp_rmb(), with a longer code comment on what it's for. Sure go ahead. I was thinking the same. Originally the code was more obvious when I did it with two counters, and then Paul improved it with a single counter but now it deserves a bit more of commentary. Thanks Hugh and Kanoj! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 20:37 ` Hugh Dickins 2005-01-14 21:14 ` Kanoj Sarcar @ 2005-01-14 21:25 ` Andrea Arcangeli 2005-01-14 21:32 ` Andrea Arcangeli 1 sibling, 1 reply; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 21:25 UTC (permalink / raw) To: Hugh Dickins Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem On Fri, Jan 14, 2005 at 08:37:58PM +0000, Hugh Dickins wrote: > On Thu, 13 Jan 2005, Kanoj Sarcar wrote: > > > > Thanks, I think this explains it. IE, if do_no_page() > > reads truncate_count, and then later goes on to > > acquire a lock in nopage(), the smp_rmb() is > > guaranteeing that the read of truncate_count completes > > before nopage() starts executing. > > > > For x86 at least, it seems to me that since the > > spin_lock (in nopage()) uses a "lock" instruction, > > that itself guarantees that the truncate_count read is > > completed, even without the smp_rmb(). (Refer to IA32 > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus for > > x86, the smp_rmb is superfluous. > > You're making me nervous. If you look at 2.6.11-rc1 you'll find > that I too couldn't see the point of that smp_rmb(), on any architecture, > and so removed it; while also removing the "atomicity" of truncate_count. > > Here was my comment to that patch: > > Why is mapping->truncate_count atomic? It's incremented inside > > i_mmap_lock (and i_sem), and the reads don't need it to be atomic. > > > > And why smp_rmb() before call to ->nopage? The compiler cannot reorder > > the initial assignment of sequence after the call to ->nopage, and no > > cpu (yet!) can read from the future, which is all that matters there. > > Now I'm not so convinced by that "no cpu can read from the future". > > I don't entirely follow your remarks above, but I do think people > on this thread have a better grasp of these matters than I have: > does anyone now think that smp_rmb() needs to be restored? You could have asked even before breaking mainline ;). The rmb serializes the read of truncate_count with the read of inode->i_size. The rmb is definitely required, and I would leave it an atomic op to be sure gcc doesn't outsmart unmap_mapping_range_list (gcc can see the internals of unmap_mapping_range_list). I mean just in case. We must increase that piece of ram before we truncate the ptes and after we updated the i_size. Infact it seems to me right now that we miss a smp_wmb() right before atomic_inc(&mapping->truncate_count): the spin_lock has inclusive semantics on ia64, and in turn the i_size update could happen after the atomic_inc without a smp_wmb(). So please backout the buggy changes and add the smp_wmb() to fix this ia64 altix race. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 21:25 ` Andrea Arcangeli @ 2005-01-14 21:32 ` Andrea Arcangeli 2005-01-14 22:22 ` Kanoj Sarcar 2005-01-14 22:36 ` Hugh Dickins 0 siblings, 2 replies; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 21:32 UTC (permalink / raw) To: Hugh Dickins Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds Added Andrew and Linus since they were the other signers of the buggy changeset, so they can review I'm not missing something too. http://linux.bkbits.net:8080/linux-2.5/cset%401.2247.12.78?nav=index.html|ChangeSet@-7d Thanks. On Fri, Jan 14, 2005 at 10:25:33PM +0100, Andrea Arcangeli wrote: > On Fri, Jan 14, 2005 at 08:37:58PM +0000, Hugh Dickins wrote: > > On Thu, 13 Jan 2005, Kanoj Sarcar wrote: > > > > > > Thanks, I think this explains it. IE, if do_no_page() > > > reads truncate_count, and then later goes on to > > > acquire a lock in nopage(), the smp_rmb() is > > > guaranteeing that the read of truncate_count completes > > > before nopage() starts executing. > > > > > > For x86 at least, it seems to me that since the > > > spin_lock (in nopage()) uses a "lock" instruction, > > > that itself guarantees that the truncate_count read is > > > completed, even without the smp_rmb(). (Refer to IA32 > > > SDM Vol 3 section 7.2.4 last para page 7-11). Thus for > > > x86, the smp_rmb is superfluous. > > > > You're making me nervous. If you look at 2.6.11-rc1 you'll find > > that I too couldn't see the point of that smp_rmb(), on any architecture, > > and so removed it; while also removing the "atomicity" of truncate_count. > > > > Here was my comment to that patch: > > > Why is mapping->truncate_count atomic? It's incremented inside > > > i_mmap_lock (and i_sem), and the reads don't need it to be atomic. > > > > > > And why smp_rmb() before call to ->nopage? The compiler cannot reorder > > > the initial assignment of sequence after the call to ->nopage, and no > > > cpu (yet!) can read from the future, which is all that matters there. > > > > Now I'm not so convinced by that "no cpu can read from the future". > > > > I don't entirely follow your remarks above, but I do think people > > on this thread have a better grasp of these matters than I have: > > does anyone now think that smp_rmb() needs to be restored? > > You could have asked even before breaking mainline ;). > > The rmb serializes the read of truncate_count with the read of > inode->i_size. The rmb is definitely required, and I would leave it an > atomic op to be sure gcc doesn't outsmart unmap_mapping_range_list (gcc > can see the internals of unmap_mapping_range_list). I mean just in case. > We must increase that piece of ram before we truncate the ptes and after > we updated the i_size. > > Infact it seems to me right now that we miss a smp_wmb() right before > atomic_inc(&mapping->truncate_count): the spin_lock has inclusive > semantics on ia64, and in turn the i_size update could happen after the > atomic_inc without a smp_wmb(). > > So please backout the buggy changes and add the smp_wmb() to fix this > ia64 altix race. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 21:32 ` Andrea Arcangeli @ 2005-01-14 22:22 ` Kanoj Sarcar 2005-01-14 22:47 ` Hugh Dickins 2005-01-14 22:51 ` Andrea Arcangeli 2005-01-14 22:36 ` Hugh Dickins 1 sibling, 2 replies; 21+ messages in thread From: Kanoj Sarcar @ 2005-01-14 22:22 UTC (permalink / raw) To: Andrea Arcangeli, Hugh Dickins Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds Hi Andrea, I read through your other pieces of email, but responding on this to catch Linus and Andrew. In a prior mail, you said: "Infact it seems to me right now that we miss a smp_wmb() right before atomic_inc(&mapping->truncate_count): the spin_lock has inclusive semantics on ia64, and in turn the i_size update could happen after the atomic_inc without a smp_wmb(). So please backout the buggy changes and add the smp_wmb() to fix this ia64 altix race." I haven't really tracked 2.6 closely, so pardon any omissions I make in these comments ... Note that vmtruncate() does a i_size_write(), which does a write_seqcount_end() after updating the i_size, which has an embedded smp_wmb() right after the i_size update, so the case you are talking about is already handled. No? (Btw, I did not look at i_size_write() in the case of !CONFIG_SMP and CONFIG_PREEMPT, there might need to be some barriers put in there, not sure). But, based on what you said, yes, I believe an smp_wmb() is required _after_ atomic_inc(truncate_count) in unmap_mapping_range() to ensure that the write happens before it does the TLB shootdown. Right? I am sure there might be other ways to clean up this code. Some documentation could not hurt, it could save everyone's head hurting when they look at this code! Btw, do all callers of vmtruncate() guarantee they do not concurrently invoke vmtruncate() on the same file? Seems like they could be stepping on each other while updating i_size ... Thanks. Kanoj --- Andrea Arcangeli <andrea@suse.de> wrote: > Added Andrew and Linus since they were the other > signers of the buggy > changeset, so they can review I'm not missing > something too. > > > http://linux.bkbits.net:8080/linux-2.5/cset%401.2247.12.78?nav=index.html|ChangeSet@-7d > > Thanks. > > On Fri, Jan 14, 2005 at 10:25:33PM +0100, Andrea > Arcangeli wrote: > > On Fri, Jan 14, 2005 at 08:37:58PM +0000, Hugh > Dickins wrote: > > > On Thu, 13 Jan 2005, Kanoj Sarcar wrote: > > > > > > > > Thanks, I think this explains it. IE, if > do_no_page() > > > > reads truncate_count, and then later goes on > to > > > > acquire a lock in nopage(), the smp_rmb() is > > > > guaranteeing that the read of truncate_count > completes > > > > before nopage() starts executing. > > > > > > > > For x86 at least, it seems to me that since > the > > > > spin_lock (in nopage()) uses a "lock" > instruction, > > > > that itself guarantees that the truncate_count > read is > > > > completed, even without the smp_rmb(). (Refer > to IA32 > > > > SDM Vol 3 section 7.2.4 last para page 7-11). > Thus for > > > > x86, the smp_rmb is superfluous. > > > > > > You're making me nervous. If you look at > 2.6.11-rc1 you'll find > > > that I too couldn't see the point of that > smp_rmb(), on any architecture, > > > and so removed it; while also removing the > "atomicity" of truncate_count. > > > > > > Here was my comment to that patch: > > > > Why is mapping->truncate_count atomic? It's > incremented inside > > > > i_mmap_lock (and i_sem), and the reads don't > need it to be atomic. > > > > > > > > And why smp_rmb() before call to ->nopage? > The compiler cannot reorder > > > > the initial assignment of sequence after the > call to ->nopage, and no > > > > cpu (yet!) can read from the future, which is > all that matters there. > > > > > > Now I'm not so convinced by that "no cpu can > read from the future". > > > > > > I don't entirely follow your remarks above, but > I do think people > > > on this thread have a better grasp of these > matters than I have: > > > does anyone now think that smp_rmb() needs to be > restored? > > > > You could have asked even before breaking mainline > ;). > > > > The rmb serializes the read of truncate_count with > the read of > > inode->i_size. The rmb is definitely required, and > I would leave it an > > atomic op to be sure gcc doesn't outsmart > unmap_mapping_range_list (gcc > > can see the internals of > unmap_mapping_range_list). I mean just in case. > > We must increase that piece of ram before we > truncate the ptes and after > > we updated the i_size. > > > > Infact it seems to me right now that we miss a > smp_wmb() right before > > atomic_inc(&mapping->truncate_count): the > spin_lock has inclusive > > semantics on ia64, and in turn the i_size update > could happen after the > > atomic_inc without a smp_wmb(). > > > > So please backout the buggy changes and add the > smp_wmb() to fix this > > ia64 altix race. > __________________________________ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 22:22 ` Kanoj Sarcar @ 2005-01-14 22:47 ` Hugh Dickins 2005-01-14 22:51 ` Andrea Arcangeli 1 sibling, 0 replies; 21+ messages in thread From: Hugh Dickins @ 2005-01-14 22:47 UTC (permalink / raw) To: Kanoj Sarcar Cc: Andrea Arcangeli, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds On Fri, 14 Jan 2005, Kanoj Sarcar wrote: > > Note that vmtruncate() does a i_size_write(), which > does a write_seqcount_end() after updating the i_size, > which has an embedded smp_wmb() right after the i_size > update, so the case you are talking about is already > handled. No? (Btw, I did not look at i_size_write() in > the case of !CONFIG_SMP and CONFIG_PREEMPT, there > might need to be some barriers put in there, not > sure). > > But, based on what you said, yes, I believe an > smp_wmb() is required _after_ > atomic_inc(truncate_count) in unmap_mapping_range() to > ensure that the write happens before it does the TLB > shootdown. Right? Hmm, I'd better look tomorrow to see where you and Andrea have decided the smp_wmb()s should go. > I am sure there might be other ways to clean up this > code. Some documentation could not hurt, it could save > everyone's head hurting when they look at this code! > > Btw, do all callers of vmtruncate() guarantee they do > not concurrently invoke vmtruncate() on the same file? > Seems like they could be stepping on each other while > updating i_size ... We're on safer ground there: inode->i_sem is held. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 22:22 ` Kanoj Sarcar 2005-01-14 22:47 ` Hugh Dickins @ 2005-01-14 22:51 ` Andrea Arcangeli 2005-01-14 23:14 ` Kanoj Sarcar 1 sibling, 1 reply; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 22:51 UTC (permalink / raw) To: Kanoj Sarcar Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds On Fri, Jan 14, 2005 at 02:22:10PM -0800, Kanoj Sarcar wrote: > handled. No? (Btw, I did not look at i_size_write() in > the case of !CONFIG_SMP and CONFIG_PREEMPT, there > might need to be some barriers put in there, not > sure). i_size_write is inode->i_size = i_size in common code terms, that's the 64bit case, and it's the one with the weaker semantics (in turn it's the only one we can consider in terms of common code correctness). So it has no barriers at all (nor cpu barriers, nor compiler barriers). > But, based on what you said, yes, I believe an > smp_wmb() is required _after_ > atomic_inc(truncate_count) in unmap_mapping_range() to > ensure that the write happens before it does the TLB > shootdown. Right? The smp_wmb() I mean should be put _before_ the truncate_count incrase. It is mean to avoid the i_size_write to pass the truncate_count increase (which can happen with spin_lock having inclusive semantics). The order we must enforce at the cpu level to be correct is this: first we set i_size, then we increase truncate_count to restart the page fault, and finally we zap the ptes (and the zap starts by reading the old contents set by the page fault). And it must be enforced with cpu and compiler memory barries. It seems you're right that we need an additional smp_mb() (not just smp_wmb(), because the pte shootdown does reads first) even after the truncate_count increase, but I thought the locking inside unmap_mapping_range_list would avoid us to do that, sounds like it's not the case. I can only see the page_table_lock in there, so in theory the truncate_write could enter the page_table_lock critical section on ia64. So I guess we need both an explicit smp_wmb() before atomic_inc (to serialize with i_size_write on ia64 which is 64bit and doesn't have any implcit locking there) and smp_mb() after atomic_inc to fix this on ia64 too. But it really should be a smp_mb__after_atomic_inc kind of thing, or we'll bite on x86 performance (in theory even the smp_wmb should be a smp_wmb__before_atomic_inc, though smp_wmb is zerocost on x86 so it has less impact ... ;). As said in practice x86 and x86-64 are already rock solid with 2.6.10, because atomic_add is an implicit smp_mb() before and after the atomic_inc there. > I am sure there might be other ways to clean up this > code. Some documentation could not hurt, it could save > everyone's head hurting when they look at this code! Indeed. > Btw, do all callers of vmtruncate() guarantee they do > not concurrently invoke vmtruncate() on the same file? > Seems like they could be stepping on each other while > updating i_size ... i_sem for that (and i_truncate_sem in 2.4), no problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 22:51 ` Andrea Arcangeli @ 2005-01-14 23:14 ` Kanoj Sarcar 2005-01-14 23:26 ` Andrea Arcangeli 0 siblings, 1 reply; 21+ messages in thread From: Kanoj Sarcar @ 2005-01-14 23:14 UTC (permalink / raw) To: Andrea Arcangeli Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds Hi Andrea, I will try to read and understand the rest of your email later, but one point sticks out ... --- Andrea Arcangeli <andrea@suse.de> wrote: > On Fri, Jan 14, 2005 at 02:22:10PM -0800, Kanoj > Sarcar wrote: > > handled. No? (Btw, I did not look at > i_size_write() in > > the case of !CONFIG_SMP and CONFIG_PREEMPT, there > > might need to be some barriers put in there, not > > sure). > > i_size_write is inode->i_size = i_size in common > code terms, that's the > 64bit case, and it's the one with the weaker > semantics (in turn it's the > only one we can consider in terms of common code > correctness). So it has > no barriers at all (nor cpu barriers, nor compiler > barriers). Yes, I ignored the 64bit case, and I am surprised the semantics are different. I can't see why it is a good idea to want different memory barrier semantics out of i_size_write() and friends depending on various CONFIG options and architecture. Maybe you can argue performance here, but is the slight advantage in performance on certain architectures (which is achieved nonetheless with over-barriering eg on x86) worth the extra code maintenance hassles? Thanks. Kanoj > > > But, based on what you said, yes, I believe an > > smp_wmb() is required _after_ > > atomic_inc(truncate_count) in > unmap_mapping_range() to > > ensure that the write happens before it does the > TLB > > shootdown. Right? > > The smp_wmb() I mean should be put _before_ the > truncate_count incrase. > It is mean to avoid the i_size_write to pass the > truncate_count > increase (which can happen with spin_lock having > inclusive semantics). > > The order we must enforce at the cpu level to be > correct is this: first > we set i_size, then we increase truncate_count to > restart the page > fault, and finally we zap the ptes (and the zap > starts by reading the > old contents set by the page fault). And it must be > enforced with cpu > and compiler memory barries. > > It seems you're right that we need an additional > smp_mb() (not just > smp_wmb(), because the pte shootdown does reads > first) even after the > truncate_count increase, but I thought the locking > inside > unmap_mapping_range_list would avoid us to do that, > sounds like it's not > the case. > > I can only see the page_table_lock in there, so in > theory the > truncate_write could enter the page_table_lock > critical section on ia64. > > So I guess we need both an explicit smp_wmb() before > atomic_inc (to > serialize with i_size_write on ia64 which is 64bit > and doesn't have any > implcit locking there) and smp_mb() after atomic_inc > to fix this on ia64 > too. But it really should be a > smp_mb__after_atomic_inc kind of thing, > or we'll bite on x86 performance (in theory even the > smp_wmb should be a > smp_wmb__before_atomic_inc, though smp_wmb is > zerocost on x86 so it has > less impact ... ;). > > As said in practice x86 and x86-64 are already rock > solid with 2.6.10, > because atomic_add is an implicit smp_mb() before > and after the > atomic_inc there. > > > I am sure there might be other ways to clean up > this > > code. Some documentation could not hurt, it could > save > > everyone's head hurting when they look at this > code! > > Indeed. > > > Btw, do all callers of vmtruncate() guarantee they > do > > not concurrently invoke vmtruncate() on the same > file? > > Seems like they could be stepping on each other > while > > updating i_size ... > > i_sem for that (and i_truncate_sem in 2.4), no > problem. > __________________________________ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 23:14 ` Kanoj Sarcar @ 2005-01-14 23:26 ` Andrea Arcangeli 0 siblings, 0 replies; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 23:26 UTC (permalink / raw) To: Kanoj Sarcar Cc: Hugh Dickins, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds On Fri, Jan 14, 2005 at 03:14:22PM -0800, Kanoj Sarcar wrote: > Yes, I ignored the 64bit case, and I am surprised the > semantics are different. I can't see why it is a good > idea to want different memory barrier semantics out of > i_size_write() and friends depending on various CONFIG > options and architecture. Maybe you can argue > performance here, but is the slight advantage in > performance on certain architectures (which is > achieved nonetheless with over-barriering eg on x86) > worth the extra code maintenance hassles? The only reason there are barriers in the non-64bit cases is to serialize i_size_read vs i_size_write (long long is not atomic with gcc on 32bit), it wasn't meant to be anymore than that, on 64bit that's automatic since gcc handles long long atomically natively (this isn't really true in theory, but the pratical guys on l-k will never agree to change that to assume standard C semantics that cannot guarantee atomicity). See all other theorical race conditions I enlightened in set_pte in all archs (which requires atomicity and in turn shouldn't be implementd in C, it's exactly the same issue as i_size_write/read). So in practice curent code is fine, and the locking was not meant to serialize the truncate vs nopage race. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 21:32 ` Andrea Arcangeli 2005-01-14 22:22 ` Kanoj Sarcar @ 2005-01-14 22:36 ` Hugh Dickins 2005-01-14 23:01 ` Andrea Arcangeli 1 sibling, 1 reply; 21+ messages in thread From: Hugh Dickins @ 2005-01-14 22:36 UTC (permalink / raw) To: Andrea Arcangeli Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds On Fri, 14 Jan 2005, Andrea Arcangeli wrote: > > > > You could have asked even before breaking mainline ;). Sorry (but check your mailbox for 3rd October - I'd hoped the patch would be more provocative than a question!) > > The rmb serializes the read of truncate_count with the read of > > inode->i_size. Yes, that's a clearer way of putting it, thank you. > > The rmb is definitely required, and I would leave it an > > atomic op to be sure gcc doesn't outsmart unmap_mapping_range_list (gcc > > can see the internals of unmap_mapping_range_list). I mean just in case. > > We must increase that piece of ram before we truncate the ptes and after > > we updated the i_size. I don't follow your argument for atomic there - "just in case"? I still see its atomic ops as serving no point (and it was tiresome to extend their use in the patches that followed). > > Infact it seems to me right now that we miss a smp_wmb() right before > > atomic_inc(&mapping->truncate_count): the spin_lock has inclusive > > semantics on ia64, and in turn the i_size update could happen after the > > atomic_inc without a smp_wmb(). That's interesting, and I'm glad my screwup has borne some good fruit. And an smp_rmb() in one place makes more sense to me if there's an smp_wmb() in the complementary place (though I've a suspicion that "making sense to me" is not the prime consideration here ;) > > So please backout the buggy changes and add the smp_wmb() to fix this > > ia64 altix race. Will do, though not today. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: smp_rmb in mm/memory.c in 2.6.10 2005-01-14 22:36 ` Hugh Dickins @ 2005-01-14 23:01 ` Andrea Arcangeli 0 siblings, 0 replies; 21+ messages in thread From: Andrea Arcangeli @ 2005-01-14 23:01 UTC (permalink / raw) To: Hugh Dickins Cc: Kanoj Sarcar, Anton Blanchard, Andi Kleen, William Lee Irwin III, linux-mm, davem, Andrew Morton, Linus Torvalds On Fri, Jan 14, 2005 at 10:36:17PM +0000, Hugh Dickins wrote: > On Fri, 14 Jan 2005, Andrea Arcangeli wrote: > > > > > > You could have asked even before breaking mainline ;). > > Sorry (but check your mailbox for 3rd October - > I'd hoped the patch would be more provocative than a question!) Hmm I thought it was more recent, so I guess it could have been when I got the @novell.com email downtime. I lost email for several days, then I got back to @suse.de. Sorry anyway! > I don't follow your argument for atomic there - "just in case"? > I still see its atomic ops as serving no point (and it was > tiresome to extend their use in the patches that followed). Actually see the last email I posted, seems like we need both a smp_wmb() before the increase and a smp_mb() after it. The reason is that it must be done in that very order. And on x86 doing it with atomic_inc would enforce it. I definitely agree truncate_count can be done in C _after_ we add smp_wmb() before the increase and smp_mb() after the increase. Infact now that I think about this will also avoid us to implement smp_wmb__before_atomic_add. > That's interesting, and I'm glad my screwup has borne some good fruit. Indeed ;). Me too. > And an smp_rmb() in one place makes more sense to me if there's an > smp_wmb() in the complementary place (though I've a suspicion that Hmm, I assume you meant "there's _not_ an", otherwise I don't get it. > Will do, though not today. Thanks! The only problem here is ia64, few people runs test kernels in production so it's not an hurry. I also need to rediff my pending VM stuff for Andrew but I've been extremely busy with other kernel stuff in the last few days, so I had no time for that yet. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2005-01-14 23:26 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-01-13 20:26 smp_rmb in mm/memory.c in 2.6.10 Kanoj Sarcar 2005-01-13 20:39 ` William Lee Irwin III 2005-01-13 21:02 ` Kanoj Sarcar 2005-01-13 21:06 ` Andi Kleen 2005-01-13 21:29 ` Kanoj Sarcar 2005-01-13 21:59 ` Anton Blanchard 2005-01-13 23:22 ` Kanoj Sarcar 2005-01-14 20:37 ` Hugh Dickins 2005-01-14 21:14 ` Kanoj Sarcar 2005-01-14 21:38 ` Andrea Arcangeli 2005-01-14 22:09 ` Hugh Dickins 2005-01-14 22:34 ` Andrea Arcangeli 2005-01-14 21:25 ` Andrea Arcangeli 2005-01-14 21:32 ` Andrea Arcangeli 2005-01-14 22:22 ` Kanoj Sarcar 2005-01-14 22:47 ` Hugh Dickins 2005-01-14 22:51 ` Andrea Arcangeli 2005-01-14 23:14 ` Kanoj Sarcar 2005-01-14 23:26 ` Andrea Arcangeli 2005-01-14 22:36 ` Hugh Dickins 2005-01-14 23:01 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox