From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@digeo.com>
Cc: paulmck@us.ibm.com, dmccr@us.ibm.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Fix vmtruncate race and distributed filesystem race
Date: Mon, 23 Jun 2003 09:43:53 +0200 [thread overview]
Message-ID: <20030623074353.GE19940@dualathlon.random> (raw)
In-Reply-To: <20030622233235.0924364d.akpm@digeo.com>
On Sun, Jun 22, 2003 at 11:32:35PM -0700, Andrew Morton wrote:
> "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
> >
> > > but you can't trap this with a single counter increment in do_truncate:
> > >
> > > CPU 0 CPU 1
> > > ---------- -----------
> > > do_no_page
> > > truncate
>
> i_size = new_i_size;
>
> > > increment counter
> > > read counter
> > > ->nopage
>
> check i_size
>
> > > vmtruncate
> > > read counter again -> different so retry
> > >
> > > thanks to the second counter increment after vmtruncate in my fix, the
> > > above race couldn't happen.
> >
> > The trick is that CPU 0 is expected to have updated the filesystem's
> > idea of what pages are available before calling vmtruncate,
> > invalidate_mmap_range() or whichever.
>
> i_size has been updated, and filemap_nopage() will return NULL.
ok, I finally see your object now, so it's the i_size that was in theory
supposed to act as a "second" increment on the truncate side to
serialize against in do_no_page.
I was searching for any additional SMP locking, and infact there is
none. No way the current code can serialize against the i_size.
The problem is that the last patch posted on l-k isn't guaranteeing
anything like what you wrote above. What can happen is that the i_size
can be read still way before the counter.
CPU 0 CPU 1
---------- -----------
do_no_page
truncate
specualtive read i_size into l2 cache
i_size = new_i_size;
increment counter
read counter
->nopage
check i_size from l2 cache
vmtruncate
read counter again -> different so retry
For the single counter increment patch to close the race using the
implicit i_size update as a second increment, you need to add an
absolutely necessary smb_rmb() here:
sequence = atomic_read(&mapping->truncate_count);
+ smp_rmb();
new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, 0);
And for non-x86 you need a smb_wmb() between the i_size = new_i_size and
the atomic_inc on the truncate side too, because advanced archs with
finegrined locking can implement a one-way barrier, where the counter
increment could pass the down(&mapping->i_shared_sem). Last but not the
least it was IMHO misleading and dirty to do the counter increment in
truncate after taking a completely unrelated semaphore just to take
advantage of the implicit barrier on the x86. So doing it with a proper
smp_wmb() besides fixing advanced archs will kind of make the code
readable, so you can understand the i_size update is the second
increment (actually decrement because it's the i_size).
Now that I see what the second increment was supposed to be, it's a cute
idea to use the i_size for that, and I can appreciate its beauty (I
mean, in purist terms, in practice IMHO the seq_lock was more obivously
correct code with all the smp reordering hidden into the lock semantics
and I'm 100% sure you couldn't measure any performance difference in
truncate due the second increment ;). But if you prefer to keep the most
finegrined approch of using the i_size with the explicit memory
barriers, I'm of course not against it after you add the needed fixed I
outlined above that will finally close the race.
thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2003-06-23 7:43 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-06-12 20:16 Dave McCracken
2003-06-12 20:49 ` Andrew Morton
2003-06-12 21:00 ` Andrew Morton
2003-06-12 21:08 ` Dave McCracken
2003-06-12 21:44 ` Andrew Morton
2003-06-12 22:56 ` Dave McCracken
2003-06-12 23:07 ` Andrew Morton
2003-06-20 0:17 ` Andrea Arcangeli
2003-06-23 3:28 ` Paul E. McKenney
2003-06-23 6:29 ` Andrea Arcangeli
2003-06-23 6:32 ` Andrew Morton
2003-06-23 7:43 ` Andrea Arcangeli [this message]
2003-06-23 7:56 ` Andrew Morton
2003-06-23 8:10 ` Andrea Arcangeli
2003-06-24 1:37 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030623074353.GE19940@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@digeo.com \
--cc=dmccr@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=paulmck@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox