From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
Andrew Morton <akpm@osdl.org>,
Christoph Hellwig <hch@infradead.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>
Subject: Re: Non-GPL export of invalidate_mmap_range
Date: Thu, 19 Feb 2004 11:47:51 -0800 [thread overview]
Message-ID: <20040219194751.GN1269@us.ibm.com> (raw)
In-Reply-To: <200402192106.02086.phillips@arcor.de>
On Thu, Feb 19, 2004 at 09:06:55PM -0500, Daniel Phillips wrote:
> On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> > GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> > change the underlying file. There are a number of things one can do,
> > but one must keep in mind that different processes can MAP_PRIVATE the
> > same file at different times, and that some processes might MAP_SHARED it
> > at the same time that others MAP_PRIVATE it. Here are the alternatives
> > I can imagine:
> >
> > 1. Any time a file changes, create a copy of the old version
> > for any MAP_PRIVATE vmas. This would essentially create
> > a point-in-time copy of any file that a process mapped
> > MAP_PRIVATE. This is arguably the most intuitive from the
> > user's standpoint, but (a) it would not be a small change and
> > (b) I haven't heard of anyone coming up with a good use for it.
> > Please enlighten me if I am missing a simple implementation or
> > compelling uses.
>
> This is MAP_COPY I think. Even if somebody did manage to sneak it by Linus
> one day it would certainly not be under the guise of MAP_PRIVATE.
Whew! That is a relief!!! ;-)
> > 2. Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> > as suggested by Daniel.
>
> I did not suggest that, rather I described the existing practice in OpenGFS
> and Sistina GFS, which at least does not destroy anonymous data. The correct
> behaviour is the one you describe in option 3, and we are perfectly willing
> to change GFS to obtain that behaviour. To be precise: I suggest we change
> invalidate_mmap_range to skip anon pages, and change vmtruncate to use
> something else, having the current semantics.
>
> As a historical note: the behavior GFS obtains from option 2 is
> Posix-compliant, but falls short of Linus-compliance, who insists on
> completely accurate invalidation behavior as is right and proper.
OK, this is the OpenGFS zap_inode_mapping(), right?
> > This would mean that a
> > process that had mapped a file MAP_PRIVATE and faulted
> > in parts of it would see different versions of the file
> > in different pages. This should be straightforward to
> > implement, but in what situation is this skewed view of
> > the file useful?
>
> You've got me there ;) However, Posix explicitly blesses this sloppy
> behaviour. I suppose that with additional user space locking, applications
> could make it work reliably. But it's still sloppy, and worse, it's
> different from Linux's local filesystem behaviour.
;-)
> > 3. Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> > but invalidate those pages in the vma that have not yet been
> > modified (that are not anonymous) as suggested by Stephen.
> > This would mean that a process that had mapped a file MAP_PRIVATE
> > and written on parts of it would see different versions of the
> > file in different pages.
>
> This is the correct behaviour and is the current behaviour for local
> filesystems. In particular, all processes on all nodes will see the current
> contents of any file page that they have not yet faulted in, as of the last
> time any process wrote that file page via mmap or otherwise.
>
> Our goal for GFS, and the goal I'd like to hold up as definitive for any
> distributed filesystem, is to imitate local filesystem semantics exactly,
> even across the cluster.
OK, I surrender. I got some private email agreeing with this
viewpoint. Any dissenters, speak soon, or...
> > Again, in what situation is this skewed view of the file useful?
>
> It's not skewed in any way that I can see. Though I am no linker expert, I
> dimly recall that these are precisely the semantics ld relies on.
I thought that the linker relied on people refraining (or being
prevented) from updating executables while they are in use.
But I am also no linker expert.
> > 5. The current behavior, where the process's writes do not
> > flow through to the file, but all changes to the file are
> > visible to the writing process.
>
> We all agree that's broken, I hope.
I can buy DFSes implementing semantics that are the same as local
filesystems. But no one has yet shown me anything that it breaks!
> > 6. Requiring that MAP_PRIVATE be applied only to unchanging
> > files, so that (for example) any change to the underlying
> > file removes that file from any MAP_PRIVATE address spaces.
> > Subsequent accesses would get a SEGV, rather than a
> > surprise from silently changing data.
>
> Creative :) Well, data that changes "silently" is a fact of life whenever
> data is shared. It's up to applications to ensure that shared data changes
> predictably.
Glad you liked it. ;-)
I think that predictability when using MAP_PRIVATE requires that one
refrain from modifying the underlying file while someone has it mmap()ed
with MAP_PRIVATE. I would welcome an example proving me wrong.
> > So, please help me out here... What do applications that MAP_PRIVATE
> > changing files really expect to happen?
>
> Number 3, is that ok with you? Incidently, your list doesn't include the
> semantics we'd get by just exporting and using invalidate_mmap_range. I
> presume that is because you agree it's not correct (it will clobber CoWed
> anonymous pages).
I will give it a shot, though I would still like to hear about examples
where the difference in semantics affects a real application.
BTW, my list didn't include exporting and using the current
invalidate_mmap_range() because I didn't say what I meant to say.
Hate it when that happens! ;-)
Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2004-02-19 19:47 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-16 19:09 Paul E. McKenney
2004-02-17 2:31 ` Andrew Morton
2004-02-17 7:35 ` Christoph Hellwig
2004-02-17 12:40 ` Paul E. McKenney
2004-02-18 0:19 ` Andrew Morton
2004-02-18 12:51 ` Arjan van de Ven
2004-02-18 14:00 ` Paul E. McKenney
2004-02-18 21:10 ` Christoph Hellwig
2004-02-18 15:06 ` Paul E. McKenney
2004-02-18 22:21 ` Christoph Hellwig
2004-02-18 22:51 ` Andrew Morton
2004-02-18 23:00 ` Christoph Hellwig
2004-02-18 16:21 ` Paul E. McKenney
2004-02-18 23:32 ` Andrew Morton
2004-02-19 12:32 ` Christoph Hellwig
2004-02-19 18:56 ` Andrew Morton
2004-02-19 19:01 ` Christoph Hellwig
2004-02-19 13:04 ` Paul E. McKenney
2004-02-20 3:17 ` Anton Blanchard
2004-02-20 21:46 ` Valdis.Kletnieks
2004-02-19 0:28 ` Andrew Morton
2004-02-18 18:36 ` Paul E. McKenney
2004-02-19 12:31 ` Christoph Hellwig
2004-02-19 9:11 ` Paul E. McKenney
2004-02-19 18:32 ` Lars Marowsky-Bree
2004-02-19 18:38 ` Arjan van de Ven
2004-02-19 19:16 ` viro
2004-02-19 16:15 ` Paul E. McKenney
2004-02-19 18:59 ` Tim Bird
2004-02-20 1:27 ` David Schwartz
2004-02-19 9:11 ` David Weinehall
2004-02-19 8:58 ` Paul E. McKenney
2004-03-04 5:51 ` Mike Fedyk
2004-02-19 10:29 ` Lars Marowsky-Bree
2004-02-19 9:00 ` Paul E. McKenney
2004-02-19 11:11 ` Arjan van de Ven
2004-02-19 11:53 ` Lars Marowsky-Bree
2004-02-18 18:04 ` Tim Bird
2004-02-19 20:56 ` Daniel Phillips
2004-02-19 22:06 ` Stephen C. Tweedie
2004-02-19 22:31 ` Daniel Phillips
2004-02-19 16:42 ` Paul E. McKenney
2004-02-20 2:06 ` Daniel Phillips
2004-02-19 19:47 ` Paul E. McKenney [this message]
2004-02-20 5:07 ` Daniel Phillips
2004-02-20 12:02 ` Paul E. McKenney
2004-02-20 20:37 ` Daniel Phillips
2004-02-20 14:01 ` Paul E. McKenney
2004-02-20 23:00 ` Daniel Phillips
2004-02-20 16:17 ` Paul E. McKenney
2004-02-21 3:19 ` Daniel Phillips
2004-02-21 19:00 ` Daniel Phillips
2004-02-22 23:39 ` Paul E. McKenney
2004-02-25 21:04 ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 19:12 ` Paul E. McKenney
2004-02-25 19:14 ` Paul E. McKenney
2004-02-25 22:07 ` Andrew Morton
2004-02-25 22:07 ` Daniel Phillips
2004-02-25 22:16 ` Andrew Morton
2004-02-25 22:46 ` Daniel Phillips
2004-03-03 3:00 ` Daniel Phillips
2004-03-03 3:15 ` Andrew Morton
2004-03-03 13:06 ` Daniel Phillips
2004-03-04 18:55 ` Paul E. McKenney
2004-02-20 21:17 ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
2004-02-20 22:16 ` Daniel Phillips
2004-02-18 12:12 ` Dominik Kubla
2004-02-17 22:22 ` David Weinehall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040219194751.GN1269@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=phillips@arcor.de \
--cc=sct@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox