linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Daniel Phillips <phillips@arcor.de>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
	Andrew Morton <akpm@osdl.org>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Non-GPL export of invalidate_mmap_range
Date: Thu, 19 Feb 2004 11:47:51 -0800	[thread overview]
Message-ID: <20040219194751.GN1269@us.ibm.com> (raw)
In-Reply-To: <200402192106.02086.phillips@arcor.de>

On Thu, Feb 19, 2004 at 09:06:55PM -0500, Daniel Phillips wrote:
> On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> > GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> > change the underlying file.  There are a number of things one can do,
> > but one must keep in mind that different processes can MAP_PRIVATE the
> > same file at different times, and that some processes might MAP_SHARED it
> > at the same time that others MAP_PRIVATE it.  Here are the alternatives
> > I can imagine:
> >
> > 1.	Any time a file changes, create a copy of the old version
> > 	for any MAP_PRIVATE vmas.  This would essentially create
> > 	a point-in-time copy of any file that a process mapped
> > 	MAP_PRIVATE.  This is arguably the most intuitive from the
> > 	user's standpoint, but (a) it would not be a small change and
> > 	(b) I haven't heard of anyone coming up with a good use for it.
> > 	Please enlighten me if I am missing a simple implementation or
> > 	compelling uses.
> 
> This is MAP_COPY I think.  Even if somebody did manage to sneak it by Linus 
> one day it would certainly not be under the guise of MAP_PRIVATE.

Whew!  That is a relief!!!  ;-)

> > 2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> > 	as suggested by Daniel.
> 
> I did not suggest that, rather I described the existing practice in OpenGFS 
> and Sistina GFS, which at least does not destroy anonymous data.  The correct 
> behaviour is the one you describe in option 3, and we are perfectly willing 
> to change GFS to obtain that behaviour.  To be precise: I suggest we change 
> invalidate_mmap_range to skip anon pages, and change vmtruncate to use 
> something else, having the current semantics.
> 
> As a historical note: the behavior GFS obtains from option 2 is 
> Posix-compliant, but falls short of Linus-compliance, who insists on 
> completely accurate invalidation behavior as is right and proper.

OK, this is the OpenGFS zap_inode_mapping(), right?

> > 	This would mean that a
> > 	process that had mapped a file MAP_PRIVATE and faulted
> > 	in parts of it would see different versions of the file
> > 	in different pages.  This should be straightforward to
> > 	implement, but in what situation is this skewed view of
> > 	the file useful?
> 
> You've got me there ;)  However, Posix explicitly blesses this sloppy 
> behaviour.  I suppose that with additional user space locking, applications 
> could make it work reliably.  But it's still sloppy, and worse, it's 
> different from Linux's local filesystem behaviour.

;-)

> > 3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> > 	but invalidate those pages in the vma that have not yet been
> > 	modified (that are not anonymous) as suggested by Stephen.
> > 	This would mean that a process that had mapped a file MAP_PRIVATE
> > 	and written on parts of it would see different versions of the
> > 	file in different pages.
> 
> This is the correct behaviour and is the current behaviour for local 
> filesystems.  In particular, all processes on all nodes will see the current 
> contents of any file page that they have not yet faulted in, as of the last 
> time any process wrote that file page via mmap or otherwise.
> 
> Our goal for GFS, and the goal I'd like to hold up as definitive for any 
> distributed filesystem, is to imitate local filesystem semantics exactly, 
> even across the cluster.

OK, I surrender.  I got some private email agreeing with this
viewpoint.  Any dissenters, speak soon, or...

> > Again, in what situation is this skewed view of the file useful?
> 
> It's not skewed in any way that I can see.  Though I am no linker expert, I 
> dimly recall that these are precisely the semantics ld relies on.

I thought that the linker relied on people refraining (or being
prevented) from updating executables while they are in use.
But I am also no linker expert.

> > 5.	The current behavior, where the process's writes do not
> > 	flow through to the file, but all changes to the file are
> > 	visible to the writing process.
> 
> We all agree that's broken, I hope.

I can buy DFSes implementing semantics that are the same as local
filesystems.  But no one has yet shown me anything that it breaks!

> > 6.	Requiring that MAP_PRIVATE be applied only to unchanging
> > 	files, so that (for example) any change to the underlying
> > 	file removes that file from any MAP_PRIVATE address spaces.
> > 	Subsequent accesses would get a SEGV, rather than a
> > 	surprise from silently changing data.
> 
> Creative :)  Well, data that changes "silently" is a fact of life whenever 
> data is shared.  It's up to applications to ensure that shared data changes 
> predictably.

Glad you liked it.  ;-)

I think that predictability when using MAP_PRIVATE requires that one
refrain from modifying the underlying file while someone has it mmap()ed
with MAP_PRIVATE.  I would welcome an example proving me wrong.

> > So, please help me out here...  What do applications that MAP_PRIVATE
> > changing files really expect to happen?
> 
> Number 3, is that ok with you?  Incidently, your list doesn't include the 
> semantics we'd get by just exporting and using invalidate_mmap_range.  I 
> presume that is because you agree it's not correct (it will clobber CoWed 
> anonymous pages).

I will give it a shot, though I would still like to hear about examples
where the difference in semantics affects a real application.
BTW, my list didn't include exporting and using the current
invalidate_mmap_range() because I didn't say what I meant to say.
Hate it when that happens!  ;-)

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-02-19 19:47 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-16 19:09 Paul E. McKenney
2004-02-17  2:31 ` Andrew Morton
2004-02-17  7:35 ` Christoph Hellwig
2004-02-17 12:40   ` Paul E. McKenney
2004-02-18  0:19     ` Andrew Morton
2004-02-18 12:51       ` Arjan van de Ven
2004-02-18 14:00         ` Paul E. McKenney
2004-02-18 21:10           ` Christoph Hellwig
2004-02-18 15:06             ` Paul E. McKenney
2004-02-18 22:21               ` Christoph Hellwig
2004-02-18 22:51                 ` Andrew Morton
2004-02-18 23:00                   ` Christoph Hellwig
2004-02-18 16:21                     ` Paul E. McKenney
2004-02-18 23:32                     ` Andrew Morton
2004-02-19 12:32                       ` Christoph Hellwig
2004-02-19 18:56                         ` Andrew Morton
2004-02-19 19:01                           ` Christoph Hellwig
2004-02-19 13:04                             ` Paul E. McKenney
2004-02-20  3:17                             ` Anton Blanchard
2004-02-20 21:46                               ` Valdis.Kletnieks
2004-02-19  0:28                     ` Andrew Morton
2004-02-18 18:36                       ` Paul E. McKenney
2004-02-19 12:31                       ` Christoph Hellwig
2004-02-19  9:11                         ` Paul E. McKenney
2004-02-19 18:32                           ` Lars Marowsky-Bree
2004-02-19 18:38                             ` Arjan van de Ven
2004-02-19 19:16                             ` viro
2004-02-19 16:15                               ` Paul E. McKenney
2004-02-19 18:59                         ` Tim Bird
2004-02-20  1:27                       ` David Schwartz
2004-02-19  9:11                   ` David Weinehall
2004-02-19  8:58                     ` Paul E. McKenney
2004-03-04  5:51                       ` Mike Fedyk
2004-02-19 10:29                   ` Lars Marowsky-Bree
2004-02-19  9:00                     ` Paul E. McKenney
2004-02-19 11:11                     ` Arjan van de Ven
2004-02-19 11:53                       ` Lars Marowsky-Bree
2004-02-18 18:04         ` Tim Bird
2004-02-19 20:56       ` Daniel Phillips
2004-02-19 22:06         ` Stephen C. Tweedie
2004-02-19 22:31           ` Daniel Phillips
2004-02-19 16:42             ` Paul E. McKenney
2004-02-20  2:06               ` Daniel Phillips
2004-02-19 19:47                 ` Paul E. McKenney [this message]
2004-02-20  5:07                   ` Daniel Phillips
2004-02-20 12:02                     ` Paul E. McKenney
2004-02-20 20:37                       ` Daniel Phillips
2004-02-20 14:01                         ` Paul E. McKenney
2004-02-20 23:00                           ` Daniel Phillips
2004-02-20 16:17                             ` Paul E. McKenney
2004-02-21  3:19                               ` Daniel Phillips
2004-02-21 19:00                               ` Daniel Phillips
2004-02-22 23:39                                 ` Paul E. McKenney
2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 19:12                                     ` Paul E. McKenney
2004-02-25 19:14                                     ` Paul E. McKenney
2004-02-25 22:07                                     ` Andrew Morton
2004-02-25 22:07                                       ` Daniel Phillips
2004-02-25 22:16                                         ` Andrew Morton
2004-02-25 22:46                                           ` Daniel Phillips
2004-03-03  3:00                                       ` Daniel Phillips
2004-03-03  3:15                                         ` Andrew Morton
2004-03-03 13:06                                           ` Daniel Phillips
2004-03-04 18:55                                             ` Paul E. McKenney
2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
2004-02-20 22:16                           ` Daniel Phillips
2004-02-18 12:12     ` Dominik Kubla
2004-02-17 22:22 ` David Weinehall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040219194751.GN1269@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=phillips@arcor.de \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox