linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@arcor.de>
To: paulmck@us.ibm.com
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
	Andrew Morton <akpm@osdl.org>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Non-GPL export of invalidate_mmap_range
Date: Thu, 19 Feb 2004 21:06:55 -0500	[thread overview]
Message-ID: <200402192106.02086.phillips@arcor.de> (raw)
In-Reply-To: <20040219164213.GK1269@us.ibm.com>

On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> change the underlying file.  There are a number of things one can do,
> but one must keep in mind that different processes can MAP_PRIVATE the
> same file at different times, and that some processes might MAP_SHARED it
> at the same time that others MAP_PRIVATE it.  Here are the alternatives
> I can imagine:
>
> 1.	Any time a file changes, create a copy of the old version
> 	for any MAP_PRIVATE vmas.  This would essentially create
> 	a point-in-time copy of any file that a process mapped
> 	MAP_PRIVATE.  This is arguably the most intuitive from the
> 	user's standpoint, but (a) it would not be a small change and
> 	(b) I haven't heard of anyone coming up with a good use for it.
> 	Please enlighten me if I am missing a simple implementation or
> 	compelling uses.

This is MAP_COPY I think.  Even if somebody did manage to sneak it by Linus 
one day it would certainly not be under the guise of MAP_PRIVATE.

> 2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> 	as suggested by Daniel.

I did not suggest that, rather I described the existing practice in OpenGFS 
and Sistina GFS, which at least does not destroy anonymous data.  The correct 
behaviour is the one you describe in option 3, and we are perfectly willing 
to change GFS to obtain that behaviour.  To be precise: I suggest we change 
invalidate_mmap_range to skip anon pages, and change vmtruncate to use 
something else, having the current semantics.

As a historical note: the behavior GFS obtains from option 2 is 
Posix-compliant, but falls short of Linus-compliance, who insists on 
completely accurate invalidation behavior as is right and proper.

> 	This would mean that a
> 	process that had mapped a file MAP_PRIVATE and faulted
> 	in parts of it would see different versions of the file
> 	in different pages.  This should be straightforward to
> 	implement, but in what situation is this skewed view of
> 	the file useful?

You've got me there ;)  However, Posix explicitly blesses this sloppy 
behaviour.  I suppose that with additional user space locking, applications 
could make it work reliably.  But it's still sloppy, and worse, it's 
different from Linux's local filesystem behaviour.

> 3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> 	but invalidate those pages in the vma that have not yet been
> 	modified (that are not anonymous) as suggested by Stephen.
> 	This would mean that a process that had mapped a file MAP_PRIVATE
> 	and written on parts of it would see different versions of the
> 	file in different pages.

This is the correct behaviour and is the current behaviour for local 
filesystems.  In particular, all processes on all nodes will see the current 
contents of any file page that they have not yet faulted in, as of the last 
time any process wrote that file page via mmap or otherwise.

Our goal for GFS, and the goal I'd like to hold up as definitive for any 
distributed filesystem, is to imitate local filesystem semantics exactly, 
even across the cluster.

> Again, in what situation is this skewed view of the file useful?

It's not skewed in any way that I can see.  Though I am no linker expert, I 
dimly recall that these are precisely the semantics ld relies on.

> 5.	The current behavior, where the process's writes do not
> 	flow through to the file, but all changes to the file are
> 	visible to the writing process.

We all agree that's broken, I hope.

> 6.	Requiring that MAP_PRIVATE be applied only to unchanging
> 	files, so that (for example) any change to the underlying
> 	file removes that file from any MAP_PRIVATE address spaces.
> 	Subsequent accesses would get a SEGV, rather than a
> 	surprise from silently changing data.

Creative :)  Well, data that changes "silently" is a fact of life whenever 
data is shared.  It's up to applications to ensure that shared data changes 
predictably.

> So, please help me out here...  What do applications that MAP_PRIVATE
> changing files really expect to happen?

Number 3, is that ok with you?  Incidently, your list doesn't include the 
semantics we'd get by just exporting and using invalidate_mmap_range.  I 
presume that is because you agree it's not correct (it will clobber CoWed 
anonymous pages).

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-02-20  2:06 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-16 19:09 Paul E. McKenney
2004-02-17  2:31 ` Andrew Morton
2004-02-17  7:35 ` Christoph Hellwig
2004-02-17 12:40   ` Paul E. McKenney
2004-02-18  0:19     ` Andrew Morton
2004-02-18 12:51       ` Arjan van de Ven
2004-02-18 14:00         ` Paul E. McKenney
2004-02-18 21:10           ` Christoph Hellwig
2004-02-18 15:06             ` Paul E. McKenney
2004-02-18 22:21               ` Christoph Hellwig
2004-02-18 22:51                 ` Andrew Morton
2004-02-18 23:00                   ` Christoph Hellwig
2004-02-18 16:21                     ` Paul E. McKenney
2004-02-18 23:32                     ` Andrew Morton
2004-02-19 12:32                       ` Christoph Hellwig
2004-02-19 18:56                         ` Andrew Morton
2004-02-19 19:01                           ` Christoph Hellwig
2004-02-19 13:04                             ` Paul E. McKenney
2004-02-20  3:17                             ` Anton Blanchard
2004-02-20 21:46                               ` Valdis.Kletnieks
2004-02-19  0:28                     ` Andrew Morton
2004-02-18 18:36                       ` Paul E. McKenney
2004-02-19 12:31                       ` Christoph Hellwig
2004-02-19  9:11                         ` Paul E. McKenney
2004-02-19 18:32                           ` Lars Marowsky-Bree
2004-02-19 18:38                             ` Arjan van de Ven
2004-02-19 19:16                             ` viro
2004-02-19 16:15                               ` Paul E. McKenney
2004-02-19 18:59                         ` Tim Bird
2004-02-20  1:27                       ` David Schwartz
2004-02-19  9:11                   ` David Weinehall
2004-02-19  8:58                     ` Paul E. McKenney
2004-03-04  5:51                       ` Mike Fedyk
2004-02-19 10:29                   ` Lars Marowsky-Bree
2004-02-19  9:00                     ` Paul E. McKenney
2004-02-19 11:11                     ` Arjan van de Ven
2004-02-19 11:53                       ` Lars Marowsky-Bree
2004-02-18 18:04         ` Tim Bird
2004-02-19 20:56       ` Daniel Phillips
2004-02-19 22:06         ` Stephen C. Tweedie
2004-02-19 22:31           ` Daniel Phillips
2004-02-19 16:42             ` Paul E. McKenney
2004-02-20  2:06               ` Daniel Phillips [this message]
2004-02-19 19:47                 ` Paul E. McKenney
2004-02-20  5:07                   ` Daniel Phillips
2004-02-20 12:02                     ` Paul E. McKenney
2004-02-20 20:37                       ` Daniel Phillips
2004-02-20 14:01                         ` Paul E. McKenney
2004-02-20 23:00                           ` Daniel Phillips
2004-02-20 16:17                             ` Paul E. McKenney
2004-02-21  3:19                               ` Daniel Phillips
2004-02-21 19:00                               ` Daniel Phillips
2004-02-22 23:39                                 ` Paul E. McKenney
2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 19:12                                     ` Paul E. McKenney
2004-02-25 19:14                                     ` Paul E. McKenney
2004-02-25 22:07                                     ` Andrew Morton
2004-02-25 22:07                                       ` Daniel Phillips
2004-02-25 22:16                                         ` Andrew Morton
2004-02-25 22:46                                           ` Daniel Phillips
2004-03-03  3:00                                       ` Daniel Phillips
2004-03-03  3:15                                         ` Andrew Morton
2004-03-03 13:06                                           ` Daniel Phillips
2004-03-04 18:55                                             ` Paul E. McKenney
2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
2004-02-20 22:16                           ` Daniel Phillips
2004-02-18 12:12     ` Dominik Kubla
2004-02-17 22:22 ` David Weinehall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200402192106.02086.phillips@arcor.de \
    --to=phillips@arcor.de \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@us.ibm.com \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox