Non-GPL export of invalidate_mmap

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Non-GPL export of invalidate_mmap_range
@ 2004-02-16 19:09 Paul E. McKenney
  2004-02-17  2:31 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-16 19:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm

Hello, Andrew,

The attached patch to make invalidate_mmap_range() non-GPL exported
seems to have been lost somewhere between 2.6.1-mm4 and 2.6.1-mm5.
It still applies cleanly.  Could you please take it up again?

						Thanx, Paul

------------------------------------------------------------------------



It was EXPORT_SYMBOL_GPL(), however IBM's GPFS is not GPL.

- the GPFS team contributed to the testing and development of
  invaldiate_mmap_range().

- GPFS was developed under AIX and was ported to Linux, and hence meets
  Linus's "some binary modules are OK" exemption.

- The export makes sense: clustering filesystems need it for shootdowns to
  ensure cache coherency.



 25-akpm/mm/memory.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/memory.c~invalidate_mmap_range-non-gpl-export mm/memory.c
--- 25/mm/memory.c~invalidate_mmap_range-non-gpl-export	Mon Nov 24 11:33:19 2003
+++ 25-akpm/mm/memory.c	Mon Nov 24 11:33:34 2003
@@ -1164,7 +1164,7 @@ void invalidate_mmap_range(struct addres
 		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen);
 	up(&mapping->i_shared_sem);
 }
-EXPORT_SYMBOL_GPL(invalidate_mmap_range);
+EXPORT_SYMBOL(invalidate_mmap_range);
 
 /*
  * Handle all mappings that got truncated by a "truncate()"

_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
@ 2004-02-17  2:31 ` Andrew Morton
  2004-02-17  7:35 ` Christoph Hellwig
  2004-02-17 22:22 ` David Weinehall
  2 siblings, 0 replies; 68+ messages in thread
From: Andrew Morton @ 2004-02-17  2:31 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel, linux-mm

"Paul E. McKenney" <paulmck@us.ibm.com> wrote:
>
>  The attached patch to make invalidate_mmap_range() non-GPL exported
>  seems to have been lost somewhere between 2.6.1-mm4 and 2.6.1-mm5.
>  It still applies cleanly.  Could you please take it up again?

I don't have any particular opinions either way but I do recall there was
some disquiet last time this came up.  I'm sure someone will remind us ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
  2004-02-17  2:31 ` Andrew Morton
@ 2004-02-17  7:35 ` Christoph Hellwig
  2004-02-17 12:40   ` Paul E. McKenney
  2004-02-17 22:22 ` David Weinehall
  2 siblings, 1 reply; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-17  7:35 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: akpm, linux-kernel, linux-mm

On Mon, Feb 16, 2004 at 11:09:27AM -0800, Paul E. McKenney wrote:
> Hello, Andrew,
> 
> The attached patch to make invalidate_mmap_range() non-GPL exported
> seems to have been lost somewhere between 2.6.1-mm4 and 2.6.1-mm5.
> It still applies cleanly.  Could you please take it up again?

And there's still no reason to ease IBM's GPL violations by exporting
deep VM internals.  The GPLed DFS you claimed you needed this for still
hasn't shown up but instead you want to change the export all the time.

Tells a lot about IBMs promises..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-17  7:35 ` Christoph Hellwig
@ 2004-02-17 12:40   ` Paul E. McKenney
  2004-02-18  0:19     ` Andrew Morton
  2004-02-18 12:12     ` Dominik Kubla
  0 siblings, 2 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-17 12:40 UTC (permalink / raw)
  To: Christoph Hellwig, akpm, linux-kernel, linux-mm

On Tue, Feb 17, 2004 at 07:35:22AM +0000, Christoph Hellwig wrote:
> On Mon, Feb 16, 2004 at 11:09:27AM -0800, Paul E. McKenney wrote:
> > Hello, Andrew,
> > 
> > The attached patch to make invalidate_mmap_range() non-GPL exported
> > seems to have been lost somewhere between 2.6.1-mm4 and 2.6.1-mm5.
> > It still applies cleanly.  Could you please take it up again?
> 
> And there's still no reason to ease IBM's GPL violations by exporting
> deep VM internals.  The GPLed DFS you claimed you needed this for still
> hasn't shown up but instead you want to change the export all the time.
> 
> Tells a lot about IBMs promises..

Hello, Christoph!

IBM shipped the promised SAN Filesystem some months ago.  The source
code for the Linux client was released under GPL, as promised, and may
be found at the following URL:

https://www6.software.ibm.com/dl/sanfsys/sanfsref-i?S_PKG=dl&S_TACT=&S_CMP=

A PDF of the protocol specification may be found at the following URL:

http://www.storage.ibm.com/software/virtualization/sfs/protocol.html

These URLs do require that you register, but there is no cost nor any
agreement other than the GPL itself.  The Linux client has not been
shipped as product yet.  The code is still quite rough, which is one
reason that it has not be submitted to, for example, LKML.  ;-)

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-17 12:40   ` Paul E. McKenney
@ 2004-02-18  0:19     ` Andrew Morton
  2004-02-18 12:51       ` Arjan van de Ven
  2004-02-19 20:56       ` Daniel Phillips
  2004-02-18 12:12     ` Dominik Kubla
  1 sibling, 2 replies; 68+ messages in thread
From: Andrew Morton @ 2004-02-18  0:19 UTC (permalink / raw)
  To: paulmck; +Cc: hch, linux-kernel, linux-mm

"Paul E. McKenney" <paulmck@us.ibm.com> wrote:
>
> IBM shipped the promised SAN Filesystem some months ago.

Neat, but it's hard to see the relevance of this to your patch.

I don't see any licensing issues with the patch because the filesystem
which needs it clearly meets Linus's "this is not a derived work" criteria.

And I don't see a technical problem with the export: given that we export
truncate_inode_pages() it makes sense to also export the corresponding
pagetable shootdown function.

Yes, this is a sensitive issue.  Can we please evaluate it strictly
according to technical and licensing considerations?

Having said that, what concerns issues remain with Paul's patch?

Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18  0:19     ` Andrew Morton
@ 2004-02-18 12:51       ` Arjan van de Ven
  2004-02-18 14:00         ` Paul E. McKenney
  2004-02-18 18:04         ` Tim Bird
  2004-02-19 20:56       ` Daniel Phillips
  1 sibling, 2 replies; 68+ messages in thread
From: Arjan van de Ven @ 2004-02-18 12:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, hch, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

On Wed, 2004-02-18 at 01:19, Andrew Morton wrote:
> "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
> >
> > IBM shipped the promised SAN Filesystem some months ago.
> 
> Neat, but it's hard to see the relevance of this to your patch.
> 
> I don't see any licensing issues with the patch because the filesystem
> which needs it clearly meets Linus's "this is not a derived work" criteria.

it does?
It needed no changes to work on linux?
it only uses "core unix" apis ?
it needs no changes to the core kernel? *buzz*
It doesn't require knowledge of deep and changing internals ? *buzz*
It doesn't need changing for various kernel versions ?

I remember this baby overriding syscalls and the like not too long
ago...

The word "clearly" isn't correct imo. Just because something has a few
lines of code that started on another OS doesn't make it "clearly" not a
derived work, at least not in my eyes.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 12:51       ` Arjan van de Ven
@ 2004-02-18 14:00         ` Paul E. McKenney
  2004-02-18 21:10           ` Christoph Hellwig
  2004-02-18 18:04         ` Tim Bird
  1 sibling, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-18 14:00 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, hch, linux-kernel, linux-mm

On Wed, Feb 18, 2004 at 01:51:35PM +0100, Arjan van de Ven wrote:
> On Wed, 2004-02-18 at 01:19, Andrew Morton wrote:
> > "Paul E. McKenney" <paulmck@us.ibm.com> wrote:
> > >
> > > IBM shipped the promised SAN Filesystem some months ago.
> > 
> > Neat, but it's hard to see the relevance of this to your patch.
> > 
> > I don't see any licensing issues with the patch because the filesystem
> > which needs it clearly meets Linus's "this is not a derived work" criteria.
> 
> it does?

I believe so.

> It needed no changes to work on linux?

There is a small shim layer required, but the bulk of the code
implementing GPFS is common between AIX and Linux.  It was on AIX
first by quite a few years.

> it only uses "core unix" apis ?

If they are made available, yes.  That is the point of this patch,
after all.  ;-)

> it needs no changes to the core kernel? *buzz*

You -can- run GPFS in the 2.4 kernel without core-kernel patches,
as long as you don't mind putting up with mmap/page-fault races and
with NFS exports from different nodes handing out the same lock to two
different NFS clients.  ;-)

> It doesn't require knowledge of deep and changing internals ? *buzz*

That is indeed the idea.

> It doesn't need changing for various kernel versions ?

It is tested on specific kernel versions.  Clearly moving from 2.4 to
2.6 requires some change.

> I remember this baby overriding syscalls and the like not too long
> ago...

???

> The word "clearly" isn't correct imo. Just because something has a few
> lines of code that started on another OS doesn't make it "clearly" not a
> derived work, at least not in my eyes.

Hmmm...  You seem to have a rather expansive definition of
"a few lines of code".  ;-)

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 14:00         ` Paul E. McKenney
@ 2004-02-18 21:10           ` Christoph Hellwig
  2004-02-18 15:06             ` Paul E. McKenney
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-18 21:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Arjan van de Ven, Andrew Morton, hch, linux-kernel, linux-mm

On Wed, Feb 18, 2004 at 06:00:21AM -0800, Paul E. McKenney wrote:
> There is a small shim layer required, but the bulk of the code
> implementing GPFS is common between AIX and Linux.  It was on AIX
> first by quite a few years.

Small glue layer?  Unfortunately ibm took it off the website, but
the thing is damn huge.

> > it only uses "core unix" apis ?
> 
> If they are made available, yes.  That is the point of this patch,
> after all.  ;-)

No, that's wrong.  It patches the syscall table and plays evilish
tricks with lowlevel MM code.

> > It doesn't require knowledge of deep and changing internals ? *buzz*
> 
> That is indeed the idea.

The one on the ibm website a little ago did.  You're free to upload
a new one that clearly doesn't need all this, but..


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 21:10           ` Christoph Hellwig
@ 2004-02-18 15:06             ` Paul E. McKenney
  2004-02-18 22:21               ` Christoph Hellwig
  0 siblings, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-18 15:06 UTC (permalink / raw)
  To: Christoph Hellwig, Arjan van de Ven, Andrew Morton, linux-kernel,
	linux-mm

On Wed, Feb 18, 2004 at 09:10:35PM +0000, Christoph Hellwig wrote:
> On Wed, Feb 18, 2004 at 06:00:21AM -0800, Paul E. McKenney wrote:
> > There is a small shim layer required, but the bulk of the code
> > implementing GPFS is common between AIX and Linux.  It was on AIX
> > first by quite a few years.
> 
> Small glue layer?  Unfortunately ibm took it off the website, but
> the thing is damn huge.

Perhaps it is huge, but it is a small fraction of the GPFS kernel
implementation.

> > > it only uses "core unix" apis ?
> > 
> > If they are made available, yes.  That is the point of this patch,
> > after all.  ;-)
> 
> No, that's wrong.  It patches the syscall table and plays evilish
> tricks with lowlevel MM code.

The sys_call_table stuff was under #ifdef, and was intended for
use by a research project that was later put out of its misery.
This stuff has since been removed from the source tree.

As to the evilish tricks with lowlevel MM code, the whole point
of the mmap_invalidate_range() patch is to be able to rid GPFS
of exactly these evilish tricks.

> > > It doesn't require knowledge of deep and changing internals ? *buzz*
> > 
> > That is indeed the idea.
> 
> The one on the ibm website a little ago did.  You're free to upload
> a new one that clearly doesn't need all this, but..

Again, the point of the mmap_invalidate_range() patch is to be able
to do precisely this.

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 15:06             ` Paul E. McKenney
@ 2004-02-18 22:21               ` Christoph Hellwig
  2004-02-18 22:51                 ` Andrew Morton
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-18 22:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Hellwig, Arjan van de Ven, Andrew Morton, linux-kernel,
	linux-mm

> The sys_call_table stuff was under #ifdef, and was intended for
> use by a research project that was later put out of its misery.
> This stuff has since been removed from the source tree.
> 
> As to the evilish tricks with lowlevel MM code, the whole point
> of the mmap_invalidate_range() patch is to be able to rid GPFS
> of exactly these evilish tricks.

It didn;t look like that.

Really Paul, the GPL is pretty clear on the derived work thing,
and when you need changes to the core kernel and all kinds of nasty
hacks it's pretty clear it is a derived work.

And it's up to IBM anyway to show it's not a derived work, which is
pretty hard IMHO.

I don't understand why IBM is pushing this dubious change right now,
GPL violation and thus copyright violation issues in Linux is the
last thing IBM wants to see in the press with the current mess going
on, right?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 22:21               ` Christoph Hellwig
@ 2004-02-18 22:51                 ` Andrew Morton
  2004-02-18 23:00                   ` Christoph Hellwig
                                     ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Andrew Morton @ 2004-02-18 22:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: paulmck, arjanv, linux-kernel, linux-mm

Christoph Hellwig <hch@infradead.org> wrote:
>
> I don't understand why IBM is pushing this dubious change right now,

It isn't a dubious change, on technical grounds.  It is reasonable for a
distributed filesystem to want to be able to shoot down pte's which map
sections of pagecache.  Just as it is reasonable for the filesystem to be
able to shoot down the pagecache itself.

We've exported much lower-level stuff than this, because some in-kernel
module happened to use it.

> GPL violation and thus copyright violation issues in Linux is the
> last thing IBM wants to see in the press with the current mess going
> on, right?

Well this is a chicken-and-egg, isn't it.  The only way in which we can
audit the IBM code for its derivedness is for the source to be made
available.  Although not necessarily under GPL.  Or we accept Paul's claim,
which I personally am inclined to do.

Look, this isn't going anywhere.  We have a perfectly reasonable request
from Paul to make this symbol available for IBM's filesystem.  The usual
way to handle this sort of thing is to say "ooh.  shit.  hard." and not
reply to the email.  That is not adequate and hopefully Paul will not let
us get away with it.

We need to give Paul a reasoned and logically consistent answer to his
request.  For that we need to establish some sort of framework against
which to make a decision and then make the decision.  

One approach is a fait-accomplis from the top-level maintainer.  Here,
we're trying to do it in a different way.

I have proposed two criteria upon which this should be judged:

a) Does the export make technical sense?  Do filesystems have
   legitimate need for access to this symbol?

(really, a) is sufficient grounds, but for real-world reasons:)

b) Does the IBM filsystem meet the kernel's licensing requirements?

It appears that the answers are a): yes and b) probably.

Please, feel free to add additional criteria.  We could also ask "do we
want to withhold this symbols to encourage IBM to GPL the filesystem" or
"do we simply refuse to export any symbol which is not used by any GPL
software" (if so, why?).  Over to you.

But at the end of the day, if we decide to not export this symbol, we owe
Paul a good, solid reason, yes?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 22:51                 ` Andrew Morton
@ 2004-02-18 23:00                   ` Christoph Hellwig
  2004-02-18 16:21                     ` Paul E. McKenney
                                       ` (2 more replies)
  2004-02-19  9:11                   ` David Weinehall
  2004-02-19 10:29                   ` Lars Marowsky-Bree
  2 siblings, 3 replies; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-18 23:00 UTC (permalink / raw)
  To: Andrew Morton, tovalds
  Cc: Christoph Hellwig, paulmck, arjanv, linux-kernel, linux-mm

On Wed, Feb 18, 2004 at 02:51:32PM -0800, Andrew Morton wrote:
> a) Does the export make technical sense?  Do filesystems have
>    legitimate need for access to this symbol?
> 
> (really, a) is sufficient grounds, but for real-world reasons:)
> 
> b) Does the IBM filsystem meet the kernel's licensing requirements?
> 
> 
> It appears that the answers are a): yes and b) probably.

Well, the answer to b) is most likely not.  I see it very hard to argue to
have something like gpfs not beeing a derived work.  The glue code they
had online certainly looked very much like a derived work, and if the new
version got better they wouldn't have any reason to remove it from the
website, right?

> Please, feel free to add additional criteria.  We could also ask "do we
> want to withhold this symbols to encourage IBM to GPL the filesystem" or
> "do we simply refuse to export any symbol which is not used by any GPL
> software" (if so, why?).

Yes.  Andrew, please read the GPL, it's very clear about derived works.
Then please tell me why you think gpfs is not a derived work.

> But at the end of the day, if we decide to not export this symbol, we owe
> Paul a good, solid reason, yes?

Yes.  We've traditionally not exported symbols unless we had an intree user,
and especially not if it's for a module that's not GPL licensed.

We had this discussion with Linus a few time, maybe he can comment again to
make it clear.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 23:00                   ` Christoph Hellwig
@ 2004-02-18 16:21                     ` Paul E. McKenney
  2004-02-18 23:32                     ` Andrew Morton
  2004-02-19  0:28                     ` Andrew Morton
  2 siblings, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-18 16:21 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, tovalds, arjanv, linux-kernel,
	linux-mm

On Wed, Feb 18, 2004 at 11:00:55PM +0000, Christoph Hellwig wrote:
> On Wed, Feb 18, 2004 at 02:51:32PM -0800, Andrew Morton wrote:
> > a) Does the export make technical sense?  Do filesystems have
> >    legitimate need for access to this symbol?
> > 
> > (really, a) is sufficient grounds, but for real-world reasons:)
> > 
> > b) Does the IBM filsystem meet the kernel's licensing requirements?
> > 
> > 
> > It appears that the answers are a): yes and b) probably.
> 
> Well, the answer to b) is most likely not.  I see it very hard to argue to
> have something like gpfs not beeing a derived work.  The glue code they
> had online certainly looked very much like a derived work, and if the new
> version got better they wouldn't have any reason to remove it from the
> website, right?

Nice conspiracy theory!  ;-)

It was moved to a different website some time ago:

    http://techsupport.services.ibm.com/server/cluster/fixes/gpfsfixhome.html

The current version is 2.2.0-1.  You will get a tar.gz file, and
the glue code source will be in gpfs.gpl-2.2.0-1.noarch.rpm after
you unpack.

						Thanx, Paul

> > Please, feel free to add additional criteria.  We could also ask "do we
> > want to withhold this symbols to encourage IBM to GPL the filesystem" or
> > "do we simply refuse to export any symbol which is not used by any GPL
> > software" (if so, why?).
> 
> Yes.  Andrew, please read the GPL, it's very clear about derived works.
> Then please tell me why you think gpfs is not a derived work.
> 
> > But at the end of the day, if we decide to not export this symbol, we owe
> > Paul a good, solid reason, yes?
> 
> Yes.  We've traditionally not exported symbols unless we had an intree user,
> and especially not if it's for a module that's not GPL licensed.
> 
> We had this discussion with Linus a few time, maybe he can comment again to
> make it clear.
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 23:00                   ` Christoph Hellwig
  2004-02-18 16:21                     ` Paul E. McKenney
@ 2004-02-18 23:32                     ` Andrew Morton
  2004-02-19 12:32                       ` Christoph Hellwig
  2004-02-19  0:28                     ` Andrew Morton
  2 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2004-02-18 23:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: paulmck, arjanv, linux-kernel, linux-mm

Christoph Hellwig <hch@infradead.org> wrote:
>
> Yes.  Andrew, please read the GPL, it's very clear about derived works.
> Then please tell me why you think gpfs is not a derived work.

I haven't seen the code.

> > But at the end of the day, if we decide to not export this symbol, we owe
> > Paul a good, solid reason, yes?
> 
> Yes.  We've traditionally not exported symbols unless we had an intree user,
> and especially not if it's for a module that's not GPL licensed.

That's certainly a good rule of thumb and we (and I) have used it before.

What is the reasoning behind it?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 23:32                     ` Andrew Morton
@ 2004-02-19 12:32                       ` Christoph Hellwig
  2004-02-19 18:56                         ` Andrew Morton
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-19 12:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, paulmck, arjanv, linux-kernel, linux-mm, torvalds

On Wed, Feb 18, 2004 at 03:32:34PM -0800, Andrew Morton wrote:
> > Yes.  We've traditionally not exported symbols unless we had an intree user,
> > and especially not if it's for a module that's not GPL licensed.
> 
> That's certainly a good rule of thumb and we (and I) have used it before.
> 
> What is the reasoning behind it?

The reason is that someone who wants to distribute a binary only module
has to show it's module is not a derived work, and someone who needs new
core in the kernel and new exports pretty much shows his work is deeply
integrated with the kernel.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 12:32                       ` Christoph Hellwig
@ 2004-02-19 18:56                         ` Andrew Morton
  2004-02-19 19:01                           ` Christoph Hellwig
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2004-02-19 18:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: paulmck, arjanv, linux-kernel, linux-mm, torvalds

Christoph Hellwig <hch@infradead.org> wrote:
>
> On Wed, Feb 18, 2004 at 03:32:34PM -0800, Andrew Morton wrote:
> > > Yes.  We've traditionally not exported symbols unless we had an intree user,
> > > and especially not if it's for a module that's not GPL licensed.
> > 
> > That's certainly a good rule of thumb and we (and I) have used it before.
> > 
> > What is the reasoning behind it?
> 
> The reason is that someone who wants to distribute a binary only module
> has to show it's module is not a derived work, and someone who needs new
> core in the kernel and new exports pretty much shows his work is deeply
> integrated with the kernel.

Needing access to invalidate_mmap_range() is surely not an indication of a
derived work.  It is an indication of a need for a reliable way to achieve
inter-node cache consistency.  Other distributed filesystems will need this
and probably AIX already provides it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 18:56                         ` Andrew Morton
@ 2004-02-19 19:01                           ` Christoph Hellwig
  2004-02-19 13:04                             ` Paul E. McKenney
  2004-02-20  3:17                             ` Anton Blanchard
  0 siblings, 2 replies; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-19 19:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, paulmck, arjanv, linux-kernel, linux-mm, torvalds

On Thu, Feb 19, 2004 at 10:56:08AM -0800, Andrew Morton wrote:
> inter-node cache consistency.  Other distributed filesystems will need this
> and probably AIX already provides it.

You've probably not seen the AIX VM architecture.  Good for you as it's
not good for your stomache.  I did when I still was SCAldera and although
my NDAs don't allow me to go into details I can tell you that the AIX
VM architecture is deeply tied into the segment architecture of the Power
CPU and signicicantly different from any other UNIX variant.

So porting code from AIX that touches anything VM related is a complete
rewrite.

Nice argumentation though, for everything but AIX it might actually have
worked :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 19:01                           ` Christoph Hellwig
@ 2004-02-19 13:04                             ` Paul E. McKenney
  2004-02-20  3:17                             ` Anton Blanchard
  1 sibling, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19 13:04 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, arjanv, linux-kernel, linux-mm,
	torvalds

On Thu, Feb 19, 2004 at 07:01:41PM +0000, Christoph Hellwig wrote:
> On Thu, Feb 19, 2004 at 10:56:08AM -0800, Andrew Morton wrote:
> > inter-node cache consistency.  Other distributed filesystems will need this
> > and probably AIX already provides it.
> 
> You've probably not seen the AIX VM architecture.  Good for you as it's
> not good for your stomache.  I did when I still was SCAldera and although
> my NDAs don't allow me to go into details I can tell you that the AIX
> VM architecture is deeply tied into the segment architecture of the Power
> CPU and signicicantly different from any other UNIX variant.
> 
> So porting code from AIX that touches anything VM related is a complete
> rewrite.

Or, alternatively, requires a surprisingly large glue-code layer.

							Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 19:01                           ` Christoph Hellwig
  2004-02-19 13:04                             ` Paul E. McKenney
@ 2004-02-20  3:17                             ` Anton Blanchard
  2004-02-20 21:46                               ` Valdis.Kletnieks
  1 sibling, 1 reply; 68+ messages in thread
From: Anton Blanchard @ 2004-02-20  3:17 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, paulmck, arjanv, linux-kernel,
	linux-mm, torvalds

 
> You've probably not seen the AIX VM architecture.  Good for you as it's
> not good for your stomache.  I did when I still was SCAldera and although
> my NDAs don't allow me to go into details I can tell you that the AIX
> VM architecture is deeply tied into the segment architecture of the Power
> CPU and signicicantly different from any other UNIX variant.

Interesting, what version of AIX did you get access to? And how can you
be sure thats still the case?

Anton
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20  3:17                             ` Anton Blanchard
@ 2004-02-20 21:46                               ` Valdis.Kletnieks
  0 siblings, 0 replies; 68+ messages in thread
From: Valdis.Kletnieks @ 2004-02-20 21:46 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Christoph Hellwig, Andrew Morton, paulmck, arjanv, linux-kernel,
	linux-mm, torvalds

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

On Fri, 20 Feb 2004 14:17:51 +1100, Anton Blanchard <anton@samba.org>  said:
>  
> > You've probably not seen the AIX VM architecture.  Good for you as it's
> > not good for your stomache.  I did when I still was SCAldera and although
> > my NDAs don't allow me to go into details I can tell you that the AIX
> > VM architecture is deeply tied into the segment architecture of the Power
> > CPU and signicicantly different from any other UNIX variant.
> 
> Interesting, what version of AIX did you get access to? And how can you
> be sure thats still the case?

You don't need access to AIX source.  Reading the IBM Redbook on writing a
device driver for AIX is sufficient proof. Or even reading up on how to get
more heap space than the usual number of segment registers using the 'ld'
command (yes, it's userspace visible).

And Christoph isn't pulling your leg -  it's pretty bizzare...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 23:00                   ` Christoph Hellwig
  2004-02-18 16:21                     ` Paul E. McKenney
  2004-02-18 23:32                     ` Andrew Morton
@ 2004-02-19  0:28                     ` Andrew Morton
  2004-02-18 18:36                       ` Paul E. McKenney
                                         ` (2 more replies)
  2 siblings, 3 replies; 68+ messages in thread
From: Andrew Morton @ 2004-02-19  0:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: paulmck, arjanv, linux-kernel, linux-mm

Christoph Hellwig <hch@infradead.org> wrote:
>
> Yes.  Andrew, please read the GPL, it's very clear about derived works.
> Then please tell me why you think gpfs is not a derived work.

OK, so I looked at the wrapper.  It wasn't a tremendously pleasant
experience.  It is huge, and uses fairly standard-looking filesytem
interfaces and locking primitives.  Also some awareness of NFSV4 for some
reason.

Still, the wrapper is GPL so this is not relevant.  Its only use is to tell
us whether or not the non-GPL bits are "derived" from Linux, and it
doesn't do that.

The GPL doesn't define a derived work.  It says

  "If identifiable sections of that work are not derived from the
   Program, and can be reasonably considered independent and separate works
   in themselves, then this License, and its terms, do not apply to those
   sections when you distribute them as separate works.  But when you
   distribute the same sections as part of a whole which is a work based on
   the Program, the distribution of the whole must be on the terms of this
   License, ..."

And the "But when you distribute..." part is what the Linus doctrine rubs
out.  Because it is unreasonable to say that a large piece of work such as
this is "derived" from Linux.

Why do you believe that GPFS represents a kernel licensing violation?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19  0:28                     ` Andrew Morton
@ 2004-02-18 18:36                       ` Paul E. McKenney
  2004-02-19 12:31                       ` Christoph Hellwig
  2004-02-20  1:27                       ` David Schwartz
  2 siblings, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-18 18:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, arjanv, linux-kernel, linux-mm

On Wed, Feb 18, 2004 at 04:28:58PM -0800, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > Yes.  Andrew, please read the GPL, it's very clear about derived works.
> > Then please tell me why you think gpfs is not a derived work.
> 
> OK, so I looked at the wrapper.  It wasn't a tremendously pleasant
> experience.  It is huge, and uses fairly standard-looking filesytem
> interfaces and locking primitives.  Also some awareness of NFSV4 for some
> reason.
>
> Still, the wrapper is GPL so this is not relevant.  Its only use is to tell
> us whether or not the non-GPL bits are "derived" from Linux, and it
> doesn't do that.

In the spirit of full disclosure, the wrapper is actually
distributed under the BSD license.  The GPFS guys tell
me that the "gpl" in the RPM name means "GPFS Portability
Layer".

					Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19  0:28                     ` Andrew Morton
  2004-02-18 18:36                       ` Paul E. McKenney
@ 2004-02-19 12:31                       ` Christoph Hellwig
  2004-02-19  9:11                         ` Paul E. McKenney
  2004-02-19 18:59                         ` Tim Bird
  2004-02-20  1:27                       ` David Schwartz
  2 siblings, 2 replies; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-19 12:31 UTC (permalink / raw)
  To: Andrew Morton, torvalds
  Cc: Christoph Hellwig, paulmck, arjanv, linux-kernel, linux-mm

On Wed, Feb 18, 2004 at 04:28:58PM -0800, Andrew Morton wrote:
> OK, so I looked at the wrapper.  It wasn't a tremendously pleasant
> experience.  It is huge, and uses fairly standard-looking filesytem
> interfaces and locking primitives.  Also some awareness of NFSV4 for some
> reason.

And pokes deep into internal structures that it shouldn't.

> Still, the wrapper is GPL so this is not relevant.

It's BSD licensed - they couldn't distribute it together with GPFS if
it was GPL.

> Its only use is to tell
> us whether or not the non-GPL bits are "derived" from Linux, and it
> doesn't do that.

Well, something that needs an almost one megabyte big wrapper per defintion
is not a standalone work but something that's deeply interwinded with
the kernel.  The tons of kernel version checks certainly show it's poking
deeper than it should.

> Why do you believe that GPFS represents a kernel licensing violation?

See above.  Something that pokes deep into internal structures and even
needs new exports certainly is a derived work.  There's a few different
interpretations of the derived works clause in the GPL around, the FSF
one wouldn't allow binary modules at all, and Linus' one is also pretty
strict.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 12:31                       ` Christoph Hellwig
@ 2004-02-19  9:11                         ` Paul E. McKenney
  2004-02-19 18:32                           ` Lars Marowsky-Bree
  2004-02-19 18:59                         ` Tim Bird
  1 sibling, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19  9:11 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton, torvalds, arjanv, linux-kernel,
	linux-mm

On Thu, Feb 19, 2004 at 12:31:10PM +0000, Christoph Hellwig wrote:
> On Wed, Feb 18, 2004 at 04:28:58PM -0800, Andrew Morton wrote:
> > OK, so I looked at the wrapper.  It wasn't a tremendously pleasant
> > experience.  It is huge, and uses fairly standard-looking filesytem
> > interfaces and locking primitives.  Also some awareness of NFSV4 for some
> > reason.
> 
> And pokes deep into internal structures that it shouldn't.

Again, the point of the patch is to get rid of such poking.

> > Still, the wrapper is GPL so this is not relevant.
> 
> It's BSD licensed - they couldn't distribute it together with GPFS if
> it was GPL.

Yep.

> > Its only use is to tell
> > us whether or not the non-GPL bits are "derived" from Linux, and it
> > doesn't do that.
> 
> Well, something that needs an almost one megabyte big wrapper per defintion
> is not a standalone work but something that's deeply interwinded with
> the kernel.  The tons of kernel version checks certainly show it's poking
> deeper than it should.

On the size, I beg to differ.  One of the reasons the glue module is
so large is because of the fact that GPFS was written to run in an AIX
kernel rather than a Linux kernel.  I would guess that if GPFS had
been instead been derived from Linux, the glue module would be much
smaller.  On the kernel version checks, the point of the patch is
to get rid of at least some of these.

> > Why do you believe that GPFS represents a kernel licensing violation?
> 
> See above.  Something that pokes deep into internal structures and even
> needs new exports certainly is a derived work.  There's a few different
> interpretations of the derived works clause in the GPL around, the FSF
> one wouldn't allow binary modules at all, and Linus' one is also pretty
> strict.

So why are you coming out against something that you seem to believe
allows -better- alignment with Linus's rules?

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19  9:11                         ` Paul E. McKenney
@ 2004-02-19 18:32                           ` Lars Marowsky-Bree
  2004-02-19 18:38                             ` Arjan van de Ven
  2004-02-19 19:16                             ` viro
  0 siblings, 2 replies; 68+ messages in thread
From: Lars Marowsky-Bree @ 2004-02-19 18:32 UTC (permalink / raw)
  To: Paul E. McKenney, Christoph Hellwig, Andrew Morton, torvalds,
	arjanv, linux-kernel, linux-mm

On 2004-02-19T01:11:29,
   "Paul E. McKenney" <paulmck@us.ibm.com> said:

> > And pokes deep into internal structures that it shouldn't.
> Again, the point of the patch is to get rid of such poking.

I think this fiddling about this particular exported symbol is hiding
the real issue.

It seems that Christoph believes that _inherently_, any filesystem
kernel module on Linux must be a derived work, because it is intimately
tied into the kernel core / VFS. I can certainly see the reasoning
here, and it is a valid point of view.

Do we want to allow non-OSS filesystems in kernel space at all? That's
the entire question.

Personally, I would go with "No" and support the consequences of this,
because I believe in Open Source; and that the value proposition of
Linux is /not/ in binary-only modules, and I would /not/ sacrifice the
OSS principles of the literal core of the Linux project for a short term
pay-off.

(But I'm personally trying to solve that by making them superfluous and
putting them out of business by getting an OSS CFS, which seems to be
more amiable ;-)

Only if we can settle this, we can answer this export question. If we
want to allow them, the export is a perfectly reasonable thing to ask
for. If not, we probably need to add a few more _GPL barriers.

A rule of thumb might be whether any code in the tree uses a given
export, and if not, prune it. Anything which even we don't use or export
across the user-land boundary certainly qualifies as a kernel interna.

Currently, no kernel module seems to use this export. So I'd think such
a point could certainly be made.

Sincerely,
    Lars Marowsky-Bree <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 18:32                           ` Lars Marowsky-Bree
@ 2004-02-19 18:38                             ` Arjan van de Ven
  2004-02-19 19:16                             ` viro
  1 sibling, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2004-02-19 18:38 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]


On Thu, Feb 19, 2004 at 07:32:10PM +0100, Lars Marowsky-Bree wrote:
> 
> A rule of thumb might be whether any code in the tree uses a given
> export, and if not, prune it. Anything which even we don't use or export
> across the user-land boundary certainly qualifies as a kernel interna.

political issues aside, this sounds like a decent rule-of-thumb in general;
if NO module uses it, it is most likely the wrong API (for example obsoleted API left
around) or something really internal.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 18:32                           ` Lars Marowsky-Bree
  2004-02-19 18:38                             ` Arjan van de Ven
@ 2004-02-19 19:16                             ` viro
  2004-02-19 16:15                               ` Paul E. McKenney
  1 sibling, 1 reply; 68+ messages in thread
From: viro @ 2004-02-19 19:16 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Paul E. McKenney, Christoph Hellwig, Andrew Morton, torvalds,
	arjanv, linux-kernel, linux-mm

On Thu, Feb 19, 2004 at 07:32:10PM +0100, Lars Marowsky-Bree wrote:
> Only if we can settle this, we can answer this export question. If we
> want to allow them, the export is a perfectly reasonable thing to ask
> for. If not, we probably need to add a few more _GPL barriers.
> 
> A rule of thumb might be whether any code in the tree uses a given
> export, and if not, prune it. Anything which even we don't use or export
> across the user-land boundary certainly qualifies as a kernel interna.
> 
> Currently, no kernel module seems to use this export. So I'd think such
> a point could certainly be made.

I'm not sure.  I'm all for trimming the export list, but the real questions
are
	* does that export make sense?
	* does it impose extra restrictions on what we can do with core
code? (without breaking it, that is)
	* is it needed in the first place?  If it's redundant - to hell it
goes.

Note that majority of the exported symbols fail at least one of the above
and _that_ is why they should be killed.  Whether their users are GPL or
not doesn't matter - if they don't make sense, they must die, no matter
what b0rken code might be using them.

IMNSHO the questions above should be answered first and AFAICS they hadn't
been even discussed in that case.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 19:16                             ` viro
@ 2004-02-19 16:15                               ` Paul E. McKenney
  0 siblings, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19 16:15 UTC (permalink / raw)
  To: viro
  Cc: Lars Marowsky-Bree, Christoph Hellwig, Andrew Morton, torvalds,
	arjanv, linux-kernel, linux-mm

On Thu, Feb 19, 2004 at 07:16:33PM +0000, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Thu, Feb 19, 2004 at 07:32:10PM +0100, Lars Marowsky-Bree wrote:
> > Only if we can settle this, we can answer this export question. If we
> > want to allow them, the export is a perfectly reasonable thing to ask
> > for. If not, we probably need to add a few more _GPL barriers.
> > 
> > A rule of thumb might be whether any code in the tree uses a given
> > export, and if not, prune it. Anything which even we don't use or export
> > across the user-land boundary certainly qualifies as a kernel interna.
> > 
> > Currently, no kernel module seems to use this export. So I'd think such
> > a point could certainly be made.

Good questions, see below for my nominations for the answers.

> I'm not sure.  I'm all for trimming the export list, but the real questions
> are
> 	* does that export make sense?

		Yes, invalidate_mmap_range() permits a distributed
		filesystem to shoot down mmap()s of a to-be-modified file
		so that all nodes see a consistent view of that file's
		data.  Having an export means that this functionality
		need not be reproduced in each and every DFS, reducing
		DFS intrusiveness.

		Of course, the issue pointed out by Daniel does need
		to be addressed.  More on that shortly.

> 	* does it impose extra restrictions on what we can do with core
> code? (without breaking it, that is)

		The invalidate_mmap_range() API is pretty generic.
		It takes an address_space structure, an offset, and a
		length.  The caller can treat the address_space structure
		pointer as a cookie, so the only sorts of changes that
		could break this API would be ones that entirely did away
		with the concept of an address space.  Or that introduced
		the concept of a file with non-integer offsets, in which
		case invalidate_mmap_range() is the least of our worries.

		Either case could happen, I suppose, but both seem a
		bit unlikely.

> 	* is it needed in the first place?  If it's redundant - to hell it
> goes.

		Yes, to prevent DFSes from having to reach so far
		into the guts of the Linux VM system.

							Thanx, Paul

> Note that majority of the exported symbols fail at least one of the above
> and _that_ is why they should be killed.  Whether their users are GPL or
> not doesn't matter - if they don't make sense, they must die, no matter
> what b0rken code might be using them.
> 
> IMNSHO the questions above should be answered first and AFAICS they hadn't
> been even discussed in that case.
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 12:31                       ` Christoph Hellwig
  2004-02-19  9:11                         ` Paul E. McKenney
@ 2004-02-19 18:59                         ` Tim Bird
  1 sibling, 0 replies; 68+ messages in thread
From: Tim Bird @ 2004-02-19 18:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, torvalds, paulmck, arjanv, linux-kernel, linux-mm

Christoph Hellwig wrote:
> On Wed, Feb 18, 2004 at 04:28:58PM -0800, Andrew Morton wrote: 
>>OK, so I looked at the wrapper.  It wasn't a tremendously pleasant
>>experience.  It is huge, and uses fairly standard-looking filesytem
>>interfaces and locking primitives.  Also some awareness of NFSV4 for some
>>reason.
>> 
>>Still, the wrapper is GPL so this is not relevant.
> 
> Well, something that needs an almost one megabyte big wrapper per defintion
> is not a standalone work but something that's deeply interwinded with
> the kernel.  The tons of kernel version checks certainly show it's poking
> deeper than it should.
>...
 >
> Something that pokes deep into internal structures and even
> needs new exports certainly is a derived work. 

I'd argue (again) that having a complex glue layer is not evidence
per se of the glued module being a derived work.  If anything,
it is evidence to the contrary.  But it depends on the circumstances.

The question for GPFS itself is whether it was modified to run with
Linux, and how it was modified, and how much it was modified.

If your argument is that Linux, after being modified with the glue
layer, is now a derivative work of the glued module, that seems
more likely.  I'm not sure how the GPL reads on that case.

=============================
Tim Bird
Architecture Group Co-Chair
CE Linux Forum
Senior Staff Engineer
Sony Electronics
E-mail: Tim.Bird@am.sony.com
=============================

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: Non-GPL export of invalidate_mmap_range
  2004-02-19  0:28                     ` Andrew Morton
  2004-02-18 18:36                       ` Paul E. McKenney
  2004-02-19 12:31                       ` Christoph Hellwig
@ 2004-02-20  1:27                       ` David Schwartz
  2 siblings, 0 replies; 68+ messages in thread
From: David Schwartz @ 2004-02-20  1:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: paulmck, arjanv, linux-mm

> Christoph Hellwig <hch@infradead.org> wrote:

> And the "But when you distribute..." part is what the Linus doctrine rubs
> out.  Because it is unreasonable to say that a large piece of work such as
> this is "derived" from Linux.

	I think you misunderstand how the Linux kernel uses the term "derive". By a
"derived work", the GPL is invoking the legal copyright principle of a
"derivative work". You can google this term to get a better understanding of
it. The term "derived work" does not imply that the work is wholly derived.
Rather, it means that some part of the protected expression of the original
work is present in the work.

	In the specific case of Linux kernel modules, the question is whether some
part of the protectable expression in the Linkx kernel is present in the
module. This is a major issue for compiled modules distributed in object
form because the compilation process, through header files, puts pieces of
the header files in the resultant object.

	If the distributed work is in source code form, however, the argument
becomes much different. You are not likely to find pieces of the kernel code
present in the source code that's distributed. However, one possible
argument is that the module is a "sequel" to the kernel. It takes the
framework the kernel creates and builds on it. I can't write and sell a Star
Trek novel for just this reason, it would be derived from previous such
novels because it borrows their universe.

	Another possible argument is that the module code is so intertwined with
kernel code that you can't consider the module by itself a work at all.

	In the present case, we have a shim that is distributed in source form. The
main module works with other operating systems and doesn't contain much
Linux-specific code. So the module itself is not a derived work of Linux.
The shim is probably a derived work, but the shim is open source.

	So if there's a license issue, I don't know what it is.

	DS


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 22:51                 ` Andrew Morton
  2004-02-18 23:00                   ` Christoph Hellwig
@ 2004-02-19  9:11                   ` David Weinehall
  2004-02-19  8:58                     ` Paul E. McKenney
  2004-02-19 10:29                   ` Lars Marowsky-Bree
  2 siblings, 1 reply; 68+ messages in thread
From: David Weinehall @ 2004-02-19  9:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, paulmck, arjanv, linux-kernel, linux-mm

On Wed, Feb 18, 2004 at 02:51:32PM -0800, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > I don't understand why IBM is pushing this dubious change right now,
> 
> It isn't a dubious change, on technical grounds.  It is reasonable for a
> distributed filesystem to want to be able to shoot down pte's which map
> sections of pagecache.  Just as it is reasonable for the filesystem to be
> able to shoot down the pagecache itself.
> 
> We've exported much lower-level stuff than this, because some in-kernel
> module happened to use it.

Probably not always the right choice, though...  I highly suspect we
far to much of our intestines are easily available.

[snip]

> We need to give Paul a reasoned and logically consistent answer to his
> request.  For that we need to establish some sort of framework against
> which to make a decision and then make the decision.  
> 
> One approach is a fait-accomplis from the top-level maintainer.  Here,
> we're trying to do it in a different way.
> 
> I have proposed two criteria upon which this should be judged:
> 
> a) Does the export make technical sense?  Do filesystems have
>    legitimate need for access to this symbol?
> 
> (really, a) is sufficient grounds, but for real-world reasons:)
> 
> b) Does the IBM filsystem meet the kernel's licensing requirements?
> 
> 
> It appears that the answers are a): yes and b) probably.

a.) Definitely
b.) Perhaps
 
> Please, feel free to add additional criteria.  We could also ask "do we
> want to withhold this symbols to encourage IBM to GPL the filesystem" or
> "do we simply refuse to export any symbol which is not used by any GPL
> software" (if so, why?).  Over to you.

Well, I wasn't altogether joking when I suggested IBM should GPL gpfs.
A couple of questions:

* Is gpfs a commercial product in the sense that it's something IBM
  earns revenue from?
* Does gpfs contain third party "Intellectual Property" (no, I'm not
  particularly fond of using that expression, but I digress)

If the answer is NO to both of these questions, why _not_ GPL the code?
If the answer is NO to only the second question, is the revenue from
gpfs big enough to warrant keeping it proprietary?

> But at the end of the day, if we decide to not export this symbol, we owe
> Paul a good, solid reason, yes?

Yup.  Silence isn't always golden, sometimes it's outright shitty.


Regards: David Weinehall
-- 
 /) David Weinehall <tao@acc.umu.se> /) Northern lights wander      (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/    (/   Full colour fire           (/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19  9:11                   ` David Weinehall
@ 2004-02-19  8:58                     ` Paul E. McKenney
  2004-03-04  5:51                       ` Mike Fedyk
  0 siblings, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19  8:58 UTC (permalink / raw)
  To: Andrew Morton, Christoph Hellwig, arjanv, linux-kernel, linux-mm

On Thu, Feb 19, 2004 at 10:11:32AM +0100, David Weinehall wrote:
> On Wed, Feb 18, 2004 at 02:51:32PM -0800, Andrew Morton wrote:
> > Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > I don't understand why IBM is pushing this dubious change right now,
> > 
> > It isn't a dubious change, on technical grounds.  It is reasonable for a
> > distributed filesystem to want to be able to shoot down pte's which map
> > sections of pagecache.  Just as it is reasonable for the filesystem to be
> > able to shoot down the pagecache itself.
> > 
> > We've exported much lower-level stuff than this, because some in-kernel
> > module happened to use it.
> 
> Probably not always the right choice, though...  I highly suspect we
> far to much of our intestines are easily available.

Again, the whole point of the patch is to -reduce- the degree of
intestinal export.

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19  8:58                     ` Paul E. McKenney
@ 2004-03-04  5:51                       ` Mike Fedyk
  0 siblings, 0 replies; 68+ messages in thread
From: Mike Fedyk @ 2004-03-04  5:51 UTC (permalink / raw)
  To: paulmck; +Cc: Andrew Morton, Christoph Hellwig, arjanv, linux-kernel, linux-mm

Paul E. McKenney wrote:
> On Thu, Feb 19, 2004 at 10:11:32AM +0100, David Weinehall wrote:
> 
>>On Wed, Feb 18, 2004 at 02:51:32PM -0800, Andrew Morton wrote:
>>
>>>Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>>>I don't understand why IBM is pushing this dubious change right now,
>>>
>>>It isn't a dubious change, on technical grounds.  It is reasonable for a
>>>distributed filesystem to want to be able to shoot down pte's which map
>>>sections of pagecache.  Just as it is reasonable for the filesystem to be
>>>able to shoot down the pagecache itself.
>>>
>>>We've exported much lower-level stuff than this, because some in-kernel
>>>module happened to use it.
>>
>>Probably not always the right choice, though...  I highly suspect we
>>far to much of our intestines are easily available.
> 
> 
> Again, the whole point of the patch is to -reduce- the degree of
> intestinal export.
> 
> 						Thanx, Paul

Paul, this still doesn't answer why GPFS can't be released under the GPL.

If this has been answered, I'd love to see a pointer to which archives 
in which I should search.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 22:51                 ` Andrew Morton
  2004-02-18 23:00                   ` Christoph Hellwig
  2004-02-19  9:11                   ` David Weinehall
@ 2004-02-19 10:29                   ` Lars Marowsky-Bree
  2004-02-19  9:00                     ` Paul E. McKenney
  2004-02-19 11:11                     ` Arjan van de Ven
  2 siblings, 2 replies; 68+ messages in thread
From: Lars Marowsky-Bree @ 2004-02-19 10:29 UTC (permalink / raw)
  To: Andrew Morton, Christoph Hellwig; +Cc: paulmck, arjanv, linux-kernel, linux-mm

On 2004-02-18T14:51:32,
   Andrew Morton <akpm@osdl.org> said:

> a) Does the export make technical sense?  Do filesystems have
>    legitimate need for access to this symbol?
> 
> (really, a) is sufficient grounds, but for real-world reasons:)

Technically, I assume both OCFS, Lustre, (OpenGFS), PolyServe and
basically /everyone/ doing a cluster file system, proprietary or not,
will eventually need this capability. Vendors have included hooks for
this in 2.4 already anyway.

So on technical grounds, I'm strongly inclined to support it, but I
would like to suggest that it is ensured that the hook is sufficient for
all of the named CFS.

Paul, have you spoken with them?

> b) Does the IBM filsystem meet the kernel's licensing requirements?

If you are worried about this one, you can export it GPL-only, which as
an Open Source developer I'd appreciate, but from a real-world business
perspective would be unhappy about ;-)


Sincerely,
    Lars Marowsky-Bree <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 10:29                   ` Lars Marowsky-Bree
@ 2004-02-19  9:00                     ` Paul E. McKenney
  2004-02-19 11:11                     ` Arjan van de Ven
  1 sibling, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19  9:00 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Andrew Morton, Christoph Hellwig, arjanv, linux-kernel, linux-mm

On Thu, Feb 19, 2004 at 11:29:00AM +0100, Lars Marowsky-Bree wrote:
> On 2004-02-18T14:51:32,
>    Andrew Morton <akpm@osdl.org> said:
> 
> > a) Does the export make technical sense?  Do filesystems have
> >    legitimate need for access to this symbol?
> > 
> > (really, a) is sufficient grounds, but for real-world reasons:)
> 
> Technically, I assume both OCFS, Lustre, (OpenGFS), PolyServe and
> basically /everyone/ doing a cluster file system, proprietary or not,
> will eventually need this capability. Vendors have included hooks for
> this in 2.4 already anyway.
> 
> So on technical grounds, I'm strongly inclined to support it, but I
> would like to suggest that it is ensured that the hook is sufficient for
> all of the named CFS.
> 
> Paul, have you spoken with them?

Lustre, yes.  At OLS last summer, Peter Braam said that it was useful.
The others, no, but they are certainly free to chime in.

> > b) Does the IBM filsystem meet the kernel's licensing requirements?
> 
> If you are worried about this one, you can export it GPL-only, which as
> an Open Source developer I'd appreciate, but from a real-world business
> perspective would be unhappy about ;-)

Been there, done that.  ;-)

						Thanx, Paul

> Sincerely,
>     Lars Marowsky-Bree <lmb@suse.de>
> 
> -- 
> High Availability & Clustering	      \ ever tried. ever failed. no matter.
> SUSE Labs			      | try again. fail again. fail better.
> Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett
> 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 10:29                   ` Lars Marowsky-Bree
  2004-02-19  9:00                     ` Paul E. McKenney
@ 2004-02-19 11:11                     ` Arjan van de Ven
  2004-02-19 11:53                       ` Lars Marowsky-Bree
  1 sibling, 1 reply; 68+ messages in thread
From: Arjan van de Ven @ 2004-02-19 11:11 UTC (permalink / raw)
  To: Lars Marowsky-Bree
  Cc: Andrew Morton, Christoph Hellwig, paulmck, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]


On Thu, Feb 19, 2004 at 11:29:00AM +0100, Lars Marowsky-Bree wrote:
> > b) Does the IBM filsystem meet the kernel's licensing requirements?
> 
> If you are worried about this one, you can export it GPL-only, which as
> an Open Source developer I'd appreciate, but from a real-world business
> perspective would be unhappy about ;-)

It already is exported GPL-only, this is all about changing it to be for
linking bin only modules as well...

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 11:11                     ` Arjan van de Ven
@ 2004-02-19 11:53                       ` Lars Marowsky-Bree
  0 siblings, 0 replies; 68+ messages in thread
From: Lars Marowsky-Bree @ 2004-02-19 11:53 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, linux-mm

On 2004-02-19T12:11:17,
   Arjan van de Ven <arjanv@redhat.com> said:

> It already is exported GPL-only, this is all about changing it to be for
> linking bin only modules as well...

I blame lack of coffee and want a brown paper bag. Sorry. ;)


Sincerely,
    Lars Marowsky-Bree <lmb@suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18 12:51       ` Arjan van de Ven
  2004-02-18 14:00         ` Paul E. McKenney
@ 2004-02-18 18:04         ` Tim Bird
  1 sibling, 0 replies; 68+ messages in thread
From: Tim Bird @ 2004-02-18 18:04 UTC (permalink / raw)
  To: arjanv; +Cc: Andrew Morton, paulmck, hch, linux-kernel, linux-mm

I should know better than to stir up a hornets nest
by discussing GPL issues on this list... :)

Arjan van de Ven wrote:
> On Wed, 2004-02-18 at 01:19, Andrew Morton wrote:
>>Neat, but it's hard to see the relevance of this to your patch.
>>I don't see any licensing issues with the patch because the filesystem
>>which needs it clearly meets Linus's "this is not a derived work"
>>criteria.
> 
> it does?
...
> it needs no changes to the core kernel? *buzz*
Actually, this would tend towards an interpretation that
it was NOT a derived work.

That is, if a the Linux kernel must be modified in order
to run with a piece of software, that's one indicator
that the piece of software (when standing alone) may not
be derived from the kernel.  I am purposely avoiding the
"but what about when it's linked" argument.

=============================
Tim Bird
Architecture Group Co-Chair
CE Linux Forum
Senior Staff Engineer
Sony Electronics
E-mail: Tim.Bird@am.sony.com
=============================

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-18  0:19     ` Andrew Morton
  2004-02-18 12:51       ` Arjan van de Ven
@ 2004-02-19 20:56       ` Daniel Phillips
  2004-02-19 22:06         ` Stephen C. Tweedie
  1 sibling, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-19 20:56 UTC (permalink / raw)
  To: Andrew Morton, paulmck; +Cc: hch, linux-kernel, linux-mm

On Tuesday 17 February 2004 19:19, Andrew Morton wrote:
> I don't see any licensing issues with the patch because the filesystem
> which needs it clearly meets Linus's "this is not a derived work" criteria.
>
> And I don't see a technical problem with the export: given that we export
> truncate_inode_pages() it makes sense to also export the corresponding
> pagetable shootdown function.
>
> Yes, this is a sensitive issue.  Can we please evaluate it strictly
> according to technical and licensing considerations?
>
> Having said that, what concerns issues remain with Paul's patch?

Hi Andrew,

OpenGFS and Sistina GFS use zap_page_range directly, essentially doing the 
same as invalidate_mmap_range but skipping any vmas belonging to MAP_PRIVATE 
mmaps.  This avoids destroying data on anon pages.  GPFS and every other DFS 
have the same problem as far as I can see, and it isn't addressed by 
exporting invalidate_mmap_range as it stands.  Paul?

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 20:56       ` Daniel Phillips
@ 2004-02-19 22:06         ` Stephen C. Tweedie
  2004-02-19 22:31           ` Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Stephen C. Tweedie @ 2004-02-19 22:06 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Andrew Morton, Paul E. McKenney, Christoph Hellwig, linux-kernel,
	linux-mm, Stephen Tweedie

Hi,

On Thu, 2004-02-19 at 20:56, Daniel Phillips wrote:

> OpenGFS and Sistina GFS use zap_page_range directly, essentially doing the 
> same as invalidate_mmap_range but skipping any vmas belonging to MAP_PRIVATE 
> mmaps.

Well, MAP_PRIVATE maps can contain shared pages too --- any page in a
MAP_PRIVATE map that has been mapped but not yet written to is still
shared, and still needs shot down on truncate().

--Stephen


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 22:06         ` Stephen C. Tweedie
@ 2004-02-19 22:31           ` Daniel Phillips
  2004-02-19 16:42             ` Paul E. McKenney
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-19 22:31 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Andrew Morton, Paul E. McKenney, Christoph Hellwig, linux-kernel,
	linux-mm

Hi Stephen,

On Thursday 19 February 2004 17:06, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 2004-02-19 at 20:56, Daniel Phillips wrote:
> > OpenGFS and Sistina GFS use zap_page_range directly, essentially doing
> > the same as invalidate_mmap_range but skipping any vmas belonging to
> > MAP_PRIVATE mmaps.
>
> Well, MAP_PRIVATE maps can contain shared pages too --- any page in a
> MAP_PRIVATE map that has been mapped but not yet written to is still
> shared, and still needs shot down on truncate().

Exactly, and we ought to take this opportunity to do that properly, which is 
easy.  I'm just curious how GPFS deals with this issue, or if it simply 
doesn't support MAP_PRIVATE.

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 22:31           ` Daniel Phillips
@ 2004-02-19 16:42             ` Paul E. McKenney
  2004-02-20  2:06               ` Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19 16:42 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Thu, Feb 19, 2004 at 05:31:33PM -0500, Daniel Phillips wrote:
> Hi Stephen,
> 
> On Thursday 19 February 2004 17:06, Stephen C. Tweedie wrote:
> > Hi,
> >
> > On Thu, 2004-02-19 at 20:56, Daniel Phillips wrote:
> > > OpenGFS and Sistina GFS use zap_page_range directly, essentially doing
> > > the same as invalidate_mmap_range but skipping any vmas belonging to
> > > MAP_PRIVATE mmaps.
> >
> > Well, MAP_PRIVATE maps can contain shared pages too --- any page in a
> > MAP_PRIVATE map that has been mapped but not yet written to is still
> > shared, and still needs shot down on truncate().
> 
> Exactly, and we ought to take this opportunity to do that properly, which is 
> easy.  I'm just curious how GPFS deals with this issue, or if it simply 
> doesn't support MAP_PRIVATE.

GPFS supports MAP_PRIVATE, but does not specify the behavior if you
change the underlying file.  There are a number of things one can do,
but one must keep in mind that different processes can MAP_PRIVATE the
same file at different times, and that some processes might MAP_SHARED it
at the same time that others MAP_PRIVATE it.  Here are the alternatives
I can imagine:

1.	Any time a file changes, create a copy of the old version
	for any MAP_PRIVATE vmas.  This would essentially create
	a point-in-time copy of any file that a process mapped
	MAP_PRIVATE.  This is arguably the most intuitive from the
	user's standpoint, but (a) it would not be a small change and
	(b) I haven't heard of anyone coming up with a good use for it.
	Please enlighten me if I am missing a simple implementation or
	compelling uses.

2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
	as suggested by Daniel.  This would mean that a
	process that had mapped a file MAP_PRIVATE and faulted
	in parts of it would see different versions of the file
	in different pages.  This should be straightforward to
	implement, but in what situation is this skewed view of
	the file useful?

3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
	but invalidate those pages in the vma that have not yet been
	modified (that are not anonymous) as suggested by Stephen.
	This would mean that a process that had mapped a file MAP_PRIVATE
	and written on parts of it would see different versions of the
	file in different pages.  Again, in what situation is this skewed
	view of the file useful?

5.	The current behavior, where the process's writes do not
	flow through to the file, but all changes to the file are
	visible to the writing process.

6.	Requiring that MAP_PRIVATE be applied only to unchanging
	files, so that (for example) any change to the underlying
	file removes that file from any MAP_PRIVATE address spaces.
	Subsequent accesses would get a SEGV, rather than a
	surprise from silently changing data.

So, please help me out here...  What do applications that MAP_PRIVATE
changing files really expect to happen?

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 16:42             ` Paul E. McKenney
@ 2004-02-20  2:06               ` Daniel Phillips
  2004-02-19 19:47                 ` Paul E. McKenney
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-20  2:06 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> change the underlying file.  There are a number of things one can do,
> but one must keep in mind that different processes can MAP_PRIVATE the
> same file at different times, and that some processes might MAP_SHARED it
> at the same time that others MAP_PRIVATE it.  Here are the alternatives
> I can imagine:
>
> 1.	Any time a file changes, create a copy of the old version
> 	for any MAP_PRIVATE vmas.  This would essentially create
> 	a point-in-time copy of any file that a process mapped
> 	MAP_PRIVATE.  This is arguably the most intuitive from the
> 	user's standpoint, but (a) it would not be a small change and
> 	(b) I haven't heard of anyone coming up with a good use for it.
> 	Please enlighten me if I am missing a simple implementation or
> 	compelling uses.

This is MAP_COPY I think.  Even if somebody did manage to sneak it by Linus 
one day it would certainly not be under the guise of MAP_PRIVATE.

> 2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> 	as suggested by Daniel.

I did not suggest that, rather I described the existing practice in OpenGFS 
and Sistina GFS, which at least does not destroy anonymous data.  The correct 
behaviour is the one you describe in option 3, and we are perfectly willing 
to change GFS to obtain that behaviour.  To be precise: I suggest we change 
invalidate_mmap_range to skip anon pages, and change vmtruncate to use 
something else, having the current semantics.

As a historical note: the behavior GFS obtains from option 2 is 
Posix-compliant, but falls short of Linus-compliance, who insists on 
completely accurate invalidation behavior as is right and proper.

> 	This would mean that a
> 	process that had mapped a file MAP_PRIVATE and faulted
> 	in parts of it would see different versions of the file
> 	in different pages.  This should be straightforward to
> 	implement, but in what situation is this skewed view of
> 	the file useful?

You've got me there ;)  However, Posix explicitly blesses this sloppy 
behaviour.  I suppose that with additional user space locking, applications 
could make it work reliably.  But it's still sloppy, and worse, it's 
different from Linux's local filesystem behaviour.

> 3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> 	but invalidate those pages in the vma that have not yet been
> 	modified (that are not anonymous) as suggested by Stephen.
> 	This would mean that a process that had mapped a file MAP_PRIVATE
> 	and written on parts of it would see different versions of the
> 	file in different pages.

This is the correct behaviour and is the current behaviour for local 
filesystems.  In particular, all processes on all nodes will see the current 
contents of any file page that they have not yet faulted in, as of the last 
time any process wrote that file page via mmap or otherwise.

Our goal for GFS, and the goal I'd like to hold up as definitive for any 
distributed filesystem, is to imitate local filesystem semantics exactly, 
even across the cluster.

> Again, in what situation is this skewed view of the file useful?

It's not skewed in any way that I can see.  Though I am no linker expert, I 
dimly recall that these are precisely the semantics ld relies on.

> 5.	The current behavior, where the process's writes do not
> 	flow through to the file, but all changes to the file are
> 	visible to the writing process.

We all agree that's broken, I hope.

> 6.	Requiring that MAP_PRIVATE be applied only to unchanging
> 	files, so that (for example) any change to the underlying
> 	file removes that file from any MAP_PRIVATE address spaces.
> 	Subsequent accesses would get a SEGV, rather than a
> 	surprise from silently changing data.

Creative :)  Well, data that changes "silently" is a fact of life whenever 
data is shared.  It's up to applications to ensure that shared data changes 
predictably.

> So, please help me out here...  What do applications that MAP_PRIVATE
> changing files really expect to happen?

Number 3, is that ok with you?  Incidently, your list doesn't include the 
semantics we'd get by just exporting and using invalidate_mmap_range.  I 
presume that is because you agree it's not correct (it will clobber CoWed 
anonymous pages).

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20  2:06               ` Daniel Phillips
@ 2004-02-19 19:47                 ` Paul E. McKenney
  2004-02-20  5:07                   ` Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-19 19:47 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Thu, Feb 19, 2004 at 09:06:55PM -0500, Daniel Phillips wrote:
> On Thursday 19 February 2004 11:42, Paul E. McKenney wrote:
> > GPFS supports MAP_PRIVATE, but does not specify the behavior if you
> > change the underlying file.  There are a number of things one can do,
> > but one must keep in mind that different processes can MAP_PRIVATE the
> > same file at different times, and that some processes might MAP_SHARED it
> > at the same time that others MAP_PRIVATE it.  Here are the alternatives
> > I can imagine:
> >
> > 1.	Any time a file changes, create a copy of the old version
> > 	for any MAP_PRIVATE vmas.  This would essentially create
> > 	a point-in-time copy of any file that a process mapped
> > 	MAP_PRIVATE.  This is arguably the most intuitive from the
> > 	user's standpoint, but (a) it would not be a small change and
> > 	(b) I haven't heard of anyone coming up with a good use for it.
> > 	Please enlighten me if I am missing a simple implementation or
> > 	compelling uses.
> 
> This is MAP_COPY I think.  Even if somebody did manage to sneak it by Linus 
> one day it would certainly not be under the guise of MAP_PRIVATE.

Whew!  That is a relief!!!  ;-)

> > 2.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas.
> > 	as suggested by Daniel.
> 
> I did not suggest that, rather I described the existing practice in OpenGFS 
> and Sistina GFS, which at least does not destroy anonymous data.  The correct 
> behaviour is the one you describe in option 3, and we are perfectly willing 
> to change GFS to obtain that behaviour.  To be precise: I suggest we change 
> invalidate_mmap_range to skip anon pages, and change vmtruncate to use 
> something else, having the current semantics.
> 
> As a historical note: the behavior GFS obtains from option 2 is 
> Posix-compliant, but falls short of Linus-compliance, who insists on 
> completely accurate invalidation behavior as is right and proper.

OK, this is the OpenGFS zap_inode_mapping(), right?

> > 	This would mean that a
> > 	process that had mapped a file MAP_PRIVATE and faulted
> > 	in parts of it would see different versions of the file
> > 	in different pages.  This should be straightforward to
> > 	implement, but in what situation is this skewed view of
> > 	the file useful?
> 
> You've got me there ;)  However, Posix explicitly blesses this sloppy 
> behaviour.  I suppose that with additional user space locking, applications 
> could make it work reliably.  But it's still sloppy, and worse, it's 
> different from Linux's local filesystem behaviour.

;-)

> > 3.	Modify invalidate_mmap_range() to leave MAP_PRIVATE vmas,
> > 	but invalidate those pages in the vma that have not yet been
> > 	modified (that are not anonymous) as suggested by Stephen.
> > 	This would mean that a process that had mapped a file MAP_PRIVATE
> > 	and written on parts of it would see different versions of the
> > 	file in different pages.
> 
> This is the correct behaviour and is the current behaviour for local 
> filesystems.  In particular, all processes on all nodes will see the current 
> contents of any file page that they have not yet faulted in, as of the last 
> time any process wrote that file page via mmap or otherwise.
> 
> Our goal for GFS, and the goal I'd like to hold up as definitive for any 
> distributed filesystem, is to imitate local filesystem semantics exactly, 
> even across the cluster.

OK, I surrender.  I got some private email agreeing with this
viewpoint.  Any dissenters, speak soon, or...

> > Again, in what situation is this skewed view of the file useful?
> 
> It's not skewed in any way that I can see.  Though I am no linker expert, I 
> dimly recall that these are precisely the semantics ld relies on.

I thought that the linker relied on people refraining (or being
prevented) from updating executables while they are in use.
But I am also no linker expert.

> > 5.	The current behavior, where the process's writes do not
> > 	flow through to the file, but all changes to the file are
> > 	visible to the writing process.
> 
> We all agree that's broken, I hope.

I can buy DFSes implementing semantics that are the same as local
filesystems.  But no one has yet shown me anything that it breaks!

> > 6.	Requiring that MAP_PRIVATE be applied only to unchanging
> > 	files, so that (for example) any change to the underlying
> > 	file removes that file from any MAP_PRIVATE address spaces.
> > 	Subsequent accesses would get a SEGV, rather than a
> > 	surprise from silently changing data.
> 
> Creative :)  Well, data that changes "silently" is a fact of life whenever 
> data is shared.  It's up to applications to ensure that shared data changes 
> predictably.

Glad you liked it.  ;-)

I think that predictability when using MAP_PRIVATE requires that one
refrain from modifying the underlying file while someone has it mmap()ed
with MAP_PRIVATE.  I would welcome an example proving me wrong.

> > So, please help me out here...  What do applications that MAP_PRIVATE
> > changing files really expect to happen?
> 
> Number 3, is that ok with you?  Incidently, your list doesn't include the 
> semantics we'd get by just exporting and using invalidate_mmap_range.  I 
> presume that is because you agree it's not correct (it will clobber CoWed 
> anonymous pages).

I will give it a shot, though I would still like to hear about examples
where the difference in semantics affects a real application.
BTW, my list didn't include exporting and using the current
invalidate_mmap_range() because I didn't say what I meant to say.
Hate it when that happens!  ;-)

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-19 19:47                 ` Paul E. McKenney
@ 2004-02-20  5:07                   ` Daniel Phillips
  2004-02-20 12:02                     ` Paul E. McKenney
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-20  5:07 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Thursday 19 February 2004 14:47, Paul E. McKenney wrote:
> OK, I surrender.  I got some private email agreeing with this
> viewpoint.  Any dissenters, speak soon, or...

An implementation is going to look something like the patch below. 
Unfortunately I don't think there is a way around passing an extra parameter
all the way down the unmap call chain.  Doubly unfortunately, this doesn't
give any benefit at all to anybody who doesn't use a clustered filesystem
(which is nearly everybody) while there is a marginal cost.  Do you know a
better way?  Anyway, this is the price of correct MAP_PRIVATE semantics for
clustered filesystems.  At least I have quantified it so we can decide if it's
worth it.  (My opinion: correctness is always worth it.)

Regards,

Daniel

--- 2.6.3.clean/include/linux/mm.h	2004-02-17 22:57:13.000000000 -0500
+++ 2.6.3/include/linux/mm.h	2004-02-19 23:18:08.000000000 -0500
@@ -434,9 +434,7 @@
 			unsigned long size);
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *start_vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted);
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long size);
+		unsigned long end_addr, unsigned long *nr_accounted, int zap);
 void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
@@ -444,8 +442,7 @@
 			unsigned long size, pgprot_t prot);
 
 extern void invalidate_mmap_range(struct address_space *mapping,
-				  loff_t const holebegin,
-				  loff_t const holelen);
+			  loff_t const holebegin,  loff_t const holelen, int zap);
 extern int vmtruncate(struct inode * inode, loff_t offset);
 extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
--- 2.6.3.clean/mm/memory.c	2004-02-17 22:57:47.000000000 -0500
+++ 2.6.3/mm/memory.c	2004-02-19 23:48:23.000000000 -0500
@@ -386,7 +386,7 @@
 
 static void
 zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
-		unsigned long address, unsigned long size)
+		unsigned long address, unsigned long size, int zap)
 {
 	unsigned long offset;
 	pte_t *ptep;
@@ -414,7 +414,7 @@
 			tlb_remove_tlb_entry(tlb, ptep, address+offset);
 			if (pfn_valid(pfn)) {
 				struct page *page = pfn_to_page(pfn);
-				if (!PageReserved(page)) {
+				if (!PageReserved(page) && (zap || (page->mapping && !PageSwapCache(page)))) {
 					if (pte_dirty(pte))
 						set_page_dirty(page);
 					if (page->mapping && pte_young(pte) &&
@@ -436,7 +436,7 @@
 
 static void
 zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
-		unsigned long address, unsigned long size)
+		unsigned long address, unsigned long size, int zap)
 {
 	pmd_t * pmd;
 	unsigned long end;
@@ -453,14 +453,14 @@
 	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
 		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
 	do {
-		zap_pte_range(tlb, pmd, address, end - address);
-		address = (address + PMD_SIZE) & PMD_MASK; 
+		zap_pte_range(tlb, pmd, address, end - address, zap);
+		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
 }
 
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long end)
+static void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+			unsigned long address, unsigned long end, int zap)
 {
 	pgd_t * dir;
 
@@ -474,7 +474,7 @@
 	dir = pgd_offset(vma->vm_mm, address);
 	tlb_start_vma(tlb, vma);
 	do {
-		zap_pmd_range(tlb, dir, address, end - address);
+		zap_pmd_range(tlb, dir, address, end - address, zap);
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
@@ -524,7 +524,7 @@
  */
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted)
+		unsigned long end_addr, unsigned long *nr_accounted, int zap)
 {
 	unsigned long zap_bytes = ZAP_BLOCK_SIZE;
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
@@ -568,7 +568,7 @@
 				tlb_start_valid = 1;
 			}
 
-			unmap_page_range(*tlbp, vma, start, start + block);
+			unmap_page_range(*tlbp, vma, start, start + block, zap);
 			start += block;
 			zap_bytes -= block;
 			if ((long)zap_bytes > 0)
@@ -594,8 +594,8 @@
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
  */
-void zap_page_range(struct vm_area_struct *vma,
-			unsigned long address, unsigned long size)
+void invalidate_page_range(struct vm_area_struct *vma,
+			unsigned long address, unsigned long size, int zap)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather *tlb;
@@ -612,11 +612,17 @@
 	lru_add_drain();
 	spin_lock(&mm->page_table_lock);
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, zap);
 	tlb_finish_mmu(tlb, address, end);
 	spin_unlock(&mm->page_table_lock);
 }
 
+void zap_page_range(struct vm_area_struct *vma,
+			unsigned long address, unsigned long size)
+{
+	invalidate_page_range(vma, address, size, 1);
+}
+
 /*
  * Do a quick page-table lookup for a single page.
  * mm->page_table_lock must be held.
@@ -1095,9 +1101,9 @@
 		    	continue;	/* Mapping disjoint from hole. */
 		zba = (hba <= vba) ? vba : hba;
 		zea = (vea <= hea) ? vea : hea;
-		zap_page_range(vp,
+		invalidate_page_range(vp,
 			       ((zba - vba) << PAGE_SHIFT) + vp->vm_start,
-			       (zea - zba + 1) << PAGE_SHIFT);
+			       (zea - zba + 1) << PAGE_SHIFT, 1);
 	}
 }
 
@@ -1116,7 +1122,7 @@
  * end of the file.
  */
 void invalidate_mmap_range(struct address_space *mapping,
-		      loff_t const holebegin, loff_t const holelen)
+		      loff_t const holebegin, loff_t const holelen, int zap)
 {
 	unsigned long hba = holebegin >> PAGE_SHIFT;
 	unsigned long hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1156,7 +1162,7 @@
 	if (inode->i_size < offset)
 		goto do_expand;
 	i_size_write(inode, offset);
-	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0);
+	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
 	truncate_inode_pages(mapping, offset);
 	goto out_truncate;
 
--- 2.6.3.clean/mm/mmap.c	2004-02-17 22:58:32.000000000 -0500
+++ 2.6.3/mm/mmap.c	2004-02-19 22:46:01.000000000 -0500
@@ -1134,7 +1134,7 @@
 
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 
 	if (is_hugepage_only_range(start, end - start))
@@ -1436,7 +1436,7 @@
 	flush_cache_mm(mm);
 	/* Use ~0UL here to ensure all VMAs in the mm are unmapped */
 	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
-					~0UL, &nr_accounted);
+					~0UL, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
 	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20  5:07                   ` Daniel Phillips
@ 2004-02-20 12:02                     ` Paul E. McKenney
  2004-02-20 20:37                       ` Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-20 12:02 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Fri, Feb 20, 2004 at 12:07:25AM -0500, Daniel Phillips wrote:
> On Thursday 19 February 2004 14:47, Paul E. McKenney wrote:
> > OK, I surrender.  I got some private email agreeing with this
> > viewpoint.  Any dissenters, speak soon, or...
> 
> An implementation is going to look something like the patch below. 
> Unfortunately I don't think there is a way around passing an extra parameter
> all the way down the unmap call chain.  Doubly unfortunately, this doesn't
> give any benefit at all to anybody who doesn't use a clustered filesystem
> (which is nearly everybody) while there is a marginal cost.  Do you know a
> better way?  Anyway, this is the price of correct MAP_PRIVATE semantics for
> clustered filesystems.  At least I have quantified it so we can decide if it's
> worth it.  (My opinion: correctness is always worth it.)

"My work is done!"  ;-)

Almost, anyway.  A few comments interspersed.  This would be in
addition to invalidate_mmap_range-non-gpl-export.patch, right?

I cannot think of any reasonable alternative to passing the parameter
down either, as it certainly does not be reasonable to duplicate the
code...

						Thanx, Paul

> Regards,
> 
> Daniel
> 
> --- 2.6.3.clean/include/linux/mm.h	2004-02-17 22:57:13.000000000 -0500
> +++ 2.6.3/include/linux/mm.h	2004-02-19 23:18:08.000000000 -0500
> @@ -434,9 +434,7 @@
>  			unsigned long size);
>  int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
>  		struct vm_area_struct *start_vma, unsigned long start_addr,
> -		unsigned long end_addr, unsigned long *nr_accounted);
> -void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> -			unsigned long address, unsigned long size);
> +		unsigned long end_addr, unsigned long *nr_accounted, int zap);

How about something like "private_too" instead of "zap"?

(Ah!  unmap_page_range() converted to static, since it is used only
in memory.c.)

>  void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
>  int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
>  			struct vm_area_struct *vma);
> @@ -444,8 +442,7 @@
>  			unsigned long size, pgprot_t prot);
>  
>  extern void invalidate_mmap_range(struct address_space *mapping,
> -				  loff_t const holebegin,
> -				  loff_t const holelen);
> +			  loff_t const holebegin,  loff_t const holelen, int zap);
>  extern int vmtruncate(struct inode * inode, loff_t offset);
>  extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
>  extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
> --- 2.6.3.clean/mm/memory.c	2004-02-17 22:57:47.000000000 -0500
> +++ 2.6.3/mm/memory.c	2004-02-19 23:48:23.000000000 -0500
> @@ -386,7 +386,7 @@
>  
>  static void
>  zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
> -		unsigned long address, unsigned long size)
> +		unsigned long address, unsigned long size, int zap)
>  {
>  	unsigned long offset;
>  	pte_t *ptep;
> @@ -414,7 +414,7 @@
>  			tlb_remove_tlb_entry(tlb, ptep, address+offset);
>  			if (pfn_valid(pfn)) {
>  				struct page *page = pfn_to_page(pfn);
> -				if (!PageReserved(page)) {
> +				if (!PageReserved(page) && (zap || (page->mapping && !PageSwapCache(page)))) {

Longish line...

>  					if (pte_dirty(pte))
>  						set_page_dirty(page);
>  					if (page->mapping && pte_young(pte) &&
> @@ -436,7 +436,7 @@
>  
>  static void
>  zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
> -		unsigned long address, unsigned long size)
> +		unsigned long address, unsigned long size, int zap)
>  {
>  	pmd_t * pmd;
>  	unsigned long end;
> @@ -453,14 +453,14 @@
>  	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
>  		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
>  	do {
> -		zap_pte_range(tlb, pmd, address, end - address);
> -		address = (address + PMD_SIZE) & PMD_MASK; 
> +		zap_pte_range(tlb, pmd, address, end - address, zap);
> +		address = (address + PMD_SIZE) & PMD_MASK;
>  		pmd++;
>  	} while (address < end);
>  }
>  
> -void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> -			unsigned long address, unsigned long end)
> +static void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> +			unsigned long address, unsigned long end, int zap)
>  {
>  	pgd_t * dir;
>  
> @@ -474,7 +474,7 @@
>  	dir = pgd_offset(vma->vm_mm, address);
>  	tlb_start_vma(tlb, vma);
>  	do {
> -		zap_pmd_range(tlb, dir, address, end - address);
> +		zap_pmd_range(tlb, dir, address, end - address, zap);
>  		address = (address + PGDIR_SIZE) & PGDIR_MASK;
>  		dir++;
>  	} while (address && (address < end));
> @@ -524,7 +524,7 @@
>   */
>  int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
>  		struct vm_area_struct *vma, unsigned long start_addr,
> -		unsigned long end_addr, unsigned long *nr_accounted)
> +		unsigned long end_addr, unsigned long *nr_accounted, int zap)
>  {
>  	unsigned long zap_bytes = ZAP_BLOCK_SIZE;
>  	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
> @@ -568,7 +568,7 @@
>  				tlb_start_valid = 1;
>  			}
>  
> -			unmap_page_range(*tlbp, vma, start, start + block);
> +			unmap_page_range(*tlbp, vma, start, start + block, zap);
>  			start += block;
>  			zap_bytes -= block;
>  			if ((long)zap_bytes > 0)
> @@ -594,8 +594,8 @@
>   * @address: starting address of pages to zap
>   * @size: number of bytes to zap
>   */
> -void zap_page_range(struct vm_area_struct *vma,
> -			unsigned long address, unsigned long size)
> +void invalidate_page_range(struct vm_area_struct *vma,

Would it be useful for this to be inline?  (Wouldn't seem so,
zapping mappings has enough overhead that an extra level of
function call should be deep down in the noise...)

> +			unsigned long address, unsigned long size, int zap)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct mmu_gather *tlb;
> @@ -612,11 +612,17 @@
>  	lru_add_drain();
>  	spin_lock(&mm->page_table_lock);
>  	tlb = tlb_gather_mmu(mm, 0);
> -	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
> +	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, zap);
>  	tlb_finish_mmu(tlb, address, end);
>  	spin_unlock(&mm->page_table_lock);
>  }
>  
> +void zap_page_range(struct vm_area_struct *vma,
> +			unsigned long address, unsigned long size)
> +{
> +	invalidate_page_range(vma, address, size, 1);
> +}
> +
>  /*
>   * Do a quick page-table lookup for a single page.
>   * mm->page_table_lock must be held.
> @@ -1095,9 +1101,9 @@
>  		    	continue;	/* Mapping disjoint from hole. */
>  		zba = (hba <= vba) ? vba : hba;
>  		zea = (vea <= hea) ? vea : hea;
> -		zap_page_range(vp,
> +		invalidate_page_range(vp,
>  			       ((zba - vba) << PAGE_SHIFT) + vp->vm_start,
> -			       (zea - zba + 1) << PAGE_SHIFT);
> +			       (zea - zba + 1) << PAGE_SHIFT, 1);
>  	}
>  }
>  
> @@ -1116,7 +1122,7 @@
>   * end of the file.
>   */
>  void invalidate_mmap_range(struct address_space *mapping,
> -		      loff_t const holebegin, loff_t const holelen)
> +		      loff_t const holebegin, loff_t const holelen, int zap)
>  {
>  	unsigned long hba = holebegin >> PAGE_SHIFT;
>  	unsigned long hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;

Doesn't the new argument need to be passed down through
invalidate_mmap_range_list()?

> @@ -1156,7 +1162,7 @@
>  	if (inode->i_size < offset)
>  		goto do_expand;
>  	i_size_write(inode, offset);
> -	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0);
> +	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
>  	truncate_inode_pages(mapping, offset);
>  	goto out_truncate;
>  
> --- 2.6.3.clean/mm/mmap.c	2004-02-17 22:58:32.000000000 -0500
> +++ 2.6.3/mm/mmap.c	2004-02-19 22:46:01.000000000 -0500
> @@ -1134,7 +1134,7 @@
>  
>  	lru_add_drain();
>  	tlb = tlb_gather_mmu(mm, 0);
> -	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted);
> +	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, 1);
>  	vm_unacct_memory(nr_accounted);
>  
>  	if (is_hugepage_only_range(start, end - start))
> @@ -1436,7 +1436,7 @@
>  	flush_cache_mm(mm);
>  	/* Use ~0UL here to ensure all VMAs in the mm are unmapped */
>  	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
> -					~0UL, &nr_accounted);
> +					~0UL, &nr_accounted, 1);
>  	vm_unacct_memory(nr_accounted);
>  	BUG_ON(mm->map_count);	/* This is just debugging */
>  	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
> 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 12:02                     ` Paul E. McKenney
@ 2004-02-20 20:37                       ` Daniel Phillips
  2004-02-20 14:01                         ` Paul E. McKenney
  2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
  0 siblings, 2 replies; 68+ messages in thread
From: Daniel Phillips @ 2004-02-20 20:37 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

Hi Paul,

> I cannot think of any reasonable alternative to passing the parameter
> down either, as it certainly does not be reasonable to duplicate the
> code...

Yes, it's simply the (small) price that has to be paid in order to be able to 
boast about our accurate semantics.

> How about something like "private_too" instead of "zap"?

How about just "all", which is what we mean.

> > -void zap_page_range(struct vm_area_struct *vma,
> > -			unsigned long address, unsigned long size)
> > +void invalidate_page_range(struct vm_area_struct *vma,
>
> Would it be useful for this to be inline?  (Wouldn't seem so,
> zapping mappings has enough overhead that an extra level of
> function call should be deep down in the noise...)

Yes, it doesn't seem worth it just to save a stack frame.

Actually, I erred there in that invalidate_mmap_range should not export the 
flag, because it never makes sense to pass in non-zero from a DFS.

> Doesn't the new argument need to be passed down through
> invalidate_mmap_range_list()?

It does, thanks for the catch.  Please bear with me for a moment while I 
reroll this, then hopefully we can move on to the more interesting discussion 
of whether it's worth it.  (Yes it is :)

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 20:37                       ` Daniel Phillips
@ 2004-02-20 14:01                         ` Paul E. McKenney
  2004-02-20 23:00                           ` Daniel Phillips
  2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
  1 sibling, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-20 14:01 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Fri, Feb 20, 2004 at 03:37:26PM -0500, Daniel Phillips wrote:
> Hi Paul,
> 
> > I cannot think of any reasonable alternative to passing the parameter
> > down either, as it certainly does not be reasonable to duplicate the
> > code...
> 
> Yes, it's simply the (small) price that has to be paid in order to be able to 
> boast about our accurate semantics.

;-)

> > How about something like "private_too" instead of "zap"?
> 
> How about just "all", which is what we mean.

Fair enough, certainly keeps a few more lines of code within 80 columns.

> > > -void zap_page_range(struct vm_area_struct *vma,
> > > -			unsigned long address, unsigned long size)
> > > +void invalidate_page_range(struct vm_area_struct *vma,
> >
> > Would it be useful for this to be inline?  (Wouldn't seem so,
> > zapping mappings has enough overhead that an extra level of
> > function call should be deep down in the noise...)
> 
> Yes, it doesn't seem worth it just to save a stack frame.
> 
> Actually, I erred there in that invalidate_mmap_range should not export the 
> flag, because it never makes sense to pass in non-zero from a DFS.

Doesn't vmtruncate() want to pass non-zero "all" in to
invalidate_mmap_range() in order to maintain compatibility with existing
Linux semantics?

> > Doesn't the new argument need to be passed down through
> > invalidate_mmap_range_list()?
> 
> It does, thanks for the catch.  Please bear with me for a moment while I 
> reroll this, then hopefully we can move on to the more interesting discussion 
> of whether it's worth it.  (Yes it is :)

;-)

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 14:01                         ` Paul E. McKenney
@ 2004-02-20 23:00                           ` Daniel Phillips
  2004-02-20 16:17                             ` Paul E. McKenney
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-20 23:00 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Friday 20 February 2004 09:01, Paul E. McKenney wrote:
> On Fri, Feb 20, 2004 at 03:37:26PM -0500, Daniel Phillips wrote:
> > Actually, I erred there in that invalidate_mmap_range should not export
> > the flag, because it never makes sense to pass in non-zero from a DFS.
>
> Doesn't vmtruncate() want to pass non-zero "all" in to
> invalidate_mmap_range() in order to maintain compatibility with existing
> Linux semantics?

That comes from inside.  The DFS's truncate interface should just be 
vmtruncate.  If I missed something, please shout.

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 23:00                           ` Daniel Phillips
@ 2004-02-20 16:17                             ` Paul E. McKenney
  2004-02-21  3:19                               ` Daniel Phillips
  2004-02-21 19:00                               ` Daniel Phillips
  0 siblings, 2 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-20 16:17 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Fri, Feb 20, 2004 at 06:00:32PM -0500, Daniel Phillips wrote:
> On Friday 20 February 2004 09:01, Paul E. McKenney wrote:
> > On Fri, Feb 20, 2004 at 03:37:26PM -0500, Daniel Phillips wrote:
> > > Actually, I erred there in that invalidate_mmap_range should not export
> > > the flag, because it never makes sense to pass in non-zero from a DFS.
> >
> > Doesn't vmtruncate() want to pass non-zero "all" in to
> > invalidate_mmap_range() in order to maintain compatibility with existing
> > Linux semantics?
> 
> That comes from inside.  The DFS's truncate interface should just be 
> vmtruncate.  If I missed something, please shout.

Agreed, the DFS's truncate interface should be vmtruncate().

Your earlier patch has a call to invalidate_mmap_range() within
vmtruncate(), which passes "1" to the last arg, so as to get
rid of all mappings to the truncated portion of the file.
So either invalidate_mmap_range() needs to keep the fourth arg
or needs to be a wrapper for an underlying function that
vmtruncate() can call, or some such.

The latter may be what you intended to do.

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 16:17                             ` Paul E. McKenney
@ 2004-02-21  3:19                               ` Daniel Phillips
  2004-02-21 19:00                               ` Daniel Phillips
  1 sibling, 0 replies; 68+ messages in thread
From: Daniel Phillips @ 2004-02-21  3:19 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Friday 20 February 2004 11:17, Paul E. McKenney wrote:
> Your earlier patch has a call to invalidate_mmap_range() within
> vmtruncate(), which passes "1" to the last arg, so as to get
> rid of all mappings to the truncated portion of the file.
> So either invalidate_mmap_range() needs to keep the fourth arg
> or needs to be a wrapper for an underlying function that
> vmtruncate() can call, or some such.
>
> The latter may be what you intended to do.

Yes, modulo nobody coming up with a legitimate use for the fourth argument.

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 16:17                             ` Paul E. McKenney
  2004-02-21  3:19                               ` Daniel Phillips
@ 2004-02-21 19:00                               ` Daniel Phillips
  2004-02-22 23:39                                 ` Paul E. McKenney
  1 sibling, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-21 19:00 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

Hi Paul et al,

Here is an updated patch.  The name of the exported function is changed to
"invalidate_filemap_range" to reflect the fact that only file-backed pages are
invalidated, and to distinguish the three parameter flavour from the four
parameter version called from vmtruncate.  The inner loop in zap_pte_range is
hopefully correct now.

While I'm in here, why is the assignment "pte =" at line 411 of memory.c not
redundant?

   http://lxr.linux.no/source/mm/memory.c?v=2.6.1#L411

As far as I can see, the ->filemap spinlock protects the pte from modification
and pte was already assigned at line 405.

Anyway, we can now see that the full cost of this DFS-specific feature in the inner
loop is a single (unlikely) branch.

I'll repeat my proposition here: providing local filesystem semantics for
MAP_PRIVATE on any distributed filesystem requires these decorations on the
unmap path.  Though there is no benefit for local filesystems, the cost is
insignificant.

Regards,

Daniel

--- 2.6.3.clean/include/linux/mm.h	2004-02-17 22:57:13.000000000 -0500
+++ 2.6.3/include/linux/mm.h	2004-02-21 12:59:16.000000000 -0500
@@ -430,23 +430,23 @@
 void shmem_lock(struct file * file, int lock);
 int shmem_zero_setup(struct vm_area_struct *);
 
-void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-			unsigned long size);
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *start_vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted);
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long size);
+		unsigned long end_addr, unsigned long *nr_accounted, int zap);
 void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
 			unsigned long size, pgprot_t prot);
-
-extern void invalidate_mmap_range(struct address_space *mapping,
-				  loff_t const holebegin,
-				  loff_t const holelen);
+extern void invalidate_filemap_range(struct address_space *mapping, loff_t const start, loff_t const length);
 extern int vmtruncate(struct inode * inode, loff_t offset);
+void invalidate_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, int all);
+
+static inline void zap_page_range(struct vm_area_struct *vma, ulong address, ulong size)
+{
+	invalidate_page_range(vma, address, size, 1);
+}
+
 extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
--- 2.6.3.clean/mm/memory.c	2004-02-17 22:57:47.000000000 -0500
+++ 2.6.3/mm/memory.c	2004-02-21 13:23:36.000000000 -0500
@@ -384,9 +384,13 @@
 	return -ENOMEM;
 }
 
-static void
-zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
-		unsigned long address, unsigned long size)
+static inline int is_anon(struct page *page)
+{
+	return !page->mapping || PageSwapCache(page);
+}
+
+static void zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
+		unsigned long address, unsigned long size, int all)
 {
 	unsigned long offset;
 	pte_t *ptep;
@@ -409,7 +413,8 @@
 			continue;
 		if (pte_present(pte)) {
 			unsigned long pfn = pte_pfn(pte);
-
+			if (unlikely(!all) && is_anon(pfn_to_page(pfn)))
+				continue;
 			pte = ptep_get_and_clear(ptep);
 			tlb_remove_tlb_entry(tlb, ptep, address+offset);
 			if (pfn_valid(pfn)) {
@@ -426,7 +431,7 @@
 				}
 			}
 		} else {
-			if (!pte_file(pte))
+			if (!pte_file(pte) && all)
 				free_swap_and_cache(pte_to_swp_entry(pte));
 			pte_clear(ptep);
 		}
@@ -434,9 +439,8 @@
 	pte_unmap(ptep-1);
 }
 
-static void
-zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
-		unsigned long address, unsigned long size)
+static void zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
+		unsigned long address, unsigned long size, int all)
 {
 	pmd_t * pmd;
 	unsigned long end;
@@ -453,14 +457,14 @@
 	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
 		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
 	do {
-		zap_pte_range(tlb, pmd, address, end - address);
-		address = (address + PMD_SIZE) & PMD_MASK; 
+		zap_pte_range(tlb, pmd, address, end - address, all);
+		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
 }
 
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long end)
+static void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, int all)
 {
 	pgd_t * dir;
 
@@ -474,7 +478,7 @@
 	dir = pgd_offset(vma->vm_mm, address);
 	tlb_start_vma(tlb, vma);
 	do {
-		zap_pmd_range(tlb, dir, address, end - address);
+		zap_pmd_range(tlb, dir, address, end - address, all);
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
@@ -524,7 +528,7 @@
  */
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted)
+		unsigned long end_addr, unsigned long *nr_accounted, int all)
 {
 	unsigned long zap_bytes = ZAP_BLOCK_SIZE;
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
@@ -568,7 +572,7 @@
 				tlb_start_valid = 1;
 			}
 
-			unmap_page_range(*tlbp, vma, start, start + block);
+			unmap_page_range(*tlbp, vma, start, start + block, all);
 			start += block;
 			zap_bytes -= block;
 			if ((long)zap_bytes > 0)
@@ -594,8 +598,8 @@
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
  */
-void zap_page_range(struct vm_area_struct *vma,
-			unsigned long address, unsigned long size)
+void invalidate_page_range(struct vm_area_struct *vma,
+		unsigned long address, unsigned long size, int all)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather *tlb;
@@ -612,7 +616,7 @@
 	lru_add_drain();
 	spin_lock(&mm->page_table_lock);
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, all);
 	tlb_finish_mmu(tlb, address, end);
 	spin_unlock(&mm->page_table_lock);
 }
@@ -1071,10 +1075,8 @@
  * Both hba and hlen are page numbers in PAGE_SIZE units.
  * An hlen of zero blows away the entire portion file after hba.
  */
-static void
-invalidate_mmap_range_list(struct list_head *head,
-			   unsigned long const hba,
-			   unsigned long const hlen)
+static void invalidate_mmap_range_list(struct list_head *head,
+		 unsigned long const hba,  unsigned long const hlen, int all)
 {
 	struct list_head *curr;
 	unsigned long hea;	/* last page of hole. */
@@ -1095,9 +1097,9 @@
 		    	continue;	/* Mapping disjoint from hole. */
 		zba = (hba <= vba) ? vba : hba;
 		zea = (vea <= hea) ? vea : hea;
-		zap_page_range(vp,
+		invalidate_page_range(vp,
 			       ((zba - vba) << PAGE_SHIFT) + vp->vm_start,
-			       (zea - zba + 1) << PAGE_SHIFT);
+			       (zea - zba + 1) << PAGE_SHIFT, all);
 	}
 }
 
@@ -1115,8 +1117,8 @@
  * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the
  * end of the file.
  */
-void invalidate_mmap_range(struct address_space *mapping,
-		      loff_t const holebegin, loff_t const holelen)
+static void invalidate_mmap_range(struct address_space *mapping,
+		loff_t const holebegin, loff_t const holelen, int all)
 {
 	unsigned long hba = holebegin >> PAGE_SHIFT;
 	unsigned long hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1133,12 +1135,19 @@
 	/* Protect against page fault */
 	atomic_inc(&mapping->truncate_count);
 	if (unlikely(!list_empty(&mapping->i_mmap)))
-		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen);
+		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen, all);
 	if (unlikely(!list_empty(&mapping->i_mmap_shared)))
-		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen);
+		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen, all);
 	up(&mapping->i_shared_sem);
 }
-EXPORT_SYMBOL_GPL(invalidate_mmap_range);
+
+ void invalidate_filemap_range(struct address_space *mapping,
+		loff_t const start, loff_t const length)
+{
+	invalidate_mmap_range(mapping, start, length, 0);
+}
+
+EXPORT_SYMBOL_GPL(invalidate_filemap_range);
 
 /*
  * Handle all mappings that got truncated by a "truncate()"
@@ -1156,7 +1165,7 @@
 	if (inode->i_size < offset)
 		goto do_expand;
 	i_size_write(inode, offset);
-	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0);
+	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
 	truncate_inode_pages(mapping, offset);
 	goto out_truncate;
 
--- 2.6.3.clean/mm/mmap.c	2004-02-17 22:58:32.000000000 -0500
+++ 2.6.3/mm/mmap.c	2004-02-19 22:46:01.000000000 -0500
@@ -1134,7 +1134,7 @@
 
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 
 	if (is_hugepage_only_range(start, end - start))
@@ -1436,7 +1436,7 @@
 	flush_cache_mm(mm);
 	/* Use ~0UL here to ensure all VMAs in the mm are unmapped */
 	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
-					~0UL, &nr_accounted);
+					~0UL, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
 	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-21 19:00                               ` Daniel Phillips
@ 2004-02-22 23:39                                 ` Paul E. McKenney
  2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-22 23:39 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

Hello, Dan,

How about the following?

EXPORT_SYMBOL(invalidate_filemap_range);

						Thanx, Paul

On Sat, Feb 21, 2004 at 02:00:16PM -0500, Daniel Phillips wrote:
> Hi Paul et al,
> 
> Here is an updated patch.  The name of the exported function is changed to
> "invalidate_filemap_range" to reflect the fact that only file-backed pages are
> invalidated, and to distinguish the three parameter flavour from the four
> parameter version called from vmtruncate.  The inner loop in zap_pte_range is
> hopefully correct now.
> 
> While I'm in here, why is the assignment "pte =" at line 411 of memory.c not
> redundant?
> 
>    http://lxr.linux.no/source/mm/memory.c?v=2.6.1#L411
> 
> As far as I can see, the ->filemap spinlock protects the pte from modification
> and pte was already assigned at line 405.
> 
> Anyway, we can now see that the full cost of this DFS-specific feature in the inner
> loop is a single (unlikely) branch.
> 
> I'll repeat my proposition here: providing local filesystem semantics for
> MAP_PRIVATE on any distributed filesystem requires these decorations on the
> unmap path.  Though there is no benefit for local filesystems, the cost is
> insignificant.
> 
> Regards,
> 
> Daniel
> 
> --- 2.6.3.clean/include/linux/mm.h	2004-02-17 22:57:13.000000000 -0500
> +++ 2.6.3/include/linux/mm.h	2004-02-21 12:59:16.000000000 -0500
> @@ -430,23 +430,23 @@
>  void shmem_lock(struct file * file, int lock);
>  int shmem_zero_setup(struct vm_area_struct *);
>  
> -void zap_page_range(struct vm_area_struct *vma, unsigned long address,
> -			unsigned long size);
>  int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
>  		struct vm_area_struct *start_vma, unsigned long start_addr,
> -		unsigned long end_addr, unsigned long *nr_accounted);
> -void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> -			unsigned long address, unsigned long size);
> +		unsigned long end_addr, unsigned long *nr_accounted, int zap);
>  void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
>  int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
>  			struct vm_area_struct *vma);
>  int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
>  			unsigned long size, pgprot_t prot);
> -
> -extern void invalidate_mmap_range(struct address_space *mapping,
> -				  loff_t const holebegin,
> -				  loff_t const holelen);
> +extern void invalidate_filemap_range(struct address_space *mapping, loff_t const start, loff_t const length);
>  extern int vmtruncate(struct inode * inode, loff_t offset);
> +void invalidate_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, int all);
> +
> +static inline void zap_page_range(struct vm_area_struct *vma, ulong address, ulong size)
> +{
> +	invalidate_page_range(vma, address, size, 1);
> +}
> +
>  extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
>  extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
>  extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
> --- 2.6.3.clean/mm/memory.c	2004-02-17 22:57:47.000000000 -0500
> +++ 2.6.3/mm/memory.c	2004-02-21 13:23:36.000000000 -0500
> @@ -384,9 +384,13 @@
>  	return -ENOMEM;
>  }
>  
> -static void
> -zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
> -		unsigned long address, unsigned long size)
> +static inline int is_anon(struct page *page)
> +{
> +	return !page->mapping || PageSwapCache(page);
> +}
> +
> +static void zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
> +		unsigned long address, unsigned long size, int all)
>  {
>  	unsigned long offset;
>  	pte_t *ptep;
> @@ -409,7 +413,8 @@
>  			continue;
>  		if (pte_present(pte)) {
>  			unsigned long pfn = pte_pfn(pte);
> -
> +			if (unlikely(!all) && is_anon(pfn_to_page(pfn)))
> +				continue;
>  			pte = ptep_get_and_clear(ptep);
>  			tlb_remove_tlb_entry(tlb, ptep, address+offset);
>  			if (pfn_valid(pfn)) {
> @@ -426,7 +431,7 @@
>  				}
>  			}
>  		} else {
> -			if (!pte_file(pte))
> +			if (!pte_file(pte) && all)
>  				free_swap_and_cache(pte_to_swp_entry(pte));
>  			pte_clear(ptep);
>  		}
> @@ -434,9 +439,8 @@
>  	pte_unmap(ptep-1);
>  }
>  
> -static void
> -zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
> -		unsigned long address, unsigned long size)
> +static void zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
> +		unsigned long address, unsigned long size, int all)
>  {
>  	pmd_t * pmd;
>  	unsigned long end;
> @@ -453,14 +457,14 @@
>  	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
>  		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
>  	do {
> -		zap_pte_range(tlb, pmd, address, end - address);
> -		address = (address + PMD_SIZE) & PMD_MASK; 
> +		zap_pte_range(tlb, pmd, address, end - address, all);
> +		address = (address + PMD_SIZE) & PMD_MASK;
>  		pmd++;
>  	} while (address < end);
>  }
>  
> -void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> -			unsigned long address, unsigned long end)
> +static void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
> +		unsigned long address, unsigned long end, int all)
>  {
>  	pgd_t * dir;
>  
> @@ -474,7 +478,7 @@
>  	dir = pgd_offset(vma->vm_mm, address);
>  	tlb_start_vma(tlb, vma);
>  	do {
> -		zap_pmd_range(tlb, dir, address, end - address);
> +		zap_pmd_range(tlb, dir, address, end - address, all);
>  		address = (address + PGDIR_SIZE) & PGDIR_MASK;
>  		dir++;
>  	} while (address && (address < end));
> @@ -524,7 +528,7 @@
>   */
>  int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
>  		struct vm_area_struct *vma, unsigned long start_addr,
> -		unsigned long end_addr, unsigned long *nr_accounted)
> +		unsigned long end_addr, unsigned long *nr_accounted, int all)
>  {
>  	unsigned long zap_bytes = ZAP_BLOCK_SIZE;
>  	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
> @@ -568,7 +572,7 @@
>  				tlb_start_valid = 1;
>  			}
>  
> -			unmap_page_range(*tlbp, vma, start, start + block);
> +			unmap_page_range(*tlbp, vma, start, start + block, all);
>  			start += block;
>  			zap_bytes -= block;
>  			if ((long)zap_bytes > 0)
> @@ -594,8 +598,8 @@
>   * @address: starting address of pages to zap
>   * @size: number of bytes to zap
>   */
> -void zap_page_range(struct vm_area_struct *vma,
> -			unsigned long address, unsigned long size)
> +void invalidate_page_range(struct vm_area_struct *vma,
> +		unsigned long address, unsigned long size, int all)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct mmu_gather *tlb;
> @@ -612,7 +616,7 @@
>  	lru_add_drain();
>  	spin_lock(&mm->page_table_lock);
>  	tlb = tlb_gather_mmu(mm, 0);
> -	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
> +	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, all);
>  	tlb_finish_mmu(tlb, address, end);
>  	spin_unlock(&mm->page_table_lock);
>  }
> @@ -1071,10 +1075,8 @@
>   * Both hba and hlen are page numbers in PAGE_SIZE units.
>   * An hlen of zero blows away the entire portion file after hba.
>   */
> -static void
> -invalidate_mmap_range_list(struct list_head *head,
> -			   unsigned long const hba,
> -			   unsigned long const hlen)
> +static void invalidate_mmap_range_list(struct list_head *head,
> +		 unsigned long const hba,  unsigned long const hlen, int all)
>  {
>  	struct list_head *curr;
>  	unsigned long hea;	/* last page of hole. */
> @@ -1095,9 +1097,9 @@
>  		    	continue;	/* Mapping disjoint from hole. */
>  		zba = (hba <= vba) ? vba : hba;
>  		zea = (vea <= hea) ? vea : hea;
> -		zap_page_range(vp,
> +		invalidate_page_range(vp,
>  			       ((zba - vba) << PAGE_SHIFT) + vp->vm_start,
> -			       (zea - zba + 1) << PAGE_SHIFT);
> +			       (zea - zba + 1) << PAGE_SHIFT, all);
>  	}
>  }
>  
> @@ -1115,8 +1117,8 @@
>   * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the
>   * end of the file.
>   */
> -void invalidate_mmap_range(struct address_space *mapping,
> -		      loff_t const holebegin, loff_t const holelen)
> +static void invalidate_mmap_range(struct address_space *mapping,
> +		loff_t const holebegin, loff_t const holelen, int all)
>  {
>  	unsigned long hba = holebegin >> PAGE_SHIFT;
>  	unsigned long hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
> @@ -1133,12 +1135,19 @@
>  	/* Protect against page fault */
>  	atomic_inc(&mapping->truncate_count);
>  	if (unlikely(!list_empty(&mapping->i_mmap)))
> -		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen);
> +		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen, all);
>  	if (unlikely(!list_empty(&mapping->i_mmap_shared)))
> -		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen);
> +		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen, all);
>  	up(&mapping->i_shared_sem);
>  }
> -EXPORT_SYMBOL_GPL(invalidate_mmap_range);
> +
> + void invalidate_filemap_range(struct address_space *mapping,
> +		loff_t const start, loff_t const length)
> +{
> +	invalidate_mmap_range(mapping, start, length, 0);
> +}
> +
> +EXPORT_SYMBOL_GPL(invalidate_filemap_range);
>  
>  /*
>   * Handle all mappings that got truncated by a "truncate()"
> @@ -1156,7 +1165,7 @@
>  	if (inode->i_size < offset)
>  		goto do_expand;
>  	i_size_write(inode, offset);
> -	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0);
> +	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
>  	truncate_inode_pages(mapping, offset);
>  	goto out_truncate;
>  
> --- 2.6.3.clean/mm/mmap.c	2004-02-17 22:58:32.000000000 -0500
> +++ 2.6.3/mm/mmap.c	2004-02-19 22:46:01.000000000 -0500
> @@ -1134,7 +1134,7 @@
>  
>  	lru_add_drain();
>  	tlb = tlb_gather_mmu(mm, 0);
> -	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted);
> +	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, 1);
>  	vm_unacct_memory(nr_accounted);
>  
>  	if (is_hugepage_only_range(start, end - start))
> @@ -1436,7 +1436,7 @@
>  	flush_cache_mm(mm);
>  	/* Use ~0UL here to ensure all VMAs in the mm are unmapped */
>  	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
> -					~0UL, &nr_accounted);
> +					~0UL, &nr_accounted, 1);
>  	vm_unacct_memory(nr_accounted);
>  	BUG_ON(mm->map_count);	/* This is just debugging */
>  	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
> 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC] Distributed mmap API
  2004-02-22 23:39                                 ` Paul E. McKenney
@ 2004-02-25 21:04                                   ` Daniel Phillips
  2004-02-25 19:12                                     ` Paul E. McKenney
                                                       ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Daniel Phillips @ 2004-02-25 21:04 UTC (permalink / raw)
  To: paulmck
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

This is the function formerly known as invalidate_mmap_range, with the
addition of a new code path in the zap_ call chain to handle MAP_PRIVATE
properly.  This function by itself is enough to support a crude but useful
form of distributed mmap where a shared file is cached only on one cluster
node at a time.

To use this, the distributed filesystem has to hook do_no_page to intercept
page faults and carry out the needed global locking.  The locking itself does
not require any new kernel hooks.  In brief, the patch here and another patch
to be presented for the do_no_page hook, together provide the core kernel API
for a simplified, distributed mmap.  (Note that there may be a workaround for
the lack of a do_no_page hook, but certainly not as simple and robust.)

To put this in perspective, I'll mention the two big limitations of the
simplified API:

  1) Invalidation is always a whole file at a time
  2) Multiple readers may not cache the same data simultaneously

To handle sub-file cache granularity, we also need to be able to flush dirty
data and evict cache pages with sub-file granularity, giving a trio of cache
management functions:

    unmap_mapping_range(mapping, start, length) /* this patch */
    write_mapping_range(mapping, start, length) /* start IO for dirty cache */
    evict_mapping_range(mapping, start, length) /* wait on IO and evict cache */

To handle (2) above, the distributed filesystem will need to hook and modify
the behaviour of do_wp_page so that it can intercept memory writes to shared
cache pages.

To summarize the current proposal, and where we need to go in the future:

  Simple core kernel API for simplistic distributed memory map
  ------------------------------------------------------------

     - unmap_mapping_range export (this patch)
     - do_no_page hook

  Improved core kernel API for optimal distributed memory map
  -----------------------------------------------------------

     - unmap_mapping_range export (this patch)
     - write_mapping_range export
     - evict_mapping_range export
     - do_no_page hook
     - do_wp_page hook

There's no big rush to move on to the optimal version just now, since the simplistic
version is already a big step forward.

I'd like to take this opportunity to apologize to Paul for derailing his more
modest proposal, but unfortunately, the semantics that could be obtained that
way are fatally flawed: private mmaps just won't work.  What I've written here
is about the minimum that supports acceptable mmap semantics.

And finally, the EXPORT_SYMBOL_GPL issue: after much fretting I've changed it
to just EXPORT_SYMBOL in this patch, because I feel that we have better ways
to further our goals of free and open software than to try to use this
particular API as a battering ram.  Of course it's not my decision, I just
want to register my vote here.

Regards,

Daniel

--- 2.6.3.clean/include/linux/mm.h	2004-02-17 22:57:13.000000000 -0500
+++ 2.6.3/include/linux/mm.h	2004-02-21 12:59:16.000000000 -0500
@@ -430,23 +430,23 @@
 void shmem_lock(struct file * file, int lock);
 int shmem_zero_setup(struct vm_area_struct *);
 
-void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-			unsigned long size);
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *start_vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted);
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long size);
+		unsigned long end_addr, unsigned long *nr_accounted, int zap);
 void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
 			unsigned long size, pgprot_t prot);
-
-extern void invalidate_mmap_range(struct address_space *mapping,
-				  loff_t const holebegin,
-				  loff_t const holelen);
+extern void invalidate_filemap_range(struct address_space *mapping, loff_t const start, loff_t const length);
 extern int vmtruncate(struct inode * inode, loff_t offset);
+void invalidate_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, int all);
+
+static inline void zap_page_range(struct vm_area_struct *vma, ulong address, ulong size)
+{
+	invalidate_page_range(vma, address, size, 1);
+}
+
 extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
--- 2.6.3.clean/mm/memory.c	2004-02-17 22:57:47.000000000 -0500
+++ 2.6.3/mm/memory.c	2004-02-25 13:34:57.000000000 -0500
@@ -384,9 +384,13 @@
 	return -ENOMEM;
 }
 
-static void
-zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
-		unsigned long address, unsigned long size)
+static inline int is_anon(struct page *page)
+{
+	return !page->mapping || PageSwapCache(page);
+}
+
+static void zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
+		unsigned long address, unsigned long size, int all)
 {
 	unsigned long offset;
 	pte_t *ptep;
@@ -409,8 +413,9 @@
 			continue;
 		if (pte_present(pte)) {
 			unsigned long pfn = pte_pfn(pte);
-
-			pte = ptep_get_and_clear(ptep);
+			if (unlikely(!all) && is_anon(pfn_to_page(pfn)))
+				continue;
+			pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
 			tlb_remove_tlb_entry(tlb, ptep, address+offset);
 			if (pfn_valid(pfn)) {
 				struct page *page = pfn_to_page(pfn);
@@ -426,17 +431,19 @@
 				}
 			}
 		} else {
-			if (!pte_file(pte))
+			if (!pte_file(pte)) {
+				if (!all)
+					continue;
 				free_swap_and_cache(pte_to_swp_entry(pte));
+			}
 			pte_clear(ptep);
 		}
 	}
 	pte_unmap(ptep-1);
 }
 
-static void
-zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
-		unsigned long address, unsigned long size)
+static void zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
+		unsigned long address, unsigned long size, int all)
 {
 	pmd_t * pmd;
 	unsigned long end;
@@ -453,14 +460,14 @@
 	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
 		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
 	do {
-		zap_pte_range(tlb, pmd, address, end - address);
-		address = (address + PMD_SIZE) & PMD_MASK; 
+		zap_pte_range(tlb, pmd, address, end - address, all);
+		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
 }
 
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long end)
+static void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, int all)
 {
 	pgd_t * dir;
 
@@ -474,7 +481,7 @@
 	dir = pgd_offset(vma->vm_mm, address);
 	tlb_start_vma(tlb, vma);
 	do {
-		zap_pmd_range(tlb, dir, address, end - address);
+		zap_pmd_range(tlb, dir, address, end - address, all);
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
@@ -524,7 +531,7 @@
  */
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted)
+		unsigned long end_addr, unsigned long *nr_accounted, int all)
 {
 	unsigned long zap_bytes = ZAP_BLOCK_SIZE;
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
@@ -568,7 +575,7 @@
 				tlb_start_valid = 1;
 			}
 
-			unmap_page_range(*tlbp, vma, start, start + block);
+			unmap_page_range(*tlbp, vma, start, start + block, all);
 			start += block;
 			zap_bytes -= block;
 			if ((long)zap_bytes > 0)
@@ -594,8 +601,8 @@
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
  */
-void zap_page_range(struct vm_area_struct *vma,
-			unsigned long address, unsigned long size)
+void invalidate_page_range(struct vm_area_struct *vma,
+		unsigned long address, unsigned long size, int all)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather *tlb;
@@ -612,7 +619,7 @@
 	lru_add_drain();
 	spin_lock(&mm->page_table_lock);
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, all);
 	tlb_finish_mmu(tlb, address, end);
 	spin_unlock(&mm->page_table_lock);
 }
@@ -1071,10 +1078,8 @@
  * Both hba and hlen are page numbers in PAGE_SIZE units.
  * An hlen of zero blows away the entire portion file after hba.
  */
-static void
-invalidate_mmap_range_list(struct list_head *head,
-			   unsigned long const hba,
-			   unsigned long const hlen)
+static void invalidate_mmap_range_list(struct list_head *head,
+		 unsigned long const hba,  unsigned long const hlen, int all)
 {
 	struct list_head *curr;
 	unsigned long hea;	/* last page of hole. */
@@ -1095,9 +1100,9 @@
 		    	continue;	/* Mapping disjoint from hole. */
 		zba = (hba <= vba) ? vba : hba;
 		zea = (vea <= hea) ? vea : hea;
-		zap_page_range(vp,
+		invalidate_page_range(vp,
 			       ((zba - vba) << PAGE_SHIFT) + vp->vm_start,
-			       (zea - zba + 1) << PAGE_SHIFT);
+			       (zea - zba + 1) << PAGE_SHIFT, all);
 	}
 }
 
@@ -1115,8 +1120,8 @@
  * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the
  * end of the file.
  */
-void invalidate_mmap_range(struct address_space *mapping,
-		      loff_t const holebegin, loff_t const holelen)
+static void invalidate_mmap_range(struct address_space *mapping,
+		loff_t const holebegin, loff_t const holelen, int all)
 {
 	unsigned long hba = holebegin >> PAGE_SHIFT;
 	unsigned long hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1133,12 +1138,19 @@
 	/* Protect against page fault */
 	atomic_inc(&mapping->truncate_count);
 	if (unlikely(!list_empty(&mapping->i_mmap)))
-		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen);
+		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen, all);
 	if (unlikely(!list_empty(&mapping->i_mmap_shared)))
-		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen);
+		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen, all);
 	up(&mapping->i_shared_sem);
 }
-EXPORT_SYMBOL_GPL(invalidate_mmap_range);
+
+ void unmap_mapping_range(struct address_space *mapping,
+		loff_t const start, loff_t const length)
+{
+	invalidate_mmap_range(mapping, start, length, 0);
+}
+
+EXPORT_SYMBOL(unmap_mapping_range);
 
 /*
  * Handle all mappings that got truncated by a "truncate()"
@@ -1156,7 +1168,7 @@
 	if (inode->i_size < offset)
 		goto do_expand;
 	i_size_write(inode, offset);
-	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0);
+	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
 	truncate_inode_pages(mapping, offset);
 	goto out_truncate;
 
--- 2.6.3.clean/mm/mmap.c	2004-02-17 22:58:32.000000000 -0500
+++ 2.6.3/mm/mmap.c	2004-02-19 22:46:01.000000000 -0500
@@ -1134,7 +1134,7 @@
 
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 
 	if (is_hugepage_only_range(start, end - start))
@@ -1436,7 +1436,7 @@
 	flush_cache_mm(mm);
 	/* Use ~0UL here to ensure all VMAs in the mm are unmapped */
 	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
-					~0UL, &nr_accounted);
+					~0UL, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
 	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
@ 2004-02-25 19:12                                     ` Paul E. McKenney
  2004-02-25 19:14                                     ` Paul E. McKenney
  2004-02-25 22:07                                     ` Andrew Morton
  2 siblings, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-25 19:12 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Wed, Feb 25, 2004 at 04:04:19PM -0500, Daniel Phillips wrote:

Very cool!

> This is the function formerly known as invalidate_mmap_range, with the
> addition of a new code path in the zap_ call chain to handle MAP_PRIVATE
> properly.  This function by itself is enough to support a crude but useful
> form of distributed mmap where a shared file is cached only on one cluster
> node at a time.
> 
> To use this, the distributed filesystem has to hook do_no_page to intercept
> page faults and carry out the needed global locking.  The locking itself does
> not require any new kernel hooks.  In brief, the patch here and another patch
> to be presented for the do_no_page hook, together provide the core kernel API
> for a simplified, distributed mmap.  (Note that there may be a workaround for
> the lack of a do_no_page hook, but certainly not as simple and robust.)
> 
> To put this in perspective, I'll mention the two big limitations of the
> simplified API:
> 
>   1) Invalidation is always a whole file at a time

I must be missing something subtle here...  It looks to me like
the new unmap_mapping_range() API is capable of invalidating
portions of files, based on the "start" and "length" arguments.

What am I missing?

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
  2004-02-25 19:12                                     ` Paul E. McKenney
@ 2004-02-25 19:14                                     ` Paul E. McKenney
  2004-02-25 22:07                                     ` Andrew Morton
  2 siblings, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-02-25 19:14 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Wed, Feb 25, 2004 at 04:04:19PM -0500, Daniel Phillips wrote:
>
> I'd like to take this opportunity to apologize to Paul for derailing his more
> modest proposal, but unfortunately, the semantics that could be obtained that
> way are fatally flawed: private mmaps just won't work.  What I've written here
> is about the minimum that supports acceptable mmap semantics.

No problem -- it looks like we are getting a much better result than
I was proposing, thank you for helping me to see the light!

						Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
  2004-02-25 19:12                                     ` Paul E. McKenney
  2004-02-25 19:14                                     ` Paul E. McKenney
@ 2004-02-25 22:07                                     ` Andrew Morton
  2004-02-25 22:07                                       ` Daniel Phillips
  2004-03-03  3:00                                       ` Daniel Phillips
  2 siblings, 2 replies; 68+ messages in thread
From: Andrew Morton @ 2004-02-25 22:07 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

Daniel Phillips <phillips@arcor.de> wrote:
>
> -			pte = ptep_get_and_clear(ptep);
> +			if (unlikely(!all) && is_anon(pfn_to_page(pfn)))
> +				continue;
> +			pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
>  			tlb_remove_tlb_entry(tlb, ptep, address+offset);
>  			if (pfn_valid(pfn)) {

I think you need to check pfn_valid() before running is_anon(pfn_to_page())
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 22:07                                     ` Andrew Morton
@ 2004-02-25 22:07                                       ` Daniel Phillips
  2004-02-25 22:16                                         ` Andrew Morton
  2004-03-03  3:00                                       ` Daniel Phillips
  1 sibling, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-02-25 22:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

On Wednesday 25 February 2004 17:07, Andrew Morton wrote:
> Daniel Phillips <phillips@arcor.de> wrote:
> > -			pte = ptep_get_and_clear(ptep);
> > +			if (unlikely(!all) && is_anon(pfn_to_page(pfn)))
> > +				continue;
> > +			pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
> >  			tlb_remove_tlb_entry(tlb, ptep, address+offset);
> >  			if (pfn_valid(pfn)) {
>
> I think you need to check pfn_valid() before running is_anon(pfn_to_page())

Easy enough:

	if (unlikely(!all) && pfn_valid(pfn) && is_anon(pfn_to_page(pfn)))

but how can we legitimately get !pfn_valid there?

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 22:07                                       ` Daniel Phillips
@ 2004-02-25 22:16                                         ` Andrew Morton
  2004-02-25 22:46                                           ` Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2004-02-25 22:16 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

Daniel Phillips <phillips@arcor.de> wrote:
>
> On Wednesday 25 February 2004 17:07, Andrew Morton wrote:
> > Daniel Phillips <phillips@arcor.de> wrote:
> > > -			pte = ptep_get_and_clear(ptep);
> > > +			if (unlikely(!all) && is_anon(pfn_to_page(pfn)))
> > > +				continue;
> > > +			pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
> > >  			tlb_remove_tlb_entry(tlb, ptep, address+offset);
> > >  			if (pfn_valid(pfn)) {
> >
> > I think you need to check pfn_valid() before running is_anon(pfn_to_page())
> 
> Easy enough:
> 
> 	if (unlikely(!all) && pfn_valid(pfn) && is_anon(pfn_to_page(pfn)))

You can probably factor this into

	page = NULL;
	if (pfn_valid(..))
		page = pfn_to_page(..)
	if (page)
		..
	if (page)
		..

> but how can we legitimately get !pfn_valid there?

A mapping of some I/O region?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 22:16                                         ` Andrew Morton
@ 2004-02-25 22:46                                           ` Daniel Phillips
  0 siblings, 0 replies; 68+ messages in thread
From: Daniel Phillips @ 2004-02-25 22:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

On Wednesday 25 February 2004 17:16, Andrew Morton wrote:
> > but how can we legitimately get !pfn_valid there?
>
> A mapping of some I/O region?

With MAP_PRIVATE, on a distributed filesystem?  OK...

Can we recognize those I/O vmas and handle them with their own separate loop, 
saving a few cycles for the common case?  Or just:

	if (pte_present(pte)) {
		unsigned long pfn = pte_pfn(pte);
		struct page *page;
		if (unlikely(!pfn_valid(pfn))) {
			ptep_get_and_clear(ptep);
			tlb_remove_tlb_entry(tlb, ptep, address+offset);
			continue;
		}
		page = pfn_to_page(pfn);
		if (unlikely(!all) && is_anon(page))
			continue;
		pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
		tlb_remove_tlb_entry(tlb, ptep, address+offset);
		if (PageReserved(page))
			continue;
		if (pte_dirty(pte))
			set_page_dirty(page);
		if (page->mapping && pte_young(pte) && !PageSwapCache(page))
			mark_page_accessed(page);
		tlb->freed++;
		page_remove_rmap(page, ptep);
		tlb_remove_page(tlb, page);
	} else {

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-02-25 22:07                                     ` Andrew Morton
  2004-02-25 22:07                                       ` Daniel Phillips
@ 2004-03-03  3:00                                       ` Daniel Phillips
  2004-03-03  3:15                                         ` Andrew Morton
  1 sibling, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-03-03  3:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

On Wednesday 25 February 2004 17:07, Andrew Morton wrote:
> I think you need to check pfn_valid() before running is_anon(pfn_to_page())

Hi Andrew,

Here is a rearranged zap_pte_range that avoids any operations for out-of-range
pfns.  The only annoyance with this factoring is that tlb_remove_tlb_entry is
expanded in two places.  For most architectures the macro is null anyway, and
for the rest it's hardly any code at all, except for ppc64, which has
__tlb_remove_tlb_entry as an inline that looks like it expands into a fair
amount of code.  But probably not enough to worry about.

I took the opportunity to remove some indents by liberal use of continues. 
This version reads pretty easily.

	if (pte_present(pte)) {
		unsigned long pfn = pte_pfn(pte);
		struct page *page;

		if (unlikely(!pfn_valid(pfn))) {
			pte_clear(ptep);
			tlb_remove_tlb_entry(tlb, ptep, address+offset);
			continue;
		}
		page = pfn_to_page(pfn);
		if (unlikely(!all) && is_anon(page))
			continue;
		pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
		tlb_remove_tlb_entry(tlb, ptep, address+offset);
		if (PageReserved(page))
			continue;
		if (pte_dirty(pte))
			set_page_dirty(page);
		if (page->mapping && pte_young(pte) && !PageSwapCache(page))
			mark_page_accessed(page);
		tlb->freed++;
		page_remove_rmap(page, ptep);
		tlb_remove_page(tlb, page);
		continue;
	}

I also tried your "if (page)" suggestion, which looks like this:

	if (pte_present(pte)) {
		unsigned long pfn = pte_pfn(pte);
		struct page *page = NULL;

		if (likely(pfn_valid(pfn))) {
			page = pfn_to_page(pfn);
			if (unlikely(!all) && is_anon(page))
				continue;
		}
		pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
		tlb_remove_tlb_entry(tlb, ptep, address+offset);
		if (unlikely(!page) || PageReserved(page))
			continue;
		if (pte_dirty(pte))
			set_page_dirty(page);
		if (page->mapping && pte_young(pte) && !PageSwapCache(page))
			mark_page_accessed(page);
		tlb->freed++;
		page_remove_rmap(page, ptep);
		tlb_remove_page(tlb, page);
		continue;
	}

It came out ok too - only one "if (page)", a little shorter and no extra macro
expansions, though it's a little harder to follow and might be microscopically
slower.  The complete patch below uses the first form, and does away with the
is_anon inline.

Regards,

Daniel

--- 2.6.3.clean/include/linux/mm.h	2004-02-17 22:57:13.000000000 -0500
+++ 2.6.3/include/linux/mm.h	2004-02-21 12:59:16.000000000 -0500
@@ -430,23 +430,23 @@
 void shmem_lock(struct file * file, int lock);
 int shmem_zero_setup(struct vm_area_struct *);
 
-void zap_page_range(struct vm_area_struct *vma, unsigned long address,
-			unsigned long size);
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *start_vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted);
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long size);
+		unsigned long end_addr, unsigned long *nr_accounted, int zap);
 void clear_page_tables(struct mmu_gather *tlb, unsigned long first, int nr);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 int zeromap_page_range(struct vm_area_struct *vma, unsigned long from,
 			unsigned long size, pgprot_t prot);
-
-extern void invalidate_mmap_range(struct address_space *mapping,
-				  loff_t const holebegin,
-				  loff_t const holelen);
+extern void invalidate_filemap_range(struct address_space *mapping, loff_t const start, loff_t const length);
 extern int vmtruncate(struct inode * inode, loff_t offset);
+void invalidate_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, int all);
+
+static inline void zap_page_range(struct vm_area_struct *vma, ulong address, ulong size)
+{
+	invalidate_page_range(vma, address, size, 1);
+}
+
 extern pmd_t *FASTCALL(__pmd_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_kernel(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
 extern pte_t *FASTCALL(pte_alloc_map(struct mm_struct *mm, pmd_t *pmd, unsigned long address));
--- 2.6.3.clean/mm/memory.c	2004-02-17 22:57:47.000000000 -0500
+++ 2.6.3/mm/memory.c	2004-03-02 20:59:58.000000000 -0500
@@ -384,9 +384,8 @@
 	return -ENOMEM;
 }
 
-static void
-zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
-		unsigned long address, unsigned long size)
+static void zap_pte_range(struct mmu_gather *tlb, pmd_t * pmd,
+		unsigned long address, unsigned long size, int all)
 {
 	unsigned long offset;
 	pte_t *ptep;
@@ -409,34 +408,41 @@
 			continue;
 		if (pte_present(pte)) {
 			unsigned long pfn = pte_pfn(pte);
+			struct page *page;
 
-			pte = ptep_get_and_clear(ptep);
-			tlb_remove_tlb_entry(tlb, ptep, address+offset);
-			if (pfn_valid(pfn)) {
-				struct page *page = pfn_to_page(pfn);
-				if (!PageReserved(page)) {
-					if (pte_dirty(pte))
-						set_page_dirty(page);
-					if (page->mapping && pte_young(pte) &&
-							!PageSwapCache(page))
-						mark_page_accessed(page);
-					tlb->freed++;
-					page_remove_rmap(page, ptep);
-					tlb_remove_page(tlb, page);
-				}
+			if (unlikely(!pfn_valid(pfn))) {
+				pte_clear(ptep);
+				tlb_remove_tlb_entry(tlb, ptep, address+offset);
+				continue;
 			}
-		} else {
-			if (!pte_file(pte))
-				free_swap_and_cache(pte_to_swp_entry(pte));
-			pte_clear(ptep);
+			page = pfn_to_page(pfn);
+			if (unlikely(!all) && (!page->mapping || PageSwapCache(page)))
+				continue;
+			pte = ptep_get_and_clear(ptep); /* get dirty bit atomically */
+			tlb_remove_tlb_entry(tlb, ptep, address+offset);
+			if (PageReserved(page))
+				continue;
+			if (pte_dirty(pte))
+				set_page_dirty(page);
+			if (page->mapping && pte_young(pte) && !PageSwapCache(page))
+				mark_page_accessed(page);
+			tlb->freed++;
+			page_remove_rmap(page, ptep);
+			tlb_remove_page(tlb, page);
+			continue;
 		}
+		if (!pte_file(pte)) {
+			if (!all)
+				continue;
+			free_swap_and_cache(pte_to_swp_entry(pte));
+		}
+		pte_clear(ptep);
 	}
 	pte_unmap(ptep-1);
 }
 
-static void
-zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
-		unsigned long address, unsigned long size)
+static void zap_pmd_range(struct mmu_gather *tlb, pgd_t * dir,
+		unsigned long address, unsigned long size, int all)
 {
 	pmd_t * pmd;
 	unsigned long end;
@@ -453,14 +459,14 @@
 	if (end > ((address + PGDIR_SIZE) & PGDIR_MASK))
 		end = ((address + PGDIR_SIZE) & PGDIR_MASK);
 	do {
-		zap_pte_range(tlb, pmd, address, end - address);
-		address = (address + PMD_SIZE) & PMD_MASK; 
+		zap_pte_range(tlb, pmd, address, end - address, all);
+		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
 }
 
-void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
-			unsigned long address, unsigned long end)
+static void unmap_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
+		unsigned long address, unsigned long end, int all)
 {
 	pgd_t * dir;
 
@@ -474,7 +480,7 @@
 	dir = pgd_offset(vma->vm_mm, address);
 	tlb_start_vma(tlb, vma);
 	do {
-		zap_pmd_range(tlb, dir, address, end - address);
+		zap_pmd_range(tlb, dir, address, end - address, all);
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		dir++;
 	} while (address && (address < end));
@@ -524,7 +530,7 @@
  */
 int unmap_vmas(struct mmu_gather **tlbp, struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		unsigned long end_addr, unsigned long *nr_accounted)
+		unsigned long end_addr, unsigned long *nr_accounted, int all)
 {
 	unsigned long zap_bytes = ZAP_BLOCK_SIZE;
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
@@ -568,7 +574,7 @@
 				tlb_start_valid = 1;
 			}
 
-			unmap_page_range(*tlbp, vma, start, start + block);
+			unmap_page_range(*tlbp, vma, start, start + block, all);
 			start += block;
 			zap_bytes -= block;
 			if ((long)zap_bytes > 0)
@@ -594,8 +600,8 @@
  * @address: starting address of pages to zap
  * @size: number of bytes to zap
  */
-void zap_page_range(struct vm_area_struct *vma,
-			unsigned long address, unsigned long size)
+void invalidate_page_range(struct vm_area_struct *vma,
+		unsigned long address, unsigned long size, int all)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_gather *tlb;
@@ -612,7 +618,7 @@
 	lru_add_drain();
 	spin_lock(&mm->page_table_lock);
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, all);
 	tlb_finish_mmu(tlb, address, end);
 	spin_unlock(&mm->page_table_lock);
 }
@@ -1071,10 +1077,8 @@
  * Both hba and hlen are page numbers in PAGE_SIZE units.
  * An hlen of zero blows away the entire portion file after hba.
  */
-static void
-invalidate_mmap_range_list(struct list_head *head,
-			   unsigned long const hba,
-			   unsigned long const hlen)
+static void invalidate_mmap_range_list(struct list_head *head,
+		 unsigned long const hba,  unsigned long const hlen, int all)
 {
 	struct list_head *curr;
 	unsigned long hea;	/* last page of hole. */
@@ -1095,9 +1099,9 @@
 		    	continue;	/* Mapping disjoint from hole. */
 		zba = (hba <= vba) ? vba : hba;
 		zea = (vea <= hea) ? vea : hea;
-		zap_page_range(vp,
+		invalidate_page_range(vp,
 			       ((zba - vba) << PAGE_SHIFT) + vp->vm_start,
-			       (zea - zba + 1) << PAGE_SHIFT);
+			       (zea - zba + 1) << PAGE_SHIFT, all);
 	}
 }
 
@@ -1115,8 +1119,8 @@
  * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the
  * end of the file.
  */
-void invalidate_mmap_range(struct address_space *mapping,
-		      loff_t const holebegin, loff_t const holelen)
+static void invalidate_mmap_range(struct address_space *mapping,
+		loff_t const holebegin, loff_t const holelen, int all)
 {
 	unsigned long hba = holebegin >> PAGE_SHIFT;
 	unsigned long hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -1133,12 +1137,19 @@
 	/* Protect against page fault */
 	atomic_inc(&mapping->truncate_count);
 	if (unlikely(!list_empty(&mapping->i_mmap)))
-		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen);
+		invalidate_mmap_range_list(&mapping->i_mmap, hba, hlen, all);
 	if (unlikely(!list_empty(&mapping->i_mmap_shared)))
-		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen);
+		invalidate_mmap_range_list(&mapping->i_mmap_shared, hba, hlen, all);
 	up(&mapping->i_shared_sem);
 }
-EXPORT_SYMBOL_GPL(invalidate_mmap_range);
+
+ void unmap_mapping_range(struct address_space *mapping,
+		loff_t const start, loff_t const length)
+{
+	invalidate_mmap_range(mapping, start, length, 0);
+}
+
+EXPORT_SYMBOL(unmap_mapping_range);
 
 /*
  * Handle all mappings that got truncated by a "truncate()"
@@ -1156,7 +1167,7 @@
 	if (inode->i_size < offset)
 		goto do_expand;
 	i_size_write(inode, offset);
-	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0);
+	invalidate_mmap_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
 	truncate_inode_pages(mapping, offset);
 	goto out_truncate;
 
--- 2.6.3.clean/mm/mmap.c	2004-02-17 22:58:32.000000000 -0500
+++ 2.6.3/mm/mmap.c	2004-02-19 22:46:01.000000000 -0500
@@ -1134,7 +1134,7 @@
 
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
-	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted);
+	unmap_vmas(&tlb, mm, vma, start, end, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 
 	if (is_hugepage_only_range(start, end - start))
@@ -1436,7 +1436,7 @@
 	flush_cache_mm(mm);
 	/* Use ~0UL here to ensure all VMAs in the mm are unmapped */
 	mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
-					~0UL, &nr_accounted);
+					~0UL, &nr_accounted, 1);
 	vm_unacct_memory(nr_accounted);
 	BUG_ON(mm->map_count);	/* This is just debugging */
 	clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-03-03  3:00                                       ` Daniel Phillips
@ 2004-03-03  3:15                                         ` Andrew Morton
  2004-03-03 13:06                                           ` Daniel Phillips
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2004-03-03  3:15 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

Daniel Phillips <phillips@arcor.de> wrote:
>
> Here is a rearranged zap_pte_range that avoids any operations for out-of-range
> pfns.

Please remind us why Linux needs this patch?

> +static void invalidate_mmap_range_list(struct list_head *head,
> +		 unsigned long const hba,  unsigned long const hlen, int all)
>  {

I forget what `all' does?  anon+swapcache as well as pagecache?

A bit of API documentation here would be appropriate.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-03-03  3:15                                         ` Andrew Morton
@ 2004-03-03 13:06                                           ` Daniel Phillips
  2004-03-04 18:55                                             ` Paul E. McKenney
  0 siblings, 1 reply; 68+ messages in thread
From: Daniel Phillips @ 2004-03-03 13:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: paulmck, sct, hch, linux-kernel, linux-mm

On Tuesday 02 March 2004 22:15, Andrew Morton wrote:
> Daniel Phillips <phillips@arcor.de> wrote:
> > Here is a rearranged zap_pte_range that avoids any operations for
> > out-of-range pfns.
>
> Please remind us why Linux needs this patch?

The is purely to support mmap, including MAP_PRIVATE, accurately on 
distributed filesystems, where "accurately" is defined as "with local 
filesystem semantics".

If the same file region is mmapped by more than one node, only one of them is 
allowed to have a given page of the mmap valid in the page tables at any 
time.  When a memory write occurs on one of the other nodes, it must fault so 
that the distributed filesystem can arrange for exclusive ownership of the 
file page (or as GFS currently implements it, the whole file) to change from 
one node to the other.  At this time, any pages already faulted in must be 
unmapped so that future memory accesses will properly fault.  This unmapping 
is done by zap_page_range, which has nearly the semantics we want except that 
it will also unmap private pages of a MAP_PRIVATE mapping, destroying the 
only copy of that data.  A user would observe the privately written data 
spontaneously revert to the current file contents.  The purpose of this patch 
is to fix that.

This patch allows a distributed filesystem to unmap file-backed memory without 
unmapping anonymous pages or deleting swap cache, avoiding the above data 
destruction.  Since zap_page_range is the only function that knows how to 
unmap memory, it needs to be taught how to skip anonymous pages.

An alternative to this patch is simply to export zap_page_range, then the 
distributed filesystem can walk the lists of mmapped vmas itself, skipping 
any that are MAP_PRIVATE.  This achieves Posix local filesystem semantics, 
but not Linux local filesystem semantics, because updates to the mmap from 
other nodes become visible unpredictably.  Earlier this year, Linus said that 
he wants tighter semantics for distributed MAP_PRIVATE.

This patch presses zap_page_range into service in a way that was not 
originally intended, that is, for invalidation as opposed to destruction of 
memory regions.  The requirements are identical except for the MAP_PRIVATE 
detail.  Forking the whole zap_ chain would be even more distasteful than 
grafting on this option flag.  It's also impractical to implement a zap_ 
variant within a dfs module because of the heavy use of per-arch APIs.  As
far I can see, this patch is the minimum cost of having accurate semantics
for distributed MAP_PRIVATE mmap.

I'll take the opportunity to beat my chest a once again about the fact that 
this doesn't benefit anything other than distributed filesystems.  On the 
other hand, the cost is  miniscule: 54 bytes, a little stack and likely no 
measureable cpu.

> I forget what `all' does?  anon+swapcache as well as pagecache?

Yes

> A bit of API documentation here would be appropriate.

Oops, sorry:

/**
 * zap_page_range - remove user pages in a given range
 * @vma: vm_area_struct holding the applicable pages
 * @address: starting address of pages to zap
 * @size: number of bytes to zap
 * @all: also unmap anonymous pages
 */
void zap_page_range(struct vm_area_struct *vma,
                    unsigned long address, unsigned long size, int all)

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC] Distributed mmap API
  2004-03-03 13:06                                           ` Daniel Phillips
@ 2004-03-04 18:55                                             ` Paul E. McKenney
  0 siblings, 0 replies; 68+ messages in thread
From: Paul E. McKenney @ 2004-03-04 18:55 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andrew Morton, sct, hch, linux-kernel, linux-mm

This matches what we are after here!

						Thanx, Paul

On Wed, Mar 03, 2004 at 08:06:20AM -0500, Daniel Phillips wrote:
> On Tuesday 02 March 2004 22:15, Andrew Morton wrote:
> > Daniel Phillips <phillips@arcor.de> wrote:
> > > Here is a rearranged zap_pte_range that avoids any operations for
> > > out-of-range pfns.
> >
> > Please remind us why Linux needs this patch?
> 
> The is purely to support mmap, including MAP_PRIVATE, accurately on 
> distributed filesystems, where "accurately" is defined as "with local 
> filesystem semantics".
> 
> If the same file region is mmapped by more than one node, only one of them is 
> allowed to have a given page of the mmap valid in the page tables at any 
> time.  When a memory write occurs on one of the other nodes, it must fault so 
> that the distributed filesystem can arrange for exclusive ownership of the 
> file page (or as GFS currently implements it, the whole file) to change from 
> one node to the other.  At this time, any pages already faulted in must be 
> unmapped so that future memory accesses will properly fault.  This unmapping 
> is done by zap_page_range, which has nearly the semantics we want except that 
> it will also unmap private pages of a MAP_PRIVATE mapping, destroying the 
> only copy of that data.  A user would observe the privately written data 
> spontaneously revert to the current file contents.  The purpose of this patch 
> is to fix that.
> 
> This patch allows a distributed filesystem to unmap file-backed memory without 
> unmapping anonymous pages or deleting swap cache, avoiding the above data 
> destruction.  Since zap_page_range is the only function that knows how to 
> unmap memory, it needs to be taught how to skip anonymous pages.
> 
> An alternative to this patch is simply to export zap_page_range, then the 
> distributed filesystem can walk the lists of mmapped vmas itself, skipping 
> any that are MAP_PRIVATE.  This achieves Posix local filesystem semantics, 
> but not Linux local filesystem semantics, because updates to the mmap from 
> other nodes become visible unpredictably.  Earlier this year, Linus said that 
> he wants tighter semantics for distributed MAP_PRIVATE.
> 
> This patch presses zap_page_range into service in a way that was not 
> originally intended, that is, for invalidation as opposed to destruction of 
> memory regions.  The requirements are identical except for the MAP_PRIVATE 
> detail.  Forking the whole zap_ chain would be even more distasteful than 
> grafting on this option flag.  It's also impractical to implement a zap_ 
> variant within a dfs module because of the heavy use of per-arch APIs.  As
> far I can see, this patch is the minimum cost of having accurate semantics
> for distributed MAP_PRIVATE mmap.
> 
> I'll take the opportunity to beat my chest a once again about the fact that 
> this doesn't benefit anything other than distributed filesystems.  On the 
> other hand, the cost is  miniscule: 54 bytes, a little stack and likely no 
> measureable cpu.
> 
> > I forget what `all' does?  anon+swapcache as well as pagecache?
> 
> Yes
> 
> > A bit of API documentation here would be appropriate.
> 
> Oops, sorry:
> 
> /**
>  * zap_page_range - remove user pages in a given range
>  * @vma: vm_area_struct holding the applicable pages
>  * @address: starting address of pages to zap
>  * @size: number of bytes to zap
>  * @all: also unmap anonymous pages
>  */
> void zap_page_range(struct vm_area_struct *vma,
>                     unsigned long address, unsigned long size, int all)
> 
> Regards,
> 
> Daniel
> 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 20:37                       ` Daniel Phillips
  2004-02-20 14:01                         ` Paul E. McKenney
@ 2004-02-20 21:17                         ` Christoph Hellwig
  2004-02-20 22:16                           ` Daniel Phillips
  1 sibling, 1 reply; 68+ messages in thread
From: Christoph Hellwig @ 2004-02-20 21:17 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: paulmck, Stephen C. Tweedie, Andrew Morton, Christoph Hellwig,
	linux-kernel, linux-mm

On Fri, Feb 20, 2004 at 03:37:26PM -0500, Daniel Phillips wrote:
> It does, thanks for the catch.  Please bear with me for a moment while I 
> reroll this, then hopefully we can move on to the more interesting discussion 
> of whether it's worth it.  (Yes it is :)

What about to the more interesting question who needs it.  It think this
whole discussion who needs what and which approach is better is pretty much
moot as long as we don't have an intree users.

Instead of wasting your time on different designs you should hurry of
getting your filesystems encumbrance-reviewed, cleaned up and merged -
with intree users we have a chance of finding the right API.  And your
newly started dicussion shows pretty much that with only out of tree users
we'll never get a sane API.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
@ 2004-02-20 22:16                           ` Daniel Phillips
  0 siblings, 0 replies; 68+ messages in thread
From: Daniel Phillips @ 2004-02-20 22:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: paulmck, Stephen C. Tweedie, Andrew Morton, linux-kernel, linux-mm

On Friday 20 February 2004 16:17, Christoph Hellwig wrote:
> On Fri, Feb 20, 2004 at 03:37:26PM -0500, Daniel Phillips wrote:
> > It does, thanks for the catch.  Please bear with me for a moment while I
> > reroll this, then hopefully we can move on to the more interesting
> > discussion of whether it's worth it.  (Yes it is :)
>
> What about to the more interesting question who needs it.  It think this
> whole discussion who needs what and which approach is better is pretty much
> moot as long as we don't have an intree users.

We settled that question in this case, see Paul's "surrender" above ;)

> Instead of wasting your time on different designs you should hurry of
> getting your filesystems encumbrance-reviewed, cleaned up and merged -
> with intree users we have a chance of finding the right API.  And your
> newly started dicussion shows pretty much that with only out of tree users
> we'll never get a sane API.

Again, we (everybody who cared to jump in) now agree on what is sane here, 
it's quite logical.  As for supplying background material so this makes sense 
to a wider group of people, sorry it's been on my to-do list for a while.  
Getting a DFS, namely Sistina GFS, into the tree is underway as you know from 
the press release, however turning the ship takes time.  Meanwhile, the api 
discussion can't wait because the rudder on that ship is even smaller.

Regards,

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-17 12:40   ` Paul E. McKenney
  2004-02-18  0:19     ` Andrew Morton
@ 2004-02-18 12:12     ` Dominik Kubla
  1 sibling, 0 replies; 68+ messages in thread
From: Dominik Kubla @ 2004-02-18 12:12 UTC (permalink / raw)
  To: paulmck; +Cc: Christoph Hellwig, akpm, linux-kernel, linux-mm

On Tuesday 17 February 2004 13:40, Paul E. McKenney wrote:

> These URLs do require that you register, but there is no cost nor any
> agreement other than the GPL itself.  The Linux client has not been
> shipped as product yet.  The code is still quite rough, which is one
> reason that it has not be submitted to, for example, LKML.  ;-)

But registering requires to disclose an unreasonable amount of personal
data. This is not acceptable.

Kind regards,
  Dominik Kubla
-- 
Steal my cash, car and TV - but leave the computer!
	-- Soenke Lange <soenke@escher.north.de>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: Non-GPL export of invalidate_mmap_range
  2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
  2004-02-17  2:31 ` Andrew Morton
  2004-02-17  7:35 ` Christoph Hellwig
@ 2004-02-17 22:22 ` David Weinehall
  2 siblings, 0 replies; 68+ messages in thread
From: David Weinehall @ 2004-02-17 22:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: akpm, linux-kernel, linux-mm

On Mon, Feb 16, 2004 at 11:09:27AM -0800, Paul E. McKenney wrote:
> Hello, Andrew,
> 
> The attached patch to make invalidate_mmap_range() non-GPL exported
> seems to have been lost somewhere between 2.6.1-mm4 and 2.6.1-mm5.
> It still applies cleanly.  Could you please take it up again?
> 
> 						Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> 
> 
> It was EXPORT_SYMBOL_GPL(), however IBM's GPFS is not GPL.

Ahhh, but it would be really nice if it was, even if it's irksome to get
decent performance out of it ;-)

[snip]


Regards: David Weinehall
-- 
 /) David Weinehall <tao@acc.umu.se> /) Northern lights wander      (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/    (/   Full colour fire           (/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2004-03-04 18:55 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-16 19:09 Non-GPL export of invalidate_mmap_range Paul E. McKenney
2004-02-17  2:31 ` Andrew Morton
2004-02-17  7:35 ` Christoph Hellwig
2004-02-17 12:40   ` Paul E. McKenney
2004-02-18  0:19     ` Andrew Morton
2004-02-18 12:51       ` Arjan van de Ven
2004-02-18 14:00         ` Paul E. McKenney
2004-02-18 21:10           ` Christoph Hellwig
2004-02-18 15:06             ` Paul E. McKenney
2004-02-18 22:21               ` Christoph Hellwig
2004-02-18 22:51                 ` Andrew Morton
2004-02-18 23:00                   ` Christoph Hellwig
2004-02-18 16:21                     ` Paul E. McKenney
2004-02-18 23:32                     ` Andrew Morton
2004-02-19 12:32                       ` Christoph Hellwig
2004-02-19 18:56                         ` Andrew Morton
2004-02-19 19:01                           ` Christoph Hellwig
2004-02-19 13:04                             ` Paul E. McKenney
2004-02-20  3:17                             ` Anton Blanchard
2004-02-20 21:46                               ` Valdis.Kletnieks
2004-02-19  0:28                     ` Andrew Morton
2004-02-18 18:36                       ` Paul E. McKenney
2004-02-19 12:31                       ` Christoph Hellwig
2004-02-19  9:11                         ` Paul E. McKenney
2004-02-19 18:32                           ` Lars Marowsky-Bree
2004-02-19 18:38                             ` Arjan van de Ven
2004-02-19 19:16                             ` viro
2004-02-19 16:15                               ` Paul E. McKenney
2004-02-19 18:59                         ` Tim Bird
2004-02-20  1:27                       ` David Schwartz
2004-02-19  9:11                   ` David Weinehall
2004-02-19  8:58                     ` Paul E. McKenney
2004-03-04  5:51                       ` Mike Fedyk
2004-02-19 10:29                   ` Lars Marowsky-Bree
2004-02-19  9:00                     ` Paul E. McKenney
2004-02-19 11:11                     ` Arjan van de Ven
2004-02-19 11:53                       ` Lars Marowsky-Bree
2004-02-18 18:04         ` Tim Bird
2004-02-19 20:56       ` Daniel Phillips
2004-02-19 22:06         ` Stephen C. Tweedie
2004-02-19 22:31           ` Daniel Phillips
2004-02-19 16:42             ` Paul E. McKenney
2004-02-20  2:06               ` Daniel Phillips
2004-02-19 19:47                 ` Paul E. McKenney
2004-02-20  5:07                   ` Daniel Phillips
2004-02-20 12:02                     ` Paul E. McKenney
2004-02-20 20:37                       ` Daniel Phillips
2004-02-20 14:01                         ` Paul E. McKenney
2004-02-20 23:00                           ` Daniel Phillips
2004-02-20 16:17                             ` Paul E. McKenney
2004-02-21  3:19                               ` Daniel Phillips
2004-02-21 19:00                               ` Daniel Phillips
2004-02-22 23:39                                 ` Paul E. McKenney
2004-02-25 21:04                                   ` [RFC] Distributed mmap API Daniel Phillips
2004-02-25 19:12                                     ` Paul E. McKenney
2004-02-25 19:14                                     ` Paul E. McKenney
2004-02-25 22:07                                     ` Andrew Morton
2004-02-25 22:07                                       ` Daniel Phillips
2004-02-25 22:16                                         ` Andrew Morton
2004-02-25 22:46                                           ` Daniel Phillips
2004-03-03  3:00                                       ` Daniel Phillips
2004-03-03  3:15                                         ` Andrew Morton
2004-03-03 13:06                                           ` Daniel Phillips
2004-03-04 18:55                                             ` Paul E. McKenney
2004-02-20 21:17                         ` Non-GPL export of invalidate_mmap_range Christoph Hellwig
2004-02-20 22:16                           ` Daniel Phillips
2004-02-18 12:12     ` Dominik Kubla
2004-02-17 22:22 ` David Weinehall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox