linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch 1/2] split mmap
@ 2007-04-05  9:29 Miklos Szeredi
  2007-04-05  9:30 ` [patch 2/2] only allow nonlinear vmas for ram backed filesystems Miklos Szeredi, Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2007-04-05  9:29 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-mm

Resending this non-linear-fix mini-series, unchaged but with updated
description.

----
From: Miklos Szeredi <mszeredi@suse.cz>

This is a straightforward split of do_mmap_pgoff() into two functions:

 - do_mmap_pgoff() checks the parameters, and calculates the vma
   flags.  Then it calls

 - mmap_region(), which does the actual mapping

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---

Index: linux/mm/mmap.c
===================================================================
--- linux.orig/mm/mmap.c	2007-04-04 19:34:36.000000000 +0200
+++ linux/mm/mmap.c	2007-04-05 10:51:01.000000000 +0200
@@ -893,14 +893,11 @@ unsigned long do_mmap_pgoff(struct file 
 			unsigned long flags, unsigned long pgoff)
 {
 	struct mm_struct * mm = current->mm;
-	struct vm_area_struct * vma, * prev;
 	struct inode *inode;
 	unsigned int vm_flags;
-	int correct_wcount = 0;
 	int error;
-	struct rb_node ** rb_link, * rb_parent;
 	int accountable = 1;
-	unsigned long charged = 0, reqprot = prot;
+	unsigned long reqprot = prot;
 
 	/*
 	 * Does the application expect PROT_READ to imply PROT_EXEC?
@@ -1025,7 +1022,25 @@ unsigned long do_mmap_pgoff(struct file 
 	error = security_file_mmap(file, reqprot, prot, flags);
 	if (error)
 		return error;
-		
+
+	return mmap_region(file, addr, len, flags, vm_flags, pgoff,
+			   accountable);
+}
+EXPORT_SYMBOL(do_mmap_pgoff);
+
+unsigned long mmap_region(struct file *file, unsigned long addr,
+			  unsigned long len, unsigned long flags,
+			  unsigned int vm_flags, unsigned long pgoff,
+			  int accountable)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma, *prev;
+	int correct_wcount = 0;
+	int error;
+	struct rb_node **rb_link, *rb_parent;
+	unsigned long charged = 0;
+	struct inode *inode =  file ? file->f_path.dentry->d_inode : NULL;
+
 	/* Clear old maps */
 	error = -ENOMEM;
 munmap_back:
@@ -1174,8 +1189,6 @@ unacct_error:
 	return error;
 }
 
-EXPORT_SYMBOL(do_mmap_pgoff);
-
 /* Get an address range which is currently unmapped.
  * For shmat() with addr=0.
  *
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h	2007-04-04 19:34:35.000000000 +0200
+++ linux/include/linux/mm.h	2007-04-05 10:51:01.000000000 +0200
@@ -1074,6 +1074,10 @@ extern unsigned long get_unmapped_area(s
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long pgoff);
+extern unsigned long mmap_region(struct file *file, unsigned long addr,
+	unsigned long len, unsigned long flags,
+	unsigned int vm_flags, unsigned long pgoff,
+	int accountable);
 
 static inline unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch 2/2] only allow nonlinear vmas for ram backed filesystems
  2007-04-05  9:29 [patch 1/2] split mmap Miklos Szeredi
@ 2007-04-05  9:30 ` Miklos Szeredi, Miklos Szeredi
  2007-04-05  9:36   ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi, Miklos Szeredi @ 2007-04-05  9:30 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-mm

page_mkclean() doesn't re-protect ptes for non-linear mappings, so a
later re-dirty through such a mapping will not generate a fault,
PG_dirty will not reflect the dirty state and the dirty count will be
skewed.  This implies that msync() is also currently broken for
nonlinear mappings.

Peter Zijlstra writes:
> In order to make page_mkclean() work for nonlinear vmas we need to do a
> full pte scan for each invocation (we could perhaps only scan 1 in n
> times to try and limit the damage) and that hurts. This will basically
> render it useless.
> 
> The other solution is adding rmap information to nonlinear vmas but
> doubling the memory overhead for nonlinear mappings was not deemed a
> good idea.

The easiest solution is to emulate remap_file_pages on non-linear
mappings with simple mmap() for non ram-backed filesystems.
Applications continue to work (albeit slower), as long as the number
of remappings remain below the maximum vma count.

However all currently known real uses of non-linear mappings are for
ram backed filesystems, which this patch doesn't affect.

William Lee Irwin III writes:
> It's used for > 3GB files on tmpfs and also ramfs, sometimes
> substantially larger than 3GB.
> 
> It's not used for the database proper. It's used for the buffer pool,
> which is the in-core destination and source of direct I/O, the on-disk
> source and destination of the I/O being the database.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---

Index: linux/mm/fremap.c
===================================================================
--- linux.orig/mm/fremap.c	2007-04-05 11:18:21.000000000 +0200
+++ linux/mm/fremap.c	2007-04-05 11:18:25.000000000 +0200
@@ -181,6 +181,24 @@ asmlinkage long sys_remap_file_pages(uns
 			goto retry;
 		}
 		mapping = vma->vm_file->f_mapping;
+		/*
+		 * page_mkclean doesn't work on nonlinear vmas, so if dirty
+		 * pages need to be accounted, emulate with linear vmas.
+		 */
+		if (mapping_cap_account_dirty(mapping)) {
+			unsigned long addr;
+
+			flags &= MAP_NONBLOCK;
+			addr = mmap_region(vma->vm_file, start, size, flags,
+					   vma->vm_flags, pgoff, 1);
+			if (IS_ERR_VALUE(addr))
+				err = addr;
+			else {
+				BUG_ON(addr != start);
+				err = 0;
+			}
+			goto out;
+		}
 		spin_lock(&mapping->i_mmap_lock);
 		flush_dcache_mmap_lock(mapping);
 		vma->vm_flags |= VM_NONLINEAR;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 2/2] only allow nonlinear vmas for ram backed filesystems
  2007-04-05  9:30 ` [patch 2/2] only allow nonlinear vmas for ram backed filesystems Miklos Szeredi, Miklos Szeredi
@ 2007-04-05  9:36   ` Peter Zijlstra
  2007-04-05  9:39     ` Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2007-04-05  9:36 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-mm

On Thu, 2007-04-05 at 11:30 +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> page_mkclean() doesn't re-protect ptes for non-linear mappings, so a
> later re-dirty through such a mapping will not generate a fault,
> PG_dirty will not reflect the dirty state and the dirty count will be
> skewed.  This implies that msync() is also currently broken for
> nonlinear mappings.
> 
> Peter Zijlstra writes:
> > In order to make page_mkclean() work for nonlinear vmas we need to do a
> > full pte scan for each invocation (we could perhaps only scan 1 in n
> > times to try and limit the damage) and that hurts. This will basically
> > render it useless.
> > 
> > The other solution is adding rmap information to nonlinear vmas but
> > doubling the memory overhead for nonlinear mappings was not deemed a
> > good idea.
> 
> The easiest solution is to emulate remap_file_pages on non-linear
> mappings with simple mmap() for non ram-backed filesystems.
> Applications continue to work (albeit slower), as long as the number
> of remappings remain below the maximum vma count.
> 
> However all currently known real uses of non-linear mappings are for
> ram backed filesystems, which this patch doesn't affect.
> 
> William Lee Irwin III writes:
> > It's used for > 3GB files on tmpfs and also ramfs, sometimes
> > substantially larger than 3GB.
> > 
> > It's not used for the database proper. It's used for the buffer pool,
> > which is the in-core destination and source of direct I/O, the on-disk
> > source and destination of the I/O being the database.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> 
> Index: linux/mm/fremap.c
> ===================================================================
> --- linux.orig/mm/fremap.c	2007-04-05 11:18:21.000000000 +0200
> +++ linux/mm/fremap.c	2007-04-05 11:18:25.000000000 +0200
> @@ -181,6 +181,24 @@ asmlinkage long sys_remap_file_pages(uns
>  			goto retry;
>  		}
>  		mapping = vma->vm_file->f_mapping;
> +		/*
> +		 * page_mkclean doesn't work on nonlinear vmas, so if dirty
> +		 * pages need to be accounted, emulate with linear vmas.
> +		 */
> +		if (mapping_cap_account_dirty(mapping)) {

Perhaps this should read:

		if (vma_wants_writenotify(vma)) {

That way we would even allow read only non-linear mappings of 'real'
filesystem files.

> +			unsigned long addr;
> +
> +			flags &= MAP_NONBLOCK;
> +			addr = mmap_region(vma->vm_file, start, size, flags,
> +					   vma->vm_flags, pgoff, 1);
> +			if (IS_ERR_VALUE(addr))
> +				err = addr;
> +			else {
> +				BUG_ON(addr != start);
> +				err = 0;
> +			}
> +			goto out;
> +		}
>  		spin_lock(&mapping->i_mmap_lock);
>  		flush_dcache_mmap_lock(mapping);
>  		vma->vm_flags |= VM_NONLINEAR;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 2/2] only allow nonlinear vmas for ram backed filesystems
  2007-04-05  9:36   ` Peter Zijlstra
@ 2007-04-05  9:39     ` Miklos Szeredi
  2007-04-05  9:50       ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Miklos Szeredi @ 2007-04-05  9:39 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: akpm, linux-kernel, linux-mm

> > +		/*
> > +		 * page_mkclean doesn't work on nonlinear vmas, so if dirty
> > +		 * pages need to be accounted, emulate with linear vmas.
> > +		 */
> > +		if (mapping_cap_account_dirty(mapping)) {
> 
> Perhaps this should read:
> 
> 		if (vma_wants_writenotify(vma)) {
> 

I looked at that, but IIRC vma_wants_writenotify() doesn't work after
mmap(), because of the updated protection bits.

> That way we would even allow read only non-linear mappings of 'real'
> filesystem files.

Well, we could do that, but is it really worth the hassle?  The real
question is whether anyone would want to use non-linear
shared-read-only mappings or not.

Thanks,
Miklos

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 2/2] only allow nonlinear vmas for ram backed filesystems
  2007-04-05  9:39     ` Miklos Szeredi
@ 2007-04-05  9:50       ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2007-04-05  9:50 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-mm

On Thu, 2007-04-05 at 11:39 +0200, Miklos Szeredi wrote:
> > > +		/*
> > > +		 * page_mkclean doesn't work on nonlinear vmas, so if dirty
> > > +		 * pages need to be accounted, emulate with linear vmas.
> > > +		 */
> > > +		if (mapping_cap_account_dirty(mapping)) {
> > 
> > Perhaps this should read:
> > 
> > 		if (vma_wants_writenotify(vma)) {
> > 
> 
> I looked at that, but IIRC vma_wants_writenotify() doesn't work after
> mmap(), because of the updated protection bits.

Right, bother, that again. I fudged it in mprotect by setting the pgprot
bits to what was expected although I had a parametrised version earlier.
But that was disliked.

> > That way we would even allow read only non-linear mappings of 'real'
> > filesystem files.
> 
> Well, we could do that, but is it really worth the hassle?  The real
> question is whether anyone would want to use non-linear
> shared-read-only mappings or not.

Hmm, yeah, I thought that was the case with that code snippet Andrew
pulled of the interweb, but on second inspection they do map it writable
too. I was led astray by the fact that they map the same file twice.

Oh well, lets just keep the patch as is then.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-04-05  9:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-05  9:29 [patch 1/2] split mmap Miklos Szeredi
2007-04-05  9:30 ` [patch 2/2] only allow nonlinear vmas for ram backed filesystems Miklos Szeredi, Miklos Szeredi
2007-04-05  9:36   ` Peter Zijlstra
2007-04-05  9:39     ` Miklos Szeredi
2007-04-05  9:50       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox