linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting
       [not found]   ` <20080130172646.GA2355@tv-sign.ru>
@ 2008-02-02 20:52     ` Matt Helsley
  2008-02-02 21:17     ` Matt Helsley
  1 sibling, 0 replies; 6+ messages in thread
From: Matt Helsley @ 2008-02-02 20:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Miklos Szeredi, Andrew Morton, Peter Zijlstra,
	William Lee Irwin III, Nick Piggin, Ingo Molnar, linux-kernel,
	stable, linux-mm

On Wed, 2008-01-30 at 20:26 +0300, Oleg Nesterov wrote:
> On 01/30, Miklos Szeredi wrote:
> > 
> > On Wed, 2008-01-30 at 17:20 +0300, Oleg Nesterov wrote:
> > > Fix ->vm_file accounting, mmap_region() may do do_munmap().
> > 
> > There's a small problem with the patch: the vma itself is freed at
> > unmap, so the fput(vma->vm_file) may crash.  Here's an updated patch.
> 
> Ah, indeed, thanks!
> 
> 
> Offtopic. I noticed this problem while looking at this patch:
> 
> 	http://marc.info/?l=linux-mm-commits&m=120141116911711
> 
> So this (the old vma could be removed before we create the new mapping)
> means that the patch above has another problem: if we are remapping the
> whole VM_EXECUTABLE vma, removed_exe_file_vma() can clear ->exe_file
> while it shouldn't (Matt Helsley cc'ed).
> 
> Oleg.

Only shared VMAs can be remapped by sys_remap_file_pages but all
executable mappings are MAP_PRIVATE and hence lack the shared flag. I
don't know of a way for userspace to change that flag so I think there's
nothing that needs to be done here.

Cc'ing linux-mm.

Cheers,
	-Matt Helsley

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting
       [not found]   ` <20080130172646.GA2355@tv-sign.ru>
  2008-02-02 20:52     ` [PATCH] sys_remap_file_pages: fix ->vm_file accounting Matt Helsley
@ 2008-02-02 21:17     ` Matt Helsley
  2008-02-03 18:21       ` Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Matt Helsley @ 2008-02-02 21:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Miklos Szeredi, Andrew Morton, Peter Zijlstra,
	William Lee Irwin III, Nick Piggin, Ingo Molnar, linux-kernel,
	stable, linux-mm

On Wed, 2008-01-30 at 20:26 +0300, Oleg Nesterov wrote:
> On 01/30, Miklos Szeredi wrote:
> > 
> > On Wed, 2008-01-30 at 17:20 +0300, Oleg Nesterov wrote:
> > > Fix ->vm_file accounting, mmap_region() may do do_munmap().
> > 
> > There's a small problem with the patch: the vma itself is freed at
> > unmap, so the fput(vma->vm_file) may crash.  Here's an updated patch.
> 
> Ah, indeed, thanks!
> 
> 
> Offtopic. I noticed this problem while looking at this patch:
> 
> 	http://marc.info/?l=linux-mm-commits&m=120141116911711
> 
> So this (the old vma could be removed before we create the new mapping)
> means that the patch above has another problem: if we are remapping the
> whole VM_EXECUTABLE vma, removed_exe_file_vma() can clear ->exe_file
> while it shouldn't (Matt Helsley cc'ed).
> 
> Oleg.

	Looking at sys_remap_file_pages() it appears that the shared flag must
be set in order to remap. Executable mappings are always MAP_PRIVATE and
hence lack the shared flag so that any modifications to those areas
don't get written back to the executable. I don't think userspace can
change this flag -- even using plain mremap. So, unless there's a way to
change that flag, I don't think there's anything related to
VM_EXECUTABLE vmas that needs to be done here.

Cc'ing linux-mm.

Cheers,
	-Matt Helsley

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting
  2008-02-02 21:17     ` Matt Helsley
@ 2008-02-03 18:21       ` Oleg Nesterov
  2008-02-06 20:33         ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2008-02-03 18:21 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Miklos Szeredi, Andrew Morton, Peter Zijlstra,
	William Lee Irwin III, Nick Piggin, Ingo Molnar, linux-kernel,
	linux-mm

(remove stable@kernel.org from CC)

On 02/02, Matt Helsley wrote:
> 
> On Wed, 2008-01-30 at 20:26 +0300, Oleg Nesterov wrote:
> > 
> > Offtopic. I noticed this problem while looking at this patch:
> > 
> > 	http://marc.info/?l=linux-mm-commits&m=120141116911711
> > 
> > So this (the old vma could be removed before we create the new mapping)
> > means that the patch above has another problem: if we are remapping the
> > whole VM_EXECUTABLE vma, removed_exe_file_vma() can clear ->exe_file
> > while it shouldn't (Matt Helsley cc'ed).
> > 
> > Oleg.
> 
> 	Looking at sys_remap_file_pages() it appears that the shared flag must
> be set in order to remap. Executable mappings are always MAP_PRIVATE and
> hence lack the shared flag so that any modifications to those areas
> don't get written back to the executable. I don't think userspace can
> change this flag

Yes, userspace can't change it. But if MVFS changes ->vm_file it could also
change vm_flags... But I think you are right anyway, we shouldn't care.


So I have to try to find another bug ;) Suppose that ->load_binary() does
a series of do_mmap(MAP_EXECUTABLE). It is possible that mmap_region() can
merge 2 vmas. In that case we "leak" ->num_exe_file_vmas. Unless I missed
something, mmap_region() should do removed_exe_file_vma() when vma_merge()
succeds (near fput(file)).

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting
  2008-02-03 18:21       ` Oleg Nesterov
@ 2008-02-06 20:33         ` Hugh Dickins
  2008-02-07  0:16           ` Matt Helsley
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2008-02-06 20:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Matt Helsley, Miklos Szeredi, Andrew Morton, Peter Zijlstra,
	William Lee Irwin III, Nick Piggin, Ingo Molnar, linux-kernel,
	linux-mm

On Sun, 3 Feb 2008, Oleg Nesterov wrote:
> 
> So I have to try to find another bug ;) Suppose that ->load_binary() does
> a series of do_mmap(MAP_EXECUTABLE). It is possible that mmap_region() can
> merge 2 vmas. In that case we "leak" ->num_exe_file_vmas. Unless I missed
> something, mmap_region() should do removed_exe_file_vma() when vma_merge()
> succeds (near fput(file)).

Or there's the complementary case of a VM_EXECUTABLE vma being
split in two, for example by an mprotect of a part of it.

Sorry, Matt, I don't like your patch at all.  It seems to add a fair
amount of ugliness and unmaintainablity, all for a peculiar MVFS case
(you've tried to argue other advantages, but not always convinced!).

And I found it quite hard to see where the crucial difference comes.
I guess it's that MVFS changes vma->vm_file in its ->mmap?  Well, if
MVFS does that, maybe something else does that too, but precisely to
rely on the present behaviour of /proc/pid/exe - so in fixing for
MVFS, we'd be breaking that hypothetical other?

I can understand patches to avoid mmap_sem for /proc/pid/exe, but
this one just seems too messy for too special an out-of-tree case.
(I've no last word on this, but that's my opinion.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting
  2008-02-06 20:33         ` Hugh Dickins
@ 2008-02-07  0:16           ` Matt Helsley
  2008-02-07 16:40             ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Matt Helsley @ 2008-02-07  0:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, Miklos Szeredi, Andrew Morton, Peter Zijlstra,
	William Lee Irwin III, Nick Piggin, Ingo Molnar, linux-kernel,
	linux-mm

On Wed, 2008-02-06 at 20:33 +0000, Hugh Dickins wrote:
> On Sun, 3 Feb 2008, Oleg Nesterov wrote:
> > 
> > So I have to try to find another bug ;) Suppose that ->load_binary() does
> > a series of do_mmap(MAP_EXECUTABLE). It is possible that mmap_region() can
> > merge 2 vmas. In that case we "leak" ->num_exe_file_vmas. Unless I missed
> > something, mmap_region() should do removed_exe_file_vma() when vma_merge()
> > succeds (near fput(file)).
> 
> Or there's the complementary case of a VM_EXECUTABLE vma being
> split in two, for example by an mprotect of a part of it.
> 
> Sorry, Matt, I don't like your patch at all.  It seems to add a fair
> amount of ugliness and unmaintainablity, all for a peculiar MVFS case

I thought that getting rid of the separate versions of proc_exe_link()
improved maintainability. Do you have any specific details on what you
think makes the code introduced by the patch unmaintainable?

> (you've tried to argue other advantages, but not always convinced!).

Yup -- looking at how the VM_EXECUTABLE flag affects the vma walk it's
clear one of my arguments was wrong. So I can't blame you for being
unconvinced by that. :)

I still think it would help any stacking filesystems that can't use the
solution adopted by unionfs.

> And I found it quite hard to see where the crucial difference comes.
> I guess it's that MVFS changes vma->vm_file in its ->mmap?  Well, if

Yup.

> MVFS does that, maybe something else does that too, but precisely to
> rely on the present behaviour of /proc/pid/exe - so in fixing for
> MVFS, we'd be breaking that hypothetical other?

	I'm not completely certain that I understand your point. Are you
suggesting that some hypothetical code would want to use this "quirk"
of /proc/pid/exe for a legitimate purpose?

	Assuming that is your point, I thought my non-hypothetical java example
clearly demonstrated that at least one non-hypothetical program doesn't
expect the "quirk" and breaks because of it. Frankly,
given /proc/pid/exe's output in the non-stacking case, I can't see how
its output in the stacking case we're discussing could be considered
anything but buggy.

Cheers,
	-Matt Helsley

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sys_remap_file_pages: fix ->vm_file accounting
  2008-02-07  0:16           ` Matt Helsley
@ 2008-02-07 16:40             ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2008-02-07 16:40 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Oleg Nesterov, Miklos Szeredi, Andrew Morton, Peter Zijlstra,
	William Lee Irwin III, Nick Piggin, Ingo Molnar, linux-kernel,
	linux-mm

On Wed, 6 Feb 2008, Matt Helsley wrote:
> On Wed, 2008-02-06 at 20:33 +0000, Hugh Dickins wrote:
> > 
> > Sorry, Matt, I don't like your patch at all.  It seems to add a fair
> > amount of ugliness and unmaintainablity, all for a peculiar MVFS case
> 
> I thought that getting rid of the separate versions of proc_exe_link()
> improved maintainability.

That's a plus, but I don't see it balances the minus.

> Do you have any specific details on what you
> think makes the code introduced by the patch unmaintainable?

The fact that you now have to shadow operations on vm_file by
operations on your exe_file, easy to get out of step; and that
we'll tend not to notice when it goes wrong, because the common
case will have them both in the root mount (that will hide busy
errors, won't it? or am I mistaken?).

Though I should concede that your latest patch looks like it catches
all the places it needs to, and that they don't really stretch beyond
mm/mmap.c.  If you provided functions to do the get_file/fput/VM_EXE-
CUTABLE stuff, most of the ugliness would be confined to fs/proc/base.c.

I much preferred Andrew's take-a-copy-of-the-pathname approach, but
admit I'm not one to appreciate the namespace arguments against it.
I wish I had a more constructive solution.

> I still think it would help any stacking filesystems that can't use the
> solution adopted by unionfs.

If you think it's going to help lots of others, please get them to
speak up in support.  But we're rather short of stacking filesystems
in the kernel at present, so it can be hard to argue.

> > And I found it quite hard to see where the crucial difference comes.
> > I guess it's that MVFS changes vma->vm_file in its ->mmap?  Well, if
> 
> Yup.
> 
> > MVFS does that, maybe something else does that too, but precisely to
> > rely on the present behaviour of /proc/pid/exe - so in fixing for
> > MVFS, we'd be breaking that hypothetical other?
> 
> 	I'm not completely certain that I understand your point. Are you
> suggesting that some hypothetical code would want to use this "quirk"
> of /proc/pid/exe for a legitimate purpose?

Yes, but our different viewpoints give us a different idea of what's
a quirk.  The quirk I see is that MVFS is fiddling with vma->vm_file
in its ->mmap, which is rather unusual (though not unique); and then
complaining when this doesn't work as it wants.  But you see it as a
quirk that the VM_EXECUTABLE's vm_file gets used for /proc/pid/exe?

> 	Assuming that is your point, I thought my non-hypothetical java example
> clearly demonstrated that at least one non-hypothetical program doesn't
> expect the "quirk" and breaks because of it.

Yes, and I'm suggesting that if we change the behaviour to suit you,
we might be causing a regression in something else.  No evidence for
that, but it's a possibility (though probably not one I'd be arguing,
if I actually liked the patch involved).

When and if the VFS provides integrated support for stacking filesystems,
it would make sense to do something about this.  But as things stand, it's
for the stacking filesystem to manage its stack.  Why uglify the kernel
code (which doesn't include any user for this), instead of managing that
stacking yourself i.e. why not leave vm_file as is (or if necessary
point it to a proxy), and use its private_data for your optimizations?

> Frankly,
> given /proc/pid/exe's output in the non-stacking case, I can't see how
> its output in the stacking case we're discussing could be considered
> anything but buggy.

But whose bug?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-02-07 16:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20080130142014.GA2164@tv-sign.ru>
     [not found] ` <1201712101.31222.22.camel@tucsk.pomaz.szeredi.hu>
     [not found]   ` <20080130172646.GA2355@tv-sign.ru>
2008-02-02 20:52     ` [PATCH] sys_remap_file_pages: fix ->vm_file accounting Matt Helsley
2008-02-02 21:17     ` Matt Helsley
2008-02-03 18:21       ` Oleg Nesterov
2008-02-06 20:33         ` Hugh Dickins
2008-02-07  0:16           ` Matt Helsley
2008-02-07 16:40             ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox