* [RFC][PATCH 0/3] Procfs Task exe Symlinks
@ 2007-11-01 3:35 Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks Matt Helsley
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Matt Helsley @ 2007-11-01 3:35 UTC (permalink / raw)
To: Linux-Kernel, linux-mm; +Cc: Andrew Morton, Al Viro, Dave Hansen
The kernel implements readlink of /proc/pid/exe by getting the file from the
first executable VMA. Then the path to the file is reconstructed and reported as
the result. While this method is often correct it does not always identify the
correct path.
I've described the problem in much greater detail in the first patch of this
series.
Once case I had in mind while trying to solve this problem involves the use
of the out-of-tree MVFS filesystem. While that motivates my work on this patch
series I think there are issues with /proc/self/exe independent of MVFS and
which this patch series can solve. I've also explained the additional wrinkles
MVFS adds in the first patch.
One question that came up while solving this problem was trying to determine
what /proc/self/exe should point to. The common case is easy enough but does
not seem to unambiguously define /proc/self/exe in all possible cases. When
discussing the idea with Dave Hansen off-list I went through some possible
definitions such as:
"what's running"
"used to start"
"last exec"
For now I've chosen "last exec". I'd like to know if this doesn't fit with
what anyone would expect and/or if there's a better definition.
The first patch in this series attempts to solve the problem by keeping
an additional reference to the last exec'd file in the mm struct rather than
walking the VMA list for the first, executable, file-backed VMA.
The next patch adds a spinlock to protect the new reference rather than
reusing the mmap semaphore. This exists mainly to show where the mmap semaphore
is absolutely necessary and where there are other alternatives.
The last patch allows applications to change their exe symlink to fix the
case where a restart loader is designed to map the executable in rather than
use the exec system call.
If there are no objections to the direction of the patches I plan on making
the series acceptable for eventual inclusion.
Cheers,
-Matt Helsley
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks
2007-11-01 3:35 [RFC][PATCH 0/3] Procfs Task exe Symlinks Matt Helsley
@ 2007-11-01 3:35 ` Matt Helsley
2007-11-01 19:25 ` Andrew Morton
2007-11-01 3:35 ` [RFC][PATCH 2/3] [RFC][PATCH] Add spinlock in mm to protext exe reference Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 3/3] [RFC][PATCH] Make /proc/pid/exe symlink changeable Matt Helsley
2 siblings, 1 reply; 6+ messages in thread
From: Matt Helsley @ 2007-11-01 3:35 UTC (permalink / raw)
To: Linux-Kernel, linux-mm; +Cc: Andrew Morton, Al Viro, Dave Hansen
[-- Attachment #1: mm-cache-exe-struct-file --]
[-- Type: text/plain, Size: 15876 bytes --]
The kernel implements readlink of /proc/pid/exe by getting the file from the
first executable VMA. Then the path to the file is reconstructed and reported as
the result. While this method is often correct it does not always identify the
correct path.
Some applications may play games with the executable mappings. Al Viro mentioned
an example where the executable region is copied and subsequently unmapped.
In this case the kernel wanders down the VMA list to the next file-backed,
executable VMA -- which is not necessarily the destination of the copy. Worse,
any transformation of the executable could conceivably take place and the
symlink could be deceptive. In this case, since we can't be certain what is
being executed it might be better to simply "delete" the symlink.
Another scenario where the symlink might be broken includes some potential
implementations of checkpoint/restart. If restart is implemented as a loader
application then it will be reported in the symlink instead of the application
being restarted. This would break restarted Java applications for example
(see the java example below).
For executables on the stackable MVFS filesystem the current procfs methods for
implementing a task's exe symlink do not point to the correct file and
applications relying on the symlink fail (see the java example below).
So there are multiple ways in which a task's exe symlink in /proc can break
using the current VMA walk method.
I am trying to address the case of running Java installed on an MVFS filesystem.
However the patch solves the problems with the symlink for more than just
MVFS hence I think it's worth posting for comments. Furthermore, if there are
no objections to the direction of the patch I plan on making it acceptable for
eventual inclusion.
Java uses the /proc/self/exe symlink to determine JAVAHOME by reading the link
and trimming the path. This breaks under MVFS because MVFS reorganizes
files and directories to enable versioning and stores these on a
filesystem lower in the "stack". This is further complicated by the need for
efficient IO because reads and mapping of the file must access the file
contents from the "lower" filesystem. The symlink points to the mapped/read
file and not the MVFS file. Because MVFS utilizes a different organization of
the files in the lower filesystem the symlink cannot be used to
correctly determine JAVAHOME. This could be a problem for any stacking
filesystem which reorganizes files and directories -- though I don't know of
another besides MVFS.
Top FS (e.g. MVFS) Lower FS (e.g. ext3)
/foo/bar/jvm/bin/java /qux/baz/java
/foo/bar/jvm/lib /bee/buzz
When the executable file is opened MVFS returns its own file struct.
When the MVFS file is subsequently mmap'd or read MVFS transforms the path to
/qux/baz/java, opens, and uses the contents of the lower filesystem's file
to satisfy the read or the mmap.
Since the bytes of the java executable are at /qux/baz/java, /proc/self/exe
points to /qux/baz/java instead of /foo/bar/jvm/bin/java. Hence JAVAHOME
points to the wrong directory and Java subsequently fails to find the files
it needs to run.
One question that came up while solving this problem was trying to determine
what /proc/self/exe should point to. The common case is easy enough but does
not seem to unambiguously define /proc/self/exe. When discussing the idea with
Dave Hansen off list we ran through some possible "definitions" such as:
"what's running"
"used to start"
"last exec"
I even went back to earlier versions of the kernel and found an implementation
similar to what an earlier posting of this patch proposed (see Changelog below).
However my search revealed no comments elaborating on why it was changed
so it didn't reveal much useful information.
For now I've chosen "last exec". I'd like to know if this doesn't fit with
what anyone would expect and/or if there's a better definition.
To solve the problem this patch changes the way that the kernel resolves a
task's exe symlink. Instead of walking the VMAs to find the first
executable file-backed VMA we store a reference to the exec'd file in the
mm_struct -- /foo/bar/jvm/bin/java in the example above.
That reference would prevent the filesystem holding the executable file from
being unmounted even if the VMA were unmapped. The first time we
remove a VMA mapping the same file we also drop the reference added by
this patch. This avoids pinning the mounted filesystem in all but the
stacking case. In the case where the open_exec() file and VMA file do not
match (e.g. MVFS) only lazy unmount will work.
The semi-independent reference also allows a future patch to make the symlink
writable which can fix the potential checkpoint/restart case I mentioned
earlier.
A minor added benefit of this approach is we no longer need separate code to
offer this symlink on mmu-less systems.
Dave Hansen suggested droppping the new reference only when the last VMA
of the original mapping is dropped. This requires alot more code but, at least
to me, makes more sense.
I'm planning on working on Dave's suggestion next. That may require a new type
of VMA flag and a count of the VMAs referring to the file. To maintain them I'll
probably need to increment the count in split_vma() and decrement the count in
vma_merge().
Alternative ideas on how to fix this would certainly be very nice to
hear since I've not been able to think of anything better and I'm not especially
pleased with this patch.
This novel will get much shorter if/when this patch gets submitted
for inclusion :).
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
---
Changelog:
Moved fput() calls out of areas holding task_lock
(pointed out by Andrew Morton. see:
http://lkml.org/lkml/2007/7/12/402)
Moved the exe_file reference to the mm_struct from the task struct
(suggested by Dave Hansen)
Avoid pinning most mounted fs by dropping both file refs when the VMA is
removed (problem pointed out by Al Viro. see:
http://lkml.org/lkml/2007/7/12/398)
fs/exec.c | 2 ++
fs/proc/base.c | 25 +++++++++++++++++++++++++
fs/proc/internal.h | 1 -
fs/proc/task_mmu.c | 34 ----------------------------------
fs/proc/task_nommu.c | 34 ----------------------------------
include/linux/proc_fs.h | 19 +++++++++++++++++++
include/linux/sched.h | 3 +++
kernel/fork.c | 3 +++
mm/mmap.c | 32 ++++++++++++++++++++++++++++++++
9 files changed, 84 insertions(+), 69 deletions(-)
Index: linux-2.6.23/include/linux/sched.h
===================================================================
--- linux-2.6.23.orig/include/linux/sched.h
+++ linux-2.6.23/include/linux/sched.h
@@ -430,10 +430,13 @@ struct mm_struct {
struct completion *core_startup_done, core_done;
/* aio bits */
rwlock_t ioctx_list_lock;
struct kioctx *ioctx_list;
+
+ /* store ref to file /proc/<pid>/exe symlink points to */
+ struct file *exe_file;
};
struct sighand_struct {
atomic_t count;
struct k_sigaction action[_NSIG];
Index: linux-2.6.23/fs/exec.c
===================================================================
--- linux-2.6.23.orig/fs/exec.c
+++ linux-2.6.23/fs/exec.c
@@ -1089,10 +1089,12 @@ int flush_old_exec(struct linux_binprm *
current->self_exec_id++;
flush_signal_handlers(current, 0);
flush_old_files(current->files);
+ get_file(bprm->file);
+ set_mm_exe_file(current->mm, bprm->file);
return 0;
mmap_failed:
reset_files_struct(current, files);
out:
Index: linux-2.6.23/fs/proc/base.c
===================================================================
--- linux-2.6.23.orig/fs/proc/base.c
+++ linux-2.6.23/fs/proc/base.c
@@ -930,10 +930,35 @@ static const struct file_operations proc
.release = single_release,
};
#endif
+static int proc_exe_link(struct inode *inode, struct dentry **dentry,
+ struct vfsmount **mnt)
+{
+ struct task_struct *task;
+ struct mm_struct *mm;
+ struct file *exe_file;
+
+ task = get_proc_task(inode);
+ if (!task)
+ return -ENOENT;
+ mm = get_task_mm(task);
+ put_task_struct(task);
+ if (!mm)
+ return -ENOENT;
+ exe_file = get_mm_exe_file(mm);
+ mmput(mm);
+ if (exe_file) {
+ *mnt = mntget(exe_file->f_path.mnt);
+ *dentry = dget(exe_file->f_path.dentry);
+ fput(exe_file);
+ return 0;
+ } else
+ return -ENOENT;
+}
+
static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct inode *inode = dentry->d_inode;
int error = -EACCES;
Index: linux-2.6.23/kernel/fork.c
===================================================================
--- linux-2.6.23.orig/kernel/fork.c
+++ linux-2.6.23/kernel/fork.c
@@ -48,10 +48,11 @@
#include <linux/freezer.h>
#include <linux/delayacct.h>
#include <linux/taskstats_kern.h>
#include <linux/random.h>
#include <linux/tty.h>
+#include <linux/proc_fs.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
#include <asm/uaccess.h>
#include <asm/mmu_context.h>
@@ -390,10 +391,11 @@ void mmput(struct mm_struct *mm)
might_sleep();
if (atomic_dec_and_test(&mm->mm_users)) {
exit_aio(mm);
exit_mmap(mm);
+ set_mm_exe_file(mm, NULL);
if (!list_empty(&mm->mmlist)) {
spin_lock(&mmlist_lock);
list_del(&mm->mmlist);
spin_unlock(&mmlist_lock);
}
@@ -506,10 +508,11 @@ static struct mm_struct *dup_mm(struct t
err = dup_mmap(mm, oldmm);
if (err)
goto free_pt;
+ mm->exe_file = get_mm_exe_file(oldmm);
mm->hiwater_rss = get_mm_rss(mm);
mm->hiwater_vm = mm->total_vm;
return mm;
Index: linux-2.6.23/fs/proc/task_mmu.c
===================================================================
--- linux-2.6.23.orig/fs/proc/task_mmu.c
+++ linux-2.6.23/fs/proc/task_mmu.c
@@ -70,44 +70,10 @@ int task_statm(struct mm_struct *mm, int
*data = mm->total_vm - mm->shared_vm;
*resident = *shared + get_mm_counter(mm, anon_rss);
return mm->total_vm;
}
-int proc_exe_link(struct inode *inode, struct dentry **dentry, struct vfsmount **mnt)
-{
- struct vm_area_struct * vma;
- int result = -ENOENT;
- struct task_struct *task = get_proc_task(inode);
- struct mm_struct * mm = NULL;
-
- if (task) {
- mm = get_task_mm(task);
- put_task_struct(task);
- }
- if (!mm)
- goto out;
- down_read(&mm->mmap_sem);
-
- vma = mm->mmap;
- while (vma) {
- if ((vma->vm_flags & VM_EXECUTABLE) && vma->vm_file)
- break;
- vma = vma->vm_next;
- }
-
- if (vma) {
- *mnt = mntget(vma->vm_file->f_path.mnt);
- *dentry = dget(vma->vm_file->f_path.dentry);
- result = 0;
- }
-
- up_read(&mm->mmap_sem);
- mmput(mm);
-out:
- return result;
-}
-
static void pad_len_spaces(struct seq_file *m, int len)
{
len = 25 + sizeof(void*) * 6 - len;
if (len < 1)
len = 1;
Index: linux-2.6.23/fs/proc/task_nommu.c
===================================================================
--- linux-2.6.23.orig/fs/proc/task_nommu.c
+++ linux-2.6.23/fs/proc/task_nommu.c
@@ -102,44 +102,10 @@ int task_statm(struct mm_struct *mm, int
up_read(&mm->mmap_sem);
*resident = size;
return size;
}
-int proc_exe_link(struct inode *inode, struct dentry **dentry, struct vfsmount **mnt)
-{
- struct vm_list_struct *vml;
- struct vm_area_struct *vma;
- struct task_struct *task = get_proc_task(inode);
- struct mm_struct *mm = get_task_mm(task);
- int result = -ENOENT;
-
- if (!mm)
- goto out;
- down_read(&mm->mmap_sem);
-
- vml = mm->context.vmlist;
- vma = NULL;
- while (vml) {
- if ((vml->vma->vm_flags & VM_EXECUTABLE) && vml->vma->vm_file) {
- vma = vml->vma;
- break;
- }
- vml = vml->next;
- }
-
- if (vma) {
- *mnt = mntget(vma->vm_file->f_path.mnt);
- *dentry = dget(vma->vm_file->f_path.dentry);
- result = 0;
- }
-
- up_read(&mm->mmap_sem);
- mmput(mm);
-out:
- return result;
-}
-
/*
* display mapping lines for a particular process's /proc/pid/maps
*/
static int show_map(struct seq_file *m, void *_vml)
{
Index: linux-2.6.23/fs/proc/internal.h
===================================================================
--- linux-2.6.23.orig/fs/proc/internal.h
+++ linux-2.6.23/fs/proc/internal.h
@@ -38,11 +38,10 @@ extern int nommu_vma_show(struct seq_fil
#endif
extern int maps_protect;
extern void create_seq_entry(char *name, mode_t mode, const struct file_operations *f);
-extern int proc_exe_link(struct inode *, struct dentry **, struct vfsmount **);
extern int proc_tid_stat(struct task_struct *, char *);
extern int proc_tgid_stat(struct task_struct *, char *);
extern int proc_pid_status(struct task_struct *, char *);
extern int proc_pid_statm(struct task_struct *, char *);
Index: linux-2.6.23/include/linux/proc_fs.h
===================================================================
--- linux-2.6.23.orig/include/linux/proc_fs.h
+++ linux-2.6.23/include/linux/proc_fs.h
@@ -211,10 +211,18 @@ static inline struct proc_dir_entry *pro
static inline void proc_net_remove(const char *name)
{
remove_proc_entry(name,proc_net);
}
+/* While the foo_mm_exe_file accessor functions are for mm_structs, they are
+ * only needed to implement /proc/<pid>|self/exe so we define them here. */
+
+/* Takes a reference to new_exe_file from caller -- does not get a new
+ * reference; only puts old ones */
+extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
+extern struct file *get_mm_exe_file(struct mm_struct *mm);
+
#else
#define proc_root_driver NULL
#define proc_net NULL
#define proc_bus NULL
@@ -246,10 +254,21 @@ struct tty_driver;
static inline void proc_tty_register_driver(struct tty_driver *driver) {};
static inline void proc_tty_unregister_driver(struct tty_driver *driver) {};
extern struct proc_dir_entry proc_root;
+/* Takes a reference to new_exe_file from caller -- does not get a new
+ * reference; only puts old ones */
+static inline void set_mm_exe_file(struct mm_struct *mm,
+ struct file *new_exe_file)
+{}
+
+static inline struct file *get_mm_exe_file(struct mm_struct *mm)
+{
+ return NULL;
+}
+
#endif /* CONFIG_PROC_FS */
#if !defined(CONFIG_PROC_KCORE)
static inline void kclist_add(struct kcore_list *new, void *addr, size_t size)
{
Index: linux-2.6.23/mm/mmap.c
===================================================================
--- linux-2.6.23.orig/mm/mmap.c
+++ linux-2.6.23/mm/mmap.c
@@ -1699,10 +1699,38 @@ find_extend_vma(struct mm_struct * mm, u
make_pages_present(addr, start);
return vma;
}
#endif
+#ifdef CONFIG_PROC_FS
+/* Takes a reference to new_exe_file from caller -- does not get a new
+ * reference; only puts old ones */
+void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
+{
+ struct file *old_exe_file;
+
+ down_write(&mm->mmap_sem);
+ old_exe_file = mm->exe_file;
+ mm->exe_file = new_exe_file;
+ up_write(&mm->mmap_sem);
+ if (old_exe_file)
+ fput(old_exe_file);
+}
+
+struct file *get_mm_exe_file(struct mm_struct *mm)
+{
+ struct file *exe_file;
+
+ down_read(&mm->mmap_sem);
+ exe_file = mm->exe_file;
+ if (exe_file)
+ get_file(exe_file);
+ up_read(&mm->mmap_sem);
+ return exe_file;
+}
+#endif
+
/*
* Ok - we have the memory areas we should free on the vma list,
* so release them, and do the vma updates.
*
* Called with the mm semaphore held.
@@ -1716,10 +1744,14 @@ static void remove_vma_list(struct mm_st
mm->total_vm -= nrpages;
if (vma->vm_flags & VM_LOCKED)
mm->locked_vm -= nrpages;
vm_stat_account(mm, vma->vm_flags, vma->vm_file, -nrpages);
+ if (mm->exe_file && (vma->vm_file == mm->exe_file)) {
+ fput(mm->exe_file);
+ mm->exe_file = NULL;
+ }
vma = remove_vma(vma);
} while (vma);
validate_mm(mm);
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC][PATCH 2/3] [RFC][PATCH] Add spinlock in mm to protext exe reference
2007-11-01 3:35 [RFC][PATCH 0/3] Procfs Task exe Symlinks Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks Matt Helsley
@ 2007-11-01 3:35 ` Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 3/3] [RFC][PATCH] Make /proc/pid/exe symlink changeable Matt Helsley
2 siblings, 0 replies; 6+ messages in thread
From: Matt Helsley @ 2007-11-01 3:35 UTC (permalink / raw)
To: Linux-Kernel, linux-mm; +Cc: Andrew Morton, Al Viro, Dave Hansen
[-- Attachment #1: proc_pid_exe_avoid_mmap_sem --]
[-- Type: text/plain, Size: 2945 bytes --]
The new and relatively unused (compared to VM ops) mm_struct exe file reference
uses the mmap semaphore. It may be preferrable to avoid using the mmap
semaphore at some point in the future. This patch demonstrates one way to avoid
using the mmap semaphore for the exe file reference inside /proc/pid/exe ops.
Unfortunately we can't entirely avoid using the mmap semaphore because we need
to drop the exe file reference when the VMA mapping the executable file does --
otherwise we'd pin mounted filesystems until all applications executed from
them exitted.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
include/linux/sched.h | 1 +
mm/mmap.c | 17 +++++++++++------
2 files changed, 12 insertions(+), 6 deletions(-)
Index: linux-2.6.23/include/linux/sched.h
===================================================================
--- linux-2.6.23.orig/include/linux/sched.h
+++ linux-2.6.23/include/linux/sched.h
@@ -432,10 +432,11 @@ struct mm_struct {
/* aio bits */
rwlock_t ioctx_list_lock;
struct kioctx *ioctx_list;
/* store ref to file /proc/<pid>/exe symlink points to */
+ spinlock_t exe_file_lock;
struct file *exe_file;
};
struct sighand_struct {
atomic_t count;
Index: linux-2.6.23/mm/mmap.c
===================================================================
--- linux-2.6.23.orig/mm/mmap.c
+++ linux-2.6.23/mm/mmap.c
@@ -1706,27 +1706,27 @@ find_extend_vma(struct mm_struct * mm, u
* reference; only puts old ones */
void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
{
struct file *old_exe_file;
- down_write(&mm->mmap_sem);
+ spin_lock(&mm->exe_file_lock);
old_exe_file = mm->exe_file;
mm->exe_file = new_exe_file;
- up_write(&mm->mmap_sem);
+ spin_unlock(&mm->exe_file_lock);
if (old_exe_file)
fput(old_exe_file);
}
struct file *get_mm_exe_file(struct mm_struct *mm)
{
struct file *exe_file;
- down_read(&mm->mmap_sem);
+ spin_lock(&mm->exe_file_lock);
exe_file = mm->exe_file;
if (exe_file)
get_file(exe_file);
- up_read(&mm->mmap_sem);
+ spin_unlock(&mm->exe_file_lock);
return exe_file;
}
#endif
/*
@@ -1744,14 +1744,19 @@ static void remove_vma_list(struct mm_st
mm->total_vm -= nrpages;
if (vma->vm_flags & VM_LOCKED)
mm->locked_vm -= nrpages;
vm_stat_account(mm, vma->vm_flags, vma->vm_file, -nrpages);
+ spin_lock(&mm->exe_file_lock);
if (mm->exe_file && (vma->vm_file == mm->exe_file)) {
- fput(mm->exe_file);
+ struct file *old_exe_file = mm->exe_file;
+
mm->exe_file = NULL;
- }
+ spin_unlock(&mm->exe_file_lock);
+ fput(old_exe_file);
+ } else
+ spin_unlock(&mm->exe_file_lock);
vma = remove_vma(vma);
} while (vma);
validate_mm(mm);
}
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC][PATCH 3/3] [RFC][PATCH] Make /proc/pid/exe symlink changeable
2007-11-01 3:35 [RFC][PATCH 0/3] Procfs Task exe Symlinks Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 2/3] [RFC][PATCH] Add spinlock in mm to protext exe reference Matt Helsley
@ 2007-11-01 3:35 ` Matt Helsley
2 siblings, 0 replies; 6+ messages in thread
From: Matt Helsley @ 2007-11-01 3:35 UTC (permalink / raw)
To: Linux-Kernel, linux-mm; +Cc: Andrew Morton, Al Viro, Dave Hansen
[-- Attachment #1: writeable_proc_pid_exe --]
[-- Type: text/plain, Size: 2745 bytes --]
This patch makes the /proc/<pid>|self/exe symlink writeable. This
functionality could be useful to potential checkpoint/restart implementations
restarting Java VMs, for example. Java uses this symlink to locate
JAVAHOME so any restarted Java program requires that /proc/self/exe points to
the jvm and not the restart exectuable.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
fs/proc/base.c | 37 ++++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
Index: linux-2.6.23/fs/proc/base.c
===================================================================
--- linux-2.6.23.orig/fs/proc/base.c
+++ linux-2.6.23/fs/proc/base.c
@@ -930,10 +930,37 @@ static const struct file_operations proc
.release = single_release,
};
#endif
+static int proc_pid_exe_symlink(struct inode *inode, struct dentry *dentry,
+ const char *path)
+{
+ struct file *new_exe_file;
+ struct task_struct *task;
+ struct mm_struct *mm;
+ int error;
+
+ if (!proc_fd_access_allowed(dentry->d_inode))
+ return -EACCES;
+ task = get_proc_task(inode);
+ if (!task)
+ return -ENOENT;
+ mm = get_task_mm(task);
+ put_task_struct(task);
+ if (!mm)
+ return -ENOENT;
+ new_exe_file = open_exec(path);
+ error = PTR_ERR(new_exe_file);
+ if (!IS_ERR(error)) {
+ set_mm_exe_file(mm, new_exe_file);
+ error = 0;
+ }
+ mmput(mm);
+ return error;
+}
+
static int proc_exe_link(struct inode *inode, struct dentry **dentry,
struct vfsmount **mnt)
{
struct task_struct *task;
struct mm_struct *mm;
@@ -1027,10 +1054,16 @@ static const struct inode_operations pro
.readlink = proc_pid_readlink,
.follow_link = proc_pid_follow_link,
.setattr = proc_setattr,
};
+static const struct inode_operations proc_pid_exe_inode_operations = {
+ .readlink = proc_pid_readlink,
+ .follow_link = proc_pid_follow_link,
+ .symlink = proc_pid_exe_symlink,
+ .setattr = proc_setattr,
+};
/* building an inode */
static int task_dumpable(struct task_struct *task)
{
@@ -2087,11 +2120,13 @@ static const struct pid_entry tgid_base_
REG("numa_maps", S_IRUGO, numa_maps),
#endif
REG("mem", S_IRUSR|S_IWUSR, mem),
LNK("cwd", cwd),
LNK("root", root),
- LNK("exe", exe),
+ NOD("exe", (S_IFLNK|S_IRWXUGO),
+ &proc_pid_exe_inode_operations, NULL,
+ { .proc_get_link = &proc_exe_link }),
REG("mounts", S_IRUGO, mounts),
REG("mountstats", S_IRUSR, mountstats),
#ifdef CONFIG_MMU
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps", S_IRUGO, smaps),
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks
2007-11-01 3:35 ` [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks Matt Helsley
@ 2007-11-01 19:25 ` Andrew Morton
2007-11-01 19:52 ` Matt Helsley
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2007-11-01 19:25 UTC (permalink / raw)
To: Matt Helsley; +Cc: linux-kernel, linux-mm, viro, haveblue
On Wed, 31 Oct 2007 20:35:09 -0700
Matt Helsley <matthltc@us.ibm.com> wrote:
> +++ linux-2.6.23/include/linux/sched.h
> @@ -430,10 +430,13 @@ struct mm_struct {
> struct completion *core_startup_done, core_done;
>
> /* aio bits */
> rwlock_t ioctx_list_lock;
> struct kioctx *ioctx_list;
> +
> + /* store ref to file /proc/<pid>/exe symlink points to */
> + struct file *exe_file;
> };
I guess with a little work this could be made conditional on
CONFIG_PROC_FS. ie: change get_mm_exe_file() to
void get_mm_exe_file(struct mm_struct *newmm, struct mm_struct *old_mm);
> @@ -1716,10 +1744,14 @@ static void remove_vma_list(struct mm_st
>
> mm->total_vm -= nrpages;
> if (vma->vm_flags & VM_LOCKED)
> mm->locked_vm -= nrpages;
> vm_stat_account(mm, vma->vm_flags, vma->vm_file, -nrpages);
> + if (mm->exe_file && (vma->vm_file == mm->exe_file)) {
> + fput(mm->exe_file);
> + mm->exe_file = NULL;
> + }
> vma = remove_vma(vma);
> } while (vma);
> validate_mm(mm);
> }
hm, fput() while holding mmap_sem. I wonder if that's a problem.
I assume you've runtime tested this with lockdep enabled, but fput() is one
of those funny things which can call all sorts of things which one least
expects and where testers hit things which developers don't.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks
2007-11-01 19:25 ` Andrew Morton
@ 2007-11-01 19:52 ` Matt Helsley
0 siblings, 0 replies; 6+ messages in thread
From: Matt Helsley @ 2007-11-01 19:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, viro, haveblue
On Thu, 2007-11-01 at 12:25 -0700, Andrew Morton wrote:
> On Wed, 31 Oct 2007 20:35:09 -0700
> Matt Helsley <matthltc@us.ibm.com> wrote:
>
> > +++ linux-2.6.23/include/linux/sched.h
> > @@ -430,10 +430,13 @@ struct mm_struct {
> > struct completion *core_startup_done, core_done;
> >
> > /* aio bits */
> > rwlock_t ioctx_list_lock;
> > struct kioctx *ioctx_list;
> > +
> > + /* store ref to file /proc/<pid>/exe symlink points to */
> > + struct file *exe_file;
> > };
>
> I guess with a little work this could be made conditional on
> CONFIG_PROC_FS. ie: change get_mm_exe_file() to
Sorry, I thought ifdefs were generally frowned upon in structs. I'll add
them here and make sure everthing still works properly.
> void get_mm_exe_file(struct mm_struct *newmm, struct mm_struct *old_mm);
>
> > @@ -1716,10 +1744,14 @@ static void remove_vma_list(struct mm_st
> >
> > mm->total_vm -= nrpages;
> > if (vma->vm_flags & VM_LOCKED)
> > mm->locked_vm -= nrpages;
> > vm_stat_account(mm, vma->vm_flags, vma->vm_file, -nrpages);
> > + if (mm->exe_file && (vma->vm_file == mm->exe_file)) {
> > + fput(mm->exe_file);
> > + mm->exe_file = NULL;
> > + }
> > vma = remove_vma(vma);
> > } while (vma);
> > validate_mm(mm);
> > }
>
> hm, fput() while holding mmap_sem. I wonder if that's a problem.
>
> I assume you've runtime tested this with lockdep enabled, but fput() is one
> of those funny things which can call all sorts of things which one least
> expects and where testers hit things which developers don't.
It's being used under the mmap semaphore in remove_vma() so, even if
fput() with mmap_sem held is a bug, I'm not introducing any new bugs :).
Thanks,
-Matt Helsley
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-11-01 19:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-01 3:35 [RFC][PATCH 0/3] Procfs Task exe Symlinks Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 1/3] [RFC][PATCH] Fix procfs task exe symlinks Matt Helsley
2007-11-01 19:25 ` Andrew Morton
2007-11-01 19:52 ` Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 2/3] [RFC][PATCH] Add spinlock in mm to protext exe reference Matt Helsley
2007-11-01 3:35 ` [RFC][PATCH 3/3] [RFC][PATCH] Make /proc/pid/exe symlink changeable Matt Helsley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox