From: Andrew Morton <akpm@linux-foundation.org>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>,
riel@redhat.com, hugh@veritas.com,
kosaki.motohiro@jp.fujitsu.com, linux-abi@vger.kernel.org
Subject: Re: [PATCH] - support inheritance of mlocks across fork/exec V2
Date: Sat, 6 Dec 2008 22:07:29 -0800 [thread overview]
Message-ID: <20081206220729.042a926e.akpm@linux-foundation.org> (raw)
In-Reply-To: <1228331069.6693.73.camel@lts-notebook>
(cc linux-abi. Again.)
On Wed, 03 Dec 2008 14:04:29 -0500 Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
>
>
> Against; 2.6.28-rc7-mmotm-081203
>
> V02: rework vetting of flags argument as suggested by
> Kosaki Motohiro.
> enhance description as requested by Andrew Morton.
>
> Add support for mlockall(MCL_INHERIT|MCL_RECURSIVE):
>
> MCL_CURRENT[|MCL_FUTURE]|MCL_INHERIT - inherit memory locks
> [vmas' VM_LOCKED flags] across fork(), and inherit
> MCL_FUTURE behavior [mm's def_flags] across fork()
> and exec(). Behaves as if child and/or new task
> called mlockall(MCL_CURRENT|MCL_FUTURE) as first
> instruction.
>
> MCL_RECURSIVE - inherit MCL_CURRENT|MCL_FUTURE|MCL_INHERIT
> [vmas' VM_LOCKED flags for fork() and mm's def_flags
> and mcl_inherit across fork() and exec()] for all
> future generations of calling process's descendants.
> Behaves as if child and/or new task called
> mlockall(MCL_CURRENT|MCL_FUTURE|MCL_INHERIT|MCL_RECURSIVE)
> as the first instruction.
>
> In support of a "lock prefix command"--e.g., mlock <cmd> <args> ...
> Analogous to taskset(1) for cpu affinity or numactl(8) for numa memory
> policy.
>
> Together with patches to keep mlocked pages off the LRU, this will
> allow users/admins to lock down applications without modifying them,
> if their RLIMIT_MEMLOCK is sufficiently large, keeping their pages
> off the LRU and out of consideration for reclaim.
>
> Potentially useful, as well, in real-time environments to force
> prefaulting and residency for applications that don't mlock themselves.
>
> Jeff Sharkey at Montana State developed a similar patch for Linux
> [link no longer accessible], but apparently he never submitted the patch.
>
> I submitted an earlier version of this patch around a year ago. I
> resurrected it to test the unevictable lru/mlocked pages patches--
> e.g., by "mlock -r make -j<N*nr_cpus> all". This did shake out a few
> races and vmstat accounting bugs, but NOT something I'd recommend as
> general practice--for kernel builds, that is.
>
> ----
>
> Define MCL_INHERIT, MCL_RECURSIVE in <asm-*/mman.h>.
> + x86 and ia64 versions included.
> + other arch can/will be created, if this patch deemed merge-worthy.
>
> Similarly, I'll provide kernel man page update if/when needed.
>
> Example "lock prefix command" in Documentation/vm/mlock.c
>
>
> ...
>
> --- linux-2.6.28-rc7-mmotm-081203.orig/mm/mlock.c 2008-12-03 10:33:11.000000000 -0500
> +++ linux-2.6.28-rc7-mmotm-081203/mm/mlock.c 2008-12-03 10:33:29.000000000 -0500
> @@ -573,15 +573,18 @@ asmlinkage long sys_munlock(unsigned lon
> static int do_mlockall(int flags)
> {
> struct vm_area_struct * vma, * prev = NULL;
> + struct mm_struct *mm = current->mm;
> unsigned int def_flags = 0;
>
> if (flags & MCL_FUTURE)
> - def_flags = VM_LOCKED;
> - current->mm->def_flags = def_flags;
> - if (flags == MCL_FUTURE)
> + def_flags = VM_LOCKED;;
You get paid by the semicolon?
> + mm->def_flags = def_flags;
> + if (flags & MCL_INHERIT)
> + mm->mcl_inherit = flags & (MCL_INHERIT | MCL_RECURSIVE);
aaaaaaarrrrrrggggggghhhhhhh!
So mm->mcl_inherit is _not_ a boolean meaning "this mm has MCL_INHERIT
set", as I naively believed when I saw it. Is it in fact a composite
of both MCL_INHERIT and MCL_RECURSIVE, given a badly misleading name
and a waffly comment.
Can we please make this clearer?
>
> ...
>
> --- linux-2.6.28-rc7-mmotm-081203.orig/kernel/fork.c 2008-12-03 10:18:15.000000000 -0500
> +++ linux-2.6.28-rc7-mmotm-081203/kernel/fork.c 2008-12-03 10:33:29.000000000 -0500
> @@ -278,7 +278,8 @@ static int dup_mmap(struct mm_struct *mm
> */
> down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING);
>
> - mm->locked_vm = 0;
> + if (!mm->mcl_inherit)
> + mm->locked_vm = 0;
> mm->mmap = NULL;
> mm->mmap_cache = NULL;
> mm->free_area_cache = oldmm->mmap_base;
> @@ -316,7 +317,8 @@ static int dup_mmap(struct mm_struct *mm
> if (IS_ERR(pol))
> goto fail_nomem_policy;
> vma_set_policy(tmp, pol);
> - tmp->vm_flags &= ~VM_LOCKED;
> + if (!mm->mcl_inherit)
> + tmp->vm_flags &= ~VM_LOCKED;
> tmp->vm_mm = mm;
> tmp->vm_next = NULL;
> anon_vma_link(tmp);
> @@ -406,6 +408,8 @@ __cacheline_aligned_in_smp DEFINE_SPINLO
>
> static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p)
> {
> + unsigned long def_flags = 0;
> +
> atomic_set(&mm->mm_users, 1);
> atomic_set(&mm->mm_count, 1);
> init_rwsem(&mm->mmap_sem);
> @@ -422,9 +426,14 @@ static struct mm_struct * mm_init(struct
> mm->free_area_cache = TASK_UNMAPPED_BASE;
> mm->cached_hole_size = ~0UL;
> mm_init_owner(mm, p);
> + if (current->mm && current->mm->mcl_inherit) {
> + def_flags = current->mm->def_flags & VM_LOCKED;
> + if (mm->mcl_inherit & MCL_RECURSIVE)
> + mm->mcl_inherit = current->mm->mcl_inherit;
> + }
hm. That code looks familiar. Not worth a helper function?
> if (likely(!mm_alloc_pgd(mm))) {
> - mm->def_flags = 0;
> + mm->def_flags = def_flags;
> mmu_notifier_mm_init(mm);
> return mm;
> }
> Index: linux-2.6.28-rc7-mmotm-081203/fs/binfmt_elf.c
> ===================================================================
> --- linux-2.6.28-rc7-mmotm-081203.orig/fs/binfmt_elf.c 2008-12-03 10:19:21.000000000 -0500
> +++ linux-2.6.28-rc7-mmotm-081203/fs/binfmt_elf.c 2008-12-03 10:33:29.000000000 -0500
> @@ -585,6 +585,7 @@ static int load_elf_binary(struct linux_
> unsigned long reloc_func_desc = 0;
> int executable_stack = EXSTACK_DEFAULT;
> unsigned long def_flags = 0;
> + int mcl_inherit = 0;
> struct {
> struct elfhdr elf_ex;
> struct elfhdr interp_elf_ex;
> @@ -749,6 +750,13 @@ static int load_elf_binary(struct linux_
> SET_PERSONALITY(loc->elf_ex);
> }
>
> + /* Optionally inherit MCL_FUTURE state before destroying old mm */
> + if (current->mm && current->mm->mcl_inherit) {
umm, OK, it looks like current->mm can be null here.
> + def_flags = current->mm->def_flags & VM_LOCKED;
ooh, we finally used local variable def_flags for something. That's
been wasting space since 2.2.26 (or earlier).
> + if (current->mm->mcl_inherit & MCL_RECURSIVE)
> + mcl_inherit = current->mm->mcl_inherit;
> + }
> +
> /* Flush all traces of the currently running executable */
> retval = flush_old_exec(bprm);
> if (retval)
> @@ -757,6 +765,7 @@ static int load_elf_binary(struct linux_
> /* OK, This is the point of no return */
> current->flags &= ~PF_FORKNOEXEC;
> current->mm->def_flags = def_flags;
> + current->mm->mcl_inherit = mcl_inherit;
>
> /* Do this immediately, since STACK_TOP as used in setup_arg_pages
> may depend on the personality. */
Why is this done in binfmt_elf.c? Doesn't this preclude the operation
of these new flags for other binary formats?
> Index: linux-2.6.28-rc7-mmotm-081203/arch/x86/include/asm/mman.h
> ===================================================================
> --- linux-2.6.28-rc7-mmotm-081203.orig/arch/x86/include/asm/mman.h 2008-12-03 10:16:26.000000000 -0500
> +++ linux-2.6.28-rc7-mmotm-081203/arch/x86/include/asm/mman.h 2008-12-03 10:33:29.000000000 -0500
> @@ -16,5 +16,8 @@
>
> #define MCL_CURRENT 1 /* lock all current mappings */
> #define MCL_FUTURE 2 /* lock all future mappings */
> +#define MCL_INHERIT 4 /* inherit mlocks across fork */
> + /* inherit '_FUTURE flag across fork/exec */
> +#define MCL_RECURSIVE 8 /* inherit mlocks recursively */
>
> #endif /* _ASM_X86_MMAN_H */
> Index: linux-2.6.28-rc7-mmotm-081203/include/linux/mm_types.h
> ===================================================================
> --- linux-2.6.28-rc7-mmotm-081203.orig/include/linux/mm_types.h 2008-12-03 10:18:01.000000000 -0500
> +++ linux-2.6.28-rc7-mmotm-081203/include/linux/mm_types.h 2008-12-03 10:33:29.000000000 -0500
> @@ -235,6 +235,8 @@ struct mm_struct {
> unsigned int token_priority;
> unsigned int last_interval;
>
> + int mcl_inherit; /* inherit current/future locks */
> +
> unsigned long flags; /* Must use atomic bitops to access the bits */
I was thinking that the boolean mcl_inherit could become a bit in
mm.flags. That was before I unmisled myself about it.
Perhaps we could use two bits. That might get a bit nasty.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-07 6:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-24 21:21 [PATCH/RFC] - support inheritance of mlocks across fork/exec Lee Schermerhorn
2008-11-25 4:05 ` KOSAKI Motohiro
2008-11-25 16:21 ` Rik van Riel
2008-11-25 23:26 ` Andrew Morton
2008-12-03 19:04 ` [PATCH] - support inheritance of mlocks across fork/exec V2 Lee Schermerhorn
2008-12-04 1:57 ` KOSAKI Motohiro
2008-12-07 6:07 ` Andrew Morton [this message]
2008-12-08 15:01 ` Lee Schermerhorn
2008-12-08 21:05 ` Lee Schermerhorn
2008-12-08 21:33 ` Matt Mackall
2008-12-09 19:40 ` Lee Schermerhorn
2008-12-09 20:41 ` Matt Mackall
2009-06-05 4:39 ` Jon Masters
2009-06-05 4:49 ` KOSAKI Motohiro
2009-06-05 5:12 ` Jon Masters
2008-11-26 8:37 ` [PATCH/RFC] - support inheritance of mlocks across fork/exec KOSAKI Motohiro
2008-11-29 22:38 ` Lee Schermerhorn
2008-11-30 5:21 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081206220729.042a926e.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=Lee.Schermerhorn@hp.com \
--cc=hugh@veritas.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-abi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox