* MAP_NOZERO revisited
@ 2012-01-05 0:37 Arun Sharma
2012-01-05 7:23 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 5+ messages in thread
From: Arun Sharma @ 2012-01-05 0:37 UTC (permalink / raw)
To: linux-mm; +Cc: Davide Libenzi, Johannes Weiner, Balbir Singh
A few years ago, Davide posted patches to address clear_page() showing
up high in the kernel profiles.
http://thread.gmane.org/gmane.linux.kernel/548928
With malloc implementations that try to conserve the RSS by madvising
away unused pages that are dirty (i.e. faulted in), we pay a high cost
in clear_page() if that page is needed later by the same process.
Now that we have memcgs with their own LRU lists, I was thinking of a
MAP_NOZERO implementation that tries to avoid zero'ing the page if it's
coming from the same memcg.
This will probably need an extra PCG_* flag maintaining state about
whether the page was moved between memcgs since last use.
Security implications: this is not as good as the UID based checks in
Davide's implementation, so should probably be an opt-in instead of
being enabled by default.
Comments?
-Arun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MAP_NOZERO revisited
2012-01-05 0:37 MAP_NOZERO revisited Arun Sharma
@ 2012-01-05 7:23 ` KAMEZAWA Hiroyuki
2012-01-11 18:50 ` MAP_UNINITIALIZED (Was Re: MAP_NOZERO revisited) Arun Sharma
0 siblings, 1 reply; 5+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-05 7:23 UTC (permalink / raw)
To: Arun Sharma; +Cc: linux-mm, Davide Libenzi, Johannes Weiner, Balbir Singh
On Wed, 4 Jan 2012 16:37:13 -0800
Arun Sharma <asharma@fb.com> wrote:
>
> A few years ago, Davide posted patches to address clear_page() showing
> up high in the kernel profiles.
>
> http://thread.gmane.org/gmane.linux.kernel/548928
>
> With malloc implementations that try to conserve the RSS by madvising
> away unused pages that are dirty (i.e. faulted in), we pay a high cost
> in clear_page() if that page is needed later by the same process.
>
> Now that we have memcgs with their own LRU lists, I was thinking of a
> MAP_NOZERO implementation that tries to avoid zero'ing the page if it's
> coming from the same memcg.
>
> This will probably need an extra PCG_* flag maintaining state about
> whether the page was moved between memcgs since last use.
>
When pages are freed, it goes back to global page allocator.
memcg has no page allocator hooks for alloc/free.
We, memcg guys, tries to reduce size of page_cgroup remove page_cgroup->flags.
And finally want to integrate it to struct 'page'.
So, I don't like your idea very much.
please find another way.
> Security implications: this is not as good as the UID based checks in
> Davide's implementation, so should probably be an opt-in instead of
> being enabled by default.
>
I think you need an another page allocator as hugetlb.c does and need to
maintain 'page pool'.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* MAP_UNINITIALIZED (Was Re: MAP_NOZERO revisited)
2012-01-05 7:23 ` KAMEZAWA Hiroyuki
@ 2012-01-11 18:50 ` Arun Sharma
2012-01-12 5:10 ` Balbir Singh
0 siblings, 1 reply; 5+ messages in thread
From: Arun Sharma @ 2012-01-11 18:50 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki
Cc: Arun Sharma, linux-mm, Davide Libenzi, Johannes Weiner, Balbir Singh
On Thu, Jan 05, 2012 at 04:23:11PM +0900, KAMEZAWA Hiroyuki wrote:
> When pages are freed, it goes back to global page allocator.
> memcg has no page allocator hooks for alloc/free.
I missed this part. Thanks for reminding me.
> We, memcg guys, tries to reduce size of page_cgroup remove page_cgroup->flags.
> And finally want to integrate it to struct 'page'.
> So, I don't like your idea very much.
> please find another way.
Thinking a bit more, it may be possible to implement this without
page_cgroup->flags using mm_match_cgroup(current->mm, page->mem_cgroup).
>
> > Security implications: this is not as good as the UID based checks in
> > Davide's implementation, so should probably be an opt-in instead of
> > being enabled by default.
> >
>
> I think you need an another page allocator as hugetlb.c does and need to
> maintain 'page pool'.
That sounds like a bigger change. All I need is a way of computing
"was this page previously mapped into the current cgroup?"
without affecting allocator performance. I'm thinking this more relaxed
check is sufficient for many real world use cases.
I also realized that I could use MAP_UNINITIALIZED for this purpose.
Attached is a completely insecure patch, which may be interesting for
embedded use cases on CPUs with MMU.
Yeah, the VM_SAO hack is ugly. Any better suggestions?
-Arun
commit 37b83f3fb77a177a2f81ebb8aeaec28c2a46e503
Author: Arun Sharma <asharma@fb.com>
Date: Tue Jan 10 17:02:46 2012 -0800
mm: Enable MAP_UNINITIALIZED for archs with mmu
This enables malloc optimizations where we might
madvise(..,MADV_DONTNEED) a page only to fault it
back at a different virtual address.
Signed-off-by: Arun Sharma <asharma@fb.com>
diff --git a/include/asm-generic/mman-common.h b/include/asm-generic/mman-common.h
index 787abbb..71e079f 100644
--- a/include/asm-generic/mman-common.h
+++ b/include/asm-generic/mman-common.h
@@ -19,11 +19,7 @@
#define MAP_TYPE 0x0f /* Mask for type of mapping */
#define MAP_FIXED 0x10 /* Interpret addr exactly */
#define MAP_ANONYMOUS 0x20 /* don't use a file */
-#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
-# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
-#else
-# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */
-#endif
+#define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
#define MS_ASYNC 1 /* sync memory asynchronously */
#define MS_INVALIDATE 2 /* invalidate the caches */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 3a93f73..04d838e 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -156,6 +156,11 @@ __alloc_zeroed_user_highpage(gfp_t movableflags,
struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags,
vma, vaddr);
+#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
+ if (!vma->vm_file && vma->vm_flags & VM_UNINITIALIZED)
+ return page;
+#endif
+
if (page)
clear_user_highpage(page, vaddr);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4baadd1..6345c57 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -118,6 +118,8 @@ extern unsigned int kobjsize(const void *objp);
#define VM_SAO 0x20000000 /* Strong Access Ordering (powerpc) */
#define VM_PFN_AT_MMAP 0x40000000 /* PFNMAP vma that is fully mapped at mmap time */
#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+#define VM_UNINITIALIZED VM_SAO /* Steal a powerpc bit for now, since we're out
+ bits for 32 bit archs */
/* Bits set in the VMA until the stack is in its final location */
#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 51647b4..f7d4f60 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -88,6 +88,7 @@ calc_vm_flag_bits(unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) |
_calc_vm_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE) |
+ _calc_vm_trans(flags, MAP_UNINITIALIZED, VM_UNINITIALIZED) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED );
}
#endif /* __KERNEL__ */
diff --git a/init/Kconfig b/init/Kconfig
index 43298f9..428e047 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1259,7 +1259,7 @@ endchoice
config MMAP_ALLOW_UNINITIALIZED
bool "Allow mmapped anonymous memory to be uninitialized"
- depends on EXPERT && !MMU
+ depends on EXPERT
default n
help
Normally, and according to the Linux spec, anonymous memory obtained
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index c3fdbcb..e6dd642 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1868,6 +1868,12 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
put_mems_allowed();
return page;
}
+
+#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
+ if (!vma->vm_file && vma->vm_flags & VM_UNINITIALIZED)
+ gfp &= ~__GFP_ZERO;
+#endif
+
/*
* fast path: default or task policy
*/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MAP_UNINITIALIZED (Was Re: MAP_NOZERO revisited)
2012-01-11 18:50 ` MAP_UNINITIALIZED (Was Re: MAP_NOZERO revisited) Arun Sharma
@ 2012-01-12 5:10 ` Balbir Singh
2012-01-12 18:16 ` Arun Sharma
0 siblings, 1 reply; 5+ messages in thread
From: Balbir Singh @ 2012-01-12 5:10 UTC (permalink / raw)
To: Arun Sharma; +Cc: KAMEZAWA Hiroyuki, linux-mm, Davide Libenzi, Johannes Weiner
> commit 37b83f3fb77a177a2f81ebb8aeaec28c2a46e503
> Author: Arun Sharma <asharma@fb.com>
> Date: Tue Jan 10 17:02:46 2012 -0800
>
> mm: Enable MAP_UNINITIALIZED for archs with mmu
>
> This enables malloc optimizations where we might
> madvise(..,MADV_DONTNEED) a page only to fault it
> back at a different virtual address.
>
> Signed-off-by: Arun Sharma <asharma@fb.com>
>
> diff --git a/include/asm-generic/mman-common.h b/include/asm-generic/mman-common.h
> index 787abbb..71e079f 100644
> --- a/include/asm-generic/mman-common.h
> +++ b/include/asm-generic/mman-common.h
> @@ -19,11 +19,7 @@
> #define MAP_TYPE 0x0f /* Mask for type of mapping */
> #define MAP_FIXED 0x10 /* Interpret addr exactly */
> #define MAP_ANONYMOUS 0x20 /* don't use a file */
> -#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
> -# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
> -#else
> -# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */
> -#endif
> +#define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */
>
Define MAP_UNINITIALIZED - are you referring to not zeroing out pages
before handing them down? Is this safe even between threads.
> #define MS_ASYNC 1 /* sync memory asynchronously */
> #define MS_INVALIDATE 2 /* invalidate the caches */
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 3a93f73..04d838e 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -156,6 +156,11 @@ __alloc_zeroed_user_highpage(gfp_t movableflags,
> struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags,
> vma, vaddr);
>
> +#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
> + if (!vma->vm_file && vma->vm_flags & VM_UNINITIALIZED)
> + return page;
> +#endif
> +
> if (page)
> clear_user_highpage(page, vaddr);
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4baadd1..6345c57 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -118,6 +118,8 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_SAO 0x20000000 /* Strong Access Ordering (powerpc) */
> #define VM_PFN_AT_MMAP 0x40000000 /* PFNMAP vma that is fully mapped at mmap time */
> #define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
> +#define VM_UNINITIALIZED VM_SAO /* Steal a powerpc bit for now, since we're out
> + bits for 32 bit archs */
Without proper checks if it can be re-used?
>
> /* Bits set in the VMA until the stack is in its final location */
> #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 51647b4..f7d4f60 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -88,6 +88,7 @@ calc_vm_flag_bits(unsigned long flags)
> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> _calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) |
> _calc_vm_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE) |
> + _calc_vm_trans(flags, MAP_UNINITIALIZED, VM_UNINITIALIZED) |
> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED );
> }
> #endif /* __KERNEL__ */
> diff --git a/init/Kconfig b/init/Kconfig
> index 43298f9..428e047 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1259,7 +1259,7 @@ endchoice
>
> config MMAP_ALLOW_UNINITIALIZED
> bool "Allow mmapped anonymous memory to be uninitialized"
> - depends on EXPERT && !MMU
> + depends on EXPERT
> default n
> help
> Normally, and according to the Linux spec, anonymous memory obtained
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index c3fdbcb..e6dd642 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1868,6 +1868,12 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
> put_mems_allowed();
> return page;
> }
> +
> +#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
> + if (!vma->vm_file && vma->vm_flags & VM_UNINITIALIZED)
> + gfp &= ~__GFP_ZERO;
> +#endif
> +
> /*
> * fast path: default or task policy
> */
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MAP_UNINITIALIZED (Was Re: MAP_NOZERO revisited)
2012-01-12 5:10 ` Balbir Singh
@ 2012-01-12 18:16 ` Arun Sharma
0 siblings, 0 replies; 5+ messages in thread
From: Arun Sharma @ 2012-01-12 18:16 UTC (permalink / raw)
To: Balbir Singh; +Cc: KAMEZAWA Hiroyuki, linux-mm, Davide Libenzi, Johannes Weiner
On 1/11/12 9:10 PM, Balbir Singh wrote:
>
> Define MAP_UNINITIALIZED - are you referring to not zeroing out pages
> before handing them down? Is this safe even between threads.
>
If it doesn't work for an app, it shouldn't be asking for this behavior
via an mmap flag?
Only calloc() specifies that the returned memory will be zero'ed. There
is no such guarantee for malloc().
>> +#define VM_UNINITIALIZED VM_SAO /* Steal a powerpc bit for now, since we're out
>> + bits for 32 bit archs */
>
> Without proper checks if it can be re-used?
Yeah - this is a complete hack. I'm trying to convince people that this
is a viable idea, before asking for a vm_flags bit.
Microbenchmark data:
# time -p ./test2
real 7.60
user 0.78
sys 6.81
# time -p ./test2 xx
real 4.40
user 0.78
sys 3.62
# cat test2.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <stdint.h>
#define MMAP_SIZE (20 * 1024 * 1024)
#define PAGE_SIZE 4096
#define MAP_UNINITIALIZED 0x4000000
main(int argc, char *argv[])
{
void *addr, *naddr;
char *p, *end, val;
int flag = 0;
int i;
if (argc > 1) {
flag = MAP_UNINITIALIZED;
}
addr = mmap(NULL, MMAP_SIZE, PROT_READ|PROT_WRITE,
flag | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
exit(-1);
}
end = (char *) addr + MMAP_SIZE;
for (i = 0; i < 1000; i++) {
int ret;
ret = madvise(addr, MMAP_SIZE, MADV_DONTNEED);
if (ret == -1)
perror("madvise");
for (p = (char *) addr; p < end; p += PAGE_SIZE) {
*p = 0xAB;
}
}
}
-Arun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-01-12 18:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-05 0:37 MAP_NOZERO revisited Arun Sharma
2012-01-05 7:23 ` KAMEZAWA Hiroyuki
2012-01-11 18:50 ` MAP_UNINITIALIZED (Was Re: MAP_NOZERO revisited) Arun Sharma
2012-01-12 5:10 ` Balbir Singh
2012-01-12 18:16 ` Arun Sharma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox