* [PATCH] lazy freeing of memory through MADV_FREE
@ 2007-04-17 7:15 Rik van Riel
2007-04-19 21:15 ` [PATCH] lazy freeing of memory through MADV_FREE 2/2 Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-17 7:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]
Make it possible for applications to have the kernel free memory
lazily. This reduces a repeated free/malloc cycle from freeing
pages and allocating them, to just marking them freeable. If the
application wants to reuse them before the kernel needs the memory,
not even a page fault will happen.
This patch, together with Ulrich's glibc change, increases
MySQL sysbench performance by a factor of 2 on my quad core
test system.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
Ulrich Drepper has test glibc RPMS for this functionality at:
http://people.redhat.com/drepper/rpms
Andrew, I have stress tested this patch for a few days now and
have not been able to find any more bugs. I believe it is ready
to be merged in -mm, and upstream at the next merge window.
When the patch goes upstream, I will submit a small follow-up
patch to revert MADV_DONTNEED behaviour to what it did previously
and have the new behaviour trigger only on MADV_FREE: at that
point people will have to get new test RPMs of glibc.
[-- Attachment #2: linux-2.6.21-rc6-mm1-madv_free.patch --]
[-- Type: text/x-patch, Size: 11514 bytes --]
--- linux-2.6.21-rc6-mm1/include/asm-parisc/mman.h.madv_free 2007-04-17 02:17:19.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/asm-parisc/mman.h 2007-04-17 02:22:46.000000000 -0400
@@ -38,6 +38,7 @@
#define MADV_SPACEAVAIL 5 /* insure that resources are reserved */
#define MADV_VPS_PURGE 6 /* Purge pages from VM page cache */
#define MADV_VPS_INHERIT 7 /* Inherit parents page size */
+#define MADV_FREE 8 /* don't need the pages or the data */
/* common/generic parameters */
#define MADV_REMOVE 9 /* remove these pages & resources */
--- linux-2.6.21-rc6-mm1/include/asm-mips/mman.h.madv_free 2007-04-17 02:17:19.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/asm-mips/mman.h 2007-04-17 02:22:46.000000000 -0400
@@ -65,6 +65,7 @@
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* don't need these pages */
+#define MADV_FREE 5 /* don't need the pages or the data */
/* common parameters: try to keep these consistent across architectures */
#define MADV_REMOVE 9 /* remove these pages & resources */
--- linux-2.6.21-rc6-mm1/include/asm-xtensa/mman.h.madv_free 2007-04-17 02:17:19.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/asm-xtensa/mman.h 2007-04-17 02:22:46.000000000 -0400
@@ -72,6 +72,7 @@
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* don't need these pages */
+#define MADV_FREE 5 /* don't need the pages or the data */
/* common parameters: try to keep these consistent across architectures */
#define MADV_REMOVE 9 /* remove these pages & resources */
--- linux-2.6.21-rc6-mm1/include/linux/swap.h.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/linux/swap.h 2007-04-17 02:22:46.000000000 -0400
@@ -182,6 +182,7 @@ extern void FASTCALL(lru_cache_add(struc
extern void FASTCALL(lru_cache_add_active(struct page *));
extern void FASTCALL(lru_cache_add_tail(struct page *));
extern void FASTCALL(activate_page(struct page *));
+extern void FASTCALL(deactivate_tail_page(struct page *));
extern void FASTCALL(mark_page_accessed(struct page *));
extern void lru_add_drain(void);
extern int lru_add_drain_all(void);
--- linux-2.6.21-rc6-mm1/include/linux/mm.h.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/linux/mm.h 2007-04-17 02:22:46.000000000 -0400
@@ -767,6 +767,7 @@ struct zap_details {
pgoff_t last_index; /* Highest page->index to unmap */
spinlock_t *i_mmap_lock; /* For unmap_mapping_range: */
unsigned long truncate_count; /* Compare vm_truncate_count */
+ short madv_free; /* MADV_FREE anonymous memory */
};
struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
--- linux-2.6.21-rc6-mm1/include/linux/page-flags.h.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/linux/page-flags.h 2007-04-17 02:23:16.000000000 -0400
@@ -91,6 +91,7 @@
#define PG_booked 20 /* Has blocks reserved on-disk */
#define PG_readahead 21 /* Reminder to do read-ahead */
+#define PG_lazyfree 22 /* MADV_FREE potential throwaway */
/* PG_owner_priv_1 users should have descriptive aliases */
#define PG_checked PG_owner_priv_1 /* Used by some filesystems */
@@ -216,6 +217,11 @@ static inline void SetPageUptodate(struc
#define ClearPageReclaim(page) clear_bit(PG_reclaim, &(page)->flags)
#define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags)
+#define PageLazyFree(page) test_bit(PG_lazyfree, &(page)->flags)
+#define SetPageLazyFree(page) set_bit(PG_lazyfree, &(page)->flags)
+#define ClearPageLazyFree(page) clear_bit(PG_lazyfree, &(page)->flags)
+#define __ClearPageLazyFree(page) __clear_bit(PG_lazyfree, &(page)->flags)
+
#define PageCompound(page) test_bit(PG_compound, &(page)->flags)
#define __SetPageCompound(page) __set_bit(PG_compound, &(page)->flags)
#define __ClearPageCompound(page) __clear_bit(PG_compound, &(page)->flags)
--- linux-2.6.21-rc6-mm1/include/asm-alpha/mman.h.madv_free 2007-04-17 02:17:19.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/asm-alpha/mman.h 2007-04-17 02:22:46.000000000 -0400
@@ -42,6 +42,7 @@
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_SPACEAVAIL 5 /* ensure resources are available */
#define MADV_DONTNEED 6 /* don't need these pages */
+#define MADV_FREE 7 /* don't need the pages or the data */
/* common/generic parameters */
#define MADV_REMOVE 9 /* remove these pages & resources */
--- linux-2.6.21-rc6-mm1/include/asm-generic/mman.h.madv_free 2007-04-17 02:17:19.000000000 -0400
+++ linux-2.6.21-rc6-mm1/include/asm-generic/mman.h 2007-04-17 02:22:46.000000000 -0400
@@ -29,6 +29,7 @@
#define MADV_SEQUENTIAL 2 /* expect sequential page references */
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* don't need these pages */
+#define MADV_FREE 5 /* don't need the pages or the data */
/* common parameters: try to keep these consistent across architectures */
#define MADV_REMOVE 9 /* remove these pages & resources */
--- linux-2.6.21-rc6-mm1/mm/memory.c.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/mm/memory.c 2007-04-17 02:22:46.000000000 -0400
@@ -432,6 +432,7 @@ copy_one_pte(struct mm_struct *dst_mm, s
unsigned long vm_flags = vma->vm_flags;
pte_t pte = *src_pte;
struct page *page;
+ int dirty = 0;
/* pte contains position in swap or file, so copy. */
if (unlikely(!pte_present(pte))) {
@@ -466,6 +467,7 @@ copy_one_pte(struct mm_struct *dst_mm, s
* in the parent and the child
*/
if (is_cow_mapping(vm_flags)) {
+ dirty = pte_dirty(pte);
ptep_set_wrprotect(src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
@@ -483,6 +485,8 @@ copy_one_pte(struct mm_struct *dst_mm, s
get_page(page);
page_dup_rmap(page, vma, addr);
rss[!!PageAnon(page)]++;
+ if (dirty && PageLazyFree(page))
+ ClearPageLazyFree(page);
}
out_set_pte:
@@ -661,6 +665,25 @@ static unsigned long zap_pte_range(struc
(page->index < details->first_index ||
page->index > details->last_index))
continue;
+
+ /*
+ * MADV_FREE is used to lazily recycle
+ * anon memory. The process no longer
+ * needs the data and wants to avoid IO.
+ */
+ if (details->madv_free && PageAnon(page)) {
+ if (unlikely(PageSwapCache(page)) &&
+ !TestSetPageLocked(page)) {
+ remove_exclusive_swap_page(page);
+ unlock_page(page);
+ }
+ ptep_clear_flush_dirty(vma, addr, pte);
+ ptep_clear_flush_young(vma, addr, pte);
+ SetPageLazyFree(page);
+ if (PageActive(page))
+ deactivate_tail_page(page);
+ continue;
+ }
}
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
@@ -689,7 +713,8 @@ static unsigned long zap_pte_range(struc
* If details->check_mapping, we leave swap entries;
* if details->nonlinear_vma, we leave file entries.
*/
- if (unlikely(details))
+ if (unlikely(details && (details->check_mapping ||
+ details->nonlinear_vma)))
continue;
if (!pte_file(ptent))
free_swap_and_cache(pte_to_swp_entry(ptent));
@@ -755,7 +780,8 @@ static unsigned long unmap_page_range(st
pgd_t *pgd;
unsigned long next;
- if (details && !details->check_mapping && !details->nonlinear_vma)
+ if (details && !details->check_mapping && !details->nonlinear_vma
+ && !details->madv_free)
details = NULL;
BUG_ON(addr >= end);
--- linux-2.6.21-rc6-mm1/mm/page_alloc.c.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/mm/page_alloc.c 2007-04-17 02:22:46.000000000 -0400
@@ -266,6 +266,7 @@ static void bad_page(struct page *page)
1 << PG_slab |
1 << PG_swapcache |
1 << PG_writeback |
+ 1 << PG_lazyfree |
1 << PG_buddy );
set_page_count(page, 0);
reset_page_mapcount(page);
@@ -514,6 +515,8 @@ static inline int free_pages_check(struc
bad_page(page);
if (PageDirty(page))
__ClearPageDirty(page);
+ if (PageLazyFree(page))
+ __ClearPageLazyFree(page);
/*
* For now, we report if PG_reserved was found set, but do not
* clear it, and do not free the page. But we shall soon need
@@ -661,6 +664,7 @@ static int prep_new_page(struct page *pa
1 << PG_swapcache |
1 << PG_writeback |
1 << PG_reserved |
+ 1 << PG_lazyfree |
1 << PG_buddy ))))
bad_page(page);
--- linux-2.6.21-rc6-mm1/mm/swap.c.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/mm/swap.c 2007-04-17 02:22:46.000000000 -0400
@@ -152,6 +152,20 @@ void fastcall activate_page(struct page
spin_unlock_irq(&zone->lru_lock);
}
+void fastcall deactivate_tail_page(struct page *page)
+{
+ struct zone *zone = page_zone(page);
+
+ spin_lock_irq(&zone->lru_lock);
+ if (PageLRU(page) && PageActive(page)) {
+ del_page_from_active_list(zone, page);
+ ClearPageActive(page);
+ add_page_to_inactive_list_tail(zone, page);
+ __count_vm_event(PGDEACTIVATE);
+ }
+ spin_unlock_irq(&zone->lru_lock);
+}
+
/*
* Mark a page as having seen activity.
*
--- linux-2.6.21-rc6-mm1/mm/vmscan.c.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/mm/vmscan.c 2007-04-17 02:22:46.000000000 -0400
@@ -460,6 +460,24 @@ static unsigned long shrink_page_list(st
sc->nr_scanned++;
+ /*
+ * MADV_DONTNEED pages get reclaimed lazily, unless the
+ * process reuses it before we get to it.
+ */
+ if (PageLazyFree(page)) {
+ switch (try_to_unmap(page, 0)) {
+ case SWAP_FAIL:
+ ClearPageLazyFree(page);
+ goto activate_locked;
+ case SWAP_AGAIN:
+ ClearPageLazyFree(page);
+ goto keep_locked;
+ case SWAP_SUCCESS:
+ ClearPageLazyFree(page);
+ goto free_it;
+ }
+ }
+
if (!sc->may_swap && page_mapped(page))
goto keep_locked;
--- linux-2.6.21-rc6-mm1/mm/madvise.c.madv_free 2007-04-17 02:17:20.000000000 -0400
+++ linux-2.6.21-rc6-mm1/mm/madvise.c 2007-04-17 02:22:46.000000000 -0400
@@ -142,8 +142,12 @@ static long madvise_dontneed(struct vm_a
.last_index = ULONG_MAX,
};
zap_page_range(vma, start, end - start, &details);
- } else
- zap_page_range(vma, start, end - start, NULL);
+ } else {
+ struct zap_details details = {
+ .madv_free = 1,
+ };
+ zap_page_range(vma, start, end - start, &details);
+ }
return 0;
}
@@ -215,7 +219,9 @@ madvise_vma(struct vm_area_struct *vma,
error = madvise_willneed(vma, prev, start, end);
break;
+ /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */
case MADV_DONTNEED:
+ case MADV_FREE:
error = madvise_dontneed(vma, prev, start, end);
break;
--- linux-2.6.21-rc6-mm1/mm/rmap.c.madv_free 2007-04-17 02:17:43.000000000 -0400
+++ linux-2.6.21-rc6-mm1/mm/rmap.c 2007-04-17 02:22:46.000000000 -0400
@@ -707,7 +707,17 @@ static int try_to_unmap_one(struct page
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
- if (PageAnon(page)) {
+ /* MADV_FREE is used to lazily free memory from userspace. */
+ if (PageLazyFree(page) && !migration) {
+ /* There is new data in the page. Reinstate it. */
+ if (unlikely(pte_dirty(pteval))) {
+ set_pte_at(mm, address, pte, pteval);
+ ret = SWAP_FAIL;
+ goto out_unmap;
+ }
+ /* Throw the page away. */
+ dec_mm_counter(mm, anon_rss);
+ } else if (PageAnon(page)) {
swp_entry_t entry = { .val = page_private(page) };
if (PageSwapCache(page)) {
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2
2007-04-17 7:15 [PATCH] lazy freeing of memory through MADV_FREE Rik van Riel
@ 2007-04-19 21:15 ` Rik van Riel
2007-04-20 21:03 ` Andrew Morton
2007-04-20 20:57 ` [PATCH] lazy freeing of memory through MADV_FREE Andrew Morton
2007-04-22 8:18 ` Andrew Morton
2 siblings, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2007-04-19 21:15 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Andrew Morton, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 459 bytes --]
Restore MADV_DONTNEED to its original Linux behaviour. This is still
not the same behaviour as POSIX, but applications may be depending on
the Linux behaviour already. Besides, glibc catches POSIX_MADV_DONTNEED
and makes sure nothing is done...
Signed-off-by: Rik van Riel <riel@redhat.com>
---
This is to be applied over of the original MADV_FREE patch.
It turns out that the current glibc patch already falls back
to MADV_DONTNEED if it gets an -EINVAL.
[-- Attachment #2: linux-2.6-madv-dontneed-restore.patch --]
[-- Type: text/x-patch, Size: 1317 bytes --]
--- linux-2.6.20.x86_64/mm/madvise.c.madv_free 2007-04-19 16:46:22.000000000 -0400
+++ linux-2.6.20.x86_64/mm/madvise.c 2007-04-19 16:52:19.000000000 -0400
@@ -130,7 +130,8 @@ static long madvise_willneed(struct vm_a
*/
static long madvise_dontneed(struct vm_area_struct * vma,
struct vm_area_struct ** prev,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end,
+ int behavior)
{
*prev = vma;
if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
@@ -142,12 +143,14 @@ static long madvise_dontneed(struct vm_a
.last_index = ULONG_MAX,
};
zap_page_range(vma, start, end - start, &details);
- } else {
+ } else if (behavior == MADV_FREE) {
struct zap_details details = {
.madv_free = 1,
};
zap_page_range(vma, start, end - start, &details);
- }
+ } else /* behavior == MADV_DONTNEED */
+ zap_page_range(vma, start, end - start, NULL);
+
return 0;
}
@@ -219,10 +222,9 @@ madvise_vma(struct vm_area_struct *vma,
error = madvise_willneed(vma, prev, start, end);
break;
- /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */
case MADV_DONTNEED:
case MADV_FREE:
- error = madvise_dontneed(vma, prev, start, end);
+ error = madvise_dontneed(vma, prev, start, end, behavior);
break;
default:
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-17 7:15 [PATCH] lazy freeing of memory through MADV_FREE Rik van Riel
2007-04-19 21:15 ` [PATCH] lazy freeing of memory through MADV_FREE 2/2 Rik van Riel
@ 2007-04-20 20:57 ` Andrew Morton
2007-04-20 21:38 ` Rik van Riel
2007-04-22 8:18 ` Andrew Morton
2 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2007-04-20 20:57 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-kernel, linux-mm
On Tue, 17 Apr 2007 03:15:51 -0400
Rik van Riel <riel@redhat.com> wrote:
> Make it possible for applications to have the kernel free memory
> lazily. This reduces a repeated free/malloc cycle from freeing
> pages and allocating them, to just marking them freeable. If the
> application wants to reuse them before the kernel needs the memory,
> not even a page fault will happen.
>
> This patch, together with Ulrich's glibc change, increases
> MySQL sysbench performance by a factor of 2 on my quad core
> test system.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
>
> ---
> Ulrich Drepper has test glibc RPMS for this functionality at:
>
> http://people.redhat.com/drepper/rpms
>
> Andrew, I have stress tested this patch for a few days now and
> have not been able to find any more bugs. I believe it is ready
> to be merged in -mm, and upstream at the next merge window.
>
> When the patch goes upstream, I will submit a small follow-up
> patch to revert MADV_DONTNEED behaviour to what it did previously
> and have the new behaviour trigger only on MADV_FREE: at that
> point people will have to get new test RPMs of glibc.
>
>
I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
- Nick's patch also will help this problem. It could be that your patch
no longer offers a 2x speedup when combined with Nick's patch.
It could well be that the combination of the two is even better, but it
would be nice to firm that up a bit. Chewing a page flag is an expensive
thing to do.
I do go on about that. But we're adding page flags at about one per
year, and when we run out we're screwed - we'll need to grow the
pageframe.
- I need to update your patch for Nick's patch. Please confirm that
down_read(mmap_sem) is sufficient for MADV_FREE.
Stylistic nit:
> + if (PageLazyFree(page) && !migration) {
> + /* There is new data in the page. Reinstate it. */
> + if (unlikely(pte_dirty(pteval))) {
> + set_pte_at(mm, address, pte, pteval);
> + ret = SWAP_FAIL;
> + goto out_unmap;
> + }
The comment should be inside the second `if' statement. As it is, It
looks like we reinstate the page if (PageLazyFree(page) && !migration).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2
2007-04-19 21:15 ` [PATCH] lazy freeing of memory through MADV_FREE 2/2 Rik van Riel
@ 2007-04-20 21:03 ` Andrew Morton
2007-04-20 21:24 ` Ulrich Drepper
0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2007-04-20 21:03 UTC (permalink / raw)
To: Rik van Riel; +Cc: Jakub Jelinek, linux-kernel, linux-mm
On Thu, 19 Apr 2007 17:15:28 -0400
Rik van Riel <riel@redhat.com> wrote:
> Restore MADV_DONTNEED to its original Linux behaviour. This is still
> not the same behaviour as POSIX, but applications may be depending on
> the Linux behaviour already. Besides, glibc catches POSIX_MADV_DONTNEED
> and makes sure nothing is done...
OK, we need to flesh this out a lot please. People often get confused
about what our MADV_DONTNEED behaviour is. I regularly forget, then look
at the code, then get it wrong. That's for mainline, let alone older
kernels whose behaviour is gawd-knows-what.
So... For the changelog (and the manpage) could we please have a full
description of the 2.6.21 behaviour and the 2.6.21-post-rik behaviour (and
the 2.4 behaviour, if it differs at all)? Also some code comments to
demystify all of this once and for all?
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2
2007-04-20 21:03 ` Andrew Morton
@ 2007-04-20 21:24 ` Ulrich Drepper
2007-04-21 7:37 ` Hugh Dickins
0 siblings, 1 reply; 43+ messages in thread
From: Ulrich Drepper @ 2007-04-20 21:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rik van Riel, Jakub Jelinek, linux-kernel, linux-mm
On 4/20/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> OK, we need to flesh this out a lot please. People often get confused
> about what our MADV_DONTNEED behaviour is.
Well, there's not really much to flesh out. The current MADV_DONTNEED
is useful in some situations. The behavior cannot be changed, even
glibc will rely on it for the case when MADV_FREE is not supported.
What might be nice to have is to have a POSIX-compliant
POSIX_MADV_DONTNEED implementation. We currently do nothing which is
OK since no test suite can detect that. But some code might want to
use the real behavior and we're missing an optimization possibility.
Just for reference: the MADV_CURRENT behavior is to throw away data in
the range. The POSIX_MADV_DONTNEED behavior is to never lose data.
I.e., file backed data is written back, anon data is at most swapped
out.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 20:57 ` [PATCH] lazy freeing of memory through MADV_FREE Andrew Morton
@ 2007-04-20 21:38 ` Rik van Riel
2007-04-20 22:06 ` Andrew Morton
2007-04-21 7:24 ` Hugh Dickins
0 siblings, 2 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-20 21:38 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
Andrew Morton wrote:
> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>
> - Nick's patch also will help this problem. It could be that your patch
> no longer offers a 2x speedup when combined with Nick's patch.
>
> It could well be that the combination of the two is even better, but it
> would be nice to firm that up a bit.
I'll test that.
> I do go on about that. But we're adding page flags at about one per
> year, and when we run out we're screwed - we'll need to grow the
> pageframe.
If you want, I can take a look at folding this into the
->mapping pointer. I can guarantee you it won't be
pretty, though :)
> - I need to update your patch for Nick's patch. Please confirm that
> down_read(mmap_sem) is sufficient for MADV_FREE.
It is. MADV_FREE needs no more protection than MADV_DONTNEED.
> Stylistic nit:
>
>> + if (PageLazyFree(page) && !migration) {
>> + /* There is new data in the page. Reinstate it. */
>> + if (unlikely(pte_dirty(pteval))) {
>> + set_pte_at(mm, address, pte, pteval);
>> + ret = SWAP_FAIL;
>> + goto out_unmap;
>> + }
>
> The comment should be inside the second `if' statement. As it is, It
> looks like we reinstate the page if (PageLazyFree(page) && !migration).
Want me to move it?
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 21:38 ` Rik van Riel
@ 2007-04-20 22:06 ` Andrew Morton
2007-04-20 23:52 ` Rik van Riel
2007-04-21 7:24 ` Hugh Dickins
1 sibling, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2007-04-20 22:06 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-kernel, linux-mm
On Fri, 20 Apr 2007 17:38:06 -0400
Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
>
> > I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
> >
> > - Nick's patch also will help this problem. It could be that your patch
> > no longer offers a 2x speedup when combined with Nick's patch.
> >
> > It could well be that the combination of the two is even better, but it
> > would be nice to firm that up a bit.
>
> I'll test that.
Thanks.
> > I do go on about that. But we're adding page flags at about one per
> > year, and when we run out we're screwed - we'll need to grow the
> > pageframe.
>
> If you want, I can take a look at folding this into the
> ->mapping pointer. I can guarantee you it won't be
> pretty, though :)
Well, let's see how fugly it ends up looking?
> > - I need to update your patch for Nick's patch. Please confirm that
> > down_read(mmap_sem) is sufficient for MADV_FREE.
>
> It is. MADV_FREE needs no more protection than MADV_DONTNEED.
>
> > Stylistic nit:
> >
> >> + if (PageLazyFree(page) && !migration) {
> >> + /* There is new data in the page. Reinstate it. */
> >> + if (unlikely(pte_dirty(pteval))) {
> >> + set_pte_at(mm, address, pte, pteval);
> >> + ret = SWAP_FAIL;
> >> + goto out_unmap;
> >> + }
> >
> > The comment should be inside the second `if' statement. As it is, It
> > looks like we reinstate the page if (PageLazyFree(page) && !migration).
>
> Want me to move it?
I did that, thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 22:06 ` Andrew Morton
@ 2007-04-20 23:52 ` Rik van Riel
2007-04-21 0:48 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-20 23:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm, shak
Andrew Morton wrote:
> On Fri, 20 Apr 2007 17:38:06 -0400
> Rik van Riel <riel@redhat.com> wrote:
>
>> Andrew Morton wrote:
>>
>>> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>>>
>>> - Nick's patch also will help this problem. It could be that your patch
>>> no longer offers a 2x speedup when combined with Nick's patch.
>>>
>>> It could well be that the combination of the two is even better, but it
>>> would be nice to firm that up a bit.
>> I'll test that.
>
> Thanks.
Well, good news.
It turns out that Nick's patch does not improve peak
performance much, but it does prevent the decline when
running with 16 threads on my quad core CPU!
We _definately_ want both patches, there's a huge benefit
in having them both.
Here are the transactions/seconds for each combination:
vanilla new glibc madv_free kernel madv_free + mmap_sem
threads
1 610 609 596 545
2 1032 1136 1196 1200
4 1070 1128 2014 2024
8 1000 1088 1665 2087
16 779 1073 1310 1999
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 23:52 ` Rik van Riel
@ 2007-04-21 0:48 ` Eric Dumazet
2007-04-21 3:58 ` Rik van Riel
2007-04-21 7:12 ` Jakub Jelinek
2007-04-22 2:36 ` Nick Piggin
2 siblings, 1 reply; 43+ messages in thread
From: Eric Dumazet @ 2007-04-21 0:48 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
Rik van Riel a A(C)crit :
> Andrew Morton wrote:
>> On Fri, 20 Apr 2007 17:38:06 -0400
>> Rik van Riel <riel@redhat.com> wrote:
>>
>>> Andrew Morton wrote:
>>>
>>>> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>>>>
>>>> - Nick's patch also will help this problem. It could be that your
>>>> patch
>>>> no longer offers a 2x speedup when combined with Nick's patch.
>>>>
>>>> It could well be that the combination of the two is even better,
>>>> but it
>>>> would be nice to firm that up a bit.
>>> I'll test that.
>>
>> Thanks.
>
> Well, good news.
>
> It turns out that Nick's patch does not improve peak
> performance much, but it does prevent the decline when
> running with 16 threads on my quad core CPU!
>
> We _definately_ want both patches, there's a huge benefit
> in having them both.
>
> Here are the transactions/seconds for each combination:
>
> vanilla new glibc madv_free kernel madv_free + mmap_sem
> threads
>
> 1 610 609 596 545
545 tps versus 610 tps for one thread ? It seems quite bad, no ?
Could you please find an explanation for this ?
> 2 1032 1136 1196 1200
> 4 1070 1128 2014 2024
> 8 1000 1088 1665 2087
> 16 779 1073 1310 1999
>
>
Thank you
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-21 0:48 ` Eric Dumazet
@ 2007-04-21 3:58 ` Rik van Riel
0 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-21 3:58 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
Eric Dumazet wrote:
> Rik van Riel a A(C)crit :
>> Andrew Morton wrote:
>>> On Fri, 20 Apr 2007 17:38:06 -0400
>>> Rik van Riel <riel@redhat.com> wrote:
>>>
>>>> Andrew Morton wrote:
>>>>
>>>>> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>>>>>
>>>>> - Nick's patch also will help this problem. It could be that your
>>>>> patch
>>>>> no longer offers a 2x speedup when combined with Nick's patch.
>>>>>
>>>>> It could well be that the combination of the two is even better,
>>>>> but it
>>>>> would be nice to firm that up a bit.
>>>> I'll test that.
>>>
>>> Thanks.
>>
>> Well, good news.
>>
>> It turns out that Nick's patch does not improve peak
>> performance much, but it does prevent the decline when
>> running with 16 threads on my quad core CPU!
>>
>> We _definately_ want both patches, there's a huge benefit
>> in having them both.
>>
>> Here are the transactions/seconds for each combination:
>>
>> vanilla new glibc madv_free kernel madv_free + mmap_sem
>> threads
>>
>> 1 610 609 596 545
>
> 545 tps versus 610 tps for one thread ? It seems quite bad, no ?
>
> Could you please find an explanation for this ?
I have no idea why this happens. Especially the last one,
going from a write lock to a read lock on the mmap_sem
should not make ANY difference whatsoever since we're
running single threaded!
>> 2 1032 1136 1196 1200
>> 4 1070 1128 2014 2024
>> 8 1000 1088 1665 2087
>> 16 779 1073 1310 1999
Performance with 2 database threads is way better though,
and performance with 4 or more threads more than doubles...
If you have an explanation on why single threaded performance
went down a little on my quad core system, please let me know.
Does performance suffer at all on a real UP system?
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 23:52 ` Rik van Riel
2007-04-21 0:48 ` Eric Dumazet
@ 2007-04-21 7:12 ` Jakub Jelinek
2007-04-23 4:36 ` Nick Piggin
2007-04-22 2:36 ` Nick Piggin
2 siblings, 1 reply; 43+ messages in thread
From: Jakub Jelinek @ 2007-04-21 7:12 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
On Fri, Apr 20, 2007 at 07:52:44PM -0400, Rik van Riel wrote:
> It turns out that Nick's patch does not improve peak
> performance much, but it does prevent the decline when
> running with 16 threads on my quad core CPU!
>
> We _definately_ want both patches, there's a huge benefit
> in having them both.
>
> Here are the transactions/seconds for each combination:
>
> vanilla new glibc madv_free kernel madv_free + mmap_sem
> threads
>
> 1 610 609 596 545
> 2 1032 1136 1196 1200
> 4 1070 1128 2014 2024
> 8 1000 1088 1665 2087
> 16 779 1073 1310 1999
FYI, I have uploaded a testing glibc that uses MADV_FREE and falls back
to MADV_DONTUSE if MADV_FREE is not available, to
http://people.redhat.com/jakub/glibc/2.5.90-21.1/
and I'm also attaching the glibc patch for those who want to build it
themselves:
2007-04-19 Ulrich Drepper <drepper@redhat.com>
Jakub Jelinek <jakub@redhat.com>
* malloc/arena.c (heap_info): Add mprotect_size field, adjust pad.
(new_heap): Initialize mprotect_size.
(no_madv_free): New variable.
(grow_heap): When growing, only mprotect from mprotect_size till
new_size if mprotect_size is smaller. When shrinking, use PROT_NONE
MMAP for __libc_enable_secure only, otherwise if MADV_FREE is
available use it and fall back to MADV_DONTNEED.
* sysdeps/unix/sysv/linux/alpha/bits/mman.h (MADV_FREE): Define.
* sysdeps/unix/sysv/linux/ia64/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/i386/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/s390/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/x86_64/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/sh/bits/mman.h (MADV_FREE): Likewise.
--- libc/malloc/arena.c.jj 2006-10-31 23:05:31.000000000 +0100
+++ libc/malloc/arena.c 2007-04-19 18:54:20.000000000 +0200
@@ -1,5 +1,6 @@
/* Malloc implementation for multiple threads without lock contention.
- Copyright (C) 2001,2002,2003,2004,2005,2006 Free Software Foundation, Inc.
+ Copyright (C) 2001,2002,2003,2004,2005,2006,2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Wolfram Gloger <wg@malloc.de>, 2001.
@@ -59,10 +60,12 @@ typedef struct _heap_info {
mstate ar_ptr; /* Arena for this heap. */
struct _heap_info *prev; /* Previous heap. */
size_t size; /* Current size in bytes. */
+ size_t mprotect_size; /* Size in bytes that has been mprotected
+ PROT_READ|PROT_WRITE. */
/* Make sure the following data is properly aligned, particularly
that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
- MALLOG_ALIGNMENT. */
- char pad[-5 * SIZE_SZ & MALLOC_ALIGN_MASK];
+ MALLOC_ALIGNMENT. */
+ char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;
/* Get a compile-time error if the heap_info padding is not correct
@@ -692,10 +695,15 @@ new_heap(size, top_pad) size_t size, top
}
h = (heap_info *)p2;
h->size = size;
+ h->mprotect_size = size;
THREAD_STAT(stat_n_heaps++);
return h;
}
+#if defined _LIBC && defined MADV_FREE
+static int no_madv_free;
+#endif
+
/* Grow or shrink a heap. size is automatically rounded up to a
multiple of the page size if it is positive. */
@@ -714,17 +722,49 @@ grow_heap(h, diff) heap_info *h; long di
new_size = (long)h->size + diff;
if((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE)
return -1;
- if(mprotect((char *)h + h->size, diff, PROT_READ|PROT_WRITE) != 0)
- return -2;
+ if((unsigned long) new_size > h->mprotect_size) {
+ if (mprotect((char *)h + h->mprotect_size,
+ (unsigned long) new_size - h->mprotect_size,
+ PROT_READ|PROT_WRITE) != 0)
+ return -2;
+ h->mprotect_size = new_size;
+ }
} else {
new_size = (long)h->size + diff;
if(new_size < (long)sizeof(*h))
return -1;
/* Try to re-map the extra heap space freshly to save memory, and
make it inaccessible. */
- if((char *)MMAP((char *)h + new_size, -diff, PROT_NONE,
- MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED)
- return -2;
+#ifdef _LIBC
+ if (__builtin_expect (__libc_enable_secure, 0))
+#else
+ if (1)
+#endif
+ {
+ if((char *)MMAP((char *)h + new_size, -diff, PROT_NONE,
+ MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED)
+ return -2;
+ h->mprotect_size = new_size;
+ }
+#ifdef _LIBC
+ else
+ {
+# ifdef MADV_FREE
+ if (!__builtin_expect (no_madv_free, 0))
+ {
+ if (__builtin_expect (madvise ((char *)h + new_size,
+ -diff, MADV_FREE), 0) == -1
+ && errno == EINVAL)
+ {
+ no_madv_free = 1;
+ madvise ((char *)h + new_size, -diff, MADV_DONTNEED);
+ }
+ }
+ else
+# endif
+ madvise ((char *)h + new_size, -diff, MADV_DONTNEED);
+ }
+#endif
/*fprintf(stderr, "shrink %p %08lx\n", h, new_size);*/
}
h->size = new_size;
--- libc/sysdeps/unix/sysv/linux/alpha/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/alpha/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/Alpha version.
- Copyright (C) 1997, 1998, 2000, 2003, 2006 Free Software Foundation, Inc.
+ Copyright (C) 1997, 1998, 2000, 2003, 2006, 2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -96,6 +97,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 6 /* Don't need these pages. */
+# define MADV_FREE 7 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/ia64/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/ia64/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/ia64 version.
- Copyright (C) 1997,1998,2000,2003,2005,2006 Free Software Foundation, Inc.
+ Copyright (C) 1997,1998,2000,2003,2005,2006,2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -89,6 +90,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/i386/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/i386/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/i386 version.
- Copyright (C) 1997, 2000, 2003, 2005, 2006 Free Software Foundation, Inc.
+ Copyright (C) 1997, 2000, 2003, 2005, 2006, 2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -88,6 +89,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/s390/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/s390/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/s390 version.
- Copyright (C) 2000,2001,2002,2003,2005,2006 Free Software Foundation, Inc.
+ Copyright (C) 2000,2001,2002,2003,2005,2006,2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -89,6 +90,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/powerpc/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/powerpc/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/PowerPC version.
- Copyright (C) 1997, 2000, 2003, 2005, 2006 Free Software Foundation, Inc.
+ Copyright (C) 1997, 2000, 2003, 2005, 2006, 2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -89,6 +90,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/x86_64/bits/mman.h.jj 2006-05-02 16:33:46.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/x86_64/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,5 @@
/* Definitions for POSIX memory map interface. Linux/x86_64 version.
- Copyright (C) 2001, 2003, 2005, 2006 Free Software Foundation, Inc.
+ Copyright (C) 2001, 2003, 2005, 2006, 2007 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -89,6 +89,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/sparc/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/sparc/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/SPARC version.
- Copyright (C) 1997,1999,2000,2003,2005,2006 Free Software Foundation, Inc.
+ Copyright (C) 1997,1999,2000,2003,2005,2006,2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -90,7 +91,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
-# define MADV_FREE 5 /* Content can be freed (Solaris). */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
--- libc/sysdeps/unix/sysv/linux/sh/bits/mman.h.jj 2006-05-02 16:33:44.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/sh/bits/mman.h 2007-04-19 18:37:43.000000000 +0200
@@ -1,5 +1,6 @@
/* Definitions for POSIX memory map interface. Linux/SH version.
- Copyright (C) 1997,1999,2000,2003,2005,2006 Free Software Foundation, Inc.
+ Copyright (C) 1997,1999,2000,2003,2005,2006,2007
+ Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -88,6 +89,7 @@
# define MADV_SEQUENTIAL 2 /* Expect sequential page references. */
# define MADV_WILLNEED 3 /* Will need these pages. */
# define MADV_DONTNEED 4 /* Don't need these pages. */
+# define MADV_FREE 5 /* Content can be freed. */
# define MADV_REMOVE 9 /* Remove these pages and resources. */
# define MADV_DONTFORK 10 /* Do not inherit across fork. */
# define MADV_DOFORK 11 /* Do inherit across fork. */
Jakub
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 21:38 ` Rik van Riel
2007-04-20 22:06 ` Andrew Morton
@ 2007-04-21 7:24 ` Hugh Dickins
2007-04-21 18:06 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Hugh Dickins @ 2007-04-21 7:24 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm
On Fri, 20 Apr 2007, Rik van Riel wrote:
> Andrew Morton wrote:
>
> > I do go on about that. But we're adding page flags at about one per
> > year, and when we run out we're screwed - we'll need to grow the
> > pageframe.
>
> If you want, I can take a look at folding this into the
> ->mapping pointer. I can guarantee you it won't be
> pretty, though :)
Please don't. If we're going to stuff another pageflag into there,
let it be PageSwapCache the natural partner of PageAnon, rather than
whatever our latest pageflag happens to be. I'll look into it - but
do keep an eye on me, I've developed a dubious track record of
obstructing other people's attempts to save pageflags.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2
2007-04-20 21:24 ` Ulrich Drepper
@ 2007-04-21 7:37 ` Hugh Dickins
2007-04-21 16:32 ` Ulrich Drepper
0 siblings, 1 reply; 43+ messages in thread
From: Hugh Dickins @ 2007-04-21 7:37 UTC (permalink / raw)
To: Ulrich Drepper
Cc: Andrew Morton, Rik van Riel, Jakub Jelinek, linux-kernel, linux-mm
On Fri, 20 Apr 2007, Ulrich Drepper wrote:
>
> Just for reference: the MADV_CURRENT behavior is to throw away data in
> the range.
Not exactly. The Linux MADV_DONTNEED never throws away data from a
PROT_WRITE,MAP_SHARED mapping (or shm) - it propagates the dirty bit,
the page will eventually get written out to file, and can be retrieved
later by subsequent access. But the Linux MADV_DONTNEED does throw away
data from a PROT_WRITE,MAP_PRIVATE mapping (or brk or stack) - those
changes are discarded, and a subsequent access will revert to zeroes
or the underlying mapped file. Been like that since before 2.4.0.
> The POSIX_MADV_DONTNEED behavior is to never lose data.
> I.e., file backed data is written back, anon data is at most swapped
> out.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2
2007-04-21 7:37 ` Hugh Dickins
@ 2007-04-21 16:32 ` Ulrich Drepper
0 siblings, 0 replies; 43+ messages in thread
From: Ulrich Drepper @ 2007-04-21 16:32 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Rik van Riel, Jakub Jelinek, linux-kernel, linux-mm
On 4/21/07, Hugh Dickins <hugh@veritas.com> wrote:
> But the Linux MADV_DONTNEED does throw away
> data from a PROT_WRITE,MAP_PRIVATE mapping (or brk or stack) - those
> changes are discarded, and a subsequent access will revert to zeroes
> or the underlying mapped file. Been like that since before 2.4.0.
I didn't say it changed. I just say that there is a hole in the
current implementation as it does not allow to implement
POSIX_MADV_DONTNEED with anything but a no-op. The
POSIX_MADV_DONTNEED behavior is useful and something IMO should be
added to allow implementing it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-21 7:24 ` Hugh Dickins
@ 2007-04-21 18:06 ` Rik van Riel
0 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-21 18:06 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm
Hugh Dickins wrote:
> On Fri, 20 Apr 2007, Rik van Riel wrote:
>> Andrew Morton wrote:
>>
>>> I do go on about that. But we're adding page flags at about one per
>>> year, and when we run out we're screwed - we'll need to grow the
>>> pageframe.
>> If you want, I can take a look at folding this into the
>> ->mapping pointer. I can guarantee you it won't be
>> pretty, though :)
>
> Please don't. If we're going to stuff another pageflag into there,
> let it be PageSwapCache the natural partner of PageAnon, rather than
> whatever our latest pageflag happens to be.
I looked at doing what Andrew wanted, and it did indeed not
look like the right thing to do. The locking on page->mapping
is the kind of locking we want to avoid during zap_page_range
and in the pageout code.
I like your suggestion better.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-20 23:52 ` Rik van Riel
2007-04-21 0:48 ` Eric Dumazet
2007-04-21 7:12 ` Jakub Jelinek
@ 2007-04-22 2:36 ` Nick Piggin
2007-04-22 2:50 ` Nick Piggin
` (2 more replies)
2 siblings, 3 replies; 43+ messages in thread
From: Nick Piggin @ 2007-04-22 2:36 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
Rik van Riel wrote:
> Andrew Morton wrote:
>
>> On Fri, 20 Apr 2007 17:38:06 -0400
>> Rik van Riel <riel@redhat.com> wrote:
>>
>>> Andrew Morton wrote:
>>>
>>>> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>>>>
>>>> - Nick's patch also will help this problem. It could be that your
>>>> patch
>>>> no longer offers a 2x speedup when combined with Nick's patch.
>>>>
>>>> It could well be that the combination of the two is even better,
>>>> but it
>>>> would be nice to firm that up a bit.
>>>
>>> I'll test that.
>>
>>
>> Thanks.
>
>
> Well, good news.
>
> It turns out that Nick's patch does not improve peak
> performance much, but it does prevent the decline when
> running with 16 threads on my quad core CPU!
>
> We _definately_ want both patches, there's a huge benefit
> in having them both.
>
> Here are the transactions/seconds for each combination:
>
> vanilla new glibc madv_free kernel madv_free + mmap_sem
> threads
>
> 1 610 609 596 545
> 2 1032 1136 1196 1200
> 4 1070 1128 2014 2024
> 8 1000 1088 1665 2087
> 16 779 1073 1310 1999
Is "new glibc" meaning MADV_DONTNEED + kernel with mmap_sem patch?
The strange thing with your madv_free kernel is that it doesn't
help single-threaded performance at all. So that work to avoid
zeroing the new page is not a win at all there (maybe due to the
cache effects I was worried about?).
However MADV_FREE does improve scalability, which is interesting.
The most likely reason I can see why that may be the case is that
it avoids mmap_sem when faulting pages back in (I doubt it is due
to avoiding the page allocator, but maybe?).
So where is the down_write coming from in this workload, I wonder?
Heap management? What syscalls?
x86_64's rwsems are crap under heavy parallelism (even read-only),
as I fixed in my recent generic rwsems patch. I don't expect MySQL
to be such a mmap_sem microbenchmark, but I wonder how much this
would help?
What if we ran the private futexes patch to further cut down
mmap_sem contention?
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-22 2:36 ` Nick Piggin
@ 2007-04-22 2:50 ` Nick Piggin
2007-04-22 6:31 ` Rik van Riel
2007-04-23 4:28 ` Rik van Riel
2 siblings, 0 replies; 43+ messages in thread
From: Nick Piggin @ 2007-04-22 2:50 UTC (permalink / raw)
To: Nick Piggin; +Cc: Rik van Riel, Andrew Morton, linux-kernel, linux-mm, shak
Nick Piggin wrote:
> Rik van Riel wrote:
>
>> Andrew Morton wrote:
>>
>>> On Fri, 20 Apr 2007 17:38:06 -0400
>>> Rik van Riel <riel@redhat.com> wrote:
>>>
>>>> Andrew Morton wrote:
>>>>
>>>>> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>>>>>
>>>>> - Nick's patch also will help this problem. It could be that your
>>>>> patch
>>>>> no longer offers a 2x speedup when combined with Nick's patch.
>>>>>
>>>>> It could well be that the combination of the two is even better,
>>>>> but it
>>>>> would be nice to firm that up a bit.
>>>>
>>>>
>>>> I'll test that.
>>>
>>>
>>>
>>> Thanks.
>>
>>
>>
>> Well, good news.
>>
>> It turns out that Nick's patch does not improve peak
>> performance much, but it does prevent the decline when
>> running with 16 threads on my quad core CPU!
>>
>> We _definately_ want both patches, there's a huge benefit
>> in having them both.
>>
>> Here are the transactions/seconds for each combination:
>>
>> vanilla new glibc madv_free kernel madv_free + mmap_sem
>> threads
>>
>> 1 610 609 596 545
>> 2 1032 1136 1196 1200
>> 4 1070 1128 2014 2024
>> 8 1000 1088 1665 2087
>> 16 779 1073 1310 1999
>
>
>
> Is "new glibc" meaning MADV_DONTNEED + kernel with mmap_sem patch?
>
> The strange thing with your madv_free kernel is that it doesn't
> help single-threaded performance at all. So that work to avoid
> zeroing the new page is not a win at all there (maybe due to the
> cache effects I was worried about?).
>
> However MADV_FREE does improve scalability, which is interesting.
> The most likely reason I can see why that may be the case is that
> it avoids mmap_sem when faulting pages back in (I doubt it is due
> to avoiding the page allocator, but maybe?).
>
> So where is the down_write coming from in this workload, I wonder?
> Heap management? What syscalls?
>
> x86_64's rwsems are crap under heavy parallelism (even read-only),
> as I fixed in my recent generic rwsems patch. I don't expect MySQL
> to be such a mmap_sem microbenchmark, but I wonder how much this
> would help?
>
> What if we ran the private futexes patch to further cut down
> mmap_sem contention?
Hmm, without the MADV_FREE patch, I wonder if it isn't doing something
silly like read-faulting in a ZERO_PAGE then write faulting a new page
straight afterwards.. I'll have to try a few tests.
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-22 2:36 ` Nick Piggin
2007-04-22 2:50 ` Nick Piggin
@ 2007-04-22 6:31 ` Rik van Riel
2007-04-23 0:16 ` Nick Piggin
2007-04-23 4:28 ` Rik van Riel
2 siblings, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2007-04-22 6:31 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
Nick Piggin wrote:
> Rik van Riel wrote:
>> Andrew Morton wrote:
>>
>>> On Fri, 20 Apr 2007 17:38:06 -0400
>>> Rik van Riel <riel@redhat.com> wrote:
>>>
>>>> Andrew Morton wrote:
>>>>
>>>>> I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
>>>>>
>>>>> - Nick's patch also will help this problem. It could be that your
>>>>> patch
>>>>> no longer offers a 2x speedup when combined with Nick's patch.
>>>>>
>>>>> It could well be that the combination of the two is even better,
>>>>> but it
>>>>> would be nice to firm that up a bit.
>>>>
>>>> I'll test that.
>>>
>>>
>>> Thanks.
>>
>>
>> Well, good news.
>>
>> It turns out that Nick's patch does not improve peak
>> performance much, but it does prevent the decline when
>> running with 16 threads on my quad core CPU!
>>
>> We _definately_ want both patches, there's a huge benefit
>> in having them both.
>>
>> Here are the transactions/seconds for each combination:
>>
>> vanilla new glibc madv_free kernel madv_free + mmap_sem
>> threads
>>
>> 1 610 609 596 545
>> 2 1032 1136 1196 1200
>> 4 1070 1128 2014 2024
>> 8 1000 1088 1665 2087
>> 16 779 1073 1310 1999
>
>
> Is "new glibc" meaning MADV_DONTNEED + kernel with mmap_sem patch?
No, that's just the glibc change, with a vanilla kernel.
The third column is glibc change + mmap_sem patch.
The fourth column has your patch in it, too.
> The strange thing with your madv_free kernel is that it doesn't
> help single-threaded performance at all. So that work to avoid
> zeroing the new page is not a win at all there (maybe due to the
> cache effects I was worried about?).
Well, your patch causes the performance to drop from
596 transactions/second to 545. Your patch is the only
difference between the third and the fourth column.
> However MADV_FREE does improve scalability, which is interesting.
> The most likely reason I can see why that may be the case is that
> it avoids mmap_sem when faulting pages back in (I doubt it is due
> to avoiding the page allocator, but maybe?).
>
> So where is the down_write coming from in this workload, I wonder?
> Heap management? What syscalls?
I wonder if the increased parallelism simply caused
more cache line bouncing, with bounces happening in
some inner loop instead of an outer loop.
Btw, it is quite possible that the MySQL sysbench
thing gives different results on your system. It
would be good to know what it does on a real SMP
system, vs. a single quad-core chip :)
Other architectures would be interesting to know,
too.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-17 7:15 [PATCH] lazy freeing of memory through MADV_FREE Rik van Riel
2007-04-19 21:15 ` [PATCH] lazy freeing of memory through MADV_FREE 2/2 Rik van Riel
2007-04-20 20:57 ` [PATCH] lazy freeing of memory through MADV_FREE Andrew Morton
@ 2007-04-22 8:18 ` Andrew Morton
2007-04-22 9:16 ` Christoph Hellwig
2 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2007-04-22 8:18 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-kernel, linux-mm, David S. Miller
On Tue, 17 Apr 2007 03:15:51 -0400 Rik van Riel <riel@redhat.com> wrote:
> Make it possible for applications to have the kernel free memory
> lazily. This reduces a repeated free/malloc cycle from freeing
> pages and allocating them, to just marking them freeable. If the
> application wants to reuse them before the kernel needs the memory,
> not even a page fault will happen.
>
> This patch, together with Ulrich's glibc change, increases
> MySQL sysbench performance by a factor of 2 on my quad core
> test system.
>
In file included from include/linux/mman.h:4,
from arch/sparc64/kernel/sys_sparc.c:19:
include/asm/mman.h:36:1: "MADV_FREE" redefined
In file included from include/asm/mman.h:5,
from include/linux/mman.h:4,
from arch/sparc64/kernel/sys_sparc.c:19:
include/asm-generic/mman.h:32:1: this is the location of the previous definition
sparc32 and sparc64 already defined MADV_FREE:
#define MADV_FREE 0x5 /* (Solaris) contents can be freed */
I'll remove the sparc definitions for now, but we need to work out what
we're going to do here. Your patch changes the values of MADV_FREE on
sparc.
Perhaps this should be renamed to MADV_FREE_LINUX and given a different
number. It depends on how close your proposed behaviour is to Solaris's.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-22 8:18 ` Andrew Morton
@ 2007-04-22 9:16 ` Christoph Hellwig
2007-04-22 16:55 ` Ulrich Drepper
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2007-04-22 9:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm, David S. Miller
On Sun, Apr 22, 2007 at 01:18:10AM -0700, Andrew Morton wrote:
> On Tue, 17 Apr 2007 03:15:51 -0400 Rik van Riel <riel@redhat.com> wrote:
>
> > Make it possible for applications to have the kernel free memory
> > lazily. This reduces a repeated free/malloc cycle from freeing
> > pages and allocating them, to just marking them freeable. If the
> > application wants to reuse them before the kernel needs the memory,
> > not even a page fault will happen.
> >
> > This patch, together with Ulrich's glibc change, increases
> > MySQL sysbench performance by a factor of 2 on my quad core
> > test system.
> >
>
> In file included from include/linux/mman.h:4,
> from arch/sparc64/kernel/sys_sparc.c:19:
> include/asm/mman.h:36:1: "MADV_FREE" redefined
> In file included from include/asm/mman.h:5,
> from include/linux/mman.h:4,
> from arch/sparc64/kernel/sys_sparc.c:19:
> include/asm-generic/mman.h:32:1: this is the location of the previous definition
>
> sparc32 and sparc64 already defined MADV_FREE:
>
>
> #define MADV_FREE 0x5 /* (Solaris) contents can be freed */
>
> I'll remove the sparc definitions for now, but we need to work out what
> we're going to do here. Your patch changes the values of MADV_FREE on
> sparc.
>
> Perhaps this should be renamed to MADV_FREE_LINUX and given a different
> number. It depends on how close your proposed behaviour is to Solaris's.
Why isn't MADV_FREE defined to 5 for linux? It's our first free madv
value? Also the behaviour should better match the one in solaris or BSD,
the last thing we need is slightly different behaviour from operating
systems supporting this for ages.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-22 9:16 ` Christoph Hellwig
@ 2007-04-22 16:55 ` Ulrich Drepper
0 siblings, 0 replies; 43+ messages in thread
From: Ulrich Drepper @ 2007-04-22 16:55 UTC (permalink / raw)
To: Christoph Hellwig, Andrew Morton, Rik van Riel, linux-kernel,
linux-mm, David S. Miller
On 4/22/07, Christoph Hellwig <hch@infradead.org> wrote:
> Why isn't MADV_FREE defined to 5 for linux? It's our first free madv
> value? Also the behaviour should better match the one in solaris or BSD,
> the last thing we need is slightly different behaviour from operating
> systems supporting this for ages.
The behavior should indeed be identical. Both implementations
restrict MADV_FREE to work on anonymous memory and it is unspecified
whether a renewed access yields to a zerod page being created or
whether the old content is still there. So, just use 0x5 for both the
Linux and Solaris version on sparc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-22 6:31 ` Rik van Riel
@ 2007-04-23 0:16 ` Nick Piggin
2007-04-23 3:53 ` Rik van Riel
0 siblings, 1 reply; 43+ messages in thread
From: Nick Piggin @ 2007-04-23 0:16 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
Rik van Riel wrote:
> Nick Piggin wrote:
>
>> Rik van Riel wrote:
>>> Here are the transactions/seconds for each combination:
>>>
>>> vanilla new glibc madv_free kernel madv_free + mmap_sem
>>> threads
>>>
>>> 1 610 609 596 545
>>> 2 1032 1136 1196 1200
>>> 4 1070 1128 2014 2024
>>> 8 1000 1088 1665 2087
>>> 16 779 1073 1310 1999
>>
>>
>>
>> Is "new glibc" meaning MADV_DONTNEED + kernel with mmap_sem patch?
>
>
> No, that's just the glibc change, with a vanilla kernel.
OK. That would be interesting to see with the mmap_sem change,
because that should increase scalability.
> The third column is glibc change + mmap_sem patch.
>
> The fourth column has your patch in it, too.
>
>> The strange thing with your madv_free kernel is that it doesn't
>> help single-threaded performance at all. So that work to avoid
>> zeroing the new page is not a win at all there (maybe due to the
>> cache effects I was worried about?).
>
>
> Well, your patch causes the performance to drop from
> 596 transactions/second to 545. Your patch is the only
> difference between the third and the fourth column.
Yeah. That's funny, because it means either there is some
contention on the mmap_sem (or ptl) at 1 thread, or that my
patch alters the uncontended performance.
>> However MADV_FREE does improve scalability, which is interesting.
>> The most likely reason I can see why that may be the case is that
>> it avoids mmap_sem when faulting pages back in (I doubt it is due
>> to avoiding the page allocator, but maybe?).
>>
>> So where is the down_write coming from in this workload, I wonder?
>> Heap management? What syscalls?
>
>
> I wonder if the increased parallelism simply caused
> more cache line bouncing, with bounces happening in
> some inner loop instead of an outer loop.
>
> Btw, it is quite possible that the MySQL sysbench
> thing gives different results on your system. It
> would be good to know what it does on a real SMP
> system, vs. a single quad-core chip :)
>
> Other architectures would be interesting to know,
> too.
I don't see why parallelism should come into it at 1 thread, unless
MySQL is parallelising individual transactions. Anyway, I'll try to do
some more digging.
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 0:16 ` Nick Piggin
@ 2007-04-23 3:53 ` Rik van Riel
2007-04-23 3:58 ` Nick Piggin
2007-04-23 3:59 ` Rik van Riel
0 siblings, 2 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 3:53 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Nick Piggin wrote:
> Rik van Riel wrote:
>> Nick Piggin wrote:
>>
>>> Rik van Riel wrote:
>
>>>> Here are the transactions/seconds for each combination:
I've added a 5th column, with just your mmap_sem patch and
without my madv_free patch. It is run with the glibc patch,
which should make it fall back to MADV_DONTNEED after the
first MADV_FREE call fails.
>>>> vanilla new glibc madv_free kernel madv_free + mmap_sem mmap_sem
>>>> threads
>>>>
>>>> 1 610 609 596 545 534
>>>> 2 1032 1136 1196 1200 1180
>>>> 4 1070 1128 2014 2024 2027
>>>> 8 1000 1088 1665 2087 2089
>>>> 16 779 1073 1310 1999 2012
Not doing the mprotect calls is the big one I guess, especially
the fact that we don't need to take the mmap_sem for writing.
With both our patches, single and two thread performance with
MySQL sysbench is somewhat better than with just your patch,
4 and 8 thread performance are basically the same and just
your patch gives a slight benefit with 16 threads.
I guess I should benchmark up to 64 or 128 threads tomorrow,
to see if this is just luck or if the cache benefit of doing
the page faults and reusing hot pages is faster than not
having page faults at all.
I should run some benchmarks on other systems, too. Some of
these results could be an artifact of my quad core CPU. The
results could be very different on other systems...
> Yeah. That's funny, because it means either there is some
> contention on the mmap_sem (or ptl) at 1 thread, or that my
> patch alters the uncontended performance.
Maybe MySQL has various different threads to do
different tasks. Something to look into...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 3:53 ` Rik van Riel
@ 2007-04-23 3:58 ` Nick Piggin
2007-04-23 10:07 ` Nick Piggin
2007-04-23 3:59 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Nick Piggin @ 2007-04-23 3:58 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> I've added a 5th column, with just your mmap_sem patch and
> without my madv_free patch. It is run with the glibc patch,
> which should make it fall back to MADV_DONTNEED after the
> first MADV_FREE call fails.
Thanks! (I edited slightly so it doesn't wrap)
> vanilla new glibc madv_free mmap_sem both
> threads
>
> 1 610 609 596 534 545
> 2 1032 1136 1196 1180 1200
> 4 1070 1128 2014 2027 2024
> 8 1000 1088 1665 2089 2087
> 16 779 1073 1310 2012 1999
>
>
> Not doing the mprotect calls is the big one I guess, especially
> the fact that we don't need to take the mmap_sem for writing.
Yes.
> With both our patches, single and two thread performance with
> MySQL sysbench is somewhat better than with just your patch,
> 4 and 8 thread performance are basically the same and just
> your patch gives a slight benefit with 16 threads.
>
> I guess I should benchmark up to 64 or 128 threads tomorrow,
> to see if this is just luck or if the cache benefit of doing
> the page faults and reusing hot pages is faster than not
> having page faults at all.
>
> I should run some benchmarks on other systems, too. Some of
> these results could be an artifact of my quad core CPU. The
> results could be very different on other systems...
I'm getting the 16 core box out of retirement as we speak :)
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 3:53 ` Rik van Riel
2007-04-23 3:58 ` Nick Piggin
@ 2007-04-23 3:59 ` Rik van Riel
2007-04-23 9:20 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 3:59 UTC (permalink / raw)
To: Rik van Riel
Cc: Nick Piggin, Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> Nick Piggin wrote:
>> Rik van Riel wrote:
>>> Nick Piggin wrote:
>>>
>>>> Rik van Riel wrote:
>>
>>>>> Here are the transactions/seconds for each combination:
>
> I've added a 5th column, with just your mmap_sem patch and
> without my madv_free patch. It is run with the glibc patch,
> which should make it fall back to MADV_DONTNEED after the
> first MADV_FREE call fails.
>
>>>>> vanilla new glibc madv_free kernel madv_free + mmap_sem
>>>>> mmap_sem
>>>>> threads
>>>>>
>>>>> 1 610 609 596 545 534
>>>>> 2 1032 1136 1196 1200 1180
>>>>> 4 1070 1128 2014 2024 2027
>>>>> 8 1000 1088 1665 2087 2089
>>>>> 16 779 1073 1310 1999 2012
Now that I think about it - this is all with the rawhide kernel
configuration, which has an ungodly number of debug config
options enabled.
I should try this with a more normal kernel, on various different
systems.
It would also be helpful if other people tried this same benchmark,
and others, on their systems.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-22 2:36 ` Nick Piggin
2007-04-22 2:50 ` Nick Piggin
2007-04-22 6:31 ` Rik van Riel
@ 2007-04-23 4:28 ` Rik van Riel
2 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 4:28 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak
Nick Piggin wrote:
> So where is the down_write coming from in this workload, I wonder?
> Heap management? What syscalls?
Trying to answer this question, I straced the mysql threads that
showed up in top when running a single threaded sysbench workload.
There were no mmap, munmap, brk, mprotect or madvise system calls
in the trace.
MySQL has me puzzled, but it seems to have some other people
interested too.
I think I'll go play a bit with ebizzy now, to see how other
workloads are affected by our kernel changes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-21 7:12 ` Jakub Jelinek
@ 2007-04-23 4:36 ` Nick Piggin
0 siblings, 0 replies; 43+ messages in thread
From: Nick Piggin @ 2007-04-23 4:36 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Rik van Riel, Andrew Morton, linux-kernel, linux-mm, shak
Jakub Jelinek wrote:
> On Fri, Apr 20, 2007 at 07:52:44PM -0400, Rik van Riel wrote:
>
>>It turns out that Nick's patch does not improve peak
>>performance much, but it does prevent the decline when
>>running with 16 threads on my quad core CPU!
>>
>>We _definately_ want both patches, there's a huge benefit
>>in having them both.
>>
>>Here are the transactions/seconds for each combination:
>>
>> vanilla new glibc madv_free kernel madv_free + mmap_sem
>>threads
>>
>>1 610 609 596 545
>>2 1032 1136 1196 1200
>>4 1070 1128 2014 2024
>>8 1000 1088 1665 2087
>>16 779 1073 1310 1999
>
>
> FYI, I have uploaded a testing glibc that uses MADV_FREE and falls back
> to MADV_DONTUSE if MADV_FREE is not available, to
> http://people.redhat.com/jakub/glibc/2.5.90-21.1/
Hmm, I wonder how glibc malloc stacks up to tcmalloc on this test
(after the mmap_sem patch as well).
I'll try running that as well!
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 3:59 ` Rik van Riel
@ 2007-04-23 9:20 ` Rik van Riel
2007-04-23 10:21 ` Nick Piggin
2007-04-23 11:45 ` Rik van Riel
0 siblings, 2 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 9:20 UTC (permalink / raw)
To: Rik van Riel
Cc: Nick Piggin, Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
[-- Attachment #1: Type: text/plain, Size: 1961 bytes --]
Use TLB batching for MADV_FREE. Adds another 10-15% extra performance
to the MySQL sysbench results on my quad core system.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
Rik van Riel wrote:
>> I've added a 5th column, with just your mmap_sem patch and
>> without my madv_free patch. It is run with the glibc patch,
>> which should make it fall back to MADV_DONTNEED after the
>> first MADV_FREE call fails.
With the attached patch to make MADV_FREE use tlb batching, not
only do we gain an additional 10-15% performance but Nick's
mmap_sem patch also shows the performance increase that we
expected to see.
It looks like the tlb flushes (and IPIs) from zap_pte_range()
could have been the problem. They're gone now.
The second column from the right has Nick's patch and my own
two patches. Performance with 16 threads is almost triple what
it used to be...
vanilla glibc glibc glibc glibc glibc glibc
madv_free madv_free madv_free
madv_free
mmap_sem mmap_sem mmap_sem
tlb batch tlb_batch
threads
1 610 609 596 545 534 547 537
2 1032 1136 1196 1200 1180 1293 1194
4 1070 1128 2014 2024 2027 2248 2040
8 1000 1088 1665 2087 2089 2314 1869
16 779 1073 1310 1999 2012 2214 1557
> Now that I think about it - this is all with the rawhide kernel
> configuration, which has an ungodly number of debug config
> options enabled.
>
> I should try this with a more normal kernel, on various different
> systems.
This is for another day. :)
First some ebizzy runs...
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
[-- Attachment #2: linux-2.6-madv_free-lazytlb.patch --]
[-- Type: text/x-patch, Size: 690 bytes --]
--- linux-2.6.20.x86_64/mm/memory.c.orig 2007-04-23 02:48:36.000000000 -0400
+++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 02:54:42.000000000 -0400
@@ -677,11 +677,15 @@ static unsigned long zap_pte_range(struc
remove_exclusive_swap_page(page);
unlock_page(page);
}
- ptep_clear_flush_dirty(vma, addr, pte);
- ptep_clear_flush_young(vma, addr, pte);
SetPageLazyFree(page);
if (PageActive(page))
deactivate_tail_page(page);
+ ptent = *pte;
+ set_pte_at(mm, addr, pte,
+ pte_mkclean(pte_mkold(ptent)));
+ /* tlb_remove_page frees it again */
+ get_page(page);
+ tlb_remove_page(tlb, page);
continue;
}
}
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 3:58 ` Nick Piggin
@ 2007-04-23 10:07 ` Nick Piggin
2007-04-23 10:12 ` Rik van Riel
0 siblings, 1 reply; 43+ messages in thread
From: Nick Piggin @ 2007-04-23 10:07 UTC (permalink / raw)
To: Nick Piggin
Cc: Rik van Riel, Andrew Morton, linux-kernel, linux-mm, shak, jakub,
drepper
Nick Piggin wrote:
> Rik van Riel wrote:
>
>> I've added a 5th column, with just your mmap_sem patch and
>> without my madv_free patch. It is run with the glibc patch,
>> which should make it fall back to MADV_DONTNEED after the
>> first MADV_FREE call fails.
>
>
> Thanks! (I edited slightly so it doesn't wrap)
>
>
>> vanilla new glibc madv_free mmap_sem both
>> threads
>>
>> 1 610 609 596 534 545
>> 2 1032 1136 1196 1180 1200
>> 4 1070 1128 2014 2027 2024
>> 8 1000 1088 1665 2089 2087
>> 16 779 1073 1310 2012 1999
>>
>>
>> Not doing the mprotect calls is the big one I guess, especially
>> the fact that we don't need to take the mmap_sem for writing.
>
>
> Yes.
>
>
>> With both our patches, single and two thread performance with
>> MySQL sysbench is somewhat better than with just your patch,
>> 4 and 8 thread performance are basically the same and just
>> your patch gives a slight benefit with 16 threads.
>>
>> I guess I should benchmark up to 64 or 128 threads tomorrow,
>> to see if this is just luck or if the cache benefit of doing
>> the page faults and reusing hot pages is faster than not
>> having page faults at all.
>>
>> I should run some benchmarks on other systems, too. Some of
>> these results could be an artifact of my quad core CPU. The
>> results could be very different on other systems...
>
>
> I'm getting the 16 core box out of retirement as we speak :)
>
OK, 10 runs at 1 client, 2.6.21-rc6, MySQL version 5.33, and new
Jakub's glibc gives a 99.9% confidence of:
vanilla: 467.2 +/- 7.9 (tps)
mmap_sem: 470.5 +/- 9.3 (tps)
However, it seems those means jump around a bit from boot to boot,
so there could be some some memory placement luck for cache and/or
NUMA goodness involved.
So I think it is safe to say that the mmap_sem patch doesn't hurt
single threaded performance (from looking at the numbers and the
patch). And that's the most important thing for that patch.
I'll post some scalability results tomorrow. From my first round
of tests, after new glibc and the mmap_sem patch, it doesn't seem
like rwsem improvements, private futexes, or avoiding zero_page
make any significant differences.
I haven't tested your MADV_FREE patch yet.
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:07 ` Nick Piggin
@ 2007-04-23 10:12 ` Rik van Riel
0 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 10:12 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Nick Piggin wrote:
> I haven't tested your MADV_FREE patch yet.
Good. It turned out that one behaved a bit strange without tlb batching
anyway.
I'm now running ebizzy across the whole set of kernels I tested before,
and will post the results in a bit.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 9:20 ` Rik van Riel
@ 2007-04-23 10:21 ` Nick Piggin
2007-04-23 10:31 ` Rik van Riel
2007-04-23 10:44 ` Jakub Jelinek
2007-04-23 11:45 ` Rik van Riel
1 sibling, 2 replies; 43+ messages in thread
From: Nick Piggin @ 2007-04-23 10:21 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> Use TLB batching for MADV_FREE. Adds another 10-15% extra performance
> to the MySQL sysbench results on my quad core system.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> Rik van Riel wrote:
>
>>> I've added a 5th column, with just your mmap_sem patch and
>>> without my madv_free patch. It is run with the glibc patch,
>>> which should make it fall back to MADV_DONTNEED after the
>>> first MADV_FREE call fails.
>
>
> With the attached patch to make MADV_FREE use tlb batching, not
> only do we gain an additional 10-15% performance but Nick's
> mmap_sem patch also shows the performance increase that we
> expected to see.
>
> It looks like the tlb flushes (and IPIs) from zap_pte_range()
> could have been the problem. They're gone now.
I guess it is a good idea to batch these things. But can you
do that on all architectures? What happens if your tlb flush
happens after another thread already accesses it again, or
after it subsequently gets removed from the address space via
another CPU?
>
> The second column from the right has Nick's patch and my own
> two patches. Performance with 16 threads is almost triple what
> it used to be...
>
> vanilla glibc glibc glibc glibc glibc glibc
> madv_free madv_free madv_free madv_free
> mmap_sem mmap_sem mmap_sem
> tlb batch tlb_batch
> threads
>
> 1 610 609 596 545 534 547 537
> 2 1032 1136 1196 1200 1180 1293 1194
> 4 1070 1128 2014 2024 2027 2248 2040
> 8 1000 1088 1665 2087 2089 2314 1869
> 16 779 1073 1310 1999 2012 2214 1557
>
>
>> Now that I think about it - this is all with the rawhide kernel
>> configuration, which has an ungodly number of debug config
>> options enabled.
>>
>> I should try this with a more normal kernel, on various different
>> systems.
>
>
> This is for another day. :)
>
> First some ebizzy runs...
>
>
> ------------------------------------------------------------------------
>
> --- linux-2.6.20.x86_64/mm/memory.c.orig 2007-04-23 02:48:36.000000000 -0400
> +++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 02:54:42.000000000 -0400
> @@ -677,11 +677,15 @@ static unsigned long zap_pte_range(struc
> remove_exclusive_swap_page(page);
> unlock_page(page);
> }
> - ptep_clear_flush_dirty(vma, addr, pte);
> - ptep_clear_flush_young(vma, addr, pte);
> SetPageLazyFree(page);
> if (PageActive(page))
> deactivate_tail_page(page);
> + ptent = *pte;
> + set_pte_at(mm, addr, pte,
> + pte_mkclean(pte_mkold(ptent)));
> + /* tlb_remove_page frees it again */
> + get_page(page);
> + tlb_remove_page(tlb, page);
> continue;
> }
> }
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:21 ` Nick Piggin
@ 2007-04-23 10:31 ` Rik van Riel
2007-04-23 10:35 ` Nick Piggin
2007-04-23 10:44 ` Jakub Jelinek
1 sibling, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 10:31 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Nick Piggin wrote:
>> It looks like the tlb flushes (and IPIs) from zap_pte_range()
>> could have been the problem. They're gone now.
>
> I guess it is a good idea to batch these things. But can you
> do that on all architectures? What happens if your tlb flush
> happens after another thread already accesses it again, or
> after it subsequently gets removed from the address space via
> another CPU?
I have thought about this a lot tonight, and have come to the conclusion
that they are ok.
The reason is simple:
1) we do the TLB flush before we return from the
madvise(MADV_FREE) syscall.
2) anything that accessess the pages between the start
and end of the MADV_FREE procedure does not know in
which order we go through the pages, so it could hit
a page either before or after we get to processing
it
3) because of this, we can treat any such accesses as
happening simultaneously with the MADV_FREE and
as illegal, aka undefined behaviour territory and
we do not need to worry about them
4) because we flush the tlb before releasing the page
table lock, other CPUs cannot remove this page from
the address space - they will block on the page
table lock before looking at this pte
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:31 ` Rik van Riel
@ 2007-04-23 10:35 ` Nick Piggin
2007-04-23 10:44 ` Rik van Riel
2007-04-24 2:53 ` Rik van Riel
0 siblings, 2 replies; 43+ messages in thread
From: Nick Piggin @ 2007-04-23 10:35 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> Nick Piggin wrote:
>
>>> It looks like the tlb flushes (and IPIs) from zap_pte_range()
>>> could have been the problem. They're gone now.
>>
>>
>> I guess it is a good idea to batch these things. But can you
>> do that on all architectures? What happens if your tlb flush
>> happens after another thread already accesses it again, or
>> after it subsequently gets removed from the address space via
>> another CPU?
>
>
> I have thought about this a lot tonight, and have come to the conclusion
> that they are ok.
>
> The reason is simple:
>
> 1) we do the TLB flush before we return from the
> madvise(MADV_FREE) syscall.
>
> 2) anything that accessess the pages between the start
> and end of the MADV_FREE procedure does not know in
> which order we go through the pages, so it could hit
> a page either before or after we get to processing
> it
>
> 3) because of this, we can treat any such accesses as
> happening simultaneously with the MADV_FREE and
> as illegal, aka undefined behaviour territory and
> we do not need to worry about them
Yes, but I'm wondering if it is legal in all architectures.
>
> 4) because we flush the tlb before releasing the page
> table lock, other CPUs cannot remove this page from
> the address space - they will block on the page
> table lock before looking at this pte
We don't when the ptl is split.
What the tlb flush used to be able to assume is that the page
has been removed from the pagetables when they are put in the
tlb flush batch.
I'm not saying there is any bugs, but just suggesting there
might be.
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:21 ` Nick Piggin
2007-04-23 10:31 ` Rik van Riel
@ 2007-04-23 10:44 ` Jakub Jelinek
1 sibling, 0 replies; 43+ messages in thread
From: Jakub Jelinek @ 2007-04-23 10:44 UTC (permalink / raw)
To: Nick Piggin
Cc: Rik van Riel, Andrew Morton, linux-kernel, linux-mm, shak, drepper
On Mon, Apr 23, 2007 at 08:21:37PM +1000, Nick Piggin wrote:
> I guess it is a good idea to batch these things. But can you
> do that on all architectures? What happens if your tlb flush
> happens after another thread already accesses it again, or
> after it subsequently gets removed from the address space via
> another CPU?
Accessing the page by another thread before madvise (MADV_FREE)
returns is undefined behavior, it can act as if that access happened
right before the madvise (MADV_FREE) call or right after it.
That's ok for glibc and supposedly any other malloc implementation,
madvise (MADV_FREE) is called while holding containing's arena lock
and for whatever malloc implementaton, madvise (MADV_FREE) would be
part of free operations and you definitely need some synchronization
between one thread freeing some memory and other thread deciding
to reuse that memory and return it from malloc/realloc/calloc/etc.
My only concern is whether using non-atomic update of the pte is
ok or not.
ptep_test_and_clear_young/ptep_test_and_clear_dirty Rik's patch
was doing before are done using atomic instructions, at least on x86_64.
The operation we want for MADV_FREE is, clear young/dirty bits if they
have been set on entry to the MADV_FREE madvise call, undefined values
for these 2 bits if some other task modifies the young/dirty bits
concurrently with this MADV_FREE zap_page_range, but I'd say other
bits need to be unmodified.
Now, is there some kernel code which while either not holding corresponding
mmap_sem at all or holding it just down_read modifies other bits
in the pte? If yes, we need to do this clearing atomically, basically
do a cmpxchg loop until we succeed to clear the 2 bits and then flush
the tlb if any of them was set before (ptep_test_and_clear_dirty_and_young?),
if not, set_pte_at is ok and faster than a lock prefixed insn.
Jakub
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:35 ` Nick Piggin
@ 2007-04-23 10:44 ` Rik van Riel
2007-04-24 1:15 ` Nick Piggin
2007-04-24 2:53 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 10:44 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]
Use TLB batching for MADV_FREE. Adds another 10-15% extra performance
to the MySQL sysbench results on my quad core system.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
Nick Piggin wrote:
>> 3) because of this, we can treat any such accesses as
>> happening simultaneously with the MADV_FREE and
>> as illegal, aka undefined behaviour territory and
>> we do not need to worry about them
>
> Yes, but I'm wondering if it is legal in all architectures.
It's similar to trying to access memory during an munmap.
You may be able to for a short time, but it'll come back to
haunt you.
>> 4) because we flush the tlb before releasing the page
>> table lock, other CPUs cannot remove this page from
>> the address space - they will block on the page
>> table lock before looking at this pte
>
> We don't when the ptl is split.
Even then we do. Each invocation of zap_pte_range() only touches
one page table page, and it flushes the TLB before releasing the
page table lock.
> What the tlb flush used to be able to assume is that the page
> has been removed from the pagetables when they are put in the
> tlb flush batch.
All the tlb flush code seems to assume is that the tlb entries
should be invalidated.
> I'm not saying there is any bugs, but just suggesting there
> might be.
Jakub found a potential bug, in that I did not use an atomic
operation to clear the page table entries. I've attached a
new patch which simply uses ptep_test_and_clear_dirty/young
to get rid of the dirty and accessed bits.
It uses the same atomic accesses we use elsewhere in the VM
and the code is a line shorter than before.
Andrew, please use this one.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
[-- Attachment #2: linux-2.6-madv_free-lazytlb.patch --]
[-- Type: text/x-patch, Size: 697 bytes --]
--- linux-2.6.20.x86_64/mm/memory.c.orig 2007-04-23 02:48:36.000000000 -0400
+++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 02:54:42.000000000 -0400
@@ -677,11 +677,14 @@ static unsigned long zap_pte_range(struc
remove_exclusive_swap_page(page);
unlock_page(page);
}
- ptep_clear_flush_dirty(vma, addr, pte);
- ptep_clear_flush_young(vma, addr, pte);
+ ptep_test_and_clear_dirty(vma, addr, pte);
+ ptep_test_and_clear_young(vma, addr, pte);
SetPageLazyFree(page);
if (PageActive(page))
deactivate_tail_page(page);
+ /* tlb_remove_page frees it again */
+ get_page(page);
+ tlb_remove_page(tlb, page);
continue;
}
}
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 9:20 ` Rik van Riel
2007-04-23 10:21 ` Nick Piggin
@ 2007-04-23 11:45 ` Rik van Riel
1 sibling, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-23 11:45 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> First some ebizzy runs...
This is interesting. Ginormous speedups in ebizzy[1] on my quad core
test system. The following numbers are the average of 10 runs, since
ebizzy shows some variability.
You can see a big influence from the tlb batching and from Nick's
madv_sem patch. The reduction in system time from 100 seconds to
3 seconds is way more than I had expected, but I'm not complaining.
The 4 fold reduction in wall clock time is a nice bonus.
According to Val, ebizzy shows the weaknesses of Linux with a real
workload, so this could be a useful result.
kernel
user system wall clock %CPU
vanilla 186s 101s 123s 230%
madv_free (madv) 175s 96s 120s 230%
mmap_sem (sem) 100s 40s 40s 370%
madv+sem 200s 140s 100s 393%
madv+sem+tlb 118s 3s 30s 395%
madv+tlb 150s 10s 50s 310%
[1] http://www.ussg.iu.edu/hypermail/linux/kernel/0604.2/1699.html
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:44 ` Rik van Riel
@ 2007-04-24 1:15 ` Nick Piggin
2007-04-24 1:58 ` Rik van Riel
0 siblings, 1 reply; 43+ messages in thread
From: Nick Piggin @ 2007-04-24 1:15 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> Use TLB batching for MADV_FREE. Adds another 10-15% extra performance
> to the MySQL sysbench results on my quad core system.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>
> Nick Piggin wrote:
>
>>> 3) because of this, we can treat any such accesses as
>>> happening simultaneously with the MADV_FREE and
>>> as illegal, aka undefined behaviour territory and
>>> we do not need to worry about them
>>
>>
>> Yes, but I'm wondering if it is legal in all architectures.
>
>
> It's similar to trying to access memory during an munmap.
>
> You may be able to for a short time, but it'll come back to
> haunt you.
The question is whether the architecture specific tlb
flushing code will break or not.
>>> 4) because we flush the tlb before releasing the page
>>> table lock, other CPUs cannot remove this page from
>>> the address space - they will block on the page
>>> table lock before looking at this pte
>>
>>
>> We don't when the ptl is split.
>
>
> Even then we do. Each invocation of zap_pte_range() only touches
> one page table page, and it flushes the TLB before releasing the
> page table lock.
What kernel are you looking at? -rc7 and rc6-mm1 don't, AFAIKS.
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-24 1:15 ` Nick Piggin
@ 2007-04-24 1:58 ` Rik van Riel
2007-04-24 2:16 ` Nick Piggin
2007-04-24 4:42 ` Paul Mackerras
0 siblings, 2 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-24 1:58 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]
This should fix the MADV_FREE code for PPC's hashed tlb.
Signed-off-by: Rik van Riel <riel@redhat.com>
---
Nick Piggin wrote:
>> Nick Piggin wrote:
>>
>>>> 3) because of this, we can treat any such accesses as
>>>> happening simultaneously with the MADV_FREE and
>>>> as illegal, aka undefined behaviour territory and
>>>> we do not need to worry about them
>>>
>>>
>>> Yes, but I'm wondering if it is legal in all architectures.
>>
>>
>> It's similar to trying to access memory during an munmap.
>>
>> You may be able to for a short time, but it'll come back to
>> haunt you.
>
> The question is whether the architecture specific tlb
> flushing code will break or not.
I guess we'll need to call tlb_remove_tlb_entry() inside the
MADV_FREE code to keep powerpc happy.
Thanks for pointing this one out.
>> Even then we do. Each invocation of zap_pte_range() only touches
>> one page table page, and it flushes the TLB before releasing the
>> page table lock.
>
> What kernel are you looking at? -rc7 and rc6-mm1 don't, AFAIKS.
Oh dear. I see it now...
The tlb end things inside zap_pte_range() are actually
noops and the actual tlb flush only happens inside
zap_page_range().
I guess the fact that munmap gets the mmap_sem for
writing should save us, though...
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
[-- Attachment #2: linux-2.6-madv-ppcfix.patch --]
[-- Type: text/x-patch, Size: 453 bytes --]
--- linux-2.6.20.x86_64/mm/memory.c.noppc 2007-04-23 21:50:09.000000000 -0400
+++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 21:48:59.000000000 -0400
@@ -679,6 +679,7 @@ static unsigned long zap_pte_range(struc
}
ptep_test_and_clear_dirty(vma, addr, pte);
ptep_test_and_clear_young(vma, addr, pte);
+ tlb_remove_tlb_entry(tlb, pte, addr);
SetPageLazyFree(page);
if (PageActive(page))
deactivate_tail_page(page);
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-24 1:58 ` Rik van Riel
@ 2007-04-24 2:16 ` Nick Piggin
2007-04-24 4:42 ` Paul Mackerras
1 sibling, 0 replies; 43+ messages in thread
From: Nick Piggin @ 2007-04-24 2:16 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel wrote:
> This should fix the MADV_FREE code for PPC's hashed tlb.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>
> Nick Piggin wrote:
>
>>> Nick Piggin wrote:
>>>
>>>>> 3) because of this, we can treat any such accesses as
>>>>> happening simultaneously with the MADV_FREE and
>>>>> as illegal, aka undefined behaviour territory and
>>>>> we do not need to worry about them
>>>>
>>>>
>>>>
>>>> Yes, but I'm wondering if it is legal in all architectures.
>>>
>>>
>>>
>>> It's similar to trying to access memory during an munmap.
>>>
>>> You may be able to for a short time, but it'll come back to
>>> haunt you.
>>
>>
>> The question is whether the architecture specific tlb
>> flushing code will break or not.
>
>
> I guess we'll need to call tlb_remove_tlb_entry() inside the
> MADV_FREE code to keep powerpc happy.
>
> Thanks for pointing this one out.
>
>>> Even then we do. Each invocation of zap_pte_range() only touches
>>> one page table page, and it flushes the TLB before releasing the
>>> page table lock.
>>
>>
>> What kernel are you looking at? -rc7 and rc6-mm1 don't, AFAIKS.
>
>
> Oh dear. I see it now...
>
> The tlb end things inside zap_pte_range() are actually
> noops and the actual tlb flush only happens inside
> zap_page_range().
>
> I guess the fact that munmap gets the mmap_sem for
> writing should save us, though...
What about an unmap_mapping_range, or another MADV_FREE or
MADV_DONTNEED?
>
>
> ------------------------------------------------------------------------
>
> --- linux-2.6.20.x86_64/mm/memory.c.noppc 2007-04-23 21:50:09.000000000 -0400
> +++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 21:48:59.000000000 -0400
> @@ -679,6 +679,7 @@ static unsigned long zap_pte_range(struc
> }
> ptep_test_and_clear_dirty(vma, addr, pte);
> ptep_test_and_clear_young(vma, addr, pte);
> + tlb_remove_tlb_entry(tlb, pte, addr);
> SetPageLazyFree(page);
> if (PageActive(page))
> deactivate_tail_page(page);
--
SUSE Labs, Novell Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-23 10:35 ` Nick Piggin
2007-04-23 10:44 ` Rik van Riel
@ 2007-04-24 2:53 ` Rik van Riel
2007-04-24 3:08 ` Andrew Morton
1 sibling, 1 reply; 43+ messages in thread
From: Rik van Riel @ 2007-04-24 2:53 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
[-- Attachment #1: Type: text/plain, Size: 838 bytes --]
Nick Piggin wrote:
> What the tlb flush used to be able to assume is that the page
> has been removed from the pagetables when they are put in the
> tlb flush batch.
I think this is still the case, to a degree. There should be
no harm in removing the TLB entries after the page table has
been unlocked, right?
Or is something like the attached really needed?
From what I can see, the page table lock should be enough
synchronization between unmap_mapping_range, MADV_FREE and
MADV_DONTNEED.
I don't see why we need the attached, but in case you find
a good reason, here's my signed-off-by line for Andrew :)
Signed-off-by: Rik van Riel <riel@redhat.com>
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
[-- Attachment #2: linux-2.6-madv_free-flushme.patch --]
[-- Type: text/x-patch, Size: 750 bytes --]
--- linux-2.6.20.x86_64/mm/memory.c.flushme 2007-04-23 22:26:06.000000000 -0400
+++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 22:42:06.000000000 -0400
@@ -628,6 +628,7 @@ static unsigned long zap_pte_range(struc
long *zap_work, struct zap_details *details)
{
struct mm_struct *mm = tlb->mm;
+ unsigned long start_addr = addr;
pte_t *pte;
spinlock_t *ptl;
int file_rss = 0;
@@ -726,6 +727,11 @@ static unsigned long zap_pte_range(struc
add_mm_rss(mm, file_rss, anon_rss);
arch_leave_lazy_mmu_mode();
+ if (details && details->madv_free) {
+ /* Protect against MADV_DONTNEED or unmap_mapping_range */
+ tlb_finish_mmu(tlb, start_addr, addr);
+ tlb = tlb_gather_mmu(mm, 0);
+ }
pte_unmap_unlock(pte - 1, ptl);
return addr;
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-24 2:53 ` Rik van Riel
@ 2007-04-24 3:08 ` Andrew Morton
0 siblings, 0 replies; 43+ messages in thread
From: Andrew Morton @ 2007-04-24 3:08 UTC (permalink / raw)
To: Rik van Riel; +Cc: Nick Piggin, linux-kernel, linux-mm, shak, jakub, drepper
On Mon, 23 Apr 2007 22:53:49 -0400 Rik van Riel <riel@redhat.com> wrote:
> I don't see why we need the attached, but in case you find
> a good reason, here's my signed-off-by line for Andrew :)
Andew is in a defensive crouch trying to work his way through all the bugs
he's been sent. After I've managed to release 2.6.21-rc7-mm1 (say, December)
I expect I'll drop the MADV_FREE stuff, give you a run at creating a new
patch series.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-24 1:58 ` Rik van Riel
2007-04-24 2:16 ` Nick Piggin
@ 2007-04-24 4:42 ` Paul Mackerras
2007-04-24 5:13 ` Rik van Riel
1 sibling, 1 reply; 43+ messages in thread
From: Paul Mackerras @ 2007-04-24 4:42 UTC (permalink / raw)
To: Rik van Riel
Cc: Nick Piggin, Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Rik van Riel writes:
> I guess we'll need to call tlb_remove_tlb_entry() inside the
> MADV_FREE code to keep powerpc happy.
I don't see why; once ptep_test_and_clear_young has returned, the
entry in the hash table has already been removed. Adding the
tlb_remove_tlb_entry call certainly won't do anything on 64-bit
powerpc, since it expands to do {} while (0) there, and in fact it
won't do anything on 32-bit powerpc either.
Paul.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH] lazy freeing of memory through MADV_FREE
2007-04-24 4:42 ` Paul Mackerras
@ 2007-04-24 5:13 ` Rik van Riel
0 siblings, 0 replies; 43+ messages in thread
From: Rik van Riel @ 2007-04-24 5:13 UTC (permalink / raw)
To: Paul Mackerras
Cc: Nick Piggin, Andrew Morton, linux-kernel, linux-mm, shak, jakub, drepper
Paul Mackerras wrote:
> Rik van Riel writes:
>
>> I guess we'll need to call tlb_remove_tlb_entry() inside the
>> MADV_FREE code to keep powerpc happy.
>
> I don't see why; once ptep_test_and_clear_young has returned, the
> entry in the hash table has already been removed.
OK, so this one won't be necessary. Good to know that.
Andrew, it looks like things won't be that bad :)
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2007-04-24 5:13 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-17 7:15 [PATCH] lazy freeing of memory through MADV_FREE Rik van Riel
2007-04-19 21:15 ` [PATCH] lazy freeing of memory through MADV_FREE 2/2 Rik van Riel
2007-04-20 21:03 ` Andrew Morton
2007-04-20 21:24 ` Ulrich Drepper
2007-04-21 7:37 ` Hugh Dickins
2007-04-21 16:32 ` Ulrich Drepper
2007-04-20 20:57 ` [PATCH] lazy freeing of memory through MADV_FREE Andrew Morton
2007-04-20 21:38 ` Rik van Riel
2007-04-20 22:06 ` Andrew Morton
2007-04-20 23:52 ` Rik van Riel
2007-04-21 0:48 ` Eric Dumazet
2007-04-21 3:58 ` Rik van Riel
2007-04-21 7:12 ` Jakub Jelinek
2007-04-23 4:36 ` Nick Piggin
2007-04-22 2:36 ` Nick Piggin
2007-04-22 2:50 ` Nick Piggin
2007-04-22 6:31 ` Rik van Riel
2007-04-23 0:16 ` Nick Piggin
2007-04-23 3:53 ` Rik van Riel
2007-04-23 3:58 ` Nick Piggin
2007-04-23 10:07 ` Nick Piggin
2007-04-23 10:12 ` Rik van Riel
2007-04-23 3:59 ` Rik van Riel
2007-04-23 9:20 ` Rik van Riel
2007-04-23 10:21 ` Nick Piggin
2007-04-23 10:31 ` Rik van Riel
2007-04-23 10:35 ` Nick Piggin
2007-04-23 10:44 ` Rik van Riel
2007-04-24 1:15 ` Nick Piggin
2007-04-24 1:58 ` Rik van Riel
2007-04-24 2:16 ` Nick Piggin
2007-04-24 4:42 ` Paul Mackerras
2007-04-24 5:13 ` Rik van Riel
2007-04-24 2:53 ` Rik van Riel
2007-04-24 3:08 ` Andrew Morton
2007-04-23 10:44 ` Jakub Jelinek
2007-04-23 11:45 ` Rik van Riel
2007-04-23 4:28 ` Rik van Riel
2007-04-21 7:24 ` Hugh Dickins
2007-04-21 18:06 ` Rik van Riel
2007-04-22 8:18 ` Andrew Morton
2007-04-22 9:16 ` Christoph Hellwig
2007-04-22 16:55 ` Ulrich Drepper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox