* [PATCH] Avoiding mmap fragmentation - clean rev
[not found] <E4BA51C8E4E9634993418831223F0A49291F06E1@scsmsx401.amr.corp.intel.com>
@ 2005-05-17 22:28 ` Chen, Kenneth W
2005-05-18 7:28 ` Arjan van de Ven
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-17 22:28 UTC (permalink / raw)
To: 'Wolfgang Wander', 'Andrew Morton'
Cc: mingo, arjanv, linux-mm
This patch tries to solve address space fragmentation issue brought
up by Wolfgang where fragmentation is so severe that application
would fail on 2.6 kernel. Looking a bit deep into the issue, we
found that a lot of fragmentation were caused by suboptimal algorithm
in the munmap code path. For example, as people pointed out that
when a series of munmap occurs, the free_area_cache would point to
last vma that was freed, ignoring its surrounding and not performing
any coalescing at all, thus artificially create more holes in the
virtual address space than necessary. However, all the information
needed to perform coalescing are actually already there. This patch
put that data in use so we will prevent artificial fragmentation.
This patch covers both bottom-up and top-down topology. For bottom-up
topology, free_area_cache points to prev->vm_end. And for top-down,
free_area_cache points to next->vm_start. The results are very promising,
it passes the test case that Wolfgang posted and I have tested it on a
variety of x86, x86_64, ia64 machines.
Please note, this patch completely obsoletes previous patch that
Wolfgang posted and should completely retain the performance benefit
of free_area_cache and at the same time preserving fragmentation to
minimum.
Andrew, please consider for -mm testing. Thanks.
- Ken Chen
mmap.c | 18 +++++++++++++-----
1 files changed, 13 insertions(+), 5 deletions(-)
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
--- linux-2.6.11/mm/mmap.c.orig 2005-05-17 15:05:02.487937407 -0700
+++ linux-2.6.11/mm/mmap.c 2005-05-17 15:05:13.292624775 -0700
@@ -1208,9 +1208,10 @@ void arch_unmap_area(struct vm_area_stru
/*
* Is this a new hole at the lowest possible address?
*/
- if (area->vm_start >= TASK_UNMAPPED_BASE &&
- area->vm_start < area->vm_mm->free_area_cache)
- area->vm_mm->free_area_cache = area->vm_start;
+ unsigned long addr = (unsigned long) area->vm_private_data;
+
+ if (addr >= TASK_UNMAPPED_BASE && addr < area->vm_mm->free_area_cache)
+ area->vm_mm->free_area_cache = addr;
}
/*
@@ -1290,8 +1291,10 @@ void arch_unmap_area_topdown(struct vm_a
/*
* Is this a new hole at the highest possible address?
*/
- if (area->vm_end > area->vm_mm->free_area_cache)
- area->vm_mm->free_area_cache = area->vm_end;
+ unsigned long addr = (unsigned long) area->vm_private_data;
+
+ if (addr > area->vm_mm->free_area_cache)
+ area->vm_mm->free_area_cache = addr;
/* dont allow allocations above current base */
if (area->vm_mm->free_area_cache > area->vm_mm->mmap_base)
@@ -1656,6 +1659,11 @@ detach_vmas_to_be_unmapped(struct mm_str
} while (vma && vma->vm_start < end);
*insertion_point = vma;
tail_vma->vm_next = NULL;
+ if (mm->unmap_area == arch_unmap_area)
+ tail_vma->vm_private_data = (void*) prev->vm_end;
+ else
+ tail_vma->vm_private_data = vma ?
+ (void*) vma->vm_start : (void*) mm->mmap_base;
mm->mmap_cache = NULL; /* Kill the cache. */
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-17 22:28 ` [PATCH] Avoiding mmap fragmentation - clean rev Chen, Kenneth W
@ 2005-05-18 7:28 ` Arjan van de Ven
2005-05-18 7:43 ` Ingo Molnar
2005-05-18 7:37 ` Ingo Molnar
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2005-05-18 7:28 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander', 'Andrew Morton', mingo, linux-mm
On Tue, May 17, 2005 at 03:28:46PM -0700, Chen, Kenneth W wrote:
> This patch tries to solve address space fragmentation issue brought
> up by Wolfgang where fragmentation is so severe that application
> would fail on 2.6 kernel. Looking a bit deep into the issue, we
> found that a lot of fragmentation were caused by suboptimal algorithm
> in the munmap code path. For example, as people pointed out that
> when a series of munmap occurs, the free_area_cache would point to
> last vma that was freed, ignoring its surrounding and not performing
> any coalescing at all, thus artificially create more holes in the
> virtual address space than necessary. However, all the information
> needed to perform coalescing are actually already there. This patch
> put that data in use so we will prevent artificial fragmentation.
>
> This patch covers both bottom-up and top-down topology. For bottom-up
> topology, free_area_cache points to prev->vm_end. And for top-down,
> free_area_cache points to next->vm_start. The results are very promising,
> it passes the test case that Wolfgang posted and I have tested it on a
> variety of x86, x86_64, ia64 machines.
>
> Please note, this patch completely obsoletes previous patch that
> Wolfgang posted and should completely retain the performance benefit
> of free_area_cache and at the same time preserving fragmentation to
> minimum.
this has one downside (other than that I like it due to it's simplicity):
we've seen situations where there was a 4Kb gap at the start of the mmaps,
and then all future mmaps are bigger (say, stack sized). That 4Kb gap would
entirely void the advantage of the cache if the cache stuck to that 4kb gap.
(Personally I favor correctness above all but it does hurt performance
really bad)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-18 7:28 ` Arjan van de Ven
@ 2005-05-18 7:43 ` Ingo Molnar
0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2005-05-18 7:43 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Chen, Kenneth W, 'Wolfgang Wander',
'Andrew Morton',
linux-mm
* Arjan van de Ven <arjanv@redhat.com> wrote:
> > Please note, this patch completely obsoletes previous patch that
> > Wolfgang posted and should completely retain the performance benefit
> > of free_area_cache and at the same time preserving fragmentation to
> > minimum.
>
> this has one downside (other than that I like it due to it's
> simplicity): we've seen situations where there was a 4Kb gap at the
> start of the mmaps, and then all future mmaps are bigger (say, stack
> sized). That 4Kb gap would entirely void the advantage of the cache if
> the cache stuck to that 4kb gap. (Personally I favor correctness above
> all but it does hurt performance really bad)
hm, does the cache get permanently stuck at a small hole with Ken's
patch? An unmap may reset the cache to the hole once, but subsequent
unmaps (or mmaps) ought to move it to a larger hole again.
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-17 22:28 ` [PATCH] Avoiding mmap fragmentation - clean rev Chen, Kenneth W
2005-05-18 7:28 ` Arjan van de Ven
@ 2005-05-18 7:37 ` Ingo Molnar
2005-05-18 13:05 ` Wolfgang Wander
2005-05-18 15:47 ` Wolfgang Wander
3 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2005-05-18 7:37 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander', 'Andrew Morton', arjanv, linux-mm
* Chen, Kenneth W <kenneth.w.chen@intel.com> wrote:
> Please note, this patch completely obsoletes previous patch that
> Wolfgang posted and should completely retain the performance benefit
> of free_area_cache and at the same time preserving fragmentation to
> minimum.
>
> Andrew, please consider for -mm testing. Thanks.
very nice patch!
Acked-by: Ingo Molnar <mingo@elte.hu>
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-17 22:28 ` [PATCH] Avoiding mmap fragmentation - clean rev Chen, Kenneth W
2005-05-18 7:28 ` Arjan van de Ven
2005-05-18 7:37 ` Ingo Molnar
@ 2005-05-18 13:05 ` Wolfgang Wander
2005-05-18 15:47 ` Wolfgang Wander
3 siblings, 0 replies; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-18 13:05 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander', 'Andrew Morton',
mingo, arjanv, linux-mm
Chen, Kenneth W writes:
> This patch tries to solve address space fragmentation issue brought
> up by Wolfgang where fragmentation is so severe that application
> would fail on 2.6 kernel. Looking a bit deep into the issue, we
> found that a lot of fragmentation were caused by suboptimal algorithm
> in the munmap code path. For example, as people pointed out that
> when a series of munmap occurs, the free_area_cache would point to
> last vma that was freed, ignoring its surrounding and not performing
> any coalescing at all, thus artificially create more holes in the
> virtual address space than necessary. However, all the information
> needed to perform coalescing are actually already there. This patch
> put that data in use so we will prevent artificial fragmentation.
>
> This patch covers both bottom-up and top-down topology. For bottom-up
> topology, free_area_cache points to prev->vm_end. And for top-down,
> free_area_cache points to next->vm_start. The results are very promising,
> it passes the test case that Wolfgang posted and I have tested it on a
> variety of x86, x86_64, ia64 machines.
>
> Please note, this patch completely obsoletes previous patch that
> Wolfgang posted and should completely retain the performance benefit
> of free_area_cache and at the same time preserving fragmentation to
> minimum.
>
> Andrew, please consider for -mm testing. Thanks.
>
I do like it for its simplicity. My test case is perfectly happy
with it and I'm all for including this one rather than my patch
as a fix.
Please note though that this patch only seems to address the issue of
fragmentation due to unmapping.
The other issue, namely that the old code (2.4) and my patch tend to
fill holes in near the base with small requests thus leaving larger
holes far from the base uncluttered is not addressed. Here as in the
orginal 2.6 code we will distribute small request equally in all
available holes which will close larger holes unnessecarily.
I'll rerun my large scale applications that caused us to detect the
fragmentation issues in the first place. If they fail (which I don't
believe) we could maybe combine the two approaches to get a better
cache pointer for the unmap case and a way to unclutter the address
space via the cached_hole_size.
Wolfgang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-17 22:28 ` [PATCH] Avoiding mmap fragmentation - clean rev Chen, Kenneth W
` (2 preceding siblings ...)
2005-05-18 13:05 ` Wolfgang Wander
@ 2005-05-18 15:47 ` Wolfgang Wander
2005-05-18 16:18 ` Chen, Kenneth W
3 siblings, 1 reply; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-18 15:47 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander',
Hervé Piedvache, 'Andrew Morton',
mingo, arjanv, linux-mm, to, Herve, added
Chen, Kenneth W writes:
> This patch tries to solve address space fragmentation issue brought
> up by Wolfgang where fragmentation is so severe that application
> would fail on 2.6 kernel. Looking a bit deep into the issue, we
> found that a lot of fragmentation were caused by suboptimal algorithm
> in the munmap code path. For example, as people pointed out that
> when a series of munmap occurs, the free_area_cache would point to
> last vma that was freed, ignoring its surrounding and not performing
> any coalescing at all, thus artificially create more holes in the
> virtual address space than necessary. However, all the information
> needed to perform coalescing are actually already there. This patch
> put that data in use so we will prevent artificial fragmentation.
>
> This patch covers both bottom-up and top-down topology. For bottom-up
> topology, free_area_cache points to prev->vm_end. And for top-down,
> free_area_cache points to next->vm_start. The results are very promising,
> it passes the test case that Wolfgang posted and I have tested it on a
> variety of x86, x86_64, ia64 machines.
>
Hi Ken,
I have to retract my earlier statement partially. While this patch
does address the problems with munmap's tendency to fragment the maps
areas, the issue it does not address, namely the lack of concentrating
smaller requests towards the base is indeed important to us.
With your patch the two large applications that triggered the
fragmentation issue do still fail. So we still have a regression from
2.4 kernels to 2.6 with this fix.
So I'd vote (hope it counts ;-) to either include your munmap
improvements into my earlier avoiding-fragmentation-fix or use
my (admittedly more complex) patch instead.
I will append both a test case and the (nearly) final
/proc/self/maps status of our failing application (cleansed slightly)
First the test case:
Compile via
gcc -static leakme4.c -o leakme4.c
----------------------------------------------------------------------
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
/* logging helper function:
* print the request type (add or remove mmaps)
* and dump /proc/self/maps while counting the
* mapped areas between 0x10000000 and 0xf0000000
*/
void
dumpselfmaps(char w, void* p, size_t l, int printmaps)
{
int f,i;
int c;
int crs = 0;
static int count = 0;
static size_t totsize = 0;
char buf[65536];
if( w == '-' )
totsize -= l;
else if( w == '+' )
totsize += l;
if( w ) {
printf(" ------ %d %c %p-%p (%p / %p) -------", count++,
w, p, ((char*)p)+l, (char*)l, (char*)totsize);
} else {
printf(" ------ %d -------", count++);
}
if( printmaps )
putchar('\n');
fflush(stdout);
f = open( "/proc/self/maps", O_RDONLY );
while( (c = read( f, buf, sizeof(buf))) > 0 ) {
if( printmaps )
write( 1, buf, c);
for( i = 0; i < c-1; ++i )
if( buf[i] == '\n' &&
buf[i+1] != '0' &&
buf[i+1] != 'f')
++crs;
}
printf( "Total allocated areas: %d\n", crs );
fflush(stdout);
close(f);
}
/* map helper function - unmap and log request */
void* mymmap( size_t len) {
void *m1 = mmap(0, len, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
dumpselfmaps('+', m1, len, 0);
if( m1 == (void*)-1)
printf("ERROR allocating %p bytes\n", (void*)len);
return m1;
}
/* unmap helper function - unmap and log request */
void mymunmap( void* m1, size_t len )
{
munmap( m1, len);
dumpselfmaps('-', m1, len, 0);
}
void
aLLocator()
{
#define k1 128
#define k2 64
char *maps[k1];
char *naps[k2];
int i;
/* allocate k1 maps of size 0x100000 */
for( i = 0; i < k1; ++i )
maps[i] = mymmap(0x100000);
/* free every second of them - creating k1/2 holes */
for( i = 0; i < k1; i += 2 )
mymunmap( maps[i], 0x100000);
/* now fill the holes with alternating 0x1000/0x100000 maps */
for( i = 0; i < k1; i += 2 )
maps[i] = mymmap((i & 4) ? 0x100000 : 0x1000);
/* request some more memory of size 0x100000 */
for( i = 0; i < k2; ++i)
naps[i] = mymmap(0x100000);
}
int main() {
aLLocator();
dumpselfmaps(0,0,0,1);
/* don't clean up ;-) */
return 0;
}
----------------------------------------------------------------------
The output for various kernels looks like
2.4.21
[...]
08048000-080a3000 r-xp 00000000 00:82 15992029 /home/wwc/tmp/leakme4
080a3000-080a5000 rwxp 0005b000 00:82 15992029 /home/wwc/tmp/leakme4
080a5000-080c7000 rwxp 00000000 00:00 0
55555000-55575000 rwxp 00000000 00:00 0
55655000-5f656000 rwxp 00100000 00:00 0
fffec000-ffffe000 rwxp fffffffffffef000 00:00 0
Total allocated areas: 2
-------------------------
2.6.12-rc4-no-caching (removing all references to free-area-cache)
08048000-080a3000 r-xp 00000000 00:42 15992029 /home/wwc/tmp/leakme4
080a3000-080a5000 rwxp 0005b000 00:42 15992029 /home/wwc/tmp/leakme4
080a5000-080c7000 rwxp 080a5000 00:00 0
55555000-55575000 rwxp 55555000 00:00 0
55655000-5f656000 rwxp 55655000 00:00 0
fffec000-ffffe000 rwxp fffec000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Total allocated areas: 2
-------------------------
2.6.11.10
[...]
08048000-080a3000 r-xp 00000000 00:5a 15992029 /home/wwc/tmp/leakme4
080a3000-080a5000 rwxp 0005b000 00:5a 15992029 /home/wwc/tmp/leakme4
080a5000-080c7000 rwxp 080a5000 00:00 0
55555000-55557000 rwxp 55555000 00:00 0
55655000-55b58000 rwxp 55655000 00:00 0
55c56000-56158000 rwxp 55c56000 00:00 0
56256000-56758000 rwxp 56256000 00:00 0
56856000-56d58000 rwxp 56856000 00:00 0
56e56000-57358000 rwxp 56e56000 00:00 0
57456000-57958000 rwxp 57456000 00:00 0
57a56000-57f58000 rwxp 57a56000 00:00 0
58056000-58558000 rwxp 58056000 00:00 0
58656000-58b58000 rwxp 58656000 00:00 0
58c56000-59158000 rwxp 58c56000 00:00 0
59256000-59758000 rwxp 59256000 00:00 0
59856000-59d58000 rwxp 59856000 00:00 0
59e56000-5a358000 rwxp 59e56000 00:00 0
5a456000-5a958000 rwxp 5a456000 00:00 0
5aa56000-5af58000 rwxp 5aa56000 00:00 0
5b056000-60556000 rwxp 5b056000 00:00 0
fffec000-ffffe000 rwxp fffec000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Total allocated areas: 17
-------------------------
2.6.12-rc4-mm2
[...]
08048000-080a3000 r-xp 00000000 00:18 15992029 /home/wwc/tmp/leakme4
080a3000-080a5000 rwxp 0005b000 00:18 15992029 /home/wwc/tmp/leakme4
080a5000-080c7000 rwxp 080a5000 00:00 0 [heap]
55555000-55575000 rwxp 55555000 00:00 0
55655000-5f656000 rwxp 55655000 00:00 0
fffeb000-ffffe000 rwxp fffeb000 00:00 0 [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0
Total allocated areas: 2
-------------------------
2.6.12-rc4-ken
[...]
08048000-080a3000 r-xp 00000000 00:18 15992029 /home/wwc/tmp/leakme4
080a3000-080a5000 rwxp 0005b000 00:18 15992029 /home/wwc/tmp/leakme4
080a5000-080c7000 rwxp 080a5000 00:00 0 [heap]
55655000-55758000 rwxp 55655000 00:00 0
55856000-55d58000 rwxp 55856000 00:00 0
55e56000-56358000 rwxp 55e56000 00:00 0
56456000-56958000 rwxp 56456000 00:00 0
56a56000-56f58000 rwxp 56a56000 00:00 0
57056000-57558000 rwxp 57056000 00:00 0
57656000-57b58000 rwxp 57656000 00:00 0
57c56000-58158000 rwxp 57c56000 00:00 0
58256000-58758000 rwxp 58256000 00:00 0
58856000-58d58000 rwxp 58856000 00:00 0
58e56000-59358000 rwxp 58e56000 00:00 0
59456000-59958000 rwxp 59456000 00:00 0
59a56000-59f58000 rwxp 59a56000 00:00 0
5a056000-5a558000 rwxp 5a056000 00:00 0
5a656000-5ab58000 rwxp 5a656000 00:00 0
5ac56000-5b158000 rwxp 5ac56000 00:00 0
5b256000-60656000 rwxp 5b256000 00:00 0
fffe8000-ffffe000 rwxp fffe8000 00:00 0 [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0
Total allocated areas: 17
-------------------------
Now the promised /proc/self/maps of our failing application.
08048000-08150000 r-xp 00000000 00:1a 16956478 /path/to/executable
08150000-0817a000 rwxp 00107000 00:1a 16956478 /path/to/executable
0817a000-55554000 rwxp 0817a000 00:00 0 [heap]
55555000-5556b000 r-xp 00000000 08:03 239049 /path/to/a/shared/library.so
5556b000-5556d000 rwxp 00015000 08:03 239049 /path/to/a/shared/library.so
5556d000-5556e000 rwxp 5556d000 00:00 0
5556e000-55692000 r-xp 00000000 00:1a 2128297 /path/to/a/shared/library.so
55692000-556c5000 rwxp 00123000 00:1a 2128297 /path/to/a/shared/library.so
556c5000-556c9000 rwxp 556c5000 00:00 0
556c9000-55783000 r-xp 00000000 00:1a 21710615 /path/to/a/shared/library.so
55783000-557ac000 rwxp 000b9000 00:1a 21710615 /path/to/a/shared/library.so
557ac000-557b1000 rwxp 557ac000 00:00 0
557b1000-55841000 r-xp 00000000 00:1a 2128296 /path/to/a/shared/library.so
55841000-55866000 rwxp 0008f000 00:1a 2128296 /path/to/a/shared/library.so
55866000-5586a000 rwxp 55866000 00:00 0
5586a000-5593e000 r-xp 00000000 00:1a 2128288 /path/to/a/shared/library.so
5593e000-55967000 rwxp 000d3000 00:1a 2128288 /path/to/a/shared/library.so
55967000-5596b000 rwxp 55967000 00:00 0
5596b000-55d91000 r-xp 00000000 00:1a 2128303 /path/to/a/shared/library.so
55d91000-55e3b000 rwxp 00425000 00:1a 2128303 /path/to/a/shared/library.so
55e3b000-55e49000 rwxp 55e3b000 00:00 0
55e49000-5633d000 r-xp 00000000 00:1a 7798634 /path/to/a/shared/library.so
5633d000-56432000 rwxp 004f3000 00:1a 7798634 /path/to/a/shared/library.so
56432000-56440000 rwxp 56432000 00:00 0
56440000-565fb000 r-xp 00000000 00:1a 7798638 /path/to/a/shared/library.so
565fb000-56659000 rwxp 001ba000 00:1a 7798638 /path/to/a/shared/library.so
56659000-56664000 rwxp 56659000 00:00 0
56664000-567a8000 r-xp 00000000 00:1a 7798640 /path/to/a/shared/library.so
567a8000-567f3000 rwxp 00143000 00:1a 7798640 /path/to/a/shared/library.so
567f3000-567fc000 rwxp 567f3000 00:00 0
567fc000-568c0000 r-xp 00000000 00:1a 7798636 /path/to/a/shared/library.so
568c0000-568ed000 rwxp 000c3000 00:1a 7798636 /path/to/a/shared/library.so
568ed000-568f0000 rwxp 568ed000 00:00 0
568f0000-570a1000 r-xp 00000000 00:1a 7798632 /path/to/a/shared/library.so
570a1000-571ee000 rwxp 007b0000 00:1a 7798632 /path/to/a/shared/library.so
571ee000-5720f000 rwxp 571ee000 00:00 0
5720f000-57349000 r-xp 00000000 00:1a 7798642 /path/to/a/shared/library.so
57349000-57390000 rwxp 00139000 00:1a 7798642 /path/to/a/shared/library.so
57390000-57395000 rwxp 57390000 00:00 0
57395000-576b9000 r-xp 00000000 00:1a 7798628 /path/to/a/shared/library.so
576b9000-57771000 rwxp 00323000 00:1a 7798628 /path/to/a/shared/library.so
57771000-57779000 rwxp 57771000 00:00 0
57779000-57c1c000 r-xp 00000000 00:1a 7798630 /path/to/a/shared/library.so
57c1c000-57d4b000 rwxp 004a2000 00:1a 7798630 /path/to/a/shared/library.so
57d4b000-57d59000 rwxp 57d4b000 00:00 0
57d59000-57de4000 r-xp 00000000 00:1a 2128293 /path/to/a/shared/library.so
57de4000-57e02000 rwxp 0008a000 00:1a 2128293 /path/to/a/shared/library.so
57e02000-57e04000 rwxp 57e02000 00:00 0
57e04000-57e22000 r-xp 00000000 00:1a 21710611 /path/to/a/shared/library.so
57e22000-57e2a000 rwxp 0001d000 00:1a 21710611 /path/to/a/shared/library.so
57e2a000-57f07000 r-xp 00000000 00:1a 7798626 /path/to/a/shared/library.so
57f07000-57f30000 rwxp 000dc000 00:1a 7798626 /path/to/a/shared/library.so
57f30000-57f35000 rwxp 57f30000 00:00 0
57f35000-57f43000 r-xp 00000000 00:1a 31756329 /path/to/a/shared/library.so
57f43000-57f44000 rwxp 0000d000 00:1a 31756329 /path/to/a/shared/library.so
57f44000-57f48000 r-xp 00000000 00:1a 31756327 /path/to/a/shared/library.so
57f48000-57f49000 rwxp 00003000 00:1a 31756327 /path/to/a/shared/library.so
57f49000-580da000 r-xp 00000000 00:1a 11310621 /path/to/a/shared/library.so
580da000-58128000 rwxp 00190000 00:1a 11310621 /path/to/a/shared/library.so
58128000-58185000 rwxp 58128000 00:00 0
58185000-581fc000 r-xp 00000000 00:1a 29224039 /path/to/a/shared/library.so
581fc000-58215000 rwxp 00076000 00:1a 29224039 /path/to/a/shared/library.so
58215000-58216000 rwxp 58215000 00:00 0
58216000-5826a000 r-xp 00000000 00:1a 3169412 /path/to/a/shared/library.so
5826a000-5827b000 rwxp 00053000 00:1a 3169412 /path/to/a/shared/library.so
5827b000-5827c000 rwxp 5827b000 00:00 0
5827c000-58332000 r-xp 00000000 00:1a 3169410 /path/to/a/shared/library.so
58332000-58359000 rwxp 000b5000 00:1a 3169410 /path/to/a/shared/library.so
58359000-58392000 rwxp 58359000 00:00 0
58392000-583a1000 r-xp 00000000 00:1a 12088170 /path/to/a/shared/library.so
583a1000-583a2000 rwxp 0000e000 00:1a 12088170 /path/to/a/shared/library.so
583a2000-583f8000 rwxp 583a2000 00:00 0
583f8000-58475000 r-xp 00000000 00:1a 12088172 /path/to/a/shared/library.so
58475000-58489000 rwxp 0007c000 00:1a 12088172 /path/to/a/shared/library.so
58489000-584c5000 r-xp 00000000 00:1a 16147984 /path/to/a/shared/library.so
584c5000-584d3000 rwxp 0003b000 00:1a 16147984 /path/to/a/shared/library.so
584d3000-58679000 r-xp 00000000 00:1a 16147982 /path/to/a/shared/library.so
58679000-586ea000 rwxp 001a5000 00:1a 16147982 /path/to/a/shared/library.so
586ea000-586fa000 rwxp 586ea000 00:00 0
586fa000-587db000 r-xp 00000000 00:1a 16147990 /path/to/a/shared/library.so
587db000-5880a000 rwxp 000e0000 00:1a 16147990 /path/to/a/shared/library.so
5880a000-5946b000 rwxp 5880a000 00:00 0
5946b000-59527000 r-xp 00000000 00:1a 16147988 /path/to/a/shared/library.so
59527000-5954f000 rwxp 000bb000 00:1a 16147988 /path/to/a/shared/library.so
5954f000-59554000 rwxp 5954f000 00:00 0
59554000-596ec000 r-xp 00000000 00:1a 13771393 /path/to/a/shared/library.so
596ec000-59746000 rwxp 00197000 00:1a 13771393 /path/to/a/shared/library.so
59746000-59752000 rwxp 59746000 00:00 0
59752000-598b5000 r-xp 00000000 00:1a 13771391 /path/to/a/shared/library.so
598b5000-5991e000 rwxp 00162000 00:1a 13771391 /path/to/a/shared/library.so
5991e000-59931000 rwxp 5991e000 00:00 0
59931000-59abe000 r-xp 00000000 00:1a 13771389 /path/to/a/shared/library.so
59abe000-59b34000 rwxp 0018c000 00:1a 13771389 /path/to/a/shared/library.so
59b34000-59b47000 rwxp 59b34000 00:00 0
59b47000-59bd7000 r-xp 00000000 00:1a 16147978 /path/to/a/shared/library.so
59bd7000-59bf0000 rwxp 0008f000 00:1a 16147978 /path/to/a/shared/library.so
59bf0000-59bf1000 rwxp 59bf0000 00:00 0
59bf1000-59d07000 r-xp 00000000 00:1a 16147976 /path/to/a/shared/library.so
59d07000-59d3a000 rwxp 00115000 00:1a 16147976 /path/to/a/shared/library.so
59d3a000-59d3e000 rwxp 59d3a000 00:00 0
59d3e000-5a06f000 r-xp 00000000 00:1a 16147974 /path/to/a/shared/library.so
5a06f000-5a136000 rwxp 00330000 00:1a 16147974 /path/to/a/shared/library.so
5a136000-5a153000 rwxp 5a136000 00:00 0
5a153000-5a34e000 r-xp 00000000 00:1a 13771395 /path/to/a/shared/library.so
5a34e000-5a3ce000 rwxp 001fa000 00:1a 13771395 /path/to/a/shared/library.so
5a3ce000-5a3e3000 rwxp 5a3ce000 00:00 0
5a3e3000-5a474000 r-xp 00000000 00:1a 15113829 /path/to/a/shared/library.so
5a474000-5a48f000 rwxp 00090000 00:1a 15113829 /path/to/a/shared/library.so
5a48f000-5a491000 rwxp 5a48f000 00:00 0
5a491000-5a590000 r-xp 00000000 00:1a 23488398 /path/to/a/shared/library.so
5a590000-5a5d1000 rwxp 000fe000 00:1a 23488398 /path/to/a/shared/library.so
5a5d1000-5a5f5000 rwxp 5a5d1000 00:00 0
5a5f5000-5a616000 r-xp 00000000 00:1a 23488402 /path/to/a/shared/library.so
5a616000-5a621000 rwxp 00020000 00:1a 23488402 /path/to/a/shared/library.so
5a621000-5a622000 rwxp 5a621000 00:00 0
5a642000-5a67b000 r-xp 00000000 08:03 239102 /path/to/a/shared/library.so
5a67b000-5a687000 rwxp 00038000 08:03 239102 /path/to/a/shared/library.so
5a687000-5a689000 r-xp 00000000 08:03 239058 /path/to/a/shared/library.so
5a689000-5a68b000 rwxp 00001000 08:03 239058 /path/to/a/shared/library.so
5a68b000-5a6ac000 r-xp 00000000 08:03 239076 /path/to/a/shared/library.so
5a6ac000-5a6ae000 rwxp 00020000 08:03 239076 /path/to/a/shared/library.so
5a6ae000-5a7bd000 r-xp 00000000 08:03 239075 /path/to/a/shared/library.so
5a7bd000-5a7be000 ---p 0010f000 08:03 239075 /path/to/a/shared/library.so
5a7be000-5a7bf000 r-xp 0010f000 08:03 239075 /path/to/a/shared/library.so
5a7bf000-5a7c2000 rwxp 00110000 08:03 239075 /path/to/a/shared/library.so
5a7c2000-5a7c6000 rwxp 5a7c2000 00:00 0
5a7c6000-5e163000 r-xs 00000000 00:1f 9351804 /path/to/a/shared/library.so
5e163000-5fdb4000 rwxp 5e163000 00:00 0
60061000-6475d000 rwxp 60061000 00:00 0
6475f000-64861000 rwxp 6475f000 00:00 0
64880000-64a80000 rwxp 64880000 00:00 0
64aa3000-64ea3000 rwxp 64aa3000 00:00 0
64f0c000-660ba000 rwxp 64f0c000 00:00 0
66101000-6643f000 rwxp 66101000 00:00 0
66528000-66728000 rwxp 66528000 00:00 0
667f5000-669f5000 rwxp 667f5000 00:00 0
66acf000-66fcf000 rwxp 66acf000 00:00 0
67085000-67585000 rwxp 67085000 00:00 0
67639000-67939000 rwxp 67639000 00:00 0
67bf1000-67df1000 rwxp 67bf1000 00:00 0
67ecd000-6c878000 rwxp 67ecd000 00:00 0
6c953000-6cf53000 rwxp 6c953000 00:00 0
6d033000-6da33000 rwxp 6d033000 00:00 0
6da4d000-6df36000 rwxp 6da4d000 00:00 0
6e005000-6e205000 rwxp 6e005000 00:00 0
6e2e1000-6e6e1000 rwxp 6e2e1000 00:00 0
6e7ca000-6f4ec000 rwxp 6e7ca000 00:00 0
6f94a000-6fb4a000 rwxp 6f94a000 00:00 0
6fc25000-70125000 rwxp 6fc25000 00:00 0
701db000-705db000 rwxp 701db000 00:00 0
708bb000-709bb000 rwxp 708bb000 00:00 0
70a47000-71347000 rwxp 70a47000 00:00 0
71569000-71769000 rwxp 71569000 00:00 0
71a55000-71b55000 rwxp 71a55000 00:00 0
720b0000-72841000 rwxp 720b0000 00:00 0
7291c000-72b1c000 rwxp 7291c000 00:00 0
72ead000-733a2000 rwxp 72ead000 00:00 0
73697000-7398c000 rwxp 73697000 00:00 0
739f4000-7456f000 rwxp 739f4000 00:00 0
747f1000-74ff1000 rwxp 747f1000 00:00 0
75582000-75dff000 rwxp 75582000 00:00 0
76135000-76c14000 rwxp 76135000 00:00 0
76f09000-77a2b000 rwxp 76f09000 00:00 0
77ac3000-77cc3000 rwxp 77ac3000 00:00 0
77fac000-7858d000 rwxp 77fac000 00:00 0
7860a000-793ff000 rwxp 7860a000 00:00 0
79407000-798fc000 rwxp 79407000 00:00 0
799bd000-7a2bd000 rwxp 799bd000 00:00 0
7a5a9000-7a895000 rwxp 7a5a9000 00:00 0
7ae02000-7bc10000 rwxp 7ae02000 00:00 0
7befb000-7c48c000 rwxp 7befb000 00:00 0
7c781000-7cd12000 rwxp 7c781000 00:00 0
7d274000-7d84c000 rwxp 7d274000 00:00 0
7daad000-7e327000 rwxp 7daad000 00:00 0
7e672000-7f4d6000 rwxp 7e672000 00:00 0
7f6cd000-7fcb7000 rwxp 7f6cd000 00:00 0
7ff7a000-8050b000 rwxp 7ff7a000 00:00 0
807f6000-80aeb000 rwxp 807f6000 00:00 0
80de0000-810d5000 rwxp 80de0000 00:00 0
8131f000-818f1000 rwxp 8131f000 00:00 0
81be6000-81edb000 rwxp 81be6000 00:00 0
820e9000-831f5000 rwxp 820e9000 00:00 0
834ea000-8400c000 rwxp 834ea000 00:00 0
84294000-84b73000 rwxp 84294000 00:00 0
850a3000-869dc000 rwxp 850a3000 00:00 0
86c78000-86f6d000 rwxp 86c78000 00:00 0
86f8d000-87282000 rwxp 86f8d000 00:00 0
87809000-87afe000 rwxp 87809000 00:00 0
8801b000-8af85000 rwxp 8801b000 00:00 0
8b27a000-8b80b000 rwxp 8b27a000 00:00 0
8baec000-8bde1000 rwxp 8baec000 00:00 0
8bde4000-8e475000 rwxp 8bde4000 00:00 0
8e76a000-8ecfb000 rwxp 8e76a000 00:00 0
8f021000-8f8a7000 rwxp 8f021000 00:00 0
8fb9c000-906be000 rwxp 8fb9c000 00:00 0
906d5000-90cbf000 rwxp 906d5000 00:00 0
90f8a000-91810000 rwxp 90f8a000 00:00 0
91b05000-91dfa000 rwxp 91b05000 00:00 0
92107000-923fc000 rwxp 92107000 00:00 0
93146000-9343b000 rwxp 93146000 00:00 0
936d7000-93cc1000 rwxp 936d7000 00:00 0
93cf0000-942da000 rwxp 93cf0000 00:00 0
94526000-94b10000 rwxp 94526000 00:00 0
94da4000-9591f000 rwxp 94da4000 00:00 0
969bf000-9729e000 rwxp 969bf000 00:00 0
9782f000-97b24000 rwxp 9782f000 00:00 0
97cd6000-9b0d6000 rwxp 97cd6000 00:00 0
9b944000-9ba44000 rwxp 9b944000 00:00 0
9bd30000-9c605000 rwxp 9bd30000 00:00 0
9cb83000-9d16d000 rwxp 9cb83000 00:00 0
9d448000-9d733000 rwxp 9d448000 00:00 0
9da20000-9e2ff000 rwxp 9da20000 00:00 0
9e5f4000-9e8e9000 rwxp 9e5f4000 00:00 0
9ebde000-9f16f000 rwxp 9ebde000 00:00 0
9f464000-9f9f5000 rwxp 9f464000 00:00 0
9fcea000-a080c000 rwxp 9fcea000 00:00 0
a9466000-a99f7000 rwxp a9466000 00:00 0
aa573000-aa85e000 rwxp aa573000 00:00 0
ad2d8000-ad5cd000 rwxp ad2d8000 00:00 0
ad864000-ae143000 rwxp ad864000 00:00 0
b762a000-b7bbb000 rwxp b762a000 00:00 0
ba610000-bb729000 rwxp ba610000 00:00 0
c108b000-c1380000 rwxp c108b000 00:00 0
c1974000-c1f05000 rwxp c1974000 00:00 0
c378f000-c3d20000 rwxp c378f000 00:00 0
c4015000-c45a6000 rwxp c4015000 00:00 0
c5125000-c56b6000 rwxp c5125000 00:00 0
cfadd000-d006e000 rwxp cfadd000 00:00 0
d2b3c000-d30cd000 rwxp d2b3c000 00:00 0
d365e000-d3f3d000 rwxp d365e000 00:00 0
da180000-dace7000 rwxp da180000 00:00 0
db00c000-db38f000 rwxp db00c000 00:00 0
db922000-dc203000 rwxp db922000 00:00 0
dc4f8000-dca89000 rwxp dc4f8000 00:00 0
e6eb4000-e7445000 rwxp e6eb4000 00:00 0
e7447000-f1e01000 rwxp e7447000 00:00 0
f486f000-f5da7000 rwxp f486f000 00:00 0
fffd5000-ffffe000 rwxp fffd5000 00:00 0 [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0
The application fails with a request for 250MB but still had more
than 1GB of memory distributed over the various holes. All
maps are allocated via standard malloc/free calls which glibc
translates into brk/mmap calls.
Wolfgang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-18 15:47 ` Wolfgang Wander
@ 2005-05-18 16:18 ` Chen, Kenneth W
2005-05-18 17:16 ` Wolfgang Wander
0 siblings, 1 reply; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-18 16:18 UTC (permalink / raw)
To: 'Wolfgang Wander'
Cc: Hervé Piedvache, 'Andrew Morton', mingo, arjanv, linux-mm
Wolfgang Wander wrote on Wednesday, May 18, 2005 8:47 AM
> I have to retract my earlier statement partially. While this patch
> does address the problems with munmap's tendency to fragment the maps
> areas, the issue it does not address, namely the lack of concentrating
> smaller requests towards the base is indeed important to us.
>
> With your patch the two large applications that triggered the
> fragmentation issue do still fail. So we still have a regression from
> 2.4 kernels to 2.6 with this fix.
>
> So I'd vote (hope it counts ;-) to either include your munmap
> improvements into my earlier avoiding-fragmentation-fix or use
> my (admittedly more complex) patch instead.
>
> I will append both a test case and the (nearly) final
> /proc/self/maps status of our failing application (cleansed slightly)
>
> The application fails with a request for 250MB but still had more
> than 1GB of memory distributed over the various holes. All
> maps are allocated via standard malloc/free calls which glibc
> translates into brk/mmap calls.
Yeah, it's going to be a challenge to satisfy a very large mmap mixed
with tons of small mmap/munmap. I would think a truly random mmap/munmap
size would coalesce mapping nicely. But apparently not in your case. I
will keep digging.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-18 16:18 ` Chen, Kenneth W
@ 2005-05-18 17:16 ` Wolfgang Wander
2005-05-18 17:57 ` Chen, Kenneth W
0 siblings, 1 reply; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-18 17:16 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander',
Hervé Piedvache, 'Andrew Morton',
mingo, arjanv, linux-mm
I don't think that a truly random mmap/munmap works well the current
cache approach. By filling any request into the first fitting hole
(and first being the first from the current cache pointer) all large
holes are going to be filled inefficiently.(*)
Ideally you would want to have a sorted list of holes and fit new
requests in there on a best match basis. But this patch would be even
more complex than my one.
My goal was to place small requests close to the base while leaving
larger holes open as long as possible and far from the base. 2.4
kernels did this inadvertently by always starting to search from the
base, my patch starts searching from the base (upward or downward)
if the new request is known to fit between base and current cache
pointer, thus it maintains the 2.4 quality of mixing small and large
requests and maintains the huge speedups Ingo introduced with the
cache pointer.
One way or another I believe we need to address the issue of mixed
small and large map requests. Either via my earlier approach or by
maintaining an index of holes. If you feel the latter is needed I'd
certainly volunteer to provide a patch for this one as well...
Likewise if there are issues with my earlier patch, I hope I can
address them as well.
Wolfgang
(*) Sidenote: in 2.4 (and my approach) large holes close to the base
are still not going to be cluttered with smaller
requests, however the large ones far from the base
are going to stay there until they are needed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-18 17:16 ` Wolfgang Wander
@ 2005-05-18 17:57 ` Chen, Kenneth W
2005-05-19 18:38 ` Wolfgang Wander
0 siblings, 1 reply; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-18 17:57 UTC (permalink / raw)
To: 'Wolfgang Wander'
Cc: Hervé Piedvache, 'Andrew Morton', mingo, arjanv, linux-mm
Wolfgang Wander wrote on Wednesday, May 18, 2005 10:16 AM
> My goal was to place small requests close to the base while leaving
> larger holes open as long as possible and far from the base. 2.4
> kernels did this inadvertently by always starting to search from the
> base, my patch starts searching from the base (upward or downward)
> if the new request is known to fit between base and current cache
> pointer, thus it maintains the 2.4 quality of mixing small and large
> requests and maintains the huge speedups Ingo introduced with the
> cache pointer.
This algorithm tends to penalize small size request and it would do a
linear search from the beginning. It would also penalize large size
request since cache pointer will be reset to a lower address and making
a subsequent large request to search forward. In your case, since all
mappings are anonymous mmap with same page protection, you won't notice
performance problem because of coalescing in the mapped area. But other
app like apache web server, which mmap thousands of different files will
degrade. The probability of linear search is lot higher with this proposal.
The nice thing about the current *broken* cache pointer is that it is
almost an O(1) order to fulfill a request since it moves in one direction.
The new proposal would reduce that O(1) probability.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-18 17:57 ` Chen, Kenneth W
@ 2005-05-19 18:38 ` Wolfgang Wander
2005-05-19 22:54 ` Andrew Morton
2005-05-20 3:10 ` Chen, Kenneth W
0 siblings, 2 replies; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-19 18:38 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander',
Hervé Piedvache, 'Andrew Morton',
mingo, arjanv, linux-mm
Chen, Kenneth W writes:
> Wolfgang Wander wrote on Wednesday, May 18, 2005 10:16 AM
> > My goal was to place small requests close to the base while leaving
> > larger holes open as long as possible and far from the base. 2.4
> > kernels did this inadvertently by always starting to search from the
> > base, my patch starts searching from the base (upward or downward)
> > if the new request is known to fit between base and current cache
> > pointer, thus it maintains the 2.4 quality of mixing small and large
> > requests and maintains the huge speedups Ingo introduced with the
> > cache pointer.
>
> This algorithm tends to penalize small size request and it would do a
> linear search from the beginning. It would also penalize large size
> request since cache pointer will be reset to a lower address and making
> a subsequent large request to search forward. In your case, since all
> mappings are anonymous mmap with same page protection, you won't notice
> performance problem because of coalescing in the mapped area. But other
> app like apache web server, which mmap thousands of different files will
> degrade. The probability of linear search is lot higher with this proposal.
> The nice thing about the current *broken* cache pointer is that it is
> almost an O(1) order to fulfill a request since it moves in one direction.
> The new proposal would reduce that O(1) probability.
I do certainly see that the algorithm isn't perfect in every case
however for the test case Ingo sent me (Ingo, did you verify the
timing?) my patch performed as well as Ingo's original solution. I
assume that Ingo's test was requesting same map sizes for every thread
so the results would be a bit biased in my favour... ;-)
That leaves us with two scenarious for a new mmap request:
* the new request is greater or equal than the cached_hole_size ->
no change in behaviour
* otherwise we start the search at a position where we know the
new request will fit in, this could eventually even be faster
than the required wrap.
So I don't necessarily see that the probability is reduced in all
circumstances. Clearly mixed size requests do tend to keep the
free_area_cache pointer low and thus will likely extend the search length.
Are there test cases/benchmarks which would simulate the behaviour of an
Apache like application under the various schemes?
Clearly one has to weight the performance issues against the memory
efficiency but since we demonstratibly throw away 25% (or 1GB) of the
available address space in the various accumulated holes a long
running application can generate I hope that for the time being we can
stick with my first solution, preferably extended by your munmap fix?
Please? ;-)
Wolfgang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-19 18:38 ` Wolfgang Wander
@ 2005-05-19 22:54 ` Andrew Morton
2005-05-20 2:02 ` Chen, Kenneth W
2005-05-20 23:51 ` Chen, Kenneth W
2005-05-20 3:10 ` Chen, Kenneth W
1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2005-05-19 22:54 UTC (permalink / raw)
To: Wolfgang Wander; +Cc: kenneth.w.chen, herve, mingo, arjanv, linux-mm
Wolfgang Wander <wwc@rentec.com> wrote:
>
> Clearly one has to weight the performance issues against the memory
> efficiency but since we demonstratibly throw away 25% (or 1GB) of the
> available address space in the various accumulated holes a long
> running application can generate
That sounds pretty bad.
> I hope that for the time being we can
> stick with my first solution,
I'm inclined to do this.
> preferably extended by your munmap fix?
And this, if someone has a patch?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-19 22:54 ` Andrew Morton
@ 2005-05-20 2:02 ` Chen, Kenneth W
2005-05-20 23:51 ` Chen, Kenneth W
1 sibling, 0 replies; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-20 2:02 UTC (permalink / raw)
To: 'Andrew Morton', Wolfgang Wander; +Cc: herve, mingo, arjanv, linux-mm
Andrew Morton wrote on Thursday, May 19, 2005 3:55 PM
> Wolfgang Wander <wwc@rentec.com> wrote:
> >
> > Clearly one has to weight the performance issues against the memory
> > efficiency but since we demonstratibly throw away 25% (or 1GB) of the
> > available address space in the various accumulated holes a long
> > running application can generate
>
> That sounds pretty bad.
>
> > I hope that for the time being we can
> > stick with my first solution,
>
> I'm inclined to do this.
Oh well, I guess we have to take a performance hit here in favor of
functionality. Though this is a problem specific to 32-bit address
space, please don't unnecessarily penalize 64-bit arch. If Andrew is
going to take Wolfgang's patch, then we should minimally take the
following patch. This patch revert changes made in arch/ia64 and make
x86_64 to use alternate cache algorithm for 32-bit app.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
--- linux-2.6.11/arch/ia64/kernel/sys_ia64.c.orig 2005-05-19 18:35:31.468087777 -0700
+++ linux-2.6.11/arch/ia64/kernel/sys_ia64.c 2005-05-19 18:35:46.521798000 -0700
@@ -38,14 +38,8 @@ arch_get_unmapped_area (struct file *fil
if (REGION_NUMBER(addr) == REGION_HPAGE)
addr = 0;
#endif
- if (!addr) {
- if (len > mm->cached_hole_size) {
- addr = mm->free_area_cache;
- } else {
- addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
- }
- }
+ if (!addr)
+ addr = mm->free_area_cache;
if (map_shared && (TASK_SIZE > 0xfffffffful))
/*
@@ -65,7 +59,6 @@ arch_get_unmapped_area (struct file *fil
if (start_addr != TASK_UNMAPPED_BASE) {
/* Start a new search --- just in case we missed some holes. */
addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
goto full_search;
}
return -ENOMEM;
@@ -75,8 +68,6 @@ arch_get_unmapped_area (struct file *fil
mm->free_area_cache = addr + len;
return addr;
}
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
addr = (vma->vm_end + align_mask) & ~align_mask;
}
}
--- linux-2.6.11/arch/x86_64/kernel/sys_x86_64.c.orig 2005-05-19 18:37:32.202461298 -0700
+++ linux-2.6.11/arch/x86_64/kernel/sys_x86_64.c 2005-05-19 18:39:03.110663309 -0700
@@ -111,7 +111,7 @@ arch_get_unmapped_area(struct file *filp
(!vma || addr + len <= vma->vm_start))
return addr;
}
- if (len <= mm->cached_hole_size) {
+ if (if begin != TASK_UNMAPPED_64 && len <= mm->cached_hole_size) {
mm->cached_hole_size = 0;
mm->free_area_cache = begin;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-19 22:54 ` Andrew Morton
2005-05-20 2:02 ` Chen, Kenneth W
@ 2005-05-20 23:51 ` Chen, Kenneth W
2005-05-23 18:25 ` Wolfgang Wander
2005-05-26 17:32 ` Wolfgang Wander
1 sibling, 2 replies; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-20 23:51 UTC (permalink / raw)
To: 'Andrew Morton', Wolfgang Wander; +Cc: herve, mingo, arjanv, linux-mm
Andrew Morton wrote on Thursday, May 19, 2005 3:55 PM
> Wolfgang Wander <wwc@rentec.com> wrote:
> >
> > Clearly one has to weight the performance issues against the memory
> > efficiency but since we demonstratibly throw away 25% (or 1GB) of the
> > available address space in the various accumulated holes a long
> > running application can generate
>
> That sounds pretty bad.
>
> > I hope that for the time being we can
> > stick with my first solution,
>
> I'm inclined to do this.
>
> > preferably extended by your munmap fix?
>
> And this, if someone has a patch?
2nd patch on top of wolfgang's patch. It's a compliment on top of initial
attempt by wolfgang to solve the fragmentation problem. The code path
in munmap is suboptimal and potentially worsen the fragmentation because
with a series of munmap, the free_area_cache would point to last vma that
was freed, ignoring its surrounding and not performing any coalescing at all,
thus artificially create more holes in the virtual address space than necessary.
Since all the information needed to perform coalescing are actually already there.
This patch put that data in use so we will prevent artificial fragmentation.
It covers both bottom-up and top-down topology. For bottom-up topology,
free_area_cache points to prev->vm_end. And for top-down, free_area_cache points
to next->vm_start.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
--- linux-2.6.12-rc4-mm2/mm/mmap.c.orig 2005-05-20 15:54:45.082381920 -0700
+++ linux-2.6.12-rc4-mm2/mm/mmap.c 2005-05-20 16:31:04.832355218 -0700
@@ -1217,14 +1217,11 @@ void arch_unmap_area(struct vm_area_stru
/*
* Is this a new hole at the lowest possible address?
*/
- if (area->vm_start >= TASK_UNMAPPED_BASE &&
- area->vm_start < area->vm_mm->free_area_cache) {
- unsigned area_size = area->vm_end-area->vm_start;
-
- if (area->vm_mm->cached_hole_size < area_size)
- area->vm_mm->cached_hole_size = area_size;
- else
- area->vm_mm->cached_hole_size = ~0UL;
+ unsigned long addr = (unsigned long) area->vm_private_data;
+
+ if (addr >= TASK_UNMAPPED_BASE && addr < area->vm_mm->free_area_cache) {
+ area->vm_mm->free_area_cache = addr;
+ area->vm_mm->cached_hole_size = ~0UL;
}
}
@@ -1317,8 +1314,10 @@ void arch_unmap_area_topdown(struct vm_a
/*
* Is this a new hole at the highest possible address?
*/
- if (area->vm_end > area->vm_mm->free_area_cache)
- area->vm_mm->free_area_cache = area->vm_end;
+ unsigned long addr = (unsigned long) area->vm_private_data;
+
+ if (addr > area->vm_mm->free_area_cache)
+ area->vm_mm->free_area_cache = addr;
/* dont allow allocations above current base */
if (area->vm_mm->free_area_cache > area->vm_mm->mmap_base)
@@ -1683,6 +1682,11 @@ detach_vmas_to_be_unmapped(struct mm_str
} while (vma && vma->vm_start < end);
*insertion_point = vma;
tail_vma->vm_next = NULL;
+ if (mm->unmap_area == arch_unmap_area)
+ tail_vma->vm_private_data = (void*) prev->vm_end;
+ else
+ tail_vma->vm_private_data = vma ?
+ (void*) vma->vm_start : (void*) mm->mmap_base;
mm->mmap_cache = NULL; /* Kill the cache. */
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-20 23:51 ` Chen, Kenneth W
@ 2005-05-23 18:25 ` Wolfgang Wander
2005-05-26 17:32 ` Wolfgang Wander
1 sibling, 0 replies; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-23 18:25 UTC (permalink / raw)
To: Chen, Kenneth W; +Cc: 'Andrew Morton', herve, mingo, arjanv, linux-mm
Chen, Kenneth W wrote:
> Andrew Morton wrote on Thursday, May 19, 2005 3:55 PM
>
>>Wolfgang Wander <wwc@rentec.com> wrote:
>>
>>>Clearly one has to weight the performance issues against the memory
>>> efficiency but since we demonstratibly throw away 25% (or 1GB) of the
>>> available address space in the various accumulated holes a long
>>> running application can generate
>>
>>That sounds pretty bad.
>>
>>
>>>I hope that for the time being we can
>>> stick with my first solution,
>>
>>I'm inclined to do this.
>>
>>
>>>preferably extended by your munmap fix?
>>
>>And this, if someone has a patch?
>
>
>
> 2nd patch on top of wolfgang's patch. It's a compliment on top of initial
> attempt by wolfgang to solve the fragmentation problem. The code path
> in munmap is suboptimal and potentially worsen the fragmentation because
> with a series of munmap, the free_area_cache would point to last vma that
> was freed, ignoring its surrounding and not performing any coalescing at all,
> thus artificially create more holes in the virtual address space than necessary.
> Since all the information needed to perform coalescing are actually already there.
> This patch put that data in use so we will prevent artificial fragmentation.
>
> It covers both bottom-up and top-down topology. For bottom-up topology,
> free_area_cache points to prev->vm_end. And for top-down, free_area_cache points
> to next->vm_start.
Works perfectly fine here. All my tests pass and our large applications
are happy with this patch.
Thanks Ken for your patience with my lack of it ;-)
Wolfgang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-20 23:51 ` Chen, Kenneth W
2005-05-23 18:25 ` Wolfgang Wander
@ 2005-05-26 17:32 ` Wolfgang Wander
2005-05-26 17:44 ` Chen, Kenneth W
1 sibling, 1 reply; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-26 17:32 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Andrew Morton', herve, mingo, arjanv, linux-mm, colin.harrison
Chen, Kenneth W wrote:
> Andrew Morton wrote on Thursday, May 19, 2005 3:55 PM
>
>>Wolfgang Wander <wwc@rentec.com> wrote:
>>
>>>Clearly one has to weight the performance issues against the memory
>>> efficiency but since we demonstratibly throw away 25% (or 1GB) of the
>>> available address space in the various accumulated holes a long
>>> running application can generate
>>
>>That sounds pretty bad.
>>
>>
>>>I hope that for the time being we can
>>> stick with my first solution,
>>
>>I'm inclined to do this.
>>
>>
>>>preferably extended by your munmap fix?
>>
>>And this, if someone has a patch?
>
>
>
> 2nd patch on top of wolfgang's patch. It's a compliment on top of initial
> attempt by wolfgang to solve the fragmentation problem. The code path
> in munmap is suboptimal and potentially worsen the fragmentation because
> with a series of munmap, the free_area_cache would point to last vma that
> was freed, ignoring its surrounding and not performing any coalescing at all,
> thus artificially create more holes in the virtual address space than necessary.
> Since all the information needed to perform coalescing are actually already there.
> This patch put that data in use so we will prevent artificial fragmentation.
>
This one seems to have triggered already the second bug report on lkm.
Is it possible that in
static void
detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
struct vm_area_struct *prev, unsigned long end)
{
struct vm_area_struct **insertion_point;
struct vm_area_struct *tail_vma = NULL;
insertion_point = (prev ? &prev->vm_next : &mm->mmap);
do {
rb_erase(&vma->vm_rb, &mm->mm_rb);
mm->map_count--;
tail_vma = vma;
vma = vma->vm_next;
} while (vma && vma->vm_start < end);
*insertion_point = vma;
tail_vma->vm_next = NULL;
if (mm->unmap_area == arch_unmap_area)
tail_vma->vm_private_data = (void*) prev->vm_end;
else
tail_vma->vm_private_data = vma ?
(void*) vma->vm_start : (void*) mm->mmap_base;
mm->mmap_cache = NULL; /* Kill the cache. */
}
'prev' seems to possibly be NULL and the assignemnt of
tail_vma->vm_private_data = (void*) prev->vm_end;
which fix-2 adds does not check for that.
That potential problem does not seem to match the stacktrace
below however...
Wolfgang
Colin Harrison wrote:
> Hi
>
> I'm using kernel 2.6.12-rc5-git1
> with patches from -mm
> avoiding-mmap-fragmentation.patch
> avoiding-mmap-fragmentation-tidy.patch
> avoiding-mmap-fragmentation-fix.patch
> avoiding-mmap-fragmentation-revert-unneeded-64-bit-changes.patch
> avoiding-mmap-fragmentation-fix-2.patch
>
> I get a oops when exiting from mplayer playing (dfbmga framebuffer):-
>
> xxxxxxxx.xxxxxxxxxxxxxx.com login: Unable to handle kernel paging
request at
> v0
> printing eip:
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in: parport_pc lp parport floppy natsemi
nls_iso8859_15 ntfs
> mgai
> CPU: 0
> EIP: 0060:[<e29245cc>] Not tainted VLI
> EFLAGS: 00210286 (2.6.12-rc5-git1)
> EIP is at snd_pcm_mmap_data_close+0x6/0xd [snd_pcm]
> eax: 0000863c ebx: d6073000 ecx: d4dc356c edx: e29245c6
> esi: d50756fc edi: d58a7180 ebp: d66b9800 esp: d6073f6c
> ds: 007b es: 007b ss: 0068
> Process mplayer (pid: 1634, threadinfo=d6073000 task=d72e6020)
> Stack: c013ddbc 00000000 d66b9800 d50756fc c013f532 b747e000 b746e000
> c013f8ad
> b746e000 b747e000 d4dc3a14 d66b9800 d66b9830 ffff0001 d6073000
> c013f928
> b746e000 00000002 00000002 c01026ff b746e000 00010000 b7ec143c
> 00000002
> Call Trace:
> [<c013ddbc>] remove_vm_struct+0x78/0x81
> [<c013f532>] unmap_vma_list+0xe/0x17
> [<c013f8ad>] do_munmap+0xf1/0x12c
> [<c013f928>] sys_munmap+0x40/0x63
> [<c01026ff>] sysenter_past_esp+0x54/0x75
> Code: 81 dd 89 f8 8b 5c 24 04 8b 74 24 08 8b 7c 24 0c 8b 6c 24 10 83
c4 14
> c3 8
>
> (sorry didn't have linewrap on my minicom!)
>
> Without the last patch, avoiding-mmap-fragmentation-fix-2, works fine
doing
> same stuff with mplayer.
>
> More information/testing can be supplied/performed as required.
>
> Thanks
> Colin Harrison
>
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-26 17:32 ` Wolfgang Wander
@ 2005-05-26 17:44 ` Chen, Kenneth W
0 siblings, 0 replies; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-26 17:44 UTC (permalink / raw)
To: 'Wolfgang Wander'
Cc: 'Andrew Morton', herve, mingo, arjanv, linux-mm, colin.harrison
Wolfgang Wander wrote on Thursday, May 26, 2005 10:32 AM
> This one seems to have triggered already the second bug report on lkm.
>
> Is it possible that in
>
> static void
> detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
> struct vm_area_struct *prev, unsigned long end)
> {
> struct vm_area_struct **insertion_point;
> struct vm_area_struct *tail_vma = NULL;
>
> insertion_point = (prev ? &prev->vm_next : &mm->mmap);
> do {
> rb_erase(&vma->vm_rb, &mm->mm_rb);
> mm->map_count--;
> tail_vma = vma;
> vma = vma->vm_next;
> } while (vma && vma->vm_start < end);
> *insertion_point = vma;
> tail_vma->vm_next = NULL;
> if (mm->unmap_area == arch_unmap_area)
> tail_vma->vm_private_data = (void*) prev->vm_end;
> else
> tail_vma->vm_private_data = vma ?
> (void*) vma->vm_start : (void*) mm->mmap_base;
> mm->mmap_cache = NULL; /* Kill the cache. */
> }
>
> 'prev' seems to possibly be NULL and the assignemnt of
> tail_vma->vm_private_data = (void*) prev->vm_end;
> which fix-2 adds does not check for that.
> That potential problem does not seem to match the stacktrace
> below however...
It sure looks like 'prev' can be null. It needs the similar check
like the one in the top down case. I will double check on it. Thanks
for catching the bug.
- Ken
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-19 18:38 ` Wolfgang Wander
2005-05-19 22:54 ` Andrew Morton
@ 2005-05-20 3:10 ` Chen, Kenneth W
2005-05-20 12:39 ` Wolfgang Wander
1 sibling, 1 reply; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-20 3:10 UTC (permalink / raw)
To: 'Wolfgang Wander'
Cc: Hervé Piedvache, 'Andrew Morton', mingo, arjanv, linux-mm
Wolfgang Wander wrote on Thursday, May 19, 2005 11:39 AM
> I do certainly see that the algorithm isn't perfect in every case
> however for the test case Ingo sent me (Ingo, did you verify the
> timing?) my patch performed as well as Ingo's original solution. I
> assume that Ingo's test was requesting same map sizes for every thread
> so the results would be a bit biased in my favour... ;-)
While working on porting the munmap free area coalescing patch on top of
2.6.12-rc4-mm2 Kernel, this change from wolfgang looked very strange:
> @@ -1209,8 +1218,14 @@ void arch_unmap_area(struct vm_area_stru
> * Is this a new hole at the lowest possible address?
> */
> if (area->vm_start >= TASK_UNMAPPED_BASE &&
> - area->vm_start < area->vm_mm->free_area_cache)
> - area->vm_mm->free_area_cache = area->vm_start;
> + area->vm_start < area->vm_mm->free_area_cache) {
> + unsigned area_size = area->vm_end-area->vm_start;
> +
> + if (area->vm_mm->cached_hole_size < area_size)
> + area->vm_mm->cached_hole_size = area_size;
> + else
> + area->vm_mm->cached_hole_size = ~0UL;
> + }
> }
First, free_area_cache won't get moved on munmap. OK fine. Secondly,
if area that we just unmapped is smaller than cached_hole_size, instead
of doing nothing (the condition of largest know hole size below current
cache pointer still holds at this time), the new code will reset hole
size to ~0UL, which will trigger a full scan next time for any mmap
request.
Wolfgang, did you tweak this area? Or this is just a simple typo or
something? AFAWICS, this patch will trigger a lot more innocent full scan
than what people claim it is.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-20 3:10 ` Chen, Kenneth W
@ 2005-05-20 12:39 ` Wolfgang Wander
0 siblings, 0 replies; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-20 12:39 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Wolfgang Wander',
Hervé Piedvache, 'Andrew Morton',
mingo, arjanv, linux-mm
Chen, Kenneth W writes:
> While working on porting the munmap free area coalescing patch on top of
> 2.6.12-rc4-mm2 Kernel, this change from wolfgang looked very strange:
>
> > @@ -1209,8 +1218,14 @@ void arch_unmap_area(struct vm_area_stru
> > * Is this a new hole at the lowest possible address?
> > */
> > if (area->vm_start >= TASK_UNMAPPED_BASE &&
> > - area->vm_start < area->vm_mm->free_area_cache)
> > - area->vm_mm->free_area_cache = area->vm_start;
> > + area->vm_start < area->vm_mm->free_area_cache) {
> > + unsigned area_size = area->vm_end-area->vm_start;
> > +
> > + if (area->vm_mm->cached_hole_size < area_size)
> > + area->vm_mm->cached_hole_size = area_size;
> > + else
> > + area->vm_mm->cached_hole_size = ~0UL;
> > + }
> > }
>
>
> First, free_area_cache won't get moved on munmap. OK fine. Secondly,
> if area that we just unmapped is smaller than cached_hole_size, instead
> of doing nothing (the condition of largest know hole size below current
> cache pointer still holds at this time), the new code will reset hole
> size to ~0UL, which will trigger a full scan next time for any mmap
> request.
>
> Wolfgang, did you tweak this area? Or this is just a simple typo or
> something? AFAWICS, this patch will trigger a lot more innocent full scan
> than what people claim it is.
Thanks for checking Ken. I believe that this logic becomes mostly
obsolete with your munmap patch anyhow.
You are perfectly right that the reset to ~0UL is not strictly
required however since the munmapped area can be joined with
neightborings hole to form something larger (which size *I* cannot
determine at this time) I wanted to be better safe and restart any
search request from base.
If the unmapped area sits between base and free_area_cache we
can then increase the cached_hole_size to the area_size if it
is indeed larger than the current cached_hole_size.
In both cases it would be nice to just calculate the real cached
hole size with some vma_find calls instead...
Wolfgang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
@ 2005-05-20 2:14 Chen, Kenneth W
2005-05-20 12:47 ` Wolfgang Wander
0 siblings, 1 reply; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-20 2:14 UTC (permalink / raw)
To: 'Andrew Morton', 'Wolfgang Wander'
Cc: herve, mingo, arjanv, linux-mm
Chen, Kenneth W wrote on Thursday, May 19, 2005 7:02 PM
> Oh well, I guess we have to take a performance hit here in favor of
> functionality. Though this is a problem specific to 32-bit address
> space, please don't unnecessarily penalize 64-bit arch. If Andrew is
> going to take Wolfgang's patch, then we should minimally take the
> following patch. This patch revert changes made in arch/ia64 and make
> x86_64 to use alternate cache algorithm for 32-bit app.
Oh, crap, there is a typo in my patch and it won't compile on x86_64.
Here is an updated version. Please this one instead.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
--- linux-2.6.11/arch/ia64/kernel/sys_ia64.c.orig 2005-05-19 18:35:31.468087777 -0700
+++ linux-2.6.11/arch/ia64/kernel/sys_ia64.c 2005-05-19 18:35:46.521798000 -0700
@@ -38,14 +38,8 @@ arch_get_unmapped_area (struct file *fil
if (REGION_NUMBER(addr) == REGION_HPAGE)
addr = 0;
#endif
- if (!addr) {
- if (len > mm->cached_hole_size) {
- addr = mm->free_area_cache;
- } else {
- addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
- }
- }
+ if (!addr)
+ addr = mm->free_area_cache;
if (map_shared && (TASK_SIZE > 0xfffffffful))
/*
@@ -65,7 +59,6 @@ arch_get_unmapped_area (struct file *fil
if (start_addr != TASK_UNMAPPED_BASE) {
/* Start a new search --- just in case we missed some holes. */
addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
goto full_search;
}
return -ENOMEM;
@@ -75,8 +68,6 @@ arch_get_unmapped_area (struct file *fil
mm->free_area_cache = addr + len;
return addr;
}
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
addr = (vma->vm_end + align_mask) & ~align_mask;
}
}
--- linux-2.6.11/arch/x86_64/kernel/sys_x86_64.c.orig 2005-05-19 18:37:32.202461298 -0700
+++ linux-2.6.11/arch/x86_64/kernel/sys_x86_64.c 2005-05-19 18:39:03.110663309 -0700
@@ -111,7 +111,7 @@ arch_get_unmapped_area(struct file *filp
(!vma || addr + len <= vma->vm_start))
return addr;
}
- if (len <= mm->cached_hole_size) {
+ if (begin != TASK_UNMAPPED_64 && len <= mm->cached_hole_size) {
mm->cached_hole_size = 0;
mm->free_area_cache = begin;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH] Avoiding mmap fragmentation - clean rev
2005-05-20 2:14 Chen, Kenneth W
@ 2005-05-20 12:47 ` Wolfgang Wander
0 siblings, 0 replies; 22+ messages in thread
From: Wolfgang Wander @ 2005-05-20 12:47 UTC (permalink / raw)
To: Chen, Kenneth W
Cc: 'Andrew Morton', 'Wolfgang Wander',
herve, mingo, arjanv, linux-mm
Chen, Kenneth W writes:
> Chen, Kenneth W wrote on Thursday, May 19, 2005 7:02 PM
> > Oh well, I guess we have to take a performance hit here in favor of
> > functionality. Though this is a problem specific to 32-bit address
> > space, please don't unnecessarily penalize 64-bit arch. If Andrew is
> > going to take Wolfgang's patch, then we should minimally take the
> > following patch. This patch revert changes made in arch/ia64 and make
> > x86_64 to use alternate cache algorithm for 32-bit app.
Great! Makes more than perfect sense in the 64bit world.
Wolfgang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
@ 2005-05-25 7:30 linux
0 siblings, 0 replies; 22+ messages in thread
From: linux @ 2005-05-25 7:30 UTC (permalink / raw)
To: linux-mm
If you want to minimize fragmentation, you could do worse than
study Doug Lea's malloc, known commonly as dlmalloc.
It's basically straight best-fit, but with one important heuristic:
FIFO use of free blocks.
This contributes enormously to minimizing fragmentation.
Any time a free block is created, it goes on the *end* of the
available list for that size.
This means that every chunk of free memory gets an approximately-equal
chance to be merged with more free chunks. The chunks that make it
to the front of the available list are the ones with stable neighbors,
which are the best ones use up.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH] Avoiding mmap fragmentation - clean rev
@ 2005-05-27 16:37 Chen, Kenneth W
0 siblings, 0 replies; 22+ messages in thread
From: Chen, Kenneth W @ 2005-05-27 16:37 UTC (permalink / raw)
To: 'Wolfgang Wander'
Cc: 'Andrew Morton', herve, mingo, arjanv, linux-mm, colin.harrison
Wolfgang Wander wrote on Thursday, May 26, 2005 10:32 AM
> 'prev' seems to possibly be NULL and the assignemnt of
> tail_vma->vm_private_data = (void*) prev->vm_end;
> which fix-2 adds does not check for that.
> That potential problem does not seem to match the stacktrace
> below however...
Chen, Kenneth W wrote on Thursday, May 26, 2005 10:44 AM
> It sure looks like 'prev' can be null. It needs the similar check
> like the one in the top down case. I will double check on it.
Sorry, I was side tracked for awhile yesterday. Oh boy, major clash for
corrupting vm_private_data with avoiding-mmap-fragmentation-fix-2.patch.
TO fix this, I think we can either pass the hint address down the call
stack, or pull mm->unmap_area() call site in unmap_area() two function
level up into detach_vmas_to_be_unmapped(). It also makes logical sense
to me that at the end of unlinking all the vmas, we fix up free_area_cache
in one shot instead of fixing it up for each iteration of vma removal.
Here is a patch. Majority of them is to change function prototype: to
use mm_struct pointer instead of vm_area_struct and to add hint addr
in the mm->unmap_area function argument. Last 3 hunks are the real ones
that fix the conflict with vm_private_data. This patch also fix the bug
where 'prev' is de-referenced without proper checking.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
--- ./include/linux/sched.h.orig 2005-05-27 02:04:17.601950694 -0700
+++ ./include/linux/sched.h 2005-05-27 03:06:31.389014330 -0700
@@ -201,8 +201,8 @@ extern unsigned long
arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
unsigned long flags);
-extern void arch_unmap_area(struct vm_area_struct *area);
-extern void arch_unmap_area_topdown(struct vm_area_struct *area);
+extern void arch_unmap_area(struct mm_struct *, unsigned long);
+extern void arch_unmap_area_topdown(struct mm_struct *, unsigned long);
#define set_mm_counter(mm, member, value) (mm)->_##member = (value)
#define get_mm_counter(mm, member) ((mm)->_##member)
@@ -218,7 +218,7 @@ struct mm_struct {
unsigned long (*get_unmapped_area) (struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
- void (*unmap_area) (struct vm_area_struct *area);
+ void (*unmap_area) (struct mm_struct *mm, unsigned long addr);
unsigned long mmap_base; /* base of mmap area */
unsigned long cached_hole_size; /* if non-zero, the largest hole below free_area_cache */
unsigned long free_area_cache; /* first hole of size cached_hole_size or larger */
--- ./mm/nommu.c.orig 2005-05-27 02:05:47.380269907 -0700
+++ ./mm/nommu.c 2005-05-27 03:07:31.773779216 -0700
@@ -1067,7 +1067,7 @@ unsigned long arch_get_unmapped_area(str
return -ENOMEM;
}
-void arch_unmap_area(struct vm_area_struct *area)
+void arch_unmap_area(struct mm_struct *mm, unsigned long addr)
{
}
--- ./mm/mmap.c.orig 2005-05-27 02:03:22.152732623 -0700
+++ ./mm/mmap.c 2005-05-27 03:10:36.865573823 -0700
@@ -1212,16 +1212,14 @@ full_search:
}
#endif
-void arch_unmap_area(struct vm_area_struct *area)
+void arch_unmap_area(struct mm_struct *mm, unsigned long addr)
{
/*
* Is this a new hole at the lowest possible address?
*/
- unsigned long addr = (unsigned long) area->vm_private_data;
-
- if (addr >= TASK_UNMAPPED_BASE && addr < area->vm_mm->free_area_cache) {
- area->vm_mm->free_area_cache = addr;
- area->vm_mm->cached_hole_size = ~0UL;
+ if (addr >= TASK_UNMAPPED_BASE && addr < mm->free_area_cache) {
+ mm->free_area_cache = addr;
+ mm->cached_hole_size = ~0UL;
}
}
@@ -1309,19 +1307,17 @@ arch_get_unmapped_area_topdown(struct fi
}
#endif
-void arch_unmap_area_topdown(struct vm_area_struct *area)
+void arch_unmap_area_topdown(struct mm_struct *mm, unsigned long addr)
{
/*
* Is this a new hole at the highest possible address?
*/
- unsigned long addr = (unsigned long) area->vm_private_data;
-
- if (addr > area->vm_mm->free_area_cache)
- area->vm_mm->free_area_cache = addr;
+ if (addr > mm->free_area_cache)
+ mm->free_area_cache = addr;
/* dont allow allocations above current base */
- if (area->vm_mm->free_area_cache > area->vm_mm->mmap_base)
- area->vm_mm->free_area_cache = area->vm_mm->mmap_base;
+ if (mm->free_area_cache > mm->mmap_base)
+ mm->free_area_cache = mm->mmap_base;
}
unsigned long
@@ -1621,7 +1617,6 @@ static void unmap_vma(struct mm_struct *
if (area->vm_flags & VM_LOCKED)
area->vm_mm->locked_vm -= len >> PAGE_SHIFT;
vm_stat_unaccount(area);
- area->vm_mm->unmap_area(area);
remove_vm_struct(area);
}
@@ -1675,6 +1670,7 @@ detach_vmas_to_be_unmapped(struct mm_str
{
struct vm_area_struct **insertion_point;
struct vm_area_struct *tail_vma = NULL;
+ unsigned long addr;
insertion_point = (prev ? &prev->vm_next : &mm->mmap);
do {
@@ -1686,10 +1682,10 @@ detach_vmas_to_be_unmapped(struct mm_str
*insertion_point = vma;
tail_vma->vm_next = NULL;
if (mm->unmap_area == arch_unmap_area)
- tail_vma->vm_private_data = (void*) prev->vm_end;
+ addr = prev ? prev->vm_end : mm->mmap_base;
else
- tail_vma->vm_private_data = vma ?
- (void*) vma->vm_start : (void*) mm->mmap_base;
+ addr = vma ? vma->vm_start : mm->mmap_base;
+ mm->unmap_area(mm, addr);
mm->mmap_cache = NULL; /* Kill the cache. */
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-05-27 16:37 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <E4BA51C8E4E9634993418831223F0A49291F06E1@scsmsx401.amr.corp.intel.com>
2005-05-17 22:28 ` [PATCH] Avoiding mmap fragmentation - clean rev Chen, Kenneth W
2005-05-18 7:28 ` Arjan van de Ven
2005-05-18 7:43 ` Ingo Molnar
2005-05-18 7:37 ` Ingo Molnar
2005-05-18 13:05 ` Wolfgang Wander
2005-05-18 15:47 ` Wolfgang Wander
2005-05-18 16:18 ` Chen, Kenneth W
2005-05-18 17:16 ` Wolfgang Wander
2005-05-18 17:57 ` Chen, Kenneth W
2005-05-19 18:38 ` Wolfgang Wander
2005-05-19 22:54 ` Andrew Morton
2005-05-20 2:02 ` Chen, Kenneth W
2005-05-20 23:51 ` Chen, Kenneth W
2005-05-23 18:25 ` Wolfgang Wander
2005-05-26 17:32 ` Wolfgang Wander
2005-05-26 17:44 ` Chen, Kenneth W
2005-05-20 3:10 ` Chen, Kenneth W
2005-05-20 12:39 ` Wolfgang Wander
2005-05-20 2:14 Chen, Kenneth W
2005-05-20 12:47 ` Wolfgang Wander
2005-05-25 7:30 linux
2005-05-27 16:37 Chen, Kenneth W
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox