* The VFS cache is not freed when there is not enough free memory to allocate
@ 2006-11-22 7:51 Aubrey
2006-11-22 8:43 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Aubrey @ 2006-11-22 7:51 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 6817 bytes --]
Hi all,
We are working on the blackfin uClinux platform and we encountered the
following problem.
The attached patch can work around this issue and I post it here to
find better solution.
Here is a test application:
---------------------------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#define N 8
int main (void){
void *p[N];
int i;
printf("Alloc %d MB !\n", N);
for (i = 0; i < N; i++) {
p[i] = malloc(1024 * 1024);
if (p[i] == NULL)
printf("alloc failed\n");
}
printf("alloc successful \n");
for (i = 0; i < N; i++)
free(p[i]);
}
When there is not enough free memory to allocate:
==============================
root:/mnt> cat /proc/meminfo
MemTotal: 54196 kB
MemFree: 5520 kB <== only 5M free
Buffers: 76 kB
Cached: 44696 kB <== cache eat 40MB
SwapCached: 0 kB
Active: 21092 kB
Inactive: 23680 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 54196 kB
LowFree: 5520 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 0 kB
Mapped: 0 kB
Slab: 3720 kB
PageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 27096 kB
Committed_AS: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
==========================================
I run the test application and get the following message:
---------------------------------------
root:/mnt> ./t
Alloc 8 MB !
t: page allocation failure. order:9, mode:0x40d0
Hardware Trace:
0 Target : <0x00004de0> { _dump_stack + 0x0 }
Source : <0x0003054a> { ___alloc_pages + 0x17e }
1 Target : <0x0003054a> { ___alloc_pages + 0x17e }
Source : <0x0000dbc2> { _printk + 0x16 }
2 Target : <0x0000dbbe> { _printk + 0x12 }
Source : <0x0000da4e> { _vprintk + 0x1a2 }
3 Target : <0x0000da42> { _vprintk + 0x196 }
Source : <0xffa001ea> { __common_int_entry + 0xd8 }
4 Target : <0xffa00188> { __common_int_entry + 0x76 }
Source : <0x000089bc> { _return_from_int + 0x58 }
5 Target : <0x000089bc> { _return_from_int + 0x58 }
Source : <0x00008992> { _return_from_int + 0x2e }
6 Target : <0x00008964> { _return_from_int + 0x0 }
Source : <0xffa00184> { __common_int_entry + 0x72 }
7 Target : <0xffa00182> { __common_int_entry + 0x70 }
Source : <0x00012682> { __local_bh_enable + 0x56 }
8 Target : <0x0001266c> { __local_bh_enable + 0x40 }
Source : <0x0001265c> { __local_bh_enable + 0x30 }
9 Target : <0x00012654> { __local_bh_enable + 0x28 }
Source : <0x00012644> { __local_bh_enable + 0x18 }
10 Target : <0x0001262c> { __local_bh_enable + 0x0 }
Source : <0x000128e0> { ___do_softirq + 0x94 }
11 Target : <0x000128d8> { ___do_softirq + 0x8c }
Source : <0x000128b8> { ___do_softirq + 0x6c }
12 Target : <0x000128aa> { ___do_softirq + 0x5e }
Source : <0x0001666a> { _run_timer_softirq + 0x82 }
13 Target : <0x000165fc> { _run_timer_softirq + 0x14 }
Source : <0x00023eb8> { _hrtimer_run_queues + 0xe8 }
14 Target : <0x00023ea6> { _hrtimer_run_queues + 0xd6 }
Source : <0x00023e70> { _hrtimer_run_queues + 0xa0 }
15 Target : <0x00023e68> { _hrtimer_run_queues + 0x98 }
Source : <0x00023eae> { _hrtimer_run_queues + 0xde }
Stack from 015a7dcc:
00000001 0003054e 00000000 00000001 000040d0 0013c70c 00000009 000040d0
00000000 00000080 00000000 000240d0 00000000 015a6000 015a6000 015a6000
00000010 00000000 00000001 00036e12 00000000 0023f8e0 00000073 00191e40
00000020 0023e9a0 000040d0 015afea9 015afe94 00101fff 000040d0 0023e9a0
00000010 00101fff 000370de 00000000 0363d3e0 00000073 0000ffff 04000021
00000000 00101000 00187af0 00035b44 00000000 00035e40 00000000 00000000
Call Trace:
[<0000fffe>] _do_exit+0x12e/0x7cc
[<00004118>] _sys_mmap+0x54/0x98
[<00101000>] _fib_create_info+0x670/0x780
[<00008828>] _system_call+0x68/0xba
[<000040c4>] _sys_mmap+0x0/0x98
[<0000fffe>] _do_exit+0x12e/0x7cc
[<00008000>] _cplb_mgr+0x8/0x2e8
[<00101000>] _fib_create_info+0x670/0x780
[<00101000>] _fib_create_info+0x670/0x780
Mem-info:
DMA per-cpu:
cpu 0 hot: high 18, batch 3 used:5
cpu 0 cold: high 6, batch 1 used:5
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 21028kB (0kB HighMem)
Active:2549 inactive:3856 dirty:0 writeback:0 unstable:0 free:5257
slab:1833 mapped:0 pagetables:0
DMA free:21028kB min:948kB low:1184kB high:1420kB active:10196kB
inactive:15424kB present:56320kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 43*4kB 35*8kB 28*16kB 17*32kB 18*64kB 20*128kB 16*256kB 11*512kB
6*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 21028kB
DMA32: empty
Normal: empty
HighMem: empty
14080 pages of RAM
5285 free pages
531 reserved pages
11 pages shared
0 pages swap cached
Allocation of length 1052672 from process 57 failed
DMA per-cpu:
cpu 0 hot: high 18, batch 3 used:5
cpu 0 cold: high 6, batch 1 used:5
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 21028kB (0kB HighMem)
Active:2549 inactive:3856 dirty:0 writeback:0 unstable:0 free:5257
slab:1833 mapped:0 pagetables:0
DMA free:21028kB min:948kB low:1184kB high:1420kB active:10196kB
inactive:15424kB present:56320kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB
inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 43*4kB 35*8kB 28*16kB 17*32kB 18*64kB 20*128kB 16*256kB 11*512kB
6*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB = 21028kB
DMA32: empty
Normal: empty
HighMem: empty
-----------------------------
When there is no enough free memory, kernel crash instead of freeing VFS cache,
no matter how big the value of /proc/sys/vm/vfs_cache_pressure is set.
Here is my patch,
=====================================
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-22 7:51 The VFS cache is not freed when there is not enough free memory to allocate Aubrey
@ 2006-11-22 8:43 ` Peter Zijlstra
2006-11-22 10:02 ` Aubrey
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2006-11-22 8:43 UTC (permalink / raw)
To: Aubrey; +Cc: linux-kernel, linux-mm
On Wed, 2006-11-22 at 15:51 +0800, Aubrey wrote:
> Hi all,
>
> We are working on the blackfin uClinux platform and we encountered the
> following problem.
> The attached patch can work around this issue and I post it here to
> find better solution.
> root:/mnt> ./t
> Alloc 8 MB !
> t: page allocation failure. order:9, mode:0x40d0
^^^^^^^
Such high order allocs rarely succeed after bootup. The proposed patch
will hardly help that more than lumpy reclaim will. Please see the
threads on Mel Gorman's Anti-Fragmentation and Linear/Lumpy reclaim in
the linux-mm archives.
> From: Aubrey.Li <aubrey.li@analog.com>
> Date: Wed, 22 Nov 2006 15:10:18 +0800
> Subject: [PATCH] Drop VFS cache when there is not enough free memory to allocate
>
> Signed-off-by: Aubrey.Li <aubrey.li@analog.com>
> ---
> mm/page_alloc.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bf2f6cf..62559fd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1039,6 +1039,11 @@ restart:
> if (page)
> goto got_pg;
>
> +#if defined(CONFIG_EMBEDDED) && !defined(CONFIG_MMU)
> + drop_pagecache();
> + drop_slab();
> +#endif
> +
> /* This allocation should allow future memory freeing. */
>
> if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
> --
> The patch drop the page cache and slab and then give a new chance to
> get more free pages. Applied this patch, my test application can
> allocate memory sucessfully and drop the cache and slab as well. See
> below:
> ================================
> root:/mnt> ./t
> Alloc 8 MB !
> alloc successful
Pure luck, there are workloads where there just would not have been any
order 9 contiguous block freeable (think where each 9th order block
would contain at least one active inode).
> I know performance is important for linux, and VFS cache obviously
> improve the performance when implement file operation. But for
> embedded system, we'll try our best to make the application executable
> rather than hanging system to guarantee the system performance.
>
> Any suggestions and solutions are really appreciated!
Try Mel's patches and wait for the next Lumpy reclaim posting.
The lack of a MMU on your system makes it very hard not to rely on
higher order allocations, because even user-space allocs need to be
physically contiguous. But please take that into consideration when
writing software.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-22 8:43 ` Peter Zijlstra
@ 2006-11-22 10:02 ` Aubrey
2006-11-22 10:42 ` Peter Zijlstra
2006-11-27 7:39 ` Nick Piggin
0 siblings, 2 replies; 15+ messages in thread
From: Aubrey @ 2006-11-22 10:02 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel, linux-mm
On 11/22/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Please see the
> threads on Mel Gorman's Anti-Fragmentation and Linear/Lumpy reclaim in
> the linux-mm archives.
>
Thanks to point this. Is it already included in Linus' git tree?
> > The patch drop the page cache and slab and then give a new chance to
> > get more free pages. Applied this patch, my test application can
> > allocate memory sucessfully and drop the cache and slab as well. See
> > below:
> > ================================
> > root:/mnt> ./t
> > Alloc 8 MB !
> > alloc successful
>
> Pure luck, there are workloads where there just would not have been any
> order 9 contiguous block freeable (think where each 9th order block
> would contain at least one active inode).
>
> > I know performance is important for linux, and VFS cache obviously
> > improve the performance when implement file operation. But for
> > embedded system, we'll try our best to make the application executable
> > rather than hanging system to guarantee the system performance.
> >
> > Any suggestions and solutions are really appreciated!
>
> Try Mel's patches and wait for the next Lumpy reclaim posting.
>
> The lack of a MMU on your system makes it very hard not to rely on
> higher order allocations, because even user-space allocs need to be
> physically contiguous. But please take that into consideration when
> writing software.
Well, the test application just use an exaggerated way to replicate the issue.
Actually, In the real work, the application such as mplayer, asterisk,
etc will run into
the above problem when run them at the second time. I think I have no
reason to modify those kind of applications.
My patch let kernel drop VFS cache in the low memory situation when
the application requests more memory allocation, I don't think it's
luck. You know, the application just wants to allocate 8
1Mbyte-blocks(order =9) and releasing VFS cache we can get almost
50Mbyte free memory.
The patch indeedly enabled many failed test cases on our side. But
yes, I don't think it's the final solution. I'll try Mel's patch and
update the results.
Thanks,
-Aubrey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-22 10:02 ` Aubrey
@ 2006-11-22 10:42 ` Peter Zijlstra
2006-11-22 11:09 ` Aubrey
2006-11-27 1:34 ` Mike Frysinger
2006-11-27 7:39 ` Nick Piggin
1 sibling, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2006-11-22 10:42 UTC (permalink / raw)
To: Aubrey; +Cc: linux-kernel, linux-mm, mel, Andy Whitcroft
On Wed, 2006-11-22 at 18:02 +0800, Aubrey wrote:
> On 11/22/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > Please see the
> > threads on Mel Gorman's Anti-Fragmentation and Linear/Lumpy reclaim in
> > the linux-mm archives.
> >
>
> Thanks to point this. Is it already included in Linus' git tree?
No it is not.
> Well, the test application just use an exaggerated way to replicate the issue.
>
> Actually, In the real work, the application such as mplayer, asterisk,
> etc will run into
> the above problem when run them at the second time. I think I have no
> reason to modify those kind of applications.
It comes from the choice of architecture, I'd not run general purpose
code like that on MMU-less hardware. But yeah, I see your point.
> My patch let kernel drop VFS cache in the low memory situation when
> the application requests more memory allocation, I don't think it's
> luck. You know, the application just wants to allocate 8
> 1Mbyte-blocks(order =9) and releasing VFS cache we can get almost
> 50Mbyte free memory.
Yes it does that, but there is no guarantee that those 50MB have a
single 1M contiguous region amongst them.
> The patch indeedly enabled many failed test cases on our side. But
> yes, I don't think it's the final solution. I'll try Mel's patch and
> update the results.
Mel's patches alone aren't quite enough, you also need some reclaim
modifications, I'll ping Andy to see how far he's on that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-22 10:42 ` Peter Zijlstra
@ 2006-11-22 11:09 ` Aubrey
2006-11-27 1:34 ` Mike Frysinger
1 sibling, 0 replies; 15+ messages in thread
From: Aubrey @ 2006-11-22 11:09 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel, linux-mm, mel, Andy Whitcroft
On 11/22/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Mel's patches alone aren't quite enough, you also need some reclaim
> modifications, I'll ping Andy to see how far he's on that.
>
I think so. A quick look at Mei's patch, I found the patch can't help our case.
The current situation is that the application need 8 M memory, but
ther is only 5M free memory, cached memory eat almost 40Mbyte. When
the application is requesting the memory, kernel just report failure,
not attempt to release the VFS cache and try it again.
==============================
root:/mnt> cat /proc/meminfo
MemTotal: 54196 kB
MemFree: 5520 kB <== only 5M free
Buffers: 76 kB
Cached: 44696 kB <== cache eat 40MB
SwapCached: 0 kB
Active: 21092 kB
Inactive: 23680 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 54196 kB
LowFree: 5520 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 0 kB
Mapped: 0 kB
Slab: 3720 kB
PageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 27096 kB
Committed_AS: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
==========================================
-Aubrey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-22 10:42 ` Peter Zijlstra
2006-11-22 11:09 ` Aubrey
@ 2006-11-27 1:34 ` Mike Frysinger
1 sibling, 0 replies; 15+ messages in thread
From: Mike Frysinger @ 2006-11-27 1:34 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Aubrey, linux-kernel, linux-mm, mel, Andy Whitcroft
On 11/22/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Yes it does that, but there is no guarantee that those 50MB have a
> single 1M contiguous region amongst them.
right ... the testcase posted is more to quickly illustrate the
problem ... the requested size doesnt really matter, what does matter
is that we cant seem to reclaim memory from the VFS cache in scenarios
where the VFS cache is eating a ton of memory and we need some more
another scenario is where an application is constantly reading data
from a cd, re-encoding it to mp3, and then writing it to disk. the
VFS cache here quickly eats up the available memory.
-mike
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-22 10:02 ` Aubrey
2006-11-22 10:42 ` Peter Zijlstra
@ 2006-11-27 7:39 ` Nick Piggin
2006-11-29 7:17 ` Sonic Zhang
1 sibling, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2006-11-27 7:39 UTC (permalink / raw)
To: Aubrey; +Cc: Peter Zijlstra, linux-kernel, linux-mm
Aubrey wrote:
> On 11/22/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> The lack of a MMU on your system makes it very hard not to rely on
>> higher order allocations, because even user-space allocs need to be
>> physically contiguous. But please take that into consideration when
>> writing software.
>
>
> Well, the test application just use an exaggerated way to replicate the
> issue.
>
> Actually, In the real work, the application such as mplayer, asterisk,
> etc will run into
> the above problem when run them at the second time. I think I have no
> reason to modify those kind of applications.
No that's wrong. And your patch is just a hack that happens to mask the
issue in the case you tested, and it will probably blow up in production
at some stage (consider the case where the VFS cache page is not freeable
or that page is being used for something else).
With the nommu kernel, you actually *do* have a huge reason to write
special code: large anonymous memory allocations have to use higher order
allocations!
I haven't actually written any nommu userspace code, but it is obvious
that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
even 3 allocations seem to be reasonable, from process context)... Then
you would use something a bit more advanced than a linear array to store
data (a pagetable-like radix tree would be a nice, easy idea).
You are of course free to put that patch into your product's kernel
(although I would advise against it, because it has a lot of deadlock
issues)... but the reality is that if you want a robust system, you
cannot just take a unix program and run it unmodified on a nommu kernel
AFAIKS.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-27 7:39 ` Nick Piggin
@ 2006-11-29 7:17 ` Sonic Zhang
2006-11-29 9:27 ` Aubrey
0 siblings, 1 reply; 15+ messages in thread
From: Sonic Zhang @ 2006-11-29 7:17 UTC (permalink / raw)
To: Nick Piggin; +Cc: Aubrey, Peter Zijlstra, linux-kernel, linux-mm
Forward to the mailing list.
Sonic Zhang wrote:
> On 11/27/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>> I haven't actually written any nommu userspace code, but it is obvious
>> that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
>> even 3 allocations seem to be reasonable, from process context)... Then
>> you would use something a bit more advanced than a linear array to store
>> data (a pagetable-like radix tree would be a nice, easy idea).
>>
>
> But, even we split the 8M memory into 2048 x 4k blocks, we still face
> this failure. The key problem is that available memory is small than
> 2048 x 4k, while there are still a lot of VFS cache. The VFS cache can
> be freed, but kernel allocation function ignores it. See the new test
> application.
Which kernel allocation function? If you can provide more details I'd
like to get to the bottom of this.
Because the anonymous memory allocation in mm/nommu.c is all allocated
with GFP_KERNEL from process context, and in that case, the allocator
should not fail but call into page reclaim which in turn will free VFS
caches.
> What's a better way to free the VFS cache in memory allocator?
It should be freeing it for you, so I'm not quite sure what is going
on. Can you send over the kernel messages you see when the allocation
fails?
Also, do you happen to know of a reasonable toolchain + emulator setup
that I could test the nommu kernel with?
Thanks,
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-29 7:17 ` Sonic Zhang
@ 2006-11-29 9:27 ` Aubrey
2006-11-29 9:30 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Aubrey @ 2006-11-29 9:27 UTC (permalink / raw)
To: Nick Piggin
Cc: Sonic Zhang, Peter Zijlstra, linux-kernel, linux-mm, vapier.adi
On 11/29/06, Sonic Zhang <sonic.adi@gmail.com> wrote:
> Forward to the mailing list.
>
> > On 11/27/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>
> >> I haven't actually written any nommu userspace code, but it is obvious
> >> that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
> >> even 3 allocations seem to be reasonable, from process context)... Then
> >> you would use something a bit more advanced than a linear array to store
> >> data (a pagetable-like radix tree would be a nice, easy idea).
> >>
> >
> > But, even we split the 8M memory into 2048 x 4k blocks, we still face
> > this failure. The key problem is that available memory is small than
> > 2048 x 4k, while there are still a lot of VFS cache. The VFS cache can
> > be freed, but kernel allocation function ignores it. See the new test
> > application.
>
>
> Which kernel allocation function? If you can provide more details I'd
> like to get to the bottom of this.
I posted it here, I think you missed it. So forwarded it to you.
>
> Because the anonymous memory allocation in mm/nommu.c is all allocated
> with GFP_KERNEL from process context, and in that case, the allocator
> should not fail but call into page reclaim which in turn will free VFS
> caches.
>
>
>
> > What's a better way to free the VFS cache in memory allocator?
>
>
> It should be freeing it for you, so I'm not quite sure what is going
> on. Can you send over the kernel messages you see when the allocation
> fails?
I don't think so. The kernel doesn't attempt to free it. The log is
included in the mail I forwarded to you.
>
> Also, do you happen to know of a reasonable toolchain + emulator setup
> that I could test the nommu kernel with?
A project named skyeye.
http://www.skyeye.org/index.shtml
-Aubrey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-29 9:27 ` Aubrey
@ 2006-11-29 9:30 ` Nick Piggin
2006-11-30 12:54 ` Aubrey
0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2006-11-29 9:30 UTC (permalink / raw)
To: Aubrey; +Cc: Sonic Zhang, Peter Zijlstra, linux-kernel, linux-mm, vapier.adi
Aubrey wrote:
> On 11/29/06, Sonic Zhang <sonic.adi@gmail.com> wrote:
>
>> Forward to the mailing list.
>>
>> > On 11/27/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>
>>
>> >> I haven't actually written any nommu userspace code, but it is obvious
>> >> that you must try to keep malloc to <= PAGE_SIZE (although order 2 and
>> >> even 3 allocations seem to be reasonable, from process context)...
>> Then
>> >> you would use something a bit more advanced than a linear array to
>> store
>> >> data (a pagetable-like radix tree would be a nice, easy idea).
>> >>
>> >
>> > But, even we split the 8M memory into 2048 x 4k blocks, we still face
>> > this failure. The key problem is that available memory is small than
>> > 2048 x 4k, while there are still a lot of VFS cache. The VFS cache can
>> > be freed, but kernel allocation function ignores it. See the new test
>> > application.
>>
>>
>> Which kernel allocation function? If you can provide more details I'd
>> like to get to the bottom of this.
>
>
> I posted it here, I think you missed it. So forwarded it to you.
That was the order-9 allocation failure. Which is not going to be
solved properly by just dropping caches.
But Sonic apparently saw failures with 4K allocations, where the
caches weren't getting shrunk properly. This would be more interesting
because it would indicate a real problem with the kernel.
>>
>> Also, do you happen to know of a reasonable toolchain + emulator setup
>> that I could test the nommu kernel with?
>
>
> A project named skyeye.
> http://www.skyeye.org/index.shtml
Thanks, I'll give that one a try.
Nick
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-29 9:30 ` Nick Piggin
@ 2006-11-30 12:54 ` Aubrey
2006-11-30 21:18 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Aubrey @ 2006-11-30 12:54 UTC (permalink / raw)
To: Nick Piggin
Cc: Sonic Zhang, Peter Zijlstra, linux-kernel, linux-mm, vapier.adi
On 11/29/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> That was the order-9 allocation failure. Which is not going to be
> solved properly by just dropping caches.
>
> But Sonic apparently saw failures with 4K allocations, where the
> caches weren't getting shrunk properly. This would be more interesting
> because it would indicate a real problem with the kernel.
>
I have done several test cases. when cat /proc/meminfo show MemFree < 8192KB,
1) malloc(1024 * 4), 256 times = 8MB, allocation successful.
2) malloc(1024 * 16), 64 times = 8MB, allocation successful.
3) malloc(1024 * 64), 16 times = 8MB, allocation successful.
4) malloc(1024 * 128), 8 times = 8MB, allocation failed.
5) malloc(1024 * 256), 4 times = 8MB, allocation failed.
>From those results, we know, when allocation <=64K, cache can be
shrunk properly.
That means the malloc size of an application on nommu should be
<=64KB. That's exactly our problem. Some video programmes need a big
block which has contiguous physical address. But yes, as you said, we
must keep malloc not to alloc a big block to make the current kernel
working robust on nommu.
So, my question is, Can we improve this issue? why malloc(64K) is ok
but malloc(128K) not? Is there any existing parameters about this
issue? why not kernel attempt to shrunk cache no matter how big memory
allocation is requested?
Any thoughts?
Thanks,
-Aubrey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-30 12:54 ` Aubrey
@ 2006-11-30 21:18 ` Nick Piggin
2006-12-01 10:00 ` Aubrey
0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2006-11-30 21:18 UTC (permalink / raw)
To: Aubrey; +Cc: Sonic Zhang, Peter Zijlstra, linux-kernel, linux-mm, vapier.adi
Aubrey wrote:
> On 11/29/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>> That was the order-9 allocation failure. Which is not going to be
>> solved properly by just dropping caches.
>>
>> But Sonic apparently saw failures with 4K allocations, where the
>> caches weren't getting shrunk properly. This would be more interesting
>> because it would indicate a real problem with the kernel.
>>
> I have done several test cases. when cat /proc/meminfo show MemFree <
> 8192KB,
>
> 1) malloc(1024 * 4), 256 times = 8MB, allocation successful.
> 2) malloc(1024 * 16), 64 times = 8MB, allocation successful.
> 3) malloc(1024 * 64), 16 times = 8MB, allocation successful.
> 4) malloc(1024 * 128), 8 times = 8MB, allocation failed.
> 5) malloc(1024 * 256), 4 times = 8MB, allocation failed.
>
>> From those results, we know, when allocation <=64K, cache can be
>
> shrunk properly.
> That means the malloc size of an application on nommu should be
> <=64KB. That's exactly our problem. Some video programmes need a big
> block which has contiguous physical address. But yes, as you said, we
> must keep malloc not to alloc a big block to make the current kernel
> working robust on nommu.
>
> So, my question is, Can we improve this issue? why malloc(64K) is ok
> but malloc(128K) not? Is there any existing parameters about this
> issue? why not kernel attempt to shrunk cache no matter how big memory
> allocation is requested?
>
> Any thoughts?
The pattern you are seeing here is probably due to the page allocator
always retrying process context allocations which are <= order 3 (64K
with 4K pages).
You might be able to increase this limit a bit for your system, but it
could easily cause problems. Especially fragmentation on nommu systems
where the anonymous memory cannot be paged out.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-30 21:18 ` Nick Piggin
@ 2006-12-01 10:00 ` Aubrey
0 siblings, 0 replies; 15+ messages in thread
From: Aubrey @ 2006-12-01 10:00 UTC (permalink / raw)
To: Nick Piggin
Cc: Sonic Zhang, Peter Zijlstra, linux-kernel, linux-mm, vapier.adi
On 12/1/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> The pattern you are seeing here is probably due to the page allocator
> always retrying process context allocations which are <= order 3 (64K
> with 4K pages).
>
> You might be able to increase this limit a bit for your system, but it
> could easily cause problems. Especially fragmentation on nommu systems
> where the anonymous memory cannot be paged out.
Thanks for your clue. I found increasing this limit could really help
my test cases.
When MemFree < 8M, and the test case request 1M * 8 times, the
allocation can be sucessful after 81 times rebalance, :). So far I
haven't found any issue.
If I make a patch to move this parameter to be tunable in the proc
filesystem on nommu case, is it acceptable?
Thanks,
-Aubrey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
@ 2006-11-28 13:29 Robin Getz
2006-11-28 14:41 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Robin Getz @ 2006-11-28 13:29 UTC (permalink / raw)
To: Nick Piggin; +Cc: Peter Zijlstra, linux-kernel, linux-mm
Nick wrote:
>And your patch is just a hack that happens to mask the issue in the case
>you tested, and it will probably blow up in production at some stage
Ok - that would be bad - back to the drawing board.
Maybe we need to take a step back, and describe the original problem, and
someone can maybe point us in the correct direction, so we can figure out
the proper way to fix things.
As Aubrey stated:
>When there is no enough free memory, the kernel kprints an OOM, and kills
>the application, instead of freeing VFS cache, no matter how big the value
>of /proc/sys/vm/vfs_cache_pressure is set to.
This seems to happen with application allocations as small as one page.
Larger allocations just make this happen faster.
By doing a periodic "echo 3 > /proc/sys/vm/drop_caches" in a different
terminal, seems to make the problem go away.
From what I understand, as documented in
./Documentation/filesystem/proc.txt we should be able to control the size
of vfs cache, but it does not seem to work. vfs cache on noMMU seems to
grow, and grow, and grow, until a) you drop caches manually, or b) the
system does a OOM.
Any pointers to the correct place to start investigating this would be
appreciated.
-Robin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: The VFS cache is not freed when there is not enough free memory to allocate
2006-11-28 13:29 Robin Getz
@ 2006-11-28 14:41 ` Nick Piggin
0 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-11-28 14:41 UTC (permalink / raw)
To: Robin Getz; +Cc: Peter Zijlstra, linux-kernel, linux-mm
Robin Getz wrote:
> Nick wrote:
>
>> And your patch is just a hack that happens to mask the issue in the
>> case you tested, and it will probably blow up in production at some stage
>
>
> Ok - that would be bad - back to the drawing board.
>
> Maybe we need to take a step back, and describe the original problem,
> and someone can maybe point us in the correct direction, so we can
> figure out the proper way to fix things.
>
> As Aubrey stated:
>
>> When there is no enough free memory, the kernel kprints an OOM, and
>> kills the application, instead of freeing VFS cache, no matter how big
>> the value of /proc/sys/vm/vfs_cache_pressure is set to.
>
>
> This seems to happen with application allocations as small as one page.
> Larger allocations just make this happen faster.
It might be caused by the fact that nommu uses slab, which can perform
higher order allocations for the slab, even if the object is smaller
than a page. Maybe it could fall back to using the page allocator in
this case? I don't know if the slab API gives a way to prevent this
higher-order packing.
If you get this problem via an actual order-0 allocation, then there must
be some bug or genuine OOM condition.
> By doing a periodic "echo 3 > /proc/sys/vm/drop_caches" in a different
> terminal, seems to make the problem go away.
>
> From what I understand, as documented in
> ./Documentation/filesystem/proc.txt we should be able to control the
> size of vfs cache, but it does not seem to work. vfs cache on noMMU
> seems to grow, and grow, and grow, until a) you drop caches manually, or
> b) the system does a OOM.
>
> Any pointers to the correct place to start investigating this would be
> appreciated.
The easiest might be to do it in kswapd, perhaps if kswapd_max_order is > 0,
and kswapd reclaim is unable to solve the shortage? This at least would get
you calling from a valid context, and so avoid deadlock problems of calling
drop_caches directly from the allocator.
Nick
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-12-01 10:00 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-22 7:51 The VFS cache is not freed when there is not enough free memory to allocate Aubrey
2006-11-22 8:43 ` Peter Zijlstra
2006-11-22 10:02 ` Aubrey
2006-11-22 10:42 ` Peter Zijlstra
2006-11-22 11:09 ` Aubrey
2006-11-27 1:34 ` Mike Frysinger
2006-11-27 7:39 ` Nick Piggin
2006-11-29 7:17 ` Sonic Zhang
2006-11-29 9:27 ` Aubrey
2006-11-29 9:30 ` Nick Piggin
2006-11-30 12:54 ` Aubrey
2006-11-30 21:18 ` Nick Piggin
2006-12-01 10:00 ` Aubrey
2006-11-28 13:29 Robin Getz
2006-11-28 14:41 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox