linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: mmap() > phys mem problem
@ 2004-06-14 22:04 Ron Maeder
  2004-06-15  3:19 ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Ron Maeder @ 2004-06-14 22:04 UTC (permalink / raw)
  To: nickpiggin; +Cc: riel, akpm, linux-mm

I tried upping /proc/sys/vm/min_free_kbytes to 4096 as suggested below, 
with the same results (grinding to a halt, out of mem).

Any other suggestions?  Thanks for your help.

Ron

---------- Forwarded message ----------
Date: Sun, 06 Jun 2004 11:55:39 +1000
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Ron Maeder <rlm@orionmulti.com>
Cc: Rik van Riel <riel@surriel.com>, linux-mm@kvack.org,
     Andrew Morton <akpm@osdl.org>
Subject: Re: mmap() > phys mem problem

Ron Maeder wrote:
> Thanks very much for your response.  I have had some help trying out the 
> patch and running recent versions of the kernel.  The problem is not 
> fixed in 2.6.6+patch or in 2.6.7-rc2.  Any other suggestions?
> 

OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
waiting for some network IO. Unfortunately at this point, the system
is so clogged up that order 0 GFP_ATOMIC allocations are failing in
this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.

Sadly this seems to happen pretty easily here. I don't know the
network layer, so I don't know what might be required to fix it or if
it is even possible.

This doesn't happen so easily with swap enabled (still theoretically
possible), because freeing block device backed memory should be
deadlock free, so you have another avenue to free memory. I assume
you want diskless clients, so this isn't an option.

You could try working around it by upping /proc/sys/vm/min_free_kbytes
maybe to 2048 or 4096.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-14 22:04 mmap() > phys mem problem Ron Maeder
@ 2004-06-15  3:19 ` Nick Piggin
  2004-06-16  3:08   ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2004-06-15  3:19 UTC (permalink / raw)
  To: Ron Maeder; +Cc: riel, akpm, linux-mm

Ron Maeder wrote:
> I tried upping /proc/sys/vm/min_free_kbytes to 4096 as suggested below, 
> with the same results (grinding to a halt, out of mem).
> 
> Any other suggestions?  Thanks for your help.
> 

Hmm. Maybe ask linux-net and/or the NFS guys?

You need to know the maximum amount of memory that your setup
might need in order to write out one page.

There might also be ways to reduce this, like reducing NFS
transfer sizes or network buffers... I dunno.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-15  3:19 ` Nick Piggin
@ 2004-06-16  3:08   ` Nick Piggin
  2004-06-16  6:37     ` Ron Maeder
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2004-06-16  3:08 UTC (permalink / raw)
  To: Ron Maeder; +Cc: riel, akpm, linux-mm

Nick Piggin wrote:
> Ron Maeder wrote:
> 
>> I tried upping /proc/sys/vm/min_free_kbytes to 4096 as suggested 
>> below, with the same results (grinding to a halt, out of mem).
>>
>> Any other suggestions?  Thanks for your help.
>>
> 
> Hmm. Maybe ask linux-net and/or the NFS guys?
> 
> You need to know the maximum amount of memory that your setup
> might need in order to write out one page.
> 
> There might also be ways to reduce this, like reducing NFS
> transfer sizes or network buffers... I dunno.
> 

Actually no, I don't think that will help. I have an
idea that might help. Stay tuned :)

For the time being, would it be at all possible to
work around it using your msync hack, turning swap on,
or doing read/write IO?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-16  3:08   ` Nick Piggin
@ 2004-06-16  6:37     ` Ron Maeder
  0 siblings, 0 replies; 15+ messages in thread
From: Ron Maeder @ 2004-06-16  6:37 UTC (permalink / raw)
  To: Nick Piggin; +Cc: riel, akpm, linux-mm

I can avoid this particular situation in the short term, but I can't avoid 
the general case in the long run.  Thanks again.  -Ron

On Wed, 16 Jun 2004, Nick Piggin wrote:

> Nick Piggin wrote:
>> Ron Maeder wrote:
>> 
>>> I tried upping /proc/sys/vm/min_free_kbytes to 4096 as suggested below, 
>>> with the same results (grinding to a halt, out of mem).
>>> 
>>> Any other suggestions?  Thanks for your help.
>>> 
>> 
>> Hmm. Maybe ask linux-net and/or the NFS guys?
>> 
>> You need to know the maximum amount of memory that your setup
>> might need in order to write out one page.
>> 
>> There might also be ways to reduce this, like reducing NFS
>> transfer sizes or network buffers... I dunno.
>> 
>
> Actually no, I don't think that will help. I have an
> idea that might help. Stay tuned :)
>
> For the time being, would it be at all possible to
> work around it using your msync hack, turning swap on,
> or doing read/write IO?
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-07 12:04               ` Rik van Riel
@ 2004-06-08  0:03                 ` Nick Piggin
  0 siblings, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2004-06-08  0:03 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ron Maeder, Rik van Riel, linux-mm, Andrew Morton

Rik van Riel wrote:
> On Mon, 7 Jun 2004, Nick Piggin wrote:
> 
> 
>>Well, no there isn't enough memory available: order 0 allocations
>>keep failing in the RX path (I assume each time the server retransmits)
>>and the machine is absolutely deadlocked.
> 
> 
> Yes, but did the memory get exhausted by the RX path itself,
> or by something else that's allocating the last system memory?
> 

I see what you mean. No I didn't dig that far although I
assume *most* of it would have been consumed by networking.

> If the memory exhaustion is because of something else, a
> mempool for the RX path might alleviate the situation.
> 
> 
>>>The theoretically perfect fix is to have a little mempool for
>>>every critical socket.  That is, every NFS mount, e/g/nbd block
>>>device, etc...
> 
> 
>>It would be cool if someone were able to come up with a formula
>>to capture that, and allow sockets to be marked as MEMALLOC to
>>enable mempool allocation.
> 
> 
> A per-socket mempool I guess.  At creation of a MEMALLOC
> socket you'd set up the mempool, and the same mempool
> would get destroyed when the socket is closed.
> 
> Then all memory allocations for that socket go via the
> mempool.
> 

That would be ideal, yes. I wonder how much work is involved.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-07  3:59             ` Nick Piggin
@ 2004-06-07 12:04               ` Rik van Riel
  2004-06-08  0:03                 ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2004-06-07 12:04 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ron Maeder, Rik van Riel, linux-mm, Andrew Morton

On Mon, 7 Jun 2004, Nick Piggin wrote:

> Well, no there isn't enough memory available: order 0 allocations
> keep failing in the RX path (I assume each time the server retransmits)
> and the machine is absolutely deadlocked.

Yes, but did the memory get exhausted by the RX path itself,
or by something else that's allocating the last system memory?

If the memory exhaustion is because of something else, a
mempool for the RX path might alleviate the situation.

> > The theoretically perfect fix is to have a little mempool for
> > every critical socket.  That is, every NFS mount, e/g/nbd block
> > device, etc...

> It would be cool if someone were able to come up with a formula
> to capture that, and allow sockets to be marked as MEMALLOC to
> enable mempool allocation.

A per-socket mempool I guess.  At creation of a MEMALLOC
socket you'd set up the mempool, and the same mempool
would get destroyed when the socket is closed.

Then all memory allocations for that socket go via the
mempool.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-06 23:51           ` Rik van Riel
@ 2004-06-07  3:59             ` Nick Piggin
  2004-06-07 12:04               ` Rik van Riel
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2004-06-07  3:59 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ron Maeder, Rik van Riel, linux-mm, Andrew Morton

Rik van Riel wrote:
> On Sun, 6 Jun 2004, Nick Piggin wrote:
> 
> 
>>OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
>>waiting for some network IO. Unfortunately at this point, the system
>>is so clogged up that order 0 GFP_ATOMIC allocations are failing in
>>this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.
> 
> 
> I wonder if there simply isn't enough memory available for
> GFP_ATOMIC network allocations, or if a mempool would alleviate
> the situation here.
> 

Well, no there isn't enough memory available: order 0 allocations
keep failing in the RX path (I assume each time the server retransmits)
and the machine is absolutely deadlocked.

> 
>>Sadly this seems to happen pretty easily here. I don't know the
>>network layer, so I don't know what might be required to fix it or if
>>it is even possible.
> 
> 
> The theoretically perfect fix is to have a little mempool for
> every critical socket.  That is, every NFS mount, e/g/nbd block
> device, etc...
> 

Yes. I assume there is some maximum amount of memory you might
have to allocate depending on things like fragmented and out of
order packets.

It would be cool if someone were able to come up with a formula
to capture that, and allow sockets to be marked as MEMALLOC to
enable mempool allocation.

> Of course, chances are that having one mempool for the network
> allocations might already do the trick for 95% 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-06  1:55         ` Nick Piggin
@ 2004-06-06 23:51           ` Rik van Riel
  2004-06-07  3:59             ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2004-06-06 23:51 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ron Maeder, Rik van Riel, linux-mm, Andrew Morton

On Sun, 6 Jun 2004, Nick Piggin wrote:

> OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
> waiting for some network IO. Unfortunately at this point, the system
> is so clogged up that order 0 GFP_ATOMIC allocations are failing in
> this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.

I wonder if there simply isn't enough memory available for
GFP_ATOMIC network allocations, or if a mempool would alleviate
the situation here.

> Sadly this seems to happen pretty easily here. I don't know the
> network layer, so I don't know what might be required to fix it or if
> it is even possible.

The theoretically perfect fix is to have a little mempool for
every critical socket.  That is, every NFS mount, e/g/nbd block
device, etc...

Of course, chances are that having one mempool for the network
allocations might already do the trick for 95% 

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-06-05 19:21       ` Ron Maeder
@ 2004-06-06  1:55         ` Nick Piggin
  2004-06-06 23:51           ` Rik van Riel
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2004-06-06  1:55 UTC (permalink / raw)
  To: Ron Maeder; +Cc: Rik van Riel, linux-mm, Andrew Morton

Ron Maeder wrote:
> Thanks very much for your response.  I have had some help trying out the 
> patch and running recent versions of the kernel.  The problem is not 
> fixed in 2.6.6+patch or in 2.6.7-rc2.  Any other suggestions?
> 

OK, NFS is getting stuck in nfs_flush_one => mempool_alloc presumably
waiting for some network IO. Unfortunately at this point, the system
is so clogged up that order 0 GFP_ATOMIC allocations are failing in
this path: netedev_rx => refill_rx => alloc_skb. ie. deadlock.

Sadly this seems to happen pretty easily here. I don't know the
network layer, so I don't know what might be required to fix it or if
it is even possible.

This doesn't happen so easily with swap enabled (still theoretically
possible), because freeing block device backed memory should be
deadlock free, so you have another avenue to free memory. I assume
you want diskless clients, so this isn't an option.

You could try working around it by upping /proc/sys/vm/min_free_kbytes
maybe to 2048 or 4096.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-05-30  9:24     ` Nick Piggin
  2004-05-30 10:15       ` Andrew Morton
@ 2004-06-05 19:21       ` Ron Maeder
  2004-06-06  1:55         ` Nick Piggin
  1 sibling, 1 reply; 15+ messages in thread
From: Ron Maeder @ 2004-06-05 19:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Rik van Riel, linux-mm

Thanks very much for your response.  I have had some help trying out the 
patch and running recent versions of the kernel.  The problem is not fixed 
in 2.6.6+patch or in 2.6.7-rc2.  Any other suggestions?

Ron

On Sun, 30 May 2004, Nick Piggin wrote:

> Ron Maeder wrote:
>> On Fri, 28 May 2004, Rik van Riel wrote:
>> 
>>> On Tue, 25 May 2004, Ron Maeder wrote:
>>> 
>>>> Is this an "undocumented feature" or is this a linux error?  I would
>>>> expect pages of the mmap()'d file would get paged back to the original
>>>> file. I know this won't be fast, but the performance is not an issue 
>>>> for
>>>> this application.
>>> 
>>> 
>>> It looks like a kernel bug.  Can you reproduce this problem
>>> with the latest 2.6 kernel or is it still there ?
>>> 
>>> Rik
>> 
>> 
>> I was able to reproduce the problem with the code that I posted on a 2.6.6
>> kernel.
>> 
>
> Can you give this NFS patch (from Trond) a try please?
>
> (I don't think it is a very good idea for NFS to be using
> WRITEPAGE_ACTIVATE here. If NFS needs to have good write
> clustering off the end of the LRU, we need to go about it
> some other way.)
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-05-30  9:24     ` Nick Piggin
@ 2004-05-30 10:15       ` Andrew Morton
  2004-06-05 19:21       ` Ron Maeder
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2004-05-30 10:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: rlm, riel, linux-mm

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>  -				err = WRITEPAGE_ACTIVATE;
>  +				nfs_flush_inode(inode, 0, 0, FLUSH_STABLE);

err, absolutely.  I thought we fixed that months ago...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-05-30  4:47   ` Ron Maeder
@ 2004-05-30  9:24     ` Nick Piggin
  2004-05-30 10:15       ` Andrew Morton
  2004-06-05 19:21       ` Ron Maeder
  0 siblings, 2 replies; 15+ messages in thread
From: Nick Piggin @ 2004-05-30  9:24 UTC (permalink / raw)
  To: Ron Maeder; +Cc: Rik van Riel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

Ron Maeder wrote:
> On Fri, 28 May 2004, Rik van Riel wrote:
> 
>> On Tue, 25 May 2004, Ron Maeder wrote:
>>
>>> Is this an "undocumented feature" or is this a linux error?  I would
>>> expect pages of the mmap()'d file would get paged back to the original
>>> file. I know this won't be fast, but the performance is not an issue for
>>> this application.
>>
>>
>> It looks like a kernel bug.  Can you reproduce this problem
>> with the latest 2.6 kernel or is it still there ?
>>
>> Rik
> 
> 
> I was able to reproduce the problem with the code that I posted on a 2.6.6
> kernel.
> 

Can you give this NFS patch (from Trond) a try please?

(I don't think it is a very good idea for NFS to be using
WRITEPAGE_ACTIVATE here. If NFS needs to have good write
clustering off the end of the LRU, we need to go about it
some other way.)


[-- Attachment #2: nfs-writepage.patch --]
[-- Type: text/x-patch, Size: 725 bytes --]

 linux-2.6-npiggin/fs/nfs/write.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff -puN fs/nfs/write.c~nfs-writepage fs/nfs/write.c
--- linux-2.6/fs/nfs/write.c~nfs-writepage	2004-05-30 18:46:48.000000000 +1000
+++ linux-2.6-npiggin/fs/nfs/write.c	2004-05-30 18:46:48.000000000 +1000
@@ -320,7 +320,7 @@ do_it:
 		if (err >= 0) {
 			err = 0;
 			if (wbc->for_reclaim)
-				err = WRITEPAGE_ACTIVATE;
+				nfs_flush_inode(inode, 0, 0, FLUSH_STABLE);
 		}
 	} else {
 		err = nfs_writepage_sync(NULL, inode, page, 0,
@@ -333,8 +333,7 @@ do_it:
 	}
 	unlock_kernel();
 out:
-	if (err != WRITEPAGE_ACTIVATE)
-		unlock_page(page);
+	unlock_page(page);
 	if (inode_referenced)
 		iput(inode);
 	return err; 

_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-05-29  2:08 ` Rik van Riel
@ 2004-05-30  4:47   ` Ron Maeder
  2004-05-30  9:24     ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Ron Maeder @ 2004-05-30  4:47 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm

On Fri, 28 May 2004, Rik van Riel wrote:

> On Tue, 25 May 2004, Ron Maeder wrote:
>
>> Is this an "undocumented feature" or is this a linux error?  I would
>> expect pages of the mmap()'d file would get paged back to the original
>> file. I know this won't be fast, but the performance is not an issue for
>> this application.
>
> It looks like a kernel bug.  Can you reproduce this problem
> with the latest 2.6 kernel or is it still there ?
>
> Rik

I was able to reproduce the problem with the code that I posted on a 2.6.6
kernel.

Ron

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mmap() > phys mem problem
  2004-05-25 22:40 Ron Maeder
@ 2004-05-29  2:08 ` Rik van Riel
  2004-05-30  4:47   ` Ron Maeder
  0 siblings, 1 reply; 15+ messages in thread
From: Rik van Riel @ 2004-05-29  2:08 UTC (permalink / raw)
  To: Ron Maeder; +Cc: linux-mm

On Tue, 25 May 2004, Ron Maeder wrote:

> Is this an "undocumented feature" or is this a linux error?  I would
> expect pages of the mmap()'d file would get paged back to the original
> file. I know this won't be fast, but the performance is not an issue for
> this application.

It looks like a kernel bug.  Can you reproduce this problem
with the latest 2.6 kernel or is it still there ?

Rik
-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* mmap() > phys mem problem
@ 2004-05-25 22:40 Ron Maeder
  2004-05-29  2:08 ` Rik van Riel
  0 siblings, 1 reply; 15+ messages in thread
From: Ron Maeder @ 2004-05-25 22:40 UTC (permalink / raw)
  To: linux-mm

I have a diskless x86 box running the 2.6.5rc3 kernel.  I ran a program
which mmap()'d a file that was larger than physical memory over NFS and
then began to write values to it.  The process grew until it was near the
size of phys mem, and then grinded to a halt and other programs, including 
daemons, were exiting when they should have stayed running.

If I run the program on a system that has some swap space, it completes 
without any issue.

It seems as if the OS will not write any dirty pages back to the mmap()'d 
file, and then eventually runs out of memory.

Is this an "undocumented feature" or is this a linux error?  I would
expect pages of the mmap()'d file would get paged back to the original
file. I know this won't be fast, but the performance is not an issue for
this application.

Below is an example that reproduces the problem on a machine without swap.  
If I do an occasional synchronous msync(MS_SYNC) (compiling -DNEVER), the
test case completes fine, while if I use an msync(MS_ASYNC) then other
programs exit as if I did no msync().

Many thanks,

Ron
---------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/sysinfo.h>

#define MAX_UNSIGNED	((unsigned) (~0))
#define	FILE_MODE	(S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH)

unsigned
total_ram()
{
    struct sysinfo	info;
    double		my_total_ram;

    if (sysinfo(&info) != 0) {
	perror("sysinfo");
	exit(1);
    }
    my_total_ram = ((double) info.totalram * (double) info.mem_unit);
    if (my_total_ram > (double) MAX_UNSIGNED) {
	fprintf(stderr, "'my_total_ram' too large for 'unsigned' type.");
	exit(1);
    }
    return((unsigned) my_total_ram);
}

int
main()
{
    unsigned	i;
    unsigned	addr_size;
    unsigned	mem_size;
    unsigned	*mem;
    char	swap_filename[20] = "thrash_swap";
    int		swap_filedes;

    mem_size = total_ram();
    mem_size -= (mem_size % sizeof(unsigned));	/* align to 'unsigned' size */
    /* compute the size of the address for 'unsigned' memory accesses */
    addr_size = ((mem_size / sizeof(unsigned)) - 1);

    (void) unlink(swap_filename);
    if ((swap_filedes = open(swap_filename, O_RDWR | O_CREAT | O_TRUNC,
			     FILE_MODE)) == -1) {
	perror("open: Can't open for writing");
	exit(1);
    }
    /* Set size of swap file */
    if (lseek(swap_filedes, (mem_size - 1), SEEK_SET) == (off_t) -1) {
	perror("lseek");
	exit(1);
    }
    if (write(swap_filedes, "", 1) != 1) {
	perror("write");
	exit(1);
    }
    if ((mem = (unsigned *) mmap(0, mem_size, PROT_READ | PROT_WRITE,
				 MAP_FILE | MAP_SHARED, swap_filedes, 0))
	== (unsigned *) -1) {
	perror("mmap");
	exit(1);
    }
    /* for this example just dirty each page. */
    for (i = 0; i < addr_size; i += 1024) {
	mem[i] = 0;
	if ((i & 0xfffff) == 0) {
#ifdef NEVER
	    if (msync(mem, mem_size, MS_SYNC) != 0) {
		perror("msync");
		exit(1);
	    }
#endif
	    printf(".");
	    fflush(stdout);
	}
    }
    if (munmap(mem, mem_size) != 0) {
	perror("munmap");
	exit(1);
    }
    if (close(swap_filedes) != 0) {
	perror("close");
	exit(1);
    }
    if (unlink(swap_filename) != 0) {
	perror("unlink");
	exit(1);
    }
    printf("\n");
    fflush(stdout);
    return(0);
}



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-06-16  6:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-14 22:04 mmap() > phys mem problem Ron Maeder
2004-06-15  3:19 ` Nick Piggin
2004-06-16  3:08   ` Nick Piggin
2004-06-16  6:37     ` Ron Maeder
  -- strict thread matches above, loose matches on Subject: below --
2004-05-25 22:40 Ron Maeder
2004-05-29  2:08 ` Rik van Riel
2004-05-30  4:47   ` Ron Maeder
2004-05-30  9:24     ` Nick Piggin
2004-05-30 10:15       ` Andrew Morton
2004-06-05 19:21       ` Ron Maeder
2004-06-06  1:55         ` Nick Piggin
2004-06-06 23:51           ` Rik van Riel
2004-06-07  3:59             ` Nick Piggin
2004-06-07 12:04               ` Rik van Riel
2004-06-08  0:03                 ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox