linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* weird calloc problem
@ 1999-03-10  1:51 marco saraniti
  1999-03-17 15:47 ` Stephen C. Tweedie
  0 siblings, 1 reply; 4+ messages in thread
From: marco saraniti @ 1999-03-10  1:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm

Hi there,

I'm having a calloc problem that made me waste three weeks, at this point
I'm out of options, and I was wondering if this can be a kernel- or
MM-related problem. Furthermore, the system is a relatively big machine and
I'd like to share my experience with other people who are interested in
using Linux for number crunching.

The problem is trivial: calloc returns a NULL, even if there is a lot
of free memory. Yes, both arguments of calloc are always > 0.

The code is a big Monte Carlo simulation program, developed by myself
(mostly) and by several other people. The whole thing is more 60000
lines of C. What I can say is:

1) no use of sbrk, just malloc,calloc,realloc,free. There's no heavy
use of the calloc/free couple or of realloc.  The allocation occurs in
a cycle: data are computed in a statically allocated buffer, then the
right amount of memory is allocated, and the buffer is memcpyed into
the freshly allocated memory. Then the whole thing starts again for a
new set of data (using the same buffer). The dimension of the
allocated memory varies, from very small to several hundred KB. After
several thousand iterations (and five hours!)  calloc gives a
NULL. The problem doesn't occur if the program is forced to start a
few iterations before the critical one.

2) the process size increases exactly as the memory allocation counter
implemented in the program. This is the last vmstat report *before* the
calloc failure:

 procs                  memory    swap        io    system         cpu
 r b w  swpd  free  buff cache  si  so   bi   bo   in   cs  us  sy  id
 2 0 0     0 914960 387020 78600   0   0    0    0  117   53  49   3  48

3) when used, dmalloc (which uses sbrk) gives no errors, it just exits
complaining for not being able to increase the heap size.

4) the error is reproduced perfectly in subsequent runs, but it occurs at
different iterations if dmalloc is used, or if the compiler is changed, or
if the kernel is changed.

5) I tried kernel 2.2.1 and 2.2.3, compilers gcc2.7.2.3, pgcc1.1, egcs1.1.

6) I tried to swap the memory banks on the motherboard, no change.

The machine is a dual PII (400MHz) on a mboard supermicro P6DGE with
2GB of *non* ECC SDRAM (PC100). No IDE disks, everything is SCSI
(controller AHA 2940UW). The kernel has been compiled enabling SMP,
and after changing the value 0xC0000000 to 0x80000000 in
/usr/src/linux/arch/i386/vmlinux.lds and
/usr/src/linux/include/asm-i386/page.h. No "mem=" directive is used in
lilo.conf. I have four swap partitions, 128MB each. The netcard is a
SMC Etherpower II (10/100 PCI). There's also a awe64 soundcard. The
videocard is a matrox millennium G200 AGP.  The Linux distribution is
a plain RH5.2, Xserver Accelerated X and CDE, both from X-inside.

The question is even more trivial than the problem: what's wrong? Why
calloc refuses to allocate memory if there's a full GB of (apparently)
free RAM?

thanks a lot for your help, *any* suggestion will be appreciated.

                                              marco

PS please reply also to my email address, I'm not a subscriber of the
   mail list.



====================================================================
Marco Saraniti
Assistant Professor of Electrical and Computer Engineering

Department of Electrical and Computer Engineering #SH329
Illinois Institute of Technology - Main Campus
3301 South Dearborn
Chicago, IL 60616

Tel:   (312) 567 8813
Fax:   (312) 567 8976
email: saraniti@ece.iit.edu
www:   www.ece.iit.edu/Faculty/marco.html   
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: weird calloc problem
  1999-03-10  1:51 weird calloc problem marco saraniti
@ 1999-03-17 15:47 ` Stephen C. Tweedie
  1999-03-17 23:12   ` Richard B. Johnson
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen C. Tweedie @ 1999-03-17 15:47 UTC (permalink / raw)
  To: saraniti; +Cc: linux-kernel, linux-mm

Hi,

On Tue, 9 Mar 1999 19:51:32 -0600 (EST), marco saraniti
<saraniti@neumann.ece.iit.edu> said:

> I'm having a calloc problem that made me waste three weeks, at this point
> I'm out of options, and I was wondering if this can be a kernel- or
> MM-related problem. Furthermore, the system is a relatively big machine and
> I'd like to share my experience with other people who are interested in
> using Linux for number crunching.

> The problem is trivial: calloc returns a NULL, even if there is a lot
> of free memory. Yes, both arguments of calloc are always > 0.

Do you have any evidence that this is a kernel problem as opposed to a
user-space problem?  A "ps -m" listing of the process concerned when
the fault happens would be useful in pinning this down, as would a
"strace" output.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: weird calloc problem
  1999-03-17 15:47 ` Stephen C. Tweedie
@ 1999-03-17 23:12   ` Richard B. Johnson
  1999-03-18 13:00     ` Manfred Spraul
  0 siblings, 1 reply; 4+ messages in thread
From: Richard B. Johnson @ 1999-03-17 23:12 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: saraniti, linux-kernel, linux-mm

On Wed, 17 Mar 1999, Stephen C. Tweedie wrote:

> Hi,
> 
> On Tue, 9 Mar 1999 19:51:32 -0600 (EST), marco saraniti
> <saraniti@neumann.ece.iit.edu> said:
> 
> > I'm having a calloc problem that made me waste three weeks, at this point
> > I'm out of options, and I was wondering if this can be a kernel- or
> > MM-related problem. Furthermore, the system is a relatively big machine and
> > I'd like to share my experience with other people who are interested in
> > using Linux for number crunching.
> 
> > The problem is trivial: calloc returns a NULL, even if there is a lot
> > of free memory. Yes, both arguments of calloc are always > 0.
> 

Here is a simple program and its output that, on my system, clearly
shows that if I want more array-space I just need to increase the size
of my swap file.

Script started on Wed Mar 17 18:05:00 1999
# show
         Crunching 1048576 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 17907712 312451072  2613248  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 1572864 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 20004864 310353920  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 2359296 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 23150592 307208192  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 3538944 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 27869184 302489600  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 5308416 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 34947072 295411712  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 7962624 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 45563904 284794880  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 11943936 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 61489152 268869632  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 17915904 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 85381120 244977664  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 26873856 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 121245696 209113088  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 40310784 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 175046656 155312128  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 60466176 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 255746048 74612736  2621440  5787648  2535424
SwapTotal:   248996 kB
SwapFree:    248996 kB

         Crunching 90699264 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 327876608  2482176  2293760  5787648  7868416
SwapTotal:   248996 kB
SwapFree:    196036 kB

         Crunching 136048896 elements
        total:    used:    free:  shared: buffers:  cached:
Mem:  330358784 328028160  2330624   569344  5787648  4739072
SwapTotal:   248996 kB
SwapFree:     21924 kB

calloc(816293376) failed
# exit
exit

Script done on Wed Mar 17 18:06:20 1999



Here is the test program.

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>

#define ARRAY 0x100000
#define BUF_LEN 0x100

int main(void);
int main()
{
    char buf[BUF_LEN];
    size_t len, i, half;
    size_t *pf;
    FILE *file;

    len = ARRAY;
    for(;;)
    {
        if((pf = (size_t *) calloc(len, sizeof(size_t))) == NULL)
        {
            fprintf(stderr, "calloc(%lu) failed\n", len * sizeof(size_t));
            exit(EXIT_FAILURE);
        }
        fprintf(stdout, "         Crunching %u elements\n", len);
        for(i=0; i< len; i++)
            pf[i] = i;

/*
 * seek and rewind don't work on proc (files have no size)
 */
        if((file = fopen ("/proc/meminfo", "r")) == NULL)
        {
            fprintf(stderr, "You need the /proc file system mounted");
            exit(EXIT_FAILURE);
        }
        fgets(buf,BUF_LEN, file);
        fprintf(stdout, buf);
        fgets(buf,BUF_LEN, file);
        fprintf(stdout, buf);
        fgets(buf,BUF_LEN, file);
        fgets(buf,BUF_LEN, file);
        fgets(buf,BUF_LEN, file);
        fgets(buf,BUF_LEN, file);
        fgets(buf,BUF_LEN, file);
        fgets(buf,BUF_LEN, file);
        fgets(buf,BUF_LEN, file);
        fprintf(stdout, buf);
        fgets(buf,BUF_LEN, file);
        fprintf(stdout, buf);
        puts("");
        fclose(file);
        free(pf);
        len += (len / 2);
    }
   return 0;
}



Cheers,
Dick Johnson
                 ***** FILE SYSTEM WAS MODIFIED *****
Penguin : Linux version 2.2.3 on an i686 machine (400.59 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: weird calloc problem
  1999-03-17 23:12   ` Richard B. Johnson
@ 1999-03-18 13:00     ` Manfred Spraul
  0 siblings, 0 replies; 4+ messages in thread
From: Manfred Spraul @ 1999-03-18 13:00 UTC (permalink / raw)
  To: saraniti; +Cc: linux-kernel, linux-mm

On Tue, 9 Mar 1999 19:51:32 -0600 (EST), marco saraniti
<saraniti@neumann.ece.iit.edu> said:
> I'm having a calloc problem that made me waste three weeks, at this point
> I'm out of options, and I was wondering if this can be a kernel- or
> MM-related problem. Furthermore, the system is a relatively big machine and
> I'd like to share my experience with other people who are interested in
> using Linux for number crunching.
>
> The problem is trivial: calloc returns a NULL, even if there is a lot
> of free memory. Yes, both arguments of calloc are always > 0.

you wrote 'the system is a relatively big machine'.
Perhaps you have run out of virtual memory.

How much memory do you try to allocate? (more than 1 Gigabyte?)
How much physical memory do you have?

You Could also pause the process as soon as you calloc returns NULL
(i.e. if(ptr==NULL) while(1) { printf("error!!\n");} )
and look at the informations in /proc/<pid>. The file formats are
described in 'man proc'.

Regards,
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1999-03-18 13:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-03-10  1:51 weird calloc problem marco saraniti
1999-03-17 15:47 ` Stephen C. Tweedie
1999-03-17 23:12   ` Richard B. Johnson
1999-03-18 13:00     ` Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox