memory problems: mlockall() w/ pthreads on 2.4

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* memory problems:  mlockall() w/ pthreads on 2.4
@ 2001-06-24 19:03 Koni
  2001-06-25 16:22 ` Pete Wyckoff
  0 siblings, 1 reply; 2+ messages in thread
From: Koni @ 2001-06-24 19:03 UTC (permalink / raw)
  To: linux-mm; +Cc: wireless

[-- Attachment #1: Type: text/plain, Size: 3556 bytes --]

Hi Folks,

My name is Koni. I have just joined the Linux-MM mailing list. 

I am looking into a curious little problem involving mlockall() in threaded
programs, running on Linux 2.4 kernels.

The program (http://slan.sourceforge.net) uses mlockall() to keep cryptographic
keys and state information away from the disk.

Under 2.2 kernels, the client program uses about 700Kb of memory. The server
would use just over a meg, plus a little more for each active session. UNder
2.4 however, the client may take several megabytes, and the server, with no
active connections, takes 11 megs just to start up.

After a whole day of head scratching, I tracked this down to the combination of
using mlockall() and pthread_create(). Any combination bleeds a little over 2M
(as reported by top or ps) per thread created. It is not shown in a profiling
tool such as memprof.

Attached is a program which will demonstrate the problem. It takes two
arguments on the commandline: the first is how many threads to create, the
second is the amount of memory to allocate in each thread explicitly. 0 for the
second argument prevents calls to malloc. 0 for the first argument prevents
threads from being started. Not running it as root stops the mlockall() from
suceeding but the program will run anyway. It runs forever (sleeping) until it
is stopped by ctrl-c or whatever, so that the core size can be observed.

I've played with various ordering of mlockall() and pthread_create() as well as
thread attributes, such as not using the PTHREAD_CREATE_DETACHED attribute.
That is a real kicker -- in that case, I saw 8 megs bleed per call to
pthread_create()! It doesn't matter when mlockall() or pthread_create() is
called. Calling mlockall(MCL_CURRENT|MCL_FUTURE) after pthread_create() still
results in significant memory bleed per running thread.

However, calling after pthread_create() with just mlockall(MCL_FUTURE), does
NOT bleed memory. calling with mlockall(MCL_CURRENT) does. 

My interpretation of that: mlockall(MCL_CURRENT) is locking the entire
possible stack space of every running thread (and if MCL_FUTURE is also given,
then the entire stack of every new thread created as well).

Questions on that:

This action could be argued as correct, except: why is a single (no
pthread_create()s) thread process not have a locked 8 meg stack? How does the
kernel know to lock only the in use portion of the stack? Or rather, does it
lock the main stack of  a process, and only the used pages? Is this likely to
be a pthread library problem: like pthreads (or maybe clone() -- I don't know
how it works exactly) allocating some (large) chunk of memory to be used as the
stack for each thread it starts? If that is the case, why is mlockall() needed
to observe this? 

Any ideas? I'll have to be a bit more clever I guess to keep the memory size
down for the SLAN programs running on 2.4, while still having pages locked. It
was certainly nice (from the development point of view) to just call mlockall()
at program startup and then forget about it. Trying to pick and choose which
pages to lock looks very difficult since the public key stuff is all done with
gmp and I haven't control over how those functions allocate (stack vs. heap)
memory and pass parameters to internal functions.

Cheers,
Koni

-- 
mhw6@cornell.edu
Koni (Mark Wright)
Solanaceae Genome Network	250 Emerson Hall - Cornell University
Strategic Forecasting		242 Langmuir Laboratory
Lightlink Internet		http://www.lightlink.com/

"If I'm right 90% of the time, why quibble about the other 3%?"

[-- Attachment #2: suckup_memory.c --]
[-- Type: application/octet-stream, Size: 1880 bytes --]

#include <stdio.h>
#include <stdlib.h>

#include <string.h>
#include <errno.h>

#include <unistd.h>

#include <pthread.h>
#include <sys/mman.h>

#define MAX_THREADS (256)
#define MAX_BLEED   (1024*1024*20)

static void usage(char *argv[], int exit_code) {

  fprintf(stderr,"\n\n%s: usage\n",argv[0]);
  fprintf(stderr,"%s <number of threads> <bytes allocated per thread>\n\n",
	  argv[0]); 
  exit(exit_code);
}

static void *memory_sucker(void *arg) {
  int suck_bytes;
  void *p;

  suck_bytes = *((int *) arg);
  /* Allocate and write to it so that its really allocated */
  p = calloc(1, suck_bytes);
  if (p == NULL) {
    fprintf(stderr,"Unable to allocate memory in thread #%ld (%s)\n",
	    pthread_self(), strerror(errno));
  }

  /* Sit on it */
  while(1) sleep(3600);
}

int main(int argc, char *argv[]) {

  pthread_t suck_threads[MAX_THREADS];
  pthread_attr_t thread_attributes;
  int i;
  int n_threads, bleed_size;

  if (argc!=3) usage(argv, 1);
  n_threads = atoi(argv[1]);
  bleed_size = atoi(argv[2]);

  if (bleed_size < 0) usage(argv, 1);
  if (n_threads < 0) usage(argv, 1);

  if (n_threads > MAX_THREADS) {
    n_threads = MAX_THREADS;
    fprintf(stderr,"Number of threads limited to %d\n",n_threads);
  }

  if (bleed_size > MAX_BLEED) {
    bleed_size = MAX_BLEED;
    fprintf(stderr,"Bytes allocated per thread limited to %d\n",bleed_size);
  }

  if (mlockall(MCL_CURRENT|MCL_FUTURE)) {
    fprintf(stderr,"Unable to lock memory pages for this process (%s)\n",
	    strerror(errno));
    fprintf(stderr,"Running as root? (Continuing without pages locked)\n");
  }

  pthread_attr_init(&thread_attributes);
  pthread_attr_setdetachstate(&thread_attributes,PTHREAD_CREATE_DETACHED);
  for(i=0;i<n_threads;i++) {
    pthread_create(suck_threads + i, &thread_attributes, memory_sucker,
		   &bleed_size);
  }

  while(1) sleep(7200);
  return 0;
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: memory problems:  mlockall() w/ pthreads on 2.4
  2001-06-24 19:03 memory problems: mlockall() w/ pthreads on 2.4 Koni
@ 2001-06-25 16:22 ` Pete Wyckoff
  0 siblings, 0 replies; 2+ messages in thread
From: Pete Wyckoff @ 2001-06-25 16:22 UTC (permalink / raw)
  To: Koni; +Cc: linux-mm, wireless

mhw6@cornell.edu said:
> After a whole day of head scratching, I tracked this down to the
> combination of using mlockall() and pthread_create(). Any combination
> bleeds a little over 2M (as reported by top or ps) per thread created.
> It is not shown in a profiling tool such as memprof.
[..]
> However, calling after pthread_create() with just mlockall(MCL_FUTURE), does
> NOT bleed memory. calling with mlockall(MCL_CURRENT) does. 
> 
> My interpretation of that: mlockall(MCL_CURRENT) is locking the entire
> possible stack space of every running thread (and if MCL_FUTURE is also given,
> then the entire stack of every new thread created as well).

All cloned process share the same memory space, but each thread is
allocated its own stack area in which to play.  Look at /proc/<pid>/maps
to see these:  1 page of guard, then about 2 MB of stack per thread.
(Not sure why you get 8 MB without DETACHED.)

The way mlockall(MCL_CURRENT) works is to go through the current memory
space and ensure that each page is available.  When you do this, only
the currently used stack (of a non-threaded process) is locked down.
Future stack (and heap) growth will be locked as it is used, if you use
MCL_FUTURE.

In the case of threads, though, each thread stack is allocated using
mmap before the clone() to create the thread.  The mmap system call does
not know you will be using the area as a "stack", and thus locks in the
entire region immediately.

> Any ideas? I'll have to be a bit more clever I guess to keep the
> memory size down for the SLAN programs running on 2.4, while still
> having pages locked. It was certainly nice (from the development point
> of view) to just call mlockall() at program startup and then forget
> about it. Trying to pick and choose which pages to lock looks very
> difficult since the public key stuff is all done with gmp and I
> haven't control over how those functions allocate (stack vs. heap)
> memory and pass parameters to internal functions.

You might start each thread with an explicit stack which is much
smaller than 2MB, if you can get away with that.  You might investigate
changing pthreads to mmap() just a single stack page at a calculated
offset, but with the MAP_GROWSDOWN flag, and see if the kernel will take
care of mapping/locking pages as the thread stacks grow.

		-- Pete
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-06-25 16:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-24 19:03 memory problems: mlockall() w/ pthreads on 2.4 Koni
2001-06-25 16:22 ` Pete Wyckoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox