Hi Folks, My name is Koni. I have just joined the Linux-MM mailing list. I am looking into a curious little problem involving mlockall() in threaded programs, running on Linux 2.4 kernels. The program (http://slan.sourceforge.net) uses mlockall() to keep cryptographic keys and state information away from the disk. Under 2.2 kernels, the client program uses about 700Kb of memory. The server would use just over a meg, plus a little more for each active session. UNder 2.4 however, the client may take several megabytes, and the server, with no active connections, takes 11 megs just to start up. After a whole day of head scratching, I tracked this down to the combination of using mlockall() and pthread_create(). Any combination bleeds a little over 2M (as reported by top or ps) per thread created. It is not shown in a profiling tool such as memprof. Attached is a program which will demonstrate the problem. It takes two arguments on the commandline: the first is how many threads to create, the second is the amount of memory to allocate in each thread explicitly. 0 for the second argument prevents calls to malloc. 0 for the first argument prevents threads from being started. Not running it as root stops the mlockall() from suceeding but the program will run anyway. It runs forever (sleeping) until it is stopped by ctrl-c or whatever, so that the core size can be observed. I've played with various ordering of mlockall() and pthread_create() as well as thread attributes, such as not using the PTHREAD_CREATE_DETACHED attribute. That is a real kicker -- in that case, I saw 8 megs bleed per call to pthread_create()! It doesn't matter when mlockall() or pthread_create() is called. Calling mlockall(MCL_CURRENT|MCL_FUTURE) after pthread_create() still results in significant memory bleed per running thread. However, calling after pthread_create() with just mlockall(MCL_FUTURE), does NOT bleed memory. calling with mlockall(MCL_CURRENT) does. My interpretation of that: mlockall(MCL_CURRENT) is locking the entire possible stack space of every running thread (and if MCL_FUTURE is also given, then the entire stack of every new thread created as well). Questions on that: This action could be argued as correct, except: why is a single (no pthread_create()s) thread process not have a locked 8 meg stack? How does the kernel know to lock only the in use portion of the stack? Or rather, does it lock the main stack of a process, and only the used pages? Is this likely to be a pthread library problem: like pthreads (or maybe clone() -- I don't know how it works exactly) allocating some (large) chunk of memory to be used as the stack for each thread it starts? If that is the case, why is mlockall() needed to observe this? Any ideas? I'll have to be a bit more clever I guess to keep the memory size down for the SLAN programs running on 2.4, while still having pages locked. It was certainly nice (from the development point of view) to just call mlockall() at program startup and then forget about it. Trying to pick and choose which pages to lock looks very difficult since the public key stuff is all done with gmp and I haven't control over how those functions allocate (stack vs. heap) memory and pass parameters to internal functions. Cheers, Koni -- mhw6@cornell.edu Koni (Mark Wright) Solanaceae Genome Network 250 Emerson Hall - Cornell University Strategic Forecasting 242 Langmuir Laboratory Lightlink Internet http://www.lightlink.com/ "If I'm right 90% of the time, why quibble about the other 3%?"