At 06:49 PM 5/14/2009 +0100, Mel Gorman wrote: >Ok, I just tried that there - parent writing 30% of the shared memory >before forking but still did not reproduce the problem :( Maybe it makes a difference to have lots of RAM (16GB on this server), and about 1.5 GB of hugepage shared memory allocated in the forking process in about four segments. Often have all free memory consumed by the file cache, but I don't belive this is necessary to produce the problem as it will happen even right after a reboot. [RHEL5 meminfo attached] Other possible factors: daemon is non-root but has explicit CAP_IPC_LOCK, CAP_NET_RAW, CAP_SYS_NICE set via 'setcap cap_net_raw,cap_ipc_lock,cap_sys_nice+ep daemon' ulimit -Hl and -Sl are set to process group is set in /proc/sys/vm/hugetlb_shm_group /proc/sys/vm/nr_hugepages is set to 2048 daemon has 200 threads at time of fork() shared memory segments explictly located [RHEL5 pmap -x attached] between fork & exec these syscalls are issued sched_getscheduler/sched_setscheduler getpriority/setpriority seteuid(getuid()) setegid(getgid()) with vfork() work-around, no syscalls are made before exec() Don't think it's something anything specific to the DL160 (Intel E5430) we have because the DL165 (Opteron 2354) also exhibits the problem. Will run the test cases provided this weekend for certain and will let you know if bug is reproduced. Have to go silent on this till the weekend.