linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: huge mem mmap eats all CPU when multiple processes
       [not found] <8FDBF172-AAA8-4737-A6C6-50B468CA0CBF@thehive.com>
@ 2009-06-09  0:41 ` KAMEZAWA Hiroyuki
  2009-06-09 14:16   ` Matthew Von Maszewski
  0 siblings, 1 reply; 3+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-09  0:41 UTC (permalink / raw)
  To: Matthew Von Maszewski; +Cc: linux-kernel, linux-mm

On Mon, 8 Jun 2009 10:27:49 -0400
Matthew Von Maszewski <matthew@thehive.com> wrote:

> [note: not on kernel mailing list, please cc author]
> 
> Symptom:  9 processes mmap same 2 Gig memory section for a shared C  
> heap (lots of random access).  All process begin extreme CPU load in  
> top.
> 
> - Same code works well when only single process access huge mem.
Does this "huge mem" means HugeTLB(2M/4Mbytes) pages ?

> - Code works well with standard vm based mmap file and 9 processes.
> 

What is sys/user ratio in top ? Almost all cpus are used by "sys" ?

> Environment:
> 
> - Intel x86_64:  Dual core Xeon with hyperthreading (4 logical  
> processors)
> - 6 Gig ram, 2.5G allocated to huge mem
by boot option ?

> - tried with kernels 2.6.29.4 and 2.6.30-rc8
> - following mmap() call has base address as NULL on first process,  
> then returned address passed to subsequent processes (not threads,  
> processes)
> 
>             m_MemSize=((m_MemSize/(2048*1024))+1)*2048*1024;
>              m_BaseAddr=mmap(m_File->GetFixedBase(), m_MemSize,
>                              (PROT_READ | PROT_WRITE),
>                              MAP_SHARED, m_File->GetFileId(), m_Offset);
> 
> 
> I am not a kernel hacker so I have not attempted to debug.  Will be  
> able to spend time on a sample program for sharing later today or  
> tomorrow.  Sending this note now in case this is already known.
> 

IIUC, all page faults to hugetlb are serialized by system's mutex. Then, touching
in parallel doesn't do fast job..
Then, I wonder touching all necessary maps by one thread is good, in general.



> Don't suppose this is as simple as a Copy-On-Write flag being set wrong?
> 
I don't think, so.

> Please send notes as to things I need to capture to better describe  
> this bug.  Happy to do the work.
> 
Add cc to linux-mm.

Thanks,
-Kame


> Thanks,
> Matthew
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: huge mem mmap eats all CPU when multiple processes
  2009-06-09  0:41 ` huge mem mmap eats all CPU when multiple processes KAMEZAWA Hiroyuki
@ 2009-06-09 14:16   ` Matthew Von Maszewski
  2009-06-09 19:14     ` Matthew Von Maszewski
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Von Maszewski @ 2009-06-09 14:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, linux-mm

My apologies for lack of clarity in the original email.  I am working  
on a test program to send out later today.   Here are my responses to  
the questions asked:


On Jun 8, 2009, at 8:41 PM, KAMEZAWA Hiroyuki wrote:

> On Mon, 8 Jun 2009 10:27:49 -0400
> Matthew Von Maszewski <matthew@thehive.com> wrote:
>
>> [note: not on kernel mailing list, please cc author]
>>
>> Symptom:  9 processes mmap same 2 Gig memory section for a shared C
>> heap (lots of random access).  All process begin extreme CPU load in
>> top.
>>
>> - Same code works well when only single process access huge mem.
> Does this "huge mem" means HugeTLB(2M/4Mbytes) pages ?

Yes.  My debian x86_64 kernel build uses 2m pages.  Test by one  
process is really fast.  Test by multiple process against same mmap()  
file are really slow

>
>
>> - Code works well with standard vm based mmap file and 9 processes.
>>
>
> What is sys/user ratio in top ? Almost all cpus are used by "sys" ?


Tasks:  94 total,   3 running,  91 sleeping,   0 stopped,   0 zombie
Cpu0  :  5.6%us, 86.4%sy,  0.0%ni,  1.3%id,  5.3%wa,  0.0%hi,   
1.3%si,  0.0%st
Cpu1  :  1.0%us, 92.4%sy,  0.0%ni,  0.0%id,  5.6%wa,  0.0%hi,   
1.0%si,  0.0%st
Cpu2  :  1.7%us, 90.4%sy,  0.0%ni,  0.0%id,  7.3%wa,  0.0%hi,   
0.7%si,  0.0%st
Cpu3  :  0.0%us, 70.4%sy,  0.0%ni, 25.1%id,  4.0%wa,  0.0%hi,   
0.5%si,  0.0%st
Mem:   6103960k total,  2650044k used,  3453916k free,     6068k buffers
Swap:  5871716k total,        0k used,  5871716k free,    84504k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  3681 proxy     20   0 2638m 1596 1312 S   43  0.0   0:07.87  
tentacle.e.prof
  3687 proxy     20   0 2656m 1592 1312 S   43  0.0   0:07.69  
tentacle.e.prof
  3689 proxy     20   0 2662m 1600 1312 S   42  0.0   0:07.82  
tentacle.e.prof
  3683 proxy     20   0 2652m 1596 1312 S   41  0.0   0:07.75  
tentacle.e.prof
  3684 proxy     20   0 2650m 1596 1312 S   41  0.0   0:07.89  
tentacle.e.prof
  3686 proxy     20   0 2644m 1596 1312 S   40  0.0   0:07.80  
tentacle.e.prof
  3685 proxy     20   0 2664m 1592 1312 S   40  0.0   0:07.82  
tentacle.e.prof
  3682 proxy     20   0 2646m 1616 1328 S   38  0.0   0:07.73  
tentacle.e.prof
  3664 proxy     20   0 2620m 1320  988 R   36  0.0   0:01.08 tentacle.e
  3678 proxy     20   0 72352  35m 1684 R   11  0.6   0:01.79 squid

tentacle.e and tentacle.e.prof are copies of the same executable file,  
started with different command line options.  tentacle.e is started by  
an init.d script.  tentacle.e.prof processes are started by squid.

I am creating a simplified program to duplicate the scenario.  Will  
send it along later today.

>
>
>> Environment:
>>
>> - Intel x86_64:  Dual core Xeon with hyperthreading (4 logical
>> processors)
>> - 6 Gig ram, 2.5G allocated to huge mem
> by boot option ?

huge mem initialization

1.  sysctl.conf allocates the desired number of 2M pages:

system:/mnt$ tail -n 3 /etc/sysctl.conf
#huge
vm.nr_hugepages=1200


2. init.d script for tentacle.e mounts the file system and  
preallocates space

(from init.d file starting tentacle.e)

     umount /mnt/hugefs
     mount -t hugetlbfs -o uid=proxy,size=2300M none /mnt/hugefs

system:/mnt df -kP
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1            135601864  32634960  96078636      26% /
tmpfs                  3051980         0   3051980       0% /lib/init/rw
udev                     10240        68     10172       1% /dev
tmpfs                  3051980         0   3051980       0% /dev/shm
none                   2355200   2117632    237568      90% /mnt/hugefs


>
>
>> - tried with kernels 2.6.29.4 and 2.6.30-rc8
>> - following mmap() call has base address as NULL on first process,
>> then returned address passed to subsequent processes (not threads,
>> processes)
>>
>>            m_MemSize=((m_MemSize/(2048*1024))+1)*2048*1024;
>>             m_BaseAddr=mmap(m_File->GetFixedBase(), m_MemSize,
>>                             (PROT_READ | PROT_WRITE),
>>                             MAP_SHARED, m_File->GetFileId(),  
>> m_Offset);
>>
>>
>> I am not a kernel hacker so I have not attempted to debug.  Will be
>> able to spend time on a sample program for sharing later today or
>> tomorrow.  Sending this note now in case this is already known.
>>
>
> IIUC, all page faults to hugetlb are serialized by system's mutex.  
> Then, touching
> in parallel doesn't do fast job..
> Then, I wonder touching all necessary maps by one thread is good, in  
> general.
>
>
>
>> Don't suppose this is as simple as a Copy-On-Write flag being set  
>> wrong?
>>
> I don't think, so.
>
>> Please send notes as to things I need to capture to better describe
>> this bug.  Happy to do the work.
>>
> Add cc to linux-mm.
>
> Thanks,
> -Kame
>
>
>> Thanks,
>> Matthew
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux- 
>> kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: huge mem mmap eats all CPU when multiple processes
  2009-06-09 14:16   ` Matthew Von Maszewski
@ 2009-06-09 19:14     ` Matthew Von Maszewski
  0 siblings, 0 replies; 3+ messages in thread
From: Matthew Von Maszewski @ 2009-06-09 19:14 UTC (permalink / raw)
  To: Matthew Von Maszewski; +Cc: KAMEZAWA Hiroyuki, linux-kernel, linux-mm

Re test program:

I am not yet able to create a simple test program that:

1.  matches the huge mem performance problem seen in "top" sample  
below, and
2.  has clean execution when switched to a non hugemem file/mmap.

Using process shared pthread_mutex_t objects inside tight loops  
creates something similar.  But the huge mem file and standard vm file  
both have the problem.  Maybe this slightly supports Kame's comment  
about activity being serialized on a system mutex for huge mem ? ... I  
am not qualified to judge.

Open to any suggestions for tests / measurements.

Matthew



On Jun 9, 2009, at 10:16 AM, Matthew Von Maszewski wrote:

> My apologies for lack of clarity in the original email.  I am  
> working on a test program to send out later today.   Here are my  
> responses to the questions asked:
>
>
> On Jun 8, 2009, at 8:41 PM, KAMEZAWA Hiroyuki wrote:
>
>> On Mon, 8 Jun 2009 10:27:49 -0400
>> Matthew Von Maszewski <matthew@thehive.com> wrote:
>>
>>> [note: not on kernel mailing list, please cc author]
>>>
>>> Symptom:  9 processes mmap same 2 Gig memory section for a shared C
>>> heap (lots of random access).  All process begin extreme CPU load in
>>> top.
>>>
>>> - Same code works well when only single process access huge mem.
>> Does this "huge mem" means HugeTLB(2M/4Mbytes) pages ?
>
> Yes.  My debian x86_64 kernel build uses 2m pages.  Test by one  
> process is really fast.  Test by multiple process against same  
> mmap() file are really slow
>
>>
>>
>>> - Code works well with standard vm based mmap file and 9 processes.
>>>
>>
>> What is sys/user ratio in top ? Almost all cpus are used by "sys" ?
>
>
> Tasks:  94 total,   3 running,  91 sleeping,   0 stopped,   0 zombie
> Cpu0  :  5.6%us, 86.4%sy,  0.0%ni,  1.3%id,  5.3%wa,  0.0%hi,   
> 1.3%si,  0.0%st
> Cpu1  :  1.0%us, 92.4%sy,  0.0%ni,  0.0%id,  5.6%wa,  0.0%hi,   
> 1.0%si,  0.0%st
> Cpu2  :  1.7%us, 90.4%sy,  0.0%ni,  0.0%id,  7.3%wa,  0.0%hi,   
> 0.7%si,  0.0%st
> Cpu3  :  0.0%us, 70.4%sy,  0.0%ni, 25.1%id,  4.0%wa,  0.0%hi,   
> 0.5%si,  0.0%st
> Mem:   6103960k total,  2650044k used,  3453916k free,     6068k  
> buffers
> Swap:  5871716k total,        0k used,  5871716k free,    84504k  
> cached
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 3681 proxy     20   0 2638m 1596 1312 S   43  0.0   0:07.87  
> tentacle.e.prof
> 3687 proxy     20   0 2656m 1592 1312 S   43  0.0   0:07.69  
> tentacle.e.prof
> 3689 proxy     20   0 2662m 1600 1312 S   42  0.0   0:07.82  
> tentacle.e.prof
> 3683 proxy     20   0 2652m 1596 1312 S   41  0.0   0:07.75  
> tentacle.e.prof
> 3684 proxy     20   0 2650m 1596 1312 S   41  0.0   0:07.89  
> tentacle.e.prof
> 3686 proxy     20   0 2644m 1596 1312 S   40  0.0   0:07.80  
> tentacle.e.prof
> 3685 proxy     20   0 2664m 1592 1312 S   40  0.0   0:07.82  
> tentacle.e.prof
> 3682 proxy     20   0 2646m 1616 1328 S   38  0.0   0:07.73  
> tentacle.e.prof
> 3664 proxy     20   0 2620m 1320  988 R   36  0.0   0:01.08 tentacle.e
> 3678 proxy     20   0 72352  35m 1684 R   11  0.6   0:01.79 squid
>
> tentacle.e and tentacle.e.prof are copies of the same executable  
> file, started with different command line options.  tentacle.e is  
> started by an init.d script.  tentacle.e.prof processes are started  
> by squid.
>
> I am creating a simplified program to duplicate the scenario.  Will  
> send it along later today.
>
>>
>>
>>> Environment:
>>>
>>> - Intel x86_64:  Dual core Xeon with hyperthreading (4 logical
>>> processors)
>>> - 6 Gig ram, 2.5G allocated to huge mem
>> by boot option ?
>
> huge mem initialization
>
> 1.  sysctl.conf allocates the desired number of 2M pages:
>
> system:/mnt$ tail -n 3 /etc/sysctl.conf
> #huge
> vm.nr_hugepages=1200
>
>
> 2. init.d script for tentacle.e mounts the file system and  
> preallocates space
>
> (from init.d file starting tentacle.e)
>
>    umount /mnt/hugefs
>    mount -t hugetlbfs -o uid=proxy,size=2300M none /mnt/hugefs
>
> system:/mnt df -kP
> Filesystem         1024-blocks      Used Available Capacity Mounted on
> /dev/sda1            135601864  32634960  96078636      26% /
> tmpfs                  3051980         0   3051980       0% /lib/ 
> init/rw
> udev                     10240        68     10172       1% /dev
> tmpfs                  3051980         0   3051980       0% /dev/shm
> none                   2355200   2117632    237568      90% /mnt/ 
> hugefs
>
>
>>
>>
>>> - tried with kernels 2.6.29.4 and 2.6.30-rc8
>>> - following mmap() call has base address as NULL on first process,
>>> then returned address passed to subsequent processes (not threads,
>>> processes)
>>>
>>>           m_MemSize=((m_MemSize/(2048*1024))+1)*2048*1024;
>>>            m_BaseAddr=mmap(m_File->GetFixedBase(), m_MemSize,
>>>                            (PROT_READ | PROT_WRITE),
>>>                            MAP_SHARED, m_File->GetFileId(),  
>>> m_Offset);
>>>
>>>
>>> I am not a kernel hacker so I have not attempted to debug.  Will be
>>> able to spend time on a sample program for sharing later today or
>>> tomorrow.  Sending this note now in case this is already known.
>>>
>>
>> IIUC, all page faults to hugetlb are serialized by system's mutex.  
>> Then, touching
>> in parallel doesn't do fast job..
>> Then, I wonder touching all necessary maps by one thread is good,  
>> in general.
>>
>>
>>
>>> Don't suppose this is as simple as a Copy-On-Write flag being set  
>>> wrong?
>>>
>> I don't think, so.
>>
>>> Please send notes as to things I need to capture to better describe
>>> this bug.  Happy to do the work.
>>>
>> Add cc to linux-mm.
>>
>> Thanks,
>> -Kame
>>
>>
>>> Thanks,
>>> Matthew
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux- 
>>> kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
>>
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-06-09 19:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8FDBF172-AAA8-4737-A6C6-50B468CA0CBF@thehive.com>
2009-06-09  0:41 ` huge mem mmap eats all CPU when multiple processes KAMEZAWA Hiroyuki
2009-06-09 14:16   ` Matthew Von Maszewski
2009-06-09 19:14     ` Matthew Von Maszewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox