Question about OOM-Killer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Question about OOM-Killer
@ 2005-07-18 19:21 James Washer
  2005-07-18 19:36 ` James Washer
  0 siblings, 1 reply; 14+ messages in thread
From: James Washer @ 2005-07-18 19:21 UTC (permalink / raw)
  To: linux-mm

I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory) drops below the minimum, and the OOM-Killer is fired off.

It just strikes me odd that the OOM-Killer would be called at all for DMA memory. What's the chance of regaining DMA memory by killing user land processes?

I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here. 

 - jim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-18 19:21 Question about OOM-Killer James Washer
@ 2005-07-18 19:36 ` James Washer
  2005-07-23 13:00   ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: James Washer @ 2005-07-18 19:36 UTC (permalink / raw)
  To: James Washer; +Cc: linux-mm

Sorry, I should have added... 
	2.6.11.10, 
	x86-64 dual proc (Intel Xeon 3.4GHz)
	6GiB ram
	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
	Host: scsi0 Channel: 00 Id: 00 Lun: 00
		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
		Type:   Direct-Access                    ANSI SCSI revision: 05
	Host: scsi0 Channel: 00 Id: 01 Lun: 00
		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
		Type:   Direct-Access                    ANSI SCSI revision: 05



On Mon, 18 Jul 2005 12:21:01 -0700
James Washer <washer@trlp.com> wrote:

> I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory) drops below the minimum, and the OOM-Killer is fired off.
> 
> It just strikes me odd that the OOM-Killer would be called at all for DMA memory. What's the chance of regaining DMA memory by killing user land processes?
> 
> I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here. 
> 
>  - jim
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-18 19:36 ` James Washer
@ 2005-07-23 13:00   ` Marcelo Tosatti
  2005-07-25 19:11     ` James Washer
  2005-07-26 13:53     ` Andi Kleen
  0 siblings, 2 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2005-07-23 13:00 UTC (permalink / raw)
  To: James Washer; +Cc: linux-mm, ak

James,

Can you send the OOM killer output? 

I dont know which devices part of an x86-64 system should 
be limited to 16Mb of physical addressing. Andi? 

I don't think that any devices should have 16MB limitation

On Mon, Jul 18, 2005 at 12:36:50PM -0700, James Washer wrote:
> Sorry, I should have added... 
> 	2.6.11.10, 
> 	x86-64 dual proc (Intel Xeon 3.4GHz)
> 	6GiB ram
> 	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
> 	Host: scsi0 Channel: 00 Id: 00 Lun: 00
> 		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
> 		Type:   Direct-Access                    ANSI SCSI revision: 05
> 	Host: scsi0 Channel: 00 Id: 01 Lun: 00
> 		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
> 		Type:   Direct-Access                    ANSI SCSI revision: 05
> 
> 
> On Mon, 18 Jul 2005 12:21:01 -0700
> James Washer <washer@trlp.com> wrote:
> 
> > I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory)
> drops below the minimum, and the OOM-Killer is fired off.
> > 
> > It just strikes me odd that the OOM-Killer would be called at all for DMA memory. 
> What's the chance of regaining DMA memory by killing user land processes?
> > 
> > I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here.  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-23 13:00   ` Marcelo Tosatti
@ 2005-07-25 19:11     ` James Washer
  2005-07-25 12:27       ` Marcelo Tosatti
  2005-07-25 22:41       ` Martin J. Bligh
  2005-07-26 13:53     ` Andi Kleen
  1 sibling, 2 replies; 14+ messages in thread
From: James Washer @ 2005-07-25 19:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-mm, ak

Pretty typical message here...
Jul  6 17:31:27 p6 kernel: oom-killer: gfp_mask=0xd1
Jul  6 17:31:27 p6 kernel: Node 0 DMA per-cpu:
Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 2, high 6, batch 1
Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 2, batch 1 
Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 2, high 6, batch 1
Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 2, batch 1 
Jul  6 17:31:27 p6 kernel: Node 0 Normal per-cpu:
Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 32, high 96, batch 16
Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 32, batch 16
Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 32, high 96, batch 16
Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 32, batch 16
Jul  6 17:31:27 p6 kernel: Node 0 HighMem per-cpu: empty
Jul  6 17:31:27 p6 kernel: 
Jul  6 17:31:31 p6 gconfd (washer-7174): SIGHUP received, reloading all databases
Jul  6 17:31:37 p6 kernel: Free pages:       16236kB (0kB HighMem)
Jul  6 17:31:38 p6 su(pam_unix)[9041]: session closed for user root
Jul  6 17:31:38 p6 su(pam_unix)[10645]: session closed for user root
Jul  6 17:31:38 p6 su(pam_unix)[8044]: session closed for user root
Jul  6 17:31:38 p6 su(pam_unix)[7228]: session closed for user root
Jul  6 17:31:38 p6 su(pam_unix)[16136]: session closed for user root
Jul  6 17:31:48 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
Jul  6 17:31:49 p6 kernel: Active:596167 inactive:854867 dirty:624740 writeback:0 unstable:0 free:4059 slab:52688 mapped:595231 pagetables:4862
Jul  6 17:32:00 p6 gconfd (washer-7174): Resolved address "xml:readwrite:/home/washer/.gconf" to a writable configuration source at position 1
Jul  6 17:32:02 p6 kernel: Node 0 DMA free:20kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? yes
Jul  6 17:32:04 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
Jul  6 17:32:06 p6 kernel: lowmem_reserve[]: 0 7152 7152
Jul  6 17:32:11 p6 kernel: Node 0 Normal free:16216kB min:10808kB low:13508kB high:16212kB active:2384668kB inactive:3419468kB present:7323648kB pages_scanned:0 all_unreclaimable? no
Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0
Jul  6 17:32:13 p6 kernel: Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0 
Jul  6 17:32:13 p6 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
Jul  6 17:32:13 p6 kernel: Node 0 Normal: 34*4kB 192*8kB 53*16kB 92*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 2*4096kB = 16216kB
Jul  6 17:32:13 p6 kernel: Node 0 HighMem: empty
Jul  6 17:32:13 p6 kernel: Swap cache: add 48, delete 48, find 0/0, race 0+0
Jul  6 17:32:13 p6 kernel: Free swap  = 8385728kB
Jul  6 17:32:13 p6 kernel: Total swap = 8385920kB
Jul  6 17:32:13 p6 kernel: Out of Memory: Killed process 10475 (firefox-bin).


On Sat, 23 Jul 2005 10:00:48 -0300
Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:

> 
> James,
> 
> Can you send the OOM killer output? 
> 
> I dont know which devices part of an x86-64 system should 
> be limited to 16Mb of physical addressing. Andi? 
> 
> I don't think that any devices should have 16MB limitation
> 
> On Mon, Jul 18, 2005 at 12:36:50PM -0700, James Washer wrote:
> > Sorry, I should have added... 
> > 	2.6.11.10, 
> > 	x86-64 dual proc (Intel Xeon 3.4GHz)
> > 	6GiB ram
> > 	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
> > 	Host: scsi0 Channel: 00 Id: 00 Lun: 00
> > 		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
> > 	Host: scsi0 Channel: 00 Id: 01 Lun: 00
> > 		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
> > 
> > 
> > On Mon, 18 Jul 2005 12:21:01 -0700
> > James Washer <washer@trlp.com> wrote:
> > 
> > > I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory)
> > drops below the minimum, and the OOM-Killer is fired off.
> > > 
> > > It just strikes me odd that the OOM-Killer would be called at all for DMA memory. 
> > What's the chance of regaining DMA memory by killing user land processes?
> > > 
> > > I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here.  
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-25 19:11     ` James Washer
@ 2005-07-25 12:27       ` Marcelo Tosatti
  2005-07-25 22:41       ` Martin J. Bligh
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2005-07-25 12:27 UTC (permalink / raw)
  To: James Washer, ak; +Cc: linux-mm

On Mon, Jul 25, 2005 at 12:11:30PM -0700, James Washer wrote:
> Pretty typical message here...
> Jul  6 17:31:27 p6 kernel: oom-killer: gfp_mask=0xd1

__GFP_FS|__GFP_IO|__GFP_WAIT|__GFP_DMA

> Jul  6 17:31:27 p6 kernel: Node 0 DMA per-cpu:
> Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 2, high 6, batch 1
> Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 2, batch 1 
> Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 2, high 6, batch 1
> Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 2, batch 1 
> Jul  6 17:31:27 p6 kernel: Node 0 Normal per-cpu:
> Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 32, high 96, batch 16
> Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 32, batch 16
> Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 32, high 96, batch 16
> Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 32, batch 16
> Jul  6 17:31:27 p6 kernel: Node 0 HighMem per-cpu: empty
> Jul  6 17:31:27 p6 kernel: 
> Jul  6 17:31:31 p6 gconfd (washer-7174): SIGHUP received, reloading all databases
> Jul  6 17:31:37 p6 kernel: Free pages:       16236kB (0kB HighMem)
> Jul  6 17:31:38 p6 su(pam_unix)[16136]: session closed for user root
> Jul  6 17:31:49 p6 kernel: Active:596167 inactive:854867 dirty:624740 writeback:0 unstable:0 free:4059 slab:52688 mapped:595231 pagetables:4862
> Jul  6 17:32:02 p6 kernel: Node 0 DMA free:20kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? yes

Andi, 

Zone DMA is exhausted, filled with non-LRU pages, and some 
allocator is requesting a GFP_DMA page.

Can you enlight us what kind of devices are limited to <16MB 
on x86-64, and the reasoning for it ?

> Jul  6 17:32:06 p6 kernel: lowmem_reserve[]: 0 7152 7152
> Jul  6 17:32:11 p6 kernel: Node 0 Normal free:16216kB min:10808kB low:13508kB high:16212kB active:2384668kB inactive:3419468kB present:7323648kB pages_scanned:0 all_unreclaimable? no
> Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0
> Jul  6 17:32:13 p6 kernel: Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0 
> Jul  6 17:32:13 p6 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
> Jul  6 17:32:13 p6 kernel: Node 0 Normal: 34*4kB 192*8kB 53*16kB 92*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 2*4096kB = 16216kB
> Jul  6 17:32:13 p6 kernel: Node 0 HighMem: empty
> Jul  6 17:32:13 p6 kernel: Swap cache: add 48, delete 48, find 0/0, race 0+0
> Jul  6 17:32:13 p6 kernel: Free swap  = 8385728kB
> Jul  6 17:32:13 p6 kernel: Total swap = 8385920kB
> Jul  6 17:32:13 p6 kernel: Out of Memory: Killed process 10475 (firefox-bin).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-25 19:11     ` James Washer
  2005-07-25 12:27       ` Marcelo Tosatti
@ 2005-07-25 22:41       ` Martin J. Bligh
  2005-07-25 15:46         ` Marcelo Tosatti
  2005-07-26  0:35         ` James Washer
  1 sibling, 2 replies; 14+ messages in thread
From: Martin J. Bligh @ 2005-07-25 22:41 UTC (permalink / raw)
  To: James Washer, Marcelo Tosatti; +Cc: linux-mm, ak

Jim, does seem bloody silly to be shooting stuff here, and is
probably simple to fix ... however, would be useful to see where
the DMA allocs are coming from as well, any chance you could dump
a stack backtrace in __alloc_pages when we spec a mask for DMA alloc?

M.

--On Monday, July 25, 2005 12:11:30 -0700 James Washer <washer@trlp.com> wrote:

> Pretty typical message here...
> Jul  6 17:31:27 p6 kernel: oom-killer: gfp_mask=0xd1
> Jul  6 17:31:27 p6 kernel: Node 0 DMA per-cpu:
> Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 2, high 6, batch 1
> Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 2, batch 1 
> Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 2, high 6, batch 1
> Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 2, batch 1 
> Jul  6 17:31:27 p6 kernel: Node 0 Normal per-cpu:
> Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 32, high 96, batch 16
> Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 32, batch 16
> Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 32, high 96, batch 16
> Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 32, batch 16
> Jul  6 17:31:27 p6 kernel: Node 0 HighMem per-cpu: empty
> Jul  6 17:31:27 p6 kernel: 
> Jul  6 17:31:31 p6 gconfd (washer-7174): SIGHUP received, reloading all databases
> Jul  6 17:31:37 p6 kernel: Free pages:       16236kB (0kB HighMem)
> Jul  6 17:31:38 p6 su(pam_unix)[9041]: session closed for user root
> Jul  6 17:31:38 p6 su(pam_unix)[10645]: session closed for user root
> Jul  6 17:31:38 p6 su(pam_unix)[8044]: session closed for user root
> Jul  6 17:31:38 p6 su(pam_unix)[7228]: session closed for user root
> Jul  6 17:31:38 p6 su(pam_unix)[16136]: session closed for user root
> Jul  6 17:31:48 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
> Jul  6 17:31:49 p6 kernel: Active:596167 inactive:854867 dirty:624740 writeback:0 unstable:0 free:4059 slab:52688 mapped:595231 pagetables:4862
> Jul  6 17:32:00 p6 gconfd (washer-7174): Resolved address "xml:readwrite:/home/washer/.gconf" to a writable configuration source at position 1
> Jul  6 17:32:02 p6 kernel: Node 0 DMA free:20kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? yes
> Jul  6 17:32:04 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
> Jul  6 17:32:06 p6 kernel: lowmem_reserve[]: 0 7152 7152
> Jul  6 17:32:11 p6 kernel: Node 0 Normal free:16216kB min:10808kB low:13508kB high:16212kB active:2384668kB inactive:3419468kB present:7323648kB pages_scanned:0 all_unreclaimable? no
> Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0
> Jul  6 17:32:13 p6 kernel: Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0 
> Jul  6 17:32:13 p6 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
> Jul  6 17:32:13 p6 kernel: Node 0 Normal: 34*4kB 192*8kB 53*16kB 92*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 2*4096kB = 16216kB
> Jul  6 17:32:13 p6 kernel: Node 0 HighMem: empty
> Jul  6 17:32:13 p6 kernel: Swap cache: add 48, delete 48, find 0/0, race 0+0
> Jul  6 17:32:13 p6 kernel: Free swap  = 8385728kB
> Jul  6 17:32:13 p6 kernel: Total swap = 8385920kB
> Jul  6 17:32:13 p6 kernel: Out of Memory: Killed process 10475 (firefox-bin).
> 
> 
> On Sat, 23 Jul 2005 10:00:48 -0300
> Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> 
>> 
>> James,
>> 
>> Can you send the OOM killer output? 
>> 
>> I dont know which devices part of an x86-64 system should 
>> be limited to 16Mb of physical addressing. Andi? 
>> 
>> I don't think that any devices should have 16MB limitation
>> 
>> On Mon, Jul 18, 2005 at 12:36:50PM -0700, James Washer wrote:
>> > Sorry, I should have added... 
>> > 	2.6.11.10, 
>> > 	x86-64 dual proc (Intel Xeon 3.4GHz)
>> > 	6GiB ram
>> > 	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
>> > 	Host: scsi0 Channel: 00 Id: 00 Lun: 00
>> > 		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
>> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
>> > 	Host: scsi0 Channel: 00 Id: 01 Lun: 00
>> > 		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
>> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
>> > 
>> > 
>> > On Mon, 18 Jul 2005 12:21:01 -0700
>> > James Washer <washer@trlp.com> wrote:
>> > 
>> > > I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory)
>> > drops below the minimum, and the OOM-Killer is fired off.
>> > > 
>> > > It just strikes me odd that the OOM-Killer would be called at all for DMA memory. 
>> > What's the chance of regaining DMA memory by killing user land processes?
>> > > 
>> > > I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here.  
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-25 22:41       ` Martin J. Bligh
@ 2005-07-25 15:46         ` Marcelo Tosatti
  2005-07-26  0:35         ` James Washer
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2005-07-25 15:46 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: James Washer, linux-mm, ak

On Mon, Jul 25, 2005 at 03:41:27PM -0700, Martin J. Bligh wrote:
> Jim, does seem bloody silly to be shooting stuff here, and is
> probably simple to fix ... however, would be useful to see where
> the DMA allocs are coming from as well, any chance you could dump
> a stack backtrace in __alloc_pages when we spec a mask for DMA alloc?
> 
> M.

The stacktrace should probably be in mainline, along with some sort 
of printk ratelimiting...

v2.4 has 

        if (unlikely(vm_gfp_debug))
                dump_stack();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-25 22:41       ` Martin J. Bligh
  2005-07-25 15:46         ` Marcelo Tosatti
@ 2005-07-26  0:35         ` James Washer
  2005-07-25 17:10           ` Marcelo Tosatti
  2005-07-26 13:29           ` Martin J. Bligh
  1 sibling, 2 replies; 14+ messages in thread
From: James Washer @ 2005-07-26  0:35 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: marcelo.tosatti, linux-mm, ak

Already been done... but I'd not had much time to chase it.. This is my desktop/workstation, so difficult for me to refine the debugging, in any event, here's the backtrace. This is consistent across about 20 events.


 kernel: Call Trace:<ffffffff80156ba8>{out_of_memory+275} <ffffffff80147fe5>{autoremove_wake_function+0}
 kernel:        <ffffffff80157e8b>{__alloc_pages+793} <ffffffff8015a99b>{cache_grow+269}
 kernel:        <ffffffff8015adf4>{cache_alloc_refill+442} <ffffffff8015a883>{kmem_cache_alloc+92}
 kernel:        <ffffffff88000f0b>{:sd_mod:sd_revalidate_disk+155}
 kernel:        <ffffffff8015d549>{pagevec_lookup+23} <ffffffff8015d9a0>{invalidate_mapping_pages+208}
 kernel:        <ffffffff80176292>{invalidate_bh_lru+0} <ffffffff8011b9ef>{flat_send_IPI_allbutself+20}
 kernel:        <ffffffff801198df>{smp_call_function+62} <ffffffff8017b466>{check_disk_change+89}
 kernel:        <ffffffff88000625>{:sd_mod:sd_open+257} <ffffffff8017b73e>{do_open+190}
 kernel:        <ffffffff8017bb02>{blkdev_open+33} <ffffffff80173768>{dentry_open+224}
 kernel:        <ffffffff801738a2>{filp_open+63} <ffffffff8017398a>{get_unused_fd+220}
 kernel:        <ffffffff80173a8b>{sys_open+62} <ffffffff8010e29e>{system_call+126}
 kernel:        


On Mon, 25 Jul 2005 15:41:27 -0700
"Martin J. Bligh" <mbligh@mbligh.org> wrote:

> Jim, does seem bloody silly to be shooting stuff here, and is
> probably simple to fix ... however, would be useful to see where
> the DMA allocs are coming from as well, any chance you could dump
> a stack backtrace in __alloc_pages when we spec a mask for DMA alloc?
> 
> M.
> 
> --On Monday, July 25, 2005 12:11:30 -0700 James Washer <washer@trlp.com> wrote:
> 
> > Pretty typical message here...
> > Jul  6 17:31:27 p6 kernel: oom-killer: gfp_mask=0xd1
> > Jul  6 17:31:27 p6 kernel: Node 0 DMA per-cpu:
> > Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 2, high 6, batch 1
> > Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 2, batch 1 
> > Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 2, high 6, batch 1
> > Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 2, batch 1 
> > Jul  6 17:31:27 p6 kernel: Node 0 Normal per-cpu:
> > Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 32, high 96, batch 16
> > Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 32, batch 16
> > Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 32, high 96, batch 16
> > Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 32, batch 16
> > Jul  6 17:31:27 p6 kernel: Node 0 HighMem per-cpu: empty
> > Jul  6 17:31:27 p6 kernel: 
> > Jul  6 17:31:31 p6 gconfd (washer-7174): SIGHUP received, reloading all databases
> > Jul  6 17:31:37 p6 kernel: Free pages:       16236kB (0kB HighMem)
> > Jul  6 17:31:38 p6 su(pam_unix)[9041]: session closed for user root
> > Jul  6 17:31:38 p6 su(pam_unix)[10645]: session closed for user root
> > Jul  6 17:31:38 p6 su(pam_unix)[8044]: session closed for user root
> > Jul  6 17:31:38 p6 su(pam_unix)[7228]: session closed for user root
> > Jul  6 17:31:38 p6 su(pam_unix)[16136]: session closed for user root
> > Jul  6 17:31:48 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
> > Jul  6 17:31:49 p6 kernel: Active:596167 inactive:854867 dirty:624740 writeback:0 unstable:0 free:4059 slab:52688 mapped:595231 pagetables:4862
> > Jul  6 17:32:00 p6 gconfd (washer-7174): Resolved address "xml:readwrite:/home/washer/.gconf" to a writable configuration source at position 1
> > Jul  6 17:32:02 p6 kernel: Node 0 DMA free:20kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? yes
> > Jul  6 17:32:04 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
> > Jul  6 17:32:06 p6 kernel: lowmem_reserve[]: 0 7152 7152
> > Jul  6 17:32:11 p6 kernel: Node 0 Normal free:16216kB min:10808kB low:13508kB high:16212kB active:2384668kB inactive:3419468kB present:7323648kB pages_scanned:0 all_unreclaimable? no
> > Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0
> > Jul  6 17:32:13 p6 kernel: Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> > Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0 
> > Jul  6 17:32:13 p6 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
> > Jul  6 17:32:13 p6 kernel: Node 0 Normal: 34*4kB 192*8kB 53*16kB 92*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 2*4096kB = 16216kB
> > Jul  6 17:32:13 p6 kernel: Node 0 HighMem: empty
> > Jul  6 17:32:13 p6 kernel: Swap cache: add 48, delete 48, find 0/0, race 0+0
> > Jul  6 17:32:13 p6 kernel: Free swap  = 8385728kB
> > Jul  6 17:32:13 p6 kernel: Total swap = 8385920kB
> > Jul  6 17:32:13 p6 kernel: Out of Memory: Killed process 10475 (firefox-bin).
> > 
> > 
> > On Sat, 23 Jul 2005 10:00:48 -0300
> > Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> > 
> >> 
> >> James,
> >> 
> >> Can you send the OOM killer output? 
> >> 
> >> I dont know which devices part of an x86-64 system should 
> >> be limited to 16Mb of physical addressing. Andi? 
> >> 
> >> I don't think that any devices should have 16MB limitation
> >> 
> >> On Mon, Jul 18, 2005 at 12:36:50PM -0700, James Washer wrote:
> >> > Sorry, I should have added... 
> >> > 	2.6.11.10, 
> >> > 	x86-64 dual proc (Intel Xeon 3.4GHz)
> >> > 	6GiB ram
> >> > 	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
> >> > 	Host: scsi0 Channel: 00 Id: 00 Lun: 00
> >> > 		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
> >> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
> >> > 	Host: scsi0 Channel: 00 Id: 01 Lun: 00
> >> > 		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
> >> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
> >> > 
> >> > 
> >> > On Mon, 18 Jul 2005 12:21:01 -0700
> >> > James Washer <washer@trlp.com> wrote:
> >> > 
> >> > > I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory)
> >> > drops below the minimum, and the OOM-Killer is fired off.
> >> > > 
> >> > > It just strikes me odd that the OOM-Killer would be called at all for DMA memory. 
> >> > What's the chance of regaining DMA memory by killing user land processes?
> >> > > 
> >> > > I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here.  
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> > 
> > 
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-26  0:35         ` James Washer
@ 2005-07-25 17:10           ` Marcelo Tosatti
  2005-07-26 13:29           ` Martin J. Bligh
  1 sibling, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2005-07-25 17:10 UTC (permalink / raw)
  To: James Washer
  Cc: Martin J. Bligh, linux-mm, ak, Andrea Arcangeli, James Bottomley

On Mon, Jul 25, 2005 at 05:35:13PM -0700, James Washer wrote:
> Already been done... but I'd not had much time to chase it.. This is my desktop/workstation, so difficult for me to refine the debugging, in any event, here's the backtrace. This is consistent across about 20 events.
> 
> 
>  kernel: Call Trace:<ffffffff80156ba8>{out_of_memory+275} <ffffffff80147fe5>{autoremove_wake_function+0}
>  kernel:        <ffffffff80157e8b>{__alloc_pages+793} <ffffffff8015a99b>{cache_grow+269}
>  kernel:        <ffffffff8015adf4>{cache_alloc_refill+442} <ffffffff8015a883>{kmem_cache_alloc+92}
>  kernel:        <ffffffff88000f0b>{:sd_mod:sd_revalidate_disk+155}
>  kernel:        <ffffffff8015d549>{pagevec_lookup+23} <ffffffff8015d9a0>{invalidate_mapping_pages+208}
>  kernel:        <ffffffff80176292>{invalidate_bh_lru+0} <ffffffff8011b9ef>{flat_send_IPI_allbutself+20}
>  kernel:        <ffffffff801198df>{smp_call_function+62} <ffffffff8017b466>{check_disk_change+89}
>  kernel:        <ffffffff88000625>{:sd_mod:sd_open+257} <ffffffff8017b73e>{do_open+190}
>  kernel:        <ffffffff8017bb02>{blkdev_open+33} <ffffffff80173768>{dentry_open+224}
>  kernel:        <ffffffff801738a2>{filp_open+63} <ffffffff8017398a>{get_unused_fd+220}
>  kernel:        <ffffffff80173a8b>{sys_open+62} <ffffffff8010e29e>{system_call+126}

/**
 *      sd_revalidate_disk - called the first time a new disk is seen,
 *      performs disk spin up, read_capacity, etc.
 *      @disk: struct gendisk we care about
 **/
static int sd_revalidate_disk(struct gendisk *disk)
{
...
        buffer = kmalloc(512, GFP_KERNEL | __GFP_DMA);
        if (!buffer) {
                printk(KERN_WARNING "(sd_revalidate_disk:) Memory allocation "
                       "failure.\n");
                goto out_release_request;
        }
	

        sd_spinup_disk(sdkp, disk->disk_name, sreq, buffer);


I suppose its playing safe to support 16Mb limited devices. Can't the 
gfp_mask be derived from lower level device characteristics?

On the VM side alloc_pages() should do better if conditioned to kill on
lowmem exhaustion and expect allocators to handle failure properly.

Or maybe to always return failure for in-kernel allocations, but one
might argue that kernel allocations _might_ be crucial for system
functionality (its a case by case basis) IIRC Andrea used to favour
failed DMA allocations to trigger the kill, for reasons I don't
recall.

The OOM killer, what a delight.

> On Mon, 25 Jul 2005 15:41:27 -0700
> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> 
> > Jim, does seem bloody silly to be shooting stuff here, and is
> > probably simple to fix ... however, would be useful to see where
> > the DMA allocs are coming from as well, any chance you could dump
> > a stack backtrace in __alloc_pages when we spec a mask for DMA alloc?
> > 
> > M.
> > 
> > --On Monday, July 25, 2005 12:11:30 -0700 James Washer <washer@trlp.com> wrote:
> > 
> > > Pretty typical message here...
> > > Jul  6 17:31:27 p6 kernel: oom-killer: gfp_mask=0xd1
> > > Jul  6 17:31:27 p6 kernel: Node 0 DMA per-cpu:
> > > Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 2, high 6, batch 1
> > > Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 2, batch 1 
> > > Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 2, high 6, batch 1
> > > Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 2, batch 1 
> > > Jul  6 17:31:27 p6 kernel: Node 0 Normal per-cpu:
> > > Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 32, high 96, batch 16
> > > Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 32, batch 16
> > > Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 32, high 96, batch 16
> > > Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 32, batch 16
> > > Jul  6 17:31:27 p6 kernel: Node 0 HighMem per-cpu: empty
> > > Jul  6 17:31:27 p6 kernel: 
> > > Jul  6 17:31:31 p6 gconfd (washer-7174): SIGHUP received, reloading all databases
> > > Jul  6 17:31:37 p6 kernel: Free pages:       16236kB (0kB HighMem)
> > > Jul  6 17:31:38 p6 su(pam_unix)[9041]: session closed for user root
> > > Jul  6 17:31:38 p6 su(pam_unix)[10645]: session closed for user root
> > > Jul  6 17:31:38 p6 su(pam_unix)[8044]: session closed for user root
> > > Jul  6 17:31:38 p6 su(pam_unix)[7228]: session closed for user root
> > > Jul  6 17:31:38 p6 su(pam_unix)[16136]: session closed for user root
> > > Jul  6 17:31:48 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
> > > Jul  6 17:31:49 p6 kernel: Active:596167 inactive:854867 dirty:624740 writeback:0 unstable:0 free:4059 slab:52688 mapped:595231 pagetables:4862
> > > Jul  6 17:32:00 p6 gconfd (washer-7174): Resolved address "xml:readwrite:/home/washer/.gconf" to a writable configuration source at position 1
> > > Jul  6 17:32:02 p6 kernel: Node 0 DMA free:20kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? yes
> > > Jul  6 17:32:04 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
> > > Jul  6 17:32:06 p6 kernel: lowmem_reserve[]: 0 7152 7152
> > > Jul  6 17:32:11 p6 kernel: Node 0 Normal free:16216kB min:10808kB low:13508kB high:16212kB active:2384668kB inactive:3419468kB present:7323648kB pages_scanned:0 all_unreclaimable? no
> > > Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0
> > > Jul  6 17:32:13 p6 kernel: Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> > > Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0 
> > > Jul  6 17:32:13 p6 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
> > > Jul  6 17:32:13 p6 kernel: Node 0 Normal: 34*4kB 192*8kB 53*16kB 92*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 2*4096kB = 16216kB
> > > Jul  6 17:32:13 p6 kernel: Node 0 HighMem: empty
> > > Jul  6 17:32:13 p6 kernel: Swap cache: add 48, delete 48, find 0/0, race 0+0
> > > Jul  6 17:32:13 p6 kernel: Free swap  = 8385728kB
> > > Jul  6 17:32:13 p6 kernel: Total swap = 8385920kB
> > > Jul  6 17:32:13 p6 kernel: Out of Memory: Killed process 10475 (firefox-bin).
> > > 
> > > 
> > > On Sat, 23 Jul 2005 10:00:48 -0300
> > > Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> > > 
> > >> 
> > >> James,
> > >> 
> > >> Can you send the OOM killer output? 
> > >> 
> > >> I dont know which devices part of an x86-64 system should 
> > >> be limited to 16Mb of physical addressing. Andi? 
> > >> 
> > >> I don't think that any devices should have 16MB limitation
> > >> 
> > >> On Mon, Jul 18, 2005 at 12:36:50PM -0700, James Washer wrote:
> > >> > Sorry, I should have added... 
> > >> > 	2.6.11.10, 
> > >> > 	x86-64 dual proc (Intel Xeon 3.4GHz)
> > >> > 	6GiB ram
> > >> > 	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
> > >> > 	Host: scsi0 Channel: 00 Id: 00 Lun: 00
> > >> > 		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
> > >> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
> > >> > 	Host: scsi0 Channel: 00 Id: 01 Lun: 00
> > >> > 		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
> > >> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
> > >> > 
> > >> > 
> > >> > On Mon, 18 Jul 2005 12:21:01 -0700
> > >> > James Washer <washer@trlp.com> wrote:
> > >> > 
> > >> > > I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory)
> > >> > drops below the minimum, and the OOM-Killer is fired off.
> > >> > > 
> > >> > > It just strikes me odd that the OOM-Killer would be called at all for DMA memory. 
> > >> > What's the chance of regaining DMA memory by killing user land processes?
> > >> > > 
> > >> > > I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here.  
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-26  0:35         ` James Washer
  2005-07-25 17:10           ` Marcelo Tosatti
@ 2005-07-26 13:29           ` Martin J. Bligh
  2005-07-26 15:17             ` Andi Kleen
  1 sibling, 1 reply; 14+ messages in thread
From: Martin J. Bligh @ 2005-07-26 13:29 UTC (permalink / raw)
  To: James Washer; +Cc: marcelo.tosatti, linux-mm, ak, James Bottomley

> Already been done... but I'd not had much time to chase it.. This is 
> my desktop/workstation, so difficult for me to refine the debugging, 
> in any event, here's the backtrace. This is consistent across about 
> 20 events.

Humpf. Not sure why generic scsi code would decide it needed __GFP_DMA:

static int sd_revalidate_disk(struct gendisk *disk) {
...
        buffer = kmalloc(512, GFP_KERNEL | __GFP_DMA);
        if (!buffer) {
                printk(KERN_WARNING "(sd_revalidate_disk:) Memory allocation "
                       "failure.\n");
                goto out_release_request;
        }

Trouble is, ZONE_DMA seems to mean different things on different platforms.
There's even a comment reflecting this crappiness above __kmalloc:

 * Additionally, the %GFP_DMA flag may be set to indicate the memory
 * must be suitable for DMA.  This can mean different things on different
 * platforms.  For example, on i386, it means that the memory must come
 * from the first 16MB.

But that's really for ISA DMA, which nobody uses any more apart from the
floppy disk, and the stone-tablet adaptor. For now, I'm guessing that if
you remove that __GFP_DMA, your machine will be happier, but it's not
the right fix. 

James, am I right in thinking you really want something else there,
maybe dependant on the device (which I doubt you even know in sd.c?)

M.


>  kernel: Call Trace:<ffffffff80156ba8>{out_of_memory+275} <ffffffff80147fe5>{autoremove_wake_function+0}
>  kernel:        <ffffffff80157e8b>{__alloc_pages+793} <ffffffff8015a99b>{cache_grow+269}
>  kernel:        <ffffffff8015adf4>{cache_alloc_refill+442} <ffffffff8015a883>{kmem_cache_alloc+92}
>  kernel:        <ffffffff88000f0b>{:sd_mod:sd_revalidate_disk+155}
>  kernel:        <ffffffff8015d549>{pagevec_lookup+23} <ffffffff8015d9a0>{invalidate_mapping_pages+208}
>  kernel:        <ffffffff80176292>{invalidate_bh_lru+0} <ffffffff8011b9ef>{flat_send_IPI_allbutself+20}
>  kernel:        <ffffffff801198df>{smp_call_function+62} <ffffffff8017b466>{check_disk_change+89}
>  kernel:        <ffffffff88000625>{:sd_mod:sd_open+257} <ffffffff8017b73e>{do_open+190}
>  kernel:        <ffffffff8017bb02>{blkdev_open+33} <ffffffff80173768>{dentry_open+224}
>  kernel:        <ffffffff801738a2>{filp_open+63} <ffffffff8017398a>{get_unused_fd+220}
>  kernel:        <ffffffff80173a8b>{sys_open+62} <ffffffff8010e29e>{system_call+126}
>  kernel:        
> 
> 
> On Mon, 25 Jul 2005 15:41:27 -0700
> "Martin J. Bligh" <mbligh@mbligh.org> wrote:
> 
>> Jim, does seem bloody silly to be shooting stuff here, and is
>> probably simple to fix ... however, would be useful to see where
>> the DMA allocs are coming from as well, any chance you could dump
>> a stack backtrace in __alloc_pages when we spec a mask for DMA alloc?
>> 
>> M.
>> 
>> --On Monday, July 25, 2005 12:11:30 -0700 James Washer <washer@trlp.com> wrote:
>> 
>> > Pretty typical message here...
>> > Jul  6 17:31:27 p6 kernel: oom-killer: gfp_mask=0xd1
>> > Jul  6 17:31:27 p6 kernel: Node 0 DMA per-cpu:
>> > Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 2, high 6, batch 1
>> > Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 2, batch 1 
>> > Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 2, high 6, batch 1
>> > Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 2, batch 1 
>> > Jul  6 17:31:27 p6 kernel: Node 0 Normal per-cpu:
>> > Jul  6 17:31:27 p6 kernel: cpu 0 hot: low 32, high 96, batch 16
>> > Jul  6 17:31:27 p6 kernel: cpu 0 cold: low 0, high 32, batch 16
>> > Jul  6 17:31:27 p6 kernel: cpu 1 hot: low 32, high 96, batch 16
>> > Jul  6 17:31:27 p6 kernel: cpu 1 cold: low 0, high 32, batch 16
>> > Jul  6 17:31:27 p6 kernel: Node 0 HighMem per-cpu: empty
>> > Jul  6 17:31:27 p6 kernel: 
>> > Jul  6 17:31:31 p6 gconfd (washer-7174): SIGHUP received, reloading all databases
>> > Jul  6 17:31:37 p6 kernel: Free pages:       16236kB (0kB HighMem)
>> > Jul  6 17:31:38 p6 su(pam_unix)[9041]: session closed for user root
>> > Jul  6 17:31:38 p6 su(pam_unix)[10645]: session closed for user root
>> > Jul  6 17:31:38 p6 su(pam_unix)[8044]: session closed for user root
>> > Jul  6 17:31:38 p6 su(pam_unix)[7228]: session closed for user root
>> > Jul  6 17:31:38 p6 su(pam_unix)[16136]: session closed for user root
>> > Jul  6 17:31:48 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
>> > Jul  6 17:31:49 p6 kernel: Active:596167 inactive:854867 dirty:624740 writeback:0 unstable:0 free:4059 slab:52688 mapped:595231 pagetables:4862
>> > Jul  6 17:32:00 p6 gconfd (washer-7174): Resolved address "xml:readwrite:/home/washer/.gconf" to a writable configuration source at position 1
>> > Jul  6 17:32:02 p6 kernel: Node 0 DMA free:20kB min:24kB low:28kB high:36kB active:0kB inactive:0kB present:16384kB pages_scanned:1 all_unreclaimable? yes
>> > Jul  6 17:32:04 p6 gconfd (washer-7174): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
>> > Jul  6 17:32:06 p6 kernel: lowmem_reserve[]: 0 7152 7152
>> > Jul  6 17:32:11 p6 kernel: Node 0 Normal free:16216kB min:10808kB low:13508kB high:16212kB active:2384668kB inactive:3419468kB present:7323648kB pages_scanned:0 all_unreclaimable? no
>> > Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0
>> > Jul  6 17:32:13 p6 kernel: Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
>> > Jul  6 17:32:13 p6 kernel: lowmem_reserve[]: 0 0 0 
>> > Jul  6 17:32:13 p6 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB
>> > Jul  6 17:32:13 p6 kernel: Node 0 Normal: 34*4kB 192*8kB 53*16kB 92*32kB 2*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 2*4096kB = 16216kB
>> > Jul  6 17:32:13 p6 kernel: Node 0 HighMem: empty
>> > Jul  6 17:32:13 p6 kernel: Swap cache: add 48, delete 48, find 0/0, race 0+0
>> > Jul  6 17:32:13 p6 kernel: Free swap  = 8385728kB
>> > Jul  6 17:32:13 p6 kernel: Total swap = 8385920kB
>> > Jul  6 17:32:13 p6 kernel: Out of Memory: Killed process 10475 (firefox-bin).
>> > 
>> > 
>> > On Sat, 23 Jul 2005 10:00:48 -0300
>> > Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>> > 
>> >> 
>> >> James,
>> >> 
>> >> Can you send the OOM killer output? 
>> >> 
>> >> I dont know which devices part of an x86-64 system should 
>> >> be limited to 16Mb of physical addressing. Andi? 
>> >> 
>> >> I don't think that any devices should have 16MB limitation
>> >> 
>> >> On Mon, Jul 18, 2005 at 12:36:50PM -0700, James Washer wrote:
>> >> > Sorry, I should have added... 
>> >> > 	2.6.11.10, 
>> >> > 	x86-64 dual proc (Intel Xeon 3.4GHz)
>> >> > 	6GiB ram
>> >> > 	Intel Corporation 82801EB (ICH5) SATA Controller (rev 0)
>> >> > 	Host: scsi0 Channel: 00 Id: 00 Lun: 00
>> >> > 		Vendor: ATA      Model: Maxtor 6Y160M0   Rev: YAR5
>> >> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
>> >> > 	Host: scsi0 Channel: 00 Id: 01 Lun: 00
>> >> > 		Vendor: ATA      Model: Maxtor 7Y250M0   Rev: YAR5
>> >> > 		Type:   Direct-Access                    ANSI SCSI revision: 05
>> >> > 
>> >> > 
>> >> > On Mon, 18 Jul 2005 12:21:01 -0700
>> >> > James Washer <washer@trlp.com> wrote:
>> >> > 
>> >> > > I'm chasing down a system problem where the DMA memory (x86-64, god knows why it is using DMA memory)
>> >> > drops below the minimum, and the OOM-Killer is fired off.
>> >> > > 
>> >> > > It just strikes me odd that the OOM-Killer would be called at all for DMA memory. 
>> >> > What's the chance of regaining DMA memory by killing user land processes?
>> >> > > 
>> >> > > I'll admit, I know very little about linux VM, so perhaps I'm missing how oom killing can be helpful here.  
>> > --
>> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> > the body to majordomo@kvack.org.  For more info on Linux MM,
>> > see: http://www.linux-mm.org/ .
>> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> > 
>> > 
>> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-26 13:29           ` Martin J. Bligh
@ 2005-07-26 15:17             ` Andi Kleen
  2005-07-26 16:34               ` Martin J. Bligh
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2005-07-26 15:17 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: James Washer, marcelo.tosatti, linux-mm, James Bottomley

> But that's really for ISA DMA, which nobody uses any more apart from the
> floppy disk, and the stone-tablet adaptor. For now, I'm guessing that if
> you remove that __GFP_DMA, your machine will be happier, but it's not
> the right fix. 

iirc the reason for that was that someone could load an old ISA SCSI controller
later as a module and it needs to handle that. Perhaps make it dependent
on CONFIG_ISA ? But even that would not help on distribution kernels.
Another way would be to check in PCI systems if there is a ISA 
bridge and for others assume ISA is there.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-26 15:17             ` Andi Kleen
@ 2005-07-26 16:34               ` Martin J. Bligh
  0 siblings, 0 replies; 14+ messages in thread
From: Martin J. Bligh @ 2005-07-26 16:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: James Washer, marcelo.tosatti, linux-mm, James Bottomley, linux-kernel


--Andi Kleen <ak@muc.de> wrote (on Tuesday, July 26, 2005 17:17:54 +0200):

>> But that's really for ISA DMA, which nobody uses any more apart from the
>> floppy disk, and the stone-tablet adaptor. For now, I'm guessing that if
>> you remove that __GFP_DMA, your machine will be happier, but it's not
>> the right fix. 
> 
> iirc the reason for that was that someone could load an old ISA SCSI controller
> later as a module and it needs to handle that. Perhaps make it dependent
> on CONFIG_ISA ? But even that would not help on distribution kernels.
> Another way would be to check in PCI systems if there is a ISA 
> bridge and for others assume ISA is there.

Yeah, the CONFIG_ISA thing makes a lot of sense to me ... however, would
be even better if we can work out (easily) what the disk is attatched to.
Pah, ISA is so shit. Generically, might be useful if we had a 
__GFP_DMA_IF_ISA or something defined in the header files, rather than
just shoving ifdef's all over the place.

OTOH, Jim is right ... the OOM killer is being somewhat psycopathic. Seems
we need 2 fixes ;-)

M.

PS. Warning, this is wholly untested, and generally a bit shit.
PPS. jejb ... your mail bounces.

diff -aurpN -X /home/fletch/.diff.exclude virgin/drivers/scsi/sd.c isa_dma/drivers/scsi/sd.c
--- virgin/drivers/scsi/sd.c	2005-07-26 09:25:40.000000000 -0700
+++ isa_dma/drivers/scsi/sd.c	2005-07-26 09:32:05.000000000 -0700
@@ -1468,7 +1468,7 @@ static int sd_revalidate_disk(struct gen
 		goto out;
 	}
 
-	buffer = kmalloc(512, GFP_KERNEL | __GFP_DMA);
+	buffer = kmalloc(512, GFP_KERNEL | __GFP_DMA_IF_ISA);
 	if (!buffer) {
 		printk(KERN_WARNING "(sd_revalidate_disk:) Memory allocation "
 		       "failure.\n");
diff -aurpN -X /home/fletch/.diff.exclude virgin/include/linux/gfp.h isa_dma/include/linux/gfp.h
--- virgin/include/linux/gfp.h	2005-07-26 09:26:02.000000000 -0700
+++ isa_dma/include/linux/gfp.h	2005-07-26 09:29:38.000000000 -0700
@@ -13,6 +13,11 @@ struct vm_area_struct;
  */
 /* Zone modifiers in GFP_ZONEMASK (see linux/mmzone.h - low two bits) */
 #define __GFP_DMA	0x01
+#ifdef CONFIG_ISA
+  #define __GFP_DMA_IF_ISA	__GFP_DMA
+#else
+  #define __GFP_DMA_IF_ISA	0
+#endif
 #define __GFP_HIGHMEM	0x02
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Question about OOM-Killer
  2005-07-23 13:00   ` Marcelo Tosatti
  2005-07-25 19:11     ` James Washer
@ 2005-07-26 13:53     ` Andi Kleen
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2005-07-26 13:53 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: James Washer, linux-mm

On Sat, Jul 23, 2005 at 10:00:48AM -0300, Marcelo Tosatti wrote:
> 
> James,
> 
> Can you send the OOM killer output? 
> 
> I dont know which devices part of an x86-64 system should 
> be limited to 16Mb of physical addressing. Andi? 

Could be old devices like the floppy (it does a single GFP_DMA
allocation). Or a few devices that have >16MB limits (like aacraid
or some old sound chips) but there is no other zone for them
right now.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Question about oom-killer
@ 2023-05-31  8:42 Gou Hao
  0 siblings, 0 replies; 14+ messages in thread
From: Gou Hao @ 2023-05-31  8:42 UTC (permalink / raw)
  To: linux-mm, linux-kernel

hello everyone,

Recently, my kernel restarted while I was running ltp-oom02(It allocates 
memory infinitely in a loop, testing whether the oom-killer works 
properly ).
log:
```
[480156.950100] Tasks state (memory values in pages):
[480156.950101] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes 
swapents oom_score_adj name
[480156.950302] [   2578]    81  2578      523        0 393216        
6          -900 dbus-daemon
[480156.950309] [   2648]   172  2596     2435        0 393216        
5             0 rtkit-daemon
[480156.950322] [   5256]     0  2826    25411        0 589824        
0             0 DetectThread
[480156.950328] [   5404]     0  5404      412        2 393216       
64         -1000 sshd
[480156.950357] [  10518]     0 10518     2586        0 393216       
10             0 at-spi2-registr
[480156.950361] [  10553]     0 10551    10543        0 458752        
9             0 QXcbEventQueue
[480156.950365] [  10867]     0 10567    17579        0 589824       
16             0 QXcbEventQueue
[480156.950370] [  10928]     0 10921     6999        0 458752       
17             0 QXcbEventQueue
[480156.950390] [  11882]     0 11811     7377        0 458752       
10             0 QXcbEventQueue
[480156.950394] [  12052]     0 12052     5823        0 458752       
21             0 fcitx
[480156.950404] [  12115]     0 12114    11678        0 524288       
21             0 QXcbEventQueue
[480156.950408] [ 101558]     0 101558     3549        0 393216        
0             0 runltp
[480156.950486] [1068864]     0 1068864      771        6 327680       
85         -1000 systemd-udevd
[480156.950552] [1035639]     0 1035639       52        0 393216       
14         -1000 oom02
[480156.950556] [1035640]     0 1035640       52        0 393216       
23         -1000 oom02
[480156.950561] [1036065]     0 1036065      493       60 393216        
0          -250 systemd-journal
[480156.950565] [1036087]     0 1036073  6258739  3543942 
37814272        0             0 oom02
[480156.950572] Out of memory and no killable processes...
[480156.950575] Kernel panic - not syncing: System is deadlocked on memory
```

oom02-1036073 has been already killed before crash.
log:
```
[480152.242506] [1035177]     0 1035177     4773       20 393216      
115             0 sssd_nss
[480152.242510] [1035376]     0 1035376    25500      391 589824      
602             0 tuned
[480152.242514] [1035639]     0 1035639       52        0 393216       
14         -1000 oom02
[480152.242517] [1035640]     0 1035640       52        0 393216       
19         -1000 oom02
[480152.242522] [1036065]     0 1036065      493      114 393216       
62          -250 systemd-journal
[480152.242525] [1036073]     0 1036073  6258739  3540314 37814272      
104             0 oom02
[480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or 
sacrifice child
[480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB, 
anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
[480152.365804] oom_reaper: reaped process 1036073 (oom02), now 
anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
```
but its memory can not be reclaimed.I add trace-log to oom_reaper code 
in kernel,
I found that there is a large range vma in the memory that cannot be 
reclaimed, and the vma has the  `VM_LOCKED` flag, so cannot be reclaimed 
immediately.
```log
       oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh: 
vma is anon:1048691, range=65536
       oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh: 
vma is anon:1048691, range=196608
       oom_reaper-57    [007] ....   126.063582: __oom_reap_task_mm: gh: 
vma continue: 1056883, range:3221225472
       oom_reaper-57    [007] ....   126.063583: __oom_reap_task_mm: gh: 
vma is anon:112, range=65536
       oom_reaper-57    [007] ....   126.063584: __oom_reap_task_mm: gh: 
vma is anon:1048691, range=8388608
```
`vma continue: 1056883, range:3221225472` is the memory that can not 
reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag

oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap 
will merge continuous vma into one,
so as long as one thread is still running, the entire vma will not be 
released.

In extreme cases, crashes may occur due to the lack of memory reclamation.

I'm not sure if this is a kernel's bug ?

-- 
thanks,
Gou Hao <gouhao@uniontech.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-05-31  8:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-18 19:21 Question about OOM-Killer James Washer
2005-07-18 19:36 ` James Washer
2005-07-23 13:00   ` Marcelo Tosatti
2005-07-25 19:11     ` James Washer
2005-07-25 12:27       ` Marcelo Tosatti
2005-07-25 22:41       ` Martin J. Bligh
2005-07-25 15:46         ` Marcelo Tosatti
2005-07-26  0:35         ` James Washer
2005-07-25 17:10           ` Marcelo Tosatti
2005-07-26 13:29           ` Martin J. Bligh
2005-07-26 15:17             ` Andi Kleen
2005-07-26 16:34               ` Martin J. Bligh
2005-07-26 13:53     ` Andi Kleen
2023-05-31  8:42 Question about oom-killer Gou Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox