* Re: Memory overcommit [not found] ` <20091014135119.e1baa07f.kamezawa.hiroyu@jp.fujitsu.com> @ 2009-10-20 21:52 ` Vedran Furač 2009-10-26 1:55 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-20 21:52 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm Hi and sorry for delay. Also, please CC me. KAMEZAWA Hiroyuki wrote: > On Tue, 13 Oct 2009 19:13:34 +0200 > Vedran FuraA? <vedranf@vedranf.mine.nu> wrote: > >>> Against random-kill, you may have 2 choices. >>> >>> 1. use /proc/<pid>/oom_adj 2. use memory cgroup. >>> >>> Something more easy-to-use method may be appriciated. We have above 2 >>> now. >> These are just bad workarounds for bad OOM algorithm. I tested this >> little program on multiple systems (including windows) without any >> tweaking and linux behavior is, unfortunately *the worst*. :/ >> > Yes, they are workaround. You can use /etc/sysctl.conf. > But if making it default _now_, many threaded programs will not work. Only Java ;) and only sometimes, at least from my experinence > But I agree, OOM killer should be sophisticated. > Please give us a sample program/test case which causes problem. > linux-mm@kvack.org may be a better place. lkml has too much traffic. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> int main() { char *buf; while(1) { buf = malloc (1024*1024*100); if ( buf == NULL ) { perror("malloc"); getchar(); exit(EXIT_FAILURE); } sleep(1); memset(buf, 1, 1024*1024*100); } return 0; } After running this on a typical desktop with gnome or kde, OOM killer will kill 5-10 innocent processes before killing this one. Tested multiple times on multiple installations. Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-20 21:52 ` Memory overcommit Vedran Furač @ 2009-10-26 1:55 ` KAMEZAWA Hiroyuki 2009-10-26 16:16 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-26 1:55 UTC (permalink / raw) To: vedran.furac; +Cc: linux-mm On Tue, 20 Oct 2009 23:52:33 +0200 Vedran FuraA? <vedran.furac@gmail.com> wrote: > Hi and sorry for delay. Also, please CC me. > > But I agree, OOM killer should be sophisticated. > > Please give us a sample program/test case which causes problem. > > linux-mm@kvack.org may be a better place. lkml has too much traffic. > > #include <stdio.h> > #include <string.h> > #include <stdlib.h> > #include <unistd.h> > > int main() > { > char *buf; > while(1) { > buf = malloc (1024*1024*100); > if ( buf == NULL ) { > perror("malloc"); > getchar(); > exit(EXIT_FAILURE); > } > sleep(1); > memset(buf, 1, 1024*1024*100); > } > return 0; > } > > > After running this on a typical desktop with gnome or kde, OOM killer > will kill 5-10 innocent processes before killing this one. Tested > multiple times on multiple installations. > > Regards, > Can I make more questions ? - What's cpu ? - How much memory ? - Do you have swap ? - What's the latest kernel version you tested? - Could you show me /var/log/dmesg and /var/log/messages at OOM ? Thanks, -Kame > Vedran > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-26 1:55 ` KAMEZAWA Hiroyuki @ 2009-10-26 16:16 ` Vedran Furač 2009-10-27 3:22 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-26 16:16 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel KAMEZAWA Hiroyuki wrote: > Can I make more questions ? Sure > - What's cpu ? vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X3 720 Processor stepping : 2 cpu MHz : 3314.812 cache size : 512 KB > - How much memory ? > - Do you have swap ? total used free shared buffers cached Mem: 3459 1452 2007 0 65 622 -/+ buffers/cache: 764 2695 Swap: 0 0 0 So, no swap. Don't need it. > - What's the latest kernel version you tested? 2.6.30-2-amd64 #1 SMP (on Debian) > - Could you show me /var/log/dmesg and /var/log/messages at OOM ? It was catastrophe. :) X crashed (or killed) with all the programs, but my little program was alive for 20 minutes (see timestamps). And for that time computer was completely unusable. Couldn't even get the console via ssh. Rally embarrassing for a modern OS to get destroyed by a 5 lines of C run as an ordinary user. Luckily screen was still alive, oomk usually kills it also. See for yourself: dmesg: http://pastebin.com/f3f83738a messages: http://pastebin.com/f2091110a (CCing to lklm again... I just want people to see the logs.) Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-26 16:16 ` Vedran Furač @ 2009-10-27 3:22 ` KAMEZAWA Hiroyuki 2009-10-27 6:10 ` KOSAKI Motohiro ` (2 more replies) 0 siblings, 3 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-27 3:22 UTC (permalink / raw) To: vedran.furac; +Cc: linux-mm, linux-kernel, kosaki.motohiro [-- Attachment #1: Type: text/plain, Size: 1614 bytes --] On Mon, 26 Oct 2009 17:16:14 +0100 Vedran FuraA? <vedran.furac@gmail.com> wrote: > > - Could you show me /var/log/dmesg and /var/log/messages at OOM ? > > It was catastrophe. :) X crashed (or killed) with all the programs, but > my little program was alive for 20 minutes (see timestamps). And for > that time computer was completely unusable. Couldn't even get the > console via ssh. Rally embarrassing for a modern OS to get destroyed by > a 5 lines of C run as an ordinary user. Luckily screen was still alive, > oomk usually kills it also. See for yourself: > > dmesg: http://pastebin.com/f3f83738a > messages: http://pastebin.com/f2091110a > > (CCing to lklm again... I just want people to see the logs.) > Thank you for reporting and your patience. It seems something strange that your KDE programs are killed. I agree. I attached a scirpt for checking oom_score of all exisiting process. (oom_score is a value used for selecting "bad" processs.") please run if you have time. This is a result of my own desktop(on virtual machine.) In this environ (Total memory is 1.6GBytes), mmap(1G) program is running. %check_badness.pl | sort -n | tail -- 89924 3938 mixer_applet2 90210 3942 tomboy 94753 3936 clock-applet 101994 3919 pulseaudio 113525 4028 gnome-terminal 127340 1 init 128177 3871 nautilus 151003 11515 bash 256944 11653 mmap 425561 3829 gnome-session -- Sigh, gnome-session has twice value of mmap(1G). Of course, gnome-session only uses 6M bytes of anon. I wonder this is because gnome-session has many children..but need to dig more. Does anyone has idea ? (CCed kosaki) Thanks, -Kame [-- Attachment #2: check_badness.pl --] [-- Type: text/x-perl, Size: 313 bytes --] #!/usr/bin/perl open(LINE, "ps -A -o pid,comm | grep -v PID|") || die "can't ps"; while (<LINE>) { /^\s*([0-9]+)\s+(.*)$/; $PID=$1; $COMM=$2; open(SCORE, "/proc/$PID/oom_score") || next; $oom_score = <SCORE>; chomp($oom_score); close(SCORE); print $oom_score."\t".$PID . "\t",$COMM."\n"; } close(LINE); ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 3:22 ` KAMEZAWA Hiroyuki @ 2009-10-27 6:10 ` KOSAKI Motohiro 2009-10-27 6:34 ` Minchan Kim 2009-10-27 17:12 ` Vedran Furač 2009-10-27 20:44 ` Hugh Dickins 2 siblings, 1 reply; 77+ messages in thread From: KOSAKI Motohiro @ 2009-10-27 6:10 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: vedran.furac, linux-mm, linux-kernel 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>: > On Mon, 26 Oct 2009 17:16:14 +0100 > Vedran Furač <vedran.furac@gmail.com> wrote: >> > - Could you show me /var/log/dmesg and /var/log/messages at OOM ? >> >> It was catastrophe. :) X crashed (or killed) with all the programs, but >> my little program was alive for 20 minutes (see timestamps). And for >> that time computer was completely unusable. Couldn't even get the >> console via ssh. Rally embarrassing for a modern OS to get destroyed by >> a 5 lines of C run as an ordinary user. Luckily screen was still alive, >> oomk usually kills it also. See for yourself: >> >> dmesg: http://pastebin.com/f3f83738a >> messages: http://pastebin.com/f2091110a >> >> (CCing to lklm again... I just want people to see the logs.) >> > Thank you for reporting and your patience. It seems something strange > that your KDE programs are killed. I agree. > > I attached a scirpt for checking oom_score of all exisiting process. > (oom_score is a value used for selecting "bad" processs.") > please run if you have time. > > This is a result of my own desktop(on virtual machine.) > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running. > > %check_badness.pl | sort -n | tail > -- > 89924 3938 mixer_applet2 > 90210 3942 tomboy > 94753 3936 clock-applet > 101994 3919 pulseaudio > 113525 4028 gnome-terminal > 127340 1 init > 128177 3871 nautilus > 151003 11515 bash > 256944 11653 mmap > 425561 3829 gnome-session > -- > Sigh, gnome-session has twice value of mmap(1G). > Of course, gnome-session only uses 6M bytes of anon. > I wonder this is because gnome-session has many children..but need to > dig more. Does anyone has idea ? > (CCed kosaki) Following output address the issue. The fact is, modern desktop application linked pretty many library. it makes bloat VSS size and increase OOM score. Ideally, We shouldn't account evictable file-backed mappings for oom_score. # cat /proc/`pidof gnome-session`/maps 00400000-00433000 r-xp 00000000 fd:00 100061 /usr/bin/gnome-session 00632000-00637000 rw-p 00032000 fd:00 100061 /usr/bin/gnome-session 00949000-00a10000 rw-p 00000000 00:00 0 [heap] 34cf600000-34cf61f000 r-xp 00000000 fd:00 1088 /lib64/ld-2.10.1.so 34cf81e000-34cf81f000 r--p 0001e000 fd:00 1088 /lib64/ld-2.10.1.so 34cf81f000-34cf820000 rw-p 0001f000 fd:00 1088 /lib64/ld-2.10.1.so 34cfa00000-34cfb64000 r-xp 00000000 fd:00 1089 /lib64/libc-2.10.1.so 34cfb64000-34cfd64000 ---p 00164000 fd:00 1089 /lib64/libc-2.10.1.so 34cfd64000-34cfd68000 r--p 00164000 fd:00 1089 /lib64/libc-2.10.1.so 34cfd68000-34cfd69000 rw-p 00168000 fd:00 1089 /lib64/libc-2.10.1.so 34cfd69000-34cfd6e000 rw-p 00000000 00:00 0 34cfe00000-34cfe82000 r-xp 00000000 fd:00 1104 /lib64/libm-2.10.1.so 34cfe82000-34d0082000 ---p 00082000 fd:00 1104 /lib64/libm-2.10.1.so 34d0082000-34d0083000 r--p 00082000 fd:00 1104 /lib64/libm-2.10.1.so 34d0083000-34d0084000 rw-p 00083000 fd:00 1104 /lib64/libm-2.10.1.so 34d0200000-34d0202000 r-xp 00000000 fd:00 1095 /lib64/libdl-2.10.1.so 34d0202000-34d0402000 ---p 00002000 fd:00 1095 /lib64/libdl-2.10.1.so 34d0402000-34d0403000 r--p 00002000 fd:00 1095 /lib64/libdl-2.10.1.so 34d0403000-34d0404000 rw-p 00003000 fd:00 1095 /lib64/libdl-2.10.1.so 34d0600000-34d0617000 r-xp 00000000 fd:00 1090 /lib64/libpthread-2.10.1.so 34d0617000-34d0816000 ---p 00017000 fd:00 1090 /lib64/libpthread-2.10.1.so 34d0816000-34d0817000 r--p 00016000 fd:00 1090 /lib64/libpthread-2.10.1.so 34d0817000-34d0818000 rw-p 00017000 fd:00 1090 /lib64/libpthread-2.10.1.so 34d0818000-34d081c000 rw-p 00000000 00:00 0 34d0a00000-34d0a15000 r-xp 00000000 fd:00 1113 /lib64/libz.so.1.2.3 34d0a15000-34d0c14000 ---p 00015000 fd:00 1113 /lib64/libz.so.1.2.3 34d0c14000-34d0c15000 rw-p 00014000 fd:00 1113 /lib64/libz.so.1.2.3 34d0e00000-34d0e07000 r-xp 00000000 fd:00 1091 /lib64/librt-2.10.1.so 34d0e07000-34d1006000 ---p 00007000 fd:00 1091 /lib64/librt-2.10.1.so 34d1006000-34d1007000 r--p 00006000 fd:00 1091 /lib64/librt-2.10.1.so 34d1007000-34d1008000 rw-p 00007000 fd:00 1091 /lib64/librt-2.10.1.so 34d1200000-34d121c000 r-xp 00000000 fd:00 1097 /lib64/libselinux.so.1 34d121c000-34d141b000 ---p 0001c000 fd:00 1097 /lib64/libselinux.so.1 34d141b000-34d141c000 r--p 0001b000 fd:00 1097 /lib64/libselinux.so.1 34d141c000-34d141d000 rw-p 0001c000 fd:00 1097 /lib64/libselinux.so.1 34d141d000-34d141e000 rw-p 00000000 00:00 0 34d1600000-34d16dd000 r-xp 00000000 fd:00 1092 /lib64/libglib-2.0.so.0.2000.4 34d16dd000-34d18dc000 ---p 000dd000 fd:00 1092 /lib64/libglib-2.0.so.0.2000.4 34d18dc000-34d18de000 rw-p 000dc000 fd:00 1092 /lib64/libglib-2.0.so.0.2000.4 34d1a00000-34d1a41000 r-xp 00000000 fd:00 1094 /lib64/libgobject-2.0.so.0.2000.4 34d1a41000-34d1c41000 ---p 00041000 fd:00 1094 /lib64/libgobject-2.0.so.0.2000.4 34d1c41000-34d1c43000 rw-p 00041000 fd:00 1094 /lib64/libgobject-2.0.so.0.2000.4 34d1e00000-34d1e02000 r-xp 00000000 fd:00 1115 /usr/lib64/libXau.so.6.0.0 34d1e02000-34d2001000 ---p 00002000 fd:00 1115 /usr/lib64/libXau.so.6.0.0 34d2001000-34d2002000 rw-p 00001000 fd:00 1115 /usr/lib64/libXau.so.6.0.0 34d2200000-34d2203000 r-xp 00000000 fd:00 1096 /lib64/libgmodule-2.0.so.0.2000.4 34d2203000-34d2402000 ---p 00003000 fd:00 1096 /lib64/libgmodule-2.0.so.0.2000.4 34d2402000-34d2403000 rw-p 00002000 fd:00 1096 /lib64/libgmodule-2.0.so.0.2000.4 34d2600000-34d261a000 r-xp 00000000 fd:00 1116 /usr/lib64/libxcb.so.1.1.0 34d261a000-34d281a000 ---p 0001a000 fd:00 1116 /usr/lib64/libxcb.so.1.1.0 34d281a000-34d281b000 rw-p 0001a000 fd:00 1116 /usr/lib64/libxcb.so.1.1.0 34d2a00000-34d2b34000 r-xp 00000000 fd:00 1117 /usr/lib64/libX11.so.6.2.0 34d2b34000-34d2d33000 ---p 00134000 fd:00 1117 /usr/lib64/libX11.so.6.2.0 34d2d33000-34d2d39000 rw-p 00133000 fd:00 1117 /usr/lib64/libX11.so.6.2.0 34d2e00000-34d2e04000 r-xp 00000000 fd:00 1093 /lib64/libgthread-2.0.so.0.2000.4 34d2e04000-34d3003000 ---p 00004000 fd:00 1093 /lib64/libgthread-2.0.so.0.2000.4 34d3003000-34d3004000 rw-p 00003000 fd:00 1093 /lib64/libgthread-2.0.so.0.2000.4 34d3200000-34d3226000 r-xp 00000000 fd:00 1111 /lib64/libexpat.so.1.5.2 34d3226000-34d3425000 ---p 00026000 fd:00 1111 /lib64/libexpat.so.1.5.2 34d3425000-34d3428000 rw-p 00025000 fd:00 1111 /lib64/libexpat.so.1.5.2 34d3600000-34d3676000 r-xp 00000000 fd:00 1098 /lib64/libgio-2.0.so.0.2000.4 34d3676000-34d3875000 ---p 00076000 fd:00 1098 /lib64/libgio-2.0.so.0.2000.4 34d3875000-34d3877000 rw-p 00075000 fd:00 1098 /lib64/libgio-2.0.so.0.2000.4 34d3877000-34d3878000 rw-p 00000000 00:00 0 34d3a00000-34d3a93000 r-xp 00000000 fd:00 1110 /usr/lib64/libfreetype.so.6.3.20 34d3a93000-34d3c93000 ---p 00093000 fd:00 1110 /usr/lib64/libfreetype.so.6.3.20 34d3c93000-34d3c99000 rw-p 00093000 fd:00 1110 /usr/lib64/libfreetype.so.6.3.20 34d3e00000-34d3e04000 r-xp 00000000 fd:00 1141 /lib64/libattr.so.1.1.0 34d3e04000-34d4003000 ---p 00004000 fd:00 1141 /lib64/libattr.so.1.1.0 34d4003000-34d4004000 rw-p 00003000 fd:00 1141 /lib64/libattr.so.1.1.0 34d4200000-34d4211000 r-xp 00000000 fd:00 1123 /usr/lib64/libXext.so.6.4.0 34d4211000-34d4411000 ---p 00011000 fd:00 1123 /usr/lib64/libXext.so.6.4.0 34d4411000-34d4412000 rw-p 00011000 fd:00 1123 /usr/lib64/libXext.so.6.4.0 34d4600000-34d4604000 r-xp 00000000 fd:00 1142 /lib64/libcap.so.2.16 34d4604000-34d4803000 ---p 00004000 fd:00 1142 /lib64/libcap.so.2.16 34d4803000-34d4804000 rw-p 00003000 fd:00 1142 /lib64/libcap.so.2.16 34d4a00000-34d4a33000 r-xp 00000000 fd:00 1112 /usr/lib64/libfontconfig.so.1.4.1 34d4a33000-34d4c32000 ---p 00033000 fd:00 1112 /usr/lib64/libfontconfig.so.1.4.1 34d4c32000-34d4c34000 rw-p 00032000 fd:00 1112 /usr/lib64/libfontconfig.so.1.4.1 34d4e00000-34d4e25000 r-xp 00000000 fd:00 1114 /usr/lib64/libpng12.so.0.37.0 34d4e25000-34d5024000 ---p 00025000 fd:00 1114 /usr/lib64/libpng12.so.0.37.0 34d5024000-34d5025000 rw-p 00024000 fd:00 1114 /usr/lib64/libpng12.so.0.37.0 34d5200000-34d523c000 r-xp 00000000 fd:00 1143 /lib64/libdbus-1.so.3.4.0 34d523c000-34d543c000 ---p 0003c000 fd:00 1143 /lib64/libdbus-1.so.3.4.0 34d543c000-34d543d000 r--p 0003c000 fd:00 1143 /lib64/libdbus-1.so.3.4.0 34d543d000-34d543e000 rw-p 0003d000 fd:00 1143 /lib64/libdbus-1.so.3.4.0 34d5600000-34d5609000 r-xp 00000000 fd:00 1118 /usr/lib64/libXrender.so.1.3.0 34d5609000-34d5808000 ---p 00009000 fd:00 1118 /usr/lib64/libXrender.so.1.3.0 34d5808000-34d5809000 rw-p 00008000 fd:00 1118 /usr/lib64/libXrender.so.1.3.0 34d5a00000-34d5a2c000 r-xp 00000000 fd:00 1121 /usr/lib64/libpangoft2-1.0.so.0.2400.5 34d5a2c000-34d5c2b000 ---p 0002c000 fd:00 1121 /usr/lib64/libpangoft2-1.0.so.0.2400.5 34d5c2b000-34d5c2d000 rw-p 0002b000 fd:00 1121 /usr/lib64/libpangoft2-1.0.so.0.2400.5 34d5e00000-34d5e46000 r-xp 00000000 fd:00 1120 /usr/lib64/libpango-1.0.so.0.2400.5 34d5e46000-34d6046000 ---p 00046000 fd:00 1120 /usr/lib64/libpango-1.0.so.0.2400.5 34d6046000-34d6049000 rw-p 00046000 fd:00 1120 /usr/lib64/libpango-1.0.so.0.2400.5 34d6200000-34d6209000 r-xp 00000000 fd:00 1128 /usr/lib64/libXcursor.so.1.0.2 34d6209000-34d6409000 ---p 00009000 fd:00 1128 /usr/lib64/libXcursor.so.1.0.2 34d6409000-34d640a000 rw-p 00009000 fd:00 1128 /usr/lib64/libXcursor.so.1.0.2 34d6600000-34d6674000 r-xp 00000000 fd:00 1119 /usr/lib64/libcairo.so.2.10800.8 34d6674000-34d6873000 ---p 00074000 fd:00 1119 /usr/lib64/libcairo.so.2.10800.8 34d6873000-34d6876000 rw-p 00073000 fd:00 1119 /usr/lib64/libcairo.so.2.10800.8 34d6a00000-34d6a02000 r-xp 00000000 fd:00 1129 /usr/lib64/libXcomposite.so.1.0.0 34d6a02000-34d6c01000 ---p 00002000 fd:00 1129 /usr/lib64/libXcomposite.so.1.0.0 34d6c01000-34d6c02000 rw-p 00001000 fd:00 1129 /usr/lib64/libXcomposite.so.1.0.0 34d6e00000-34d6e99000 r-xp 00000000 fd:00 1132 /usr/lib64/libgdk-x11-2.0.so.0.1600.5 34d6e99000-34d7099000 ---p 00099000 fd:00 1132 /usr/lib64/libgdk-x11-2.0.so.0.1600.5 34d7099000-34d709e000 rw-p 00099000 fd:00 1132 /usr/lib64/libgdk-x11-2.0.so.0.1600.5 34d7200000-34d7243000 r-xp 00000000 fd:00 1109 /usr/lib64/libpixman-1.so.0.14.0 34d7243000-34d7442000 ---p 00043000 fd:00 1109 /usr/lib64/libpixman-1.so.0.14.0 34d7442000-34d7445000 rw-p 00042000 fd:00 1109 /usr/lib64/libpixman-1.so.0.14.0 34d7600000-34d761d000 r-xp 00000000 fd:00 1131 /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5 34d761d000-34d781c000 ---p 0001d000 fd:00 1131 /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5 34d781c000-34d781d000 rw-p 0001c000 fd:00 1131 /usr/lib64/libgdk_pixbuf-2.0.so.0.1600.5 34d7a00000-34d7a08000 r-xp 00000000 fd:00 1126 /usr/lib64/libXrandr.so.2.2.0 34d7a08000-34d7c07000 ---p 00008000 fd:00 1126 /usr/lib64/libXrandr.so.2.2.0 34d7c07000-34d7c08000 rw-p 00007000 fd:00 1126 /usr/lib64/libXrandr.so.2.2.0 34d7e00000-34d7e02000 r-xp 00000000 fd:00 1130 /usr/lib64/libXdamage.so.1.1.0 34d7e02000-34d8001000 ---p 00002000 fd:00 1130 /usr/lib64/libXdamage.so.1.1.0 34d8001000-34d8002000 rw-p 00001000 fd:00 1130 /usr/lib64/libXdamage.so.1.1.0 34d8200000-34d8209000 r-xp 00000000 fd:00 1125 /usr/lib64/libXi.so.6.0.0 34d8209000-34d8409000 ---p 00009000 fd:00 1125 /usr/lib64/libXi.so.6.0.0 34d8409000-34d840a000 rw-p 00009000 fd:00 1125 /usr/lib64/libXi.so.6.0.0 34d8600000-34d8602000 r-xp 00000000 fd:00 1124 /usr/lib64/libXinerama.so.1.0.0 34d8602000-34d8801000 ---p 00002000 fd:00 1124 /usr/lib64/libXinerama.so.1.0.0 34d8801000-34d8802000 rw-p 00001000 fd:00 1124 /usr/lib64/libXinerama.so.1.0.0 34d8a00000-34d8a05000 r-xp 00000000 fd:00 1127 /usr/lib64/libXfixes.so.3.1.0 34d8a05000-34d8c04000 ---p 00005000 fd:00 1127 /usr/lib64/libXfixes.so.3.1.0 34d8c04000-34d8c05000 rw-p 00004000 fd:00 1127 /usr/lib64/libXfixes.so.3.1.0 34d8e00000-34d91d6000 r-xp 00000000 fd:00 1134 /usr/lib64/libgtk-x11-2.0.so.0.1600.5 34d91d6000-34d93d5000 ---p 003d6000 fd:00 1134 /usr/lib64/libgtk-x11-2.0.so.0.1600.5 34d93d5000-34d93e0000 rw-p 003d5000 fd:00 1134 /usr/lib64/libgtk-x11-2.0.so.0.1600.5 34d93e0000-34d93e2000 rw-p 00000000 00:00 0 34d9400000-34d941d000 r-xp 00000000 fd:00 1133 /usr/lib64/libatk-1.0.so.0.2511.1 34d941d000-34d961c000 ---p 0001d000 fd:00 1133 /usr/lib64/libatk-1.0.so.0.2511.1 34d961c000-34d961f000 rw-p 0001c000 fd:00 1133 /usr/lib64/libatk-1.0.so.0.2511.1 34d9800000-34d980b000 r-xp 00000000 fd:00 1122 /usr/lib64/libpangocairo-1.0.so.0.2400.5 34d980b000-34d9a0a000 ---p 0000b000 fd:00 1122 /usr/lib64/libpangocairo-1.0.so.0.2400.5 34d9a0a000-34d9a0b000 rw-p 0000a000 fd:00 1122 /usr/lib64/libpangocairo-1.0.so.0.2400.5 34d9c00000-34d9c20000 r-xp 00000000 fd:00 1144 /usr/lib64/libdbus-glib-1.so.2.1.0 34d9c20000-34d9e1f000 ---p 00020000 fd:00 1144 /usr/lib64/libdbus-glib-1.so.2.1.0 34d9e1f000-34d9e21000 rw-p 0001f000 fd:00 1144 /usr/lib64/libdbus-glib-1.so.2.1.0 34da000000-34da003000 r-xp 00000000 fd:00 16360 /lib64/libuuid.so.1.2 34da003000-34da203000 ---p 00003000 fd:00 16360 /lib64/libuuid.so.1.2 34da203000-34da204000 rw-p 00003000 fd:00 16360 /lib64/libuuid.so.1.2 34da800000-34da85d000 r-xp 00000000 fd:00 1145 /usr/lib64/libORBit-2.so.0.1.0 34da85d000-34daa5c000 ---p 0005d000 fd:00 1145 /usr/lib64/libORBit-2.so.0.1.0 34daa5c000-34daa6f000 rw-p 0005c000 fd:00 1145 /usr/lib64/libORBit-2.so.0.1.0 34db000000-34db039000 r-xp 00000000 fd:00 1146 /usr/lib64/libgconf-2.so.4.1.5 34db039000-34db239000 ---p 00039000 fd:00 1146 /usr/lib64/libgconf-2.so.4.1.5 34db239000-34db23e000 rw-p 00039000 fd:00 1146 /usr/lib64/libgconf-2.so.4.1.5 34db400000-34db407000 r-xp 00000000 fd:00 16361 /usr/lib64/libSM.so.6.0.0 34db407000-34db607000 ---p 00007000 fd:00 16361 /usr/lib64/libSM.so.6.0.0 34db607000-34db608000 rw-p 00007000 fd:00 16361 /usr/lib64/libSM.so.6.0.0 34db800000-34db817000 r-xp 00000000 fd:00 16359 /usr/lib64/libICE.so.6.3.0 34db817000-34dba17000 ---p 00017000 fd:00 16359 /usr/lib64/libICE.so.6.3.0 34dba17000-34dba18000 rw-p 00017000 fd:00 16359 /usr/lib64/libICE.so.6.3.0 34dba18000-34dba1c000 rw-p 00000000 00:00 0 34dd000000-34dd019000 r-xp 00000000 fd:00 1139 /lib64/libgcc_s-4.4.1-20090729.so.1 34dd019000-34dd219000 ---p 00019000 fd:00 1139 /lib64/libgcc_s-4.4.1-20090729.so.1 34dd219000-34dd21a000 rw-p 00019000 fd:00 1139 /lib64/libgcc_s-4.4.1-20090729.so.1 34e0000000-34e0005000 r-xp 00000000 fd:00 26294 /usr/lib64/libXtst.so.6.1.0 34e0005000-34e0205000 ---p 00005000 fd:00 26294 /usr/lib64/libXtst.so.6.1.0 34e0205000-34e0206000 rw-p 00005000 fd:00 26294 /usr/lib64/libXtst.so.6.1.0 34e5000000-34e5018000 r-xp 00000000 fd:00 29867 /usr/lib64/libpolkit.so.2.0.0 34e5018000-34e5218000 ---p 00018000 fd:00 29867 /usr/lib64/libpolkit.so.2.0.0 34e5218000-34e5219000 rw-p 00018000 fd:00 29867 /usr/lib64/libpolkit.so.2.0.0 34e5800000-34e5805000 r-xp 00000000 fd:00 29887 /usr/lib64/libogg.so.0.5.3 34e5805000-34e5a04000 ---p 00005000 fd:00 29887 /usr/lib64/libogg.so.0.5.3 34e5a04000-34e5a05000 rw-p 00004000 fd:00 29887 /usr/lib64/libogg.so.0.5.3 34e6400000-34e6408000 r-xp 00000000 fd:00 1177 /usr/lib64/libltdl.so.7.2.0 34e6408000-34e6608000 ---p 00008000 fd:00 1177 /usr/lib64/libltdl.so.7.2.0 34e6608000-34e6609000 rw-p 00008000 fd:00 1177 /usr/lib64/libltdl.so.7.2.0 34e7400000-34e740c000 r-xp 00000000 fd:00 29868 /usr/lib64/libpolkit-dbus.so.2.0.0 34e740c000-34e760b000 ---p 0000c000 fd:00 29868 /usr/lib64/libpolkit-dbus.so.2.0.0 34e760b000-34e760c000 rw-p 0000b000 fd:00 29868 /usr/lib64/libpolkit-dbus.so.2.0.0 34e7800000-34e781f000 r-xp 00000000 fd:00 29888 /usr/lib64/libvorbis.so.0.4.0 34e781f000-34e7a1e000 ---p 0001f000 fd:00 29888 /usr/lib64/libvorbis.so.0.4.0 34e7a1e000-34e7a2d000 rw-p 0001e000 fd:00 29888 /usr/lib64/libvorbis.so.0.4.0 34e7c00000-34e7c0a000 r-xp 00000000 fd:00 29869 /usr/lib64/libpolkit-grant.so.2.0.0 34e7c0a000-34e7e09000 ---p 0000a000 fd:00 29869 /usr/lib64/libpolkit-grant.so.2.0.0 34e7e09000-34e7e0a000 rw-p 00009000 fd:00 29869 /usr/lib64/libpolkit-grant.so.2.0.0 34e8000000-34e8003000 r-xp 00000000 fd:00 29892 /usr/lib64/libcanberra-gtk.so.0.0.5 34e8003000-34e8203000 ---p 00003000 fd:00 29892 /usr/lib64/libcanberra-gtk.so.0.0.5 34e8203000-34e8204000 rw-p 00003000 fd:00 29892 /usr/lib64/libcanberra-gtk.so.0.0.5 34e8800000-34e880f000 r-xp 00000000 fd:00 29891 /usr/lib64/libcanberra.so.0.1.5 34e880f000-34e8a0e000 ---p 0000f000 fd:00 29891 /usr/lib64/libcanberra.so.0.1.5 34e8a0e000-34e8a0f000 rw-p 0000e000 fd:00 29891 /usr/lib64/libcanberra.so.0.1.5 34e9000000-34e9007000 r-xp 00000000 fd:00 29889 /usr/lib64/libvorbisfile.so.3.2.0 34e9007000-34e9206000 ---p 00007000 fd:00 29889 /usr/lib64/libvorbisfile.so.3.2.0 34e9206000-34e9207000 rw-p 00006000 fd:00 29889 /usr/lib64/libvorbisfile.so.3.2.0 34e9400000-34e940d000 r-xp 00000000 fd:00 29890 /usr/lib64/libtdb.so.1.1.5 34e940d000-34e960c000 ---p 0000d000 fd:00 29890 /usr/lib64/libtdb.so.1.1.5 34e960c000-34e960d000 rw-p 0000c000 fd:00 29890 /usr/lib64/libtdb.so.1.1.5 34e9c00000-34e9c0a000 r-xp 00000000 fd:00 29870 /usr/lib64/libpolkit-gnome.so.0.0.0 34e9c0a000-34e9e0a000 ---p 0000a000 fd:00 29870 /usr/lib64/libpolkit-gnome.so.0.0.0 34e9e0a000-34e9e0b000 rw-p 0000a000 fd:00 29870 /usr/lib64/libpolkit-gnome.so.0.0.0 3d14400000-3d14541000 r-xp 00000000 fd:00 114 /usr/lib64/libxml2.so.2.7.6 3d14541000-3d14740000 ---p 00141000 fd:00 114 /usr/lib64/libxml2.so.2.7.6 3d14740000-3d1474a000 rw-p 00140000 fd:00 114 /usr/lib64/libxml2.so.2.7.6 3d1474a000-3d1474b000 rw-p 00000000 00:00 0 3d14c00000-3d14c18000 r-xp 00000000 fd:00 48785 /usr/lib64/libglade-2.0.so.0.0.7 3d14c18000-3d14e17000 ---p 00018000 fd:00 48785 /usr/lib64/libglade-2.0.so.0.0.7 3d14e17000-3d14e19000 rw-p 00017000 fd:00 48785 /usr/lib64/libglade-2.0.so.0.0.7 3d16800000-3d168ed000 r-xp 00000000 fd:00 22864 /usr/lib64/libstdc++.so.6.0.12 3d168ed000-3d16aec000 ---p 000ed000 fd:00 22864 /usr/lib64/libstdc++.so.6.0.12 3d16aec000-3d16af3000 r--p 000ec000 fd:00 22864 /usr/lib64/libstdc++.so.6.0.12 3d16af3000-3d16af5000 rw-p 000f3000 fd:00 22864 /usr/lib64/libstdc++.so.6.0.12 3d16af5000-3d16b0a000 rw-p 00000000 00:00 0 7f05a3fae000-7f05a3fc1000 r-xp 00000000 fd:00 22909 /usr/lib64/libelf-0.142.so 7f05a3fc1000-7f05a41c0000 ---p 00013000 fd:00 22909 /usr/lib64/libelf-0.142.so 7f05a41c0000-7f05a41c1000 r--p 00012000 fd:00 22909 /usr/lib64/libelf-0.142.so 7f05a41c1000-7f05a41c2000 rw-p 00013000 fd:00 22909 /usr/lib64/libelf-0.142.so 7f05a41d4000-7f05a41d7000 r-xp 00000000 fd:00 116786 /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so 7f05a41d7000-7f05a43d6000 ---p 00003000 fd:00 116786 /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so 7f05a43d6000-7f05a43d7000 rw-p 00002000 fd:00 116786 /usr/lib64/gtk-2.0/modules/libgnomebreakpad.so 7f05a43d7000-7f05a43db000 r-xp 00000000 fd:00 40602 /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so 7f05a43db000-7f05a45db000 ---p 00004000 fd:00 40602 /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so 7f05a45db000-7f05a45dc000 rw-p 00004000 fd:00 40602 /usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so 7f05a45dc000-7f05a45df000 r-xp 00000000 fd:00 82244 /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so 7f05a45df000-7f05a47de000 ---p 00003000 fd:00 82244 /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so 7f05a47de000-7f05a47df000 rw-p 00002000 fd:00 82244 /usr/lib64/gtk-2.0/modules/libpk-gtk-module.so 7f05a47df000-7f05a47fb000 r--p 00000000 fd:00 14540 /usr/share/locale/ja/LC_MESSAGES/libc.mo 7f05a47fb000-7f05a480d000 r-xp 00000000 fd:00 53032 /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so 7f05a480d000-7f05a4a0d000 ---p 00012000 fd:00 53032 /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so 7f05a4a0d000-7f05a4a0e000 rw-p 00012000 fd:00 53032 /usr/lib64/gtk-2.0/2.10.0/engines/libnodoka.so 7f05a4a0e000-7f05a4a0f000 ---p 00000000 00:00 0 7f05a4a0f000-7f05a520f000 rw-p 00000000 00:00 0 7f05a520f000-7f05a521b000 r--p 00000000 fd:00 21639 /usr/share/locale/ja/LC_MESSAGES/glib20.mo 7f05a521b000-7f05a5227000 r-xp 00000000 fd:00 12418 /lib64/libnss_files-2.10.1.so 7f05a5227000-7f05a5426000 ---p 0000c000 fd:00 12418 /lib64/libnss_files-2.10.1.so 7f05a5426000-7f05a5427000 r--p 0000b000 fd:00 12418 /lib64/libnss_files-2.10.1.so 7f05a5427000-7f05a5428000 rw-p 0000c000 fd:00 12418 /lib64/libnss_files-2.10.1.so 7f05a5428000-7f05a543a000 r--p 00000000 fd:00 25291 /usr/share/locale/ja/LC_MESSAGES/GConf2.mo 7f05a543a000-7f05a544e000 r--p 00000000 fd:00 40242 /usr/share/locale/ja/LC_MESSAGES/gtk20.mo 7f05a544e000-7f05aa520000 r--p 00000000 fd:00 14558 /usr/lib/locale/locale-archive 7f05aa520000-7f05aa538000 rw-p 00000000 00:00 0 7f05aa53f000-7f05aa546000 r--s 00000000 fd:00 12712 /usr/lib64/gconv/gconv-modules.cache 7f05aa546000-7f05aa54a000 r--p 00000000 fd:00 110980 /usr/share/locale/ja/LC_MESSAGES/gnome-session-2.0.mo 7f05aa54a000-7f05aa54c000 rw-p 00000000 00:00 0 7fff45b42000-7fff45b57000 rw-p 00000000 00:00 0 [stack] 7fff45be4000-7fff45be5000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 6:10 ` KOSAKI Motohiro @ 2009-10-27 6:34 ` Minchan Kim 2009-10-27 6:36 ` KAMEZAWA Hiroyuki 2009-10-27 6:46 ` KOSAKI Motohiro 0 siblings, 2 replies; 77+ messages in thread From: Minchan Kim @ 2009-10-27 6:34 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel On Tue, 27 Oct 2009 15:10:52 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>: > > On Mon, 26 Oct 2009 17:16:14 +0100 > > Vedran FuraA? <vedran.furac@gmail.com> wrote: > >> > A - Could you show me /var/log/dmesg and /var/log/messages at OOM ? > >> > >> It was catastrophe. :) X crashed (or killed) with all the programs, but > >> my little program was alive for 20 minutes (see timestamps). And for > >> that time computer was completely unusable. Couldn't even get the > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive, > >> oomk usually kills it also. See for yourself: > >> > >> dmesg: http://pastebin.com/f3f83738a > >> messages: http://pastebin.com/f2091110a > >> > >> (CCing to lklm again... I just want people to see the logs.) > >> > > Thank you for reporting and your patience. It seems something strange > > that your KDE programs are killed. I agree. > > > > I attached a scirpt for checking oom_score of all exisiting process. > > (oom_score is a value used for selecting "bad" processs.") > > please run if you have time. > > > > This is a result of my own desktop(on virtual machine.) > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running. > > > > %check_badness.pl | sort -n | tail > > -- > > 89924 A 3938 A A mixer_applet2 > > 90210 A 3942 A A tomboy > > 94753 A 3936 A A clock-applet > > 101994 A 3919 A A pulseaudio > > 113525 A 4028 A A gnome-terminal > > 127340 A 1 A A A init > > 128177 A 3871 A A nautilus > > 151003 A 11515 A bash > > 256944 A 11653 A mmap > > 425561 A 3829 A A gnome-session > > -- > > Sigh, gnome-session has twice value of mmap(1G). > > Of course, gnome-session only uses 6M bytes of anon. > > I wonder this is because gnome-session has many children..but need to > > dig more. Does anyone has idea ? > > (CCed kosaki) > > Following output address the issue. > The fact is, modern desktop application linked pretty many library. it > makes bloat VSS size and increase > OOM score. > > Ideally, We shouldn't account evictable file-backed mappings for oom_score. > Hmm. I wonder why we consider VM size for OOM kiling. How about RSS size? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 6:34 ` Minchan Kim @ 2009-10-27 6:36 ` KAMEZAWA Hiroyuki 2009-10-27 6:55 ` Minchan Kim 2009-10-27 6:46 ` KOSAKI Motohiro 1 sibling, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-27 6:36 UTC (permalink / raw) To: Minchan Kim; +Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel On Tue, 27 Oct 2009 15:34:29 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > On Tue, 27 Oct 2009 15:10:52 +0900 > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>: > > > On Mon, 26 Oct 2009 17:16:14 +0100 > > > Vedran FuraA? <vedran.furac@gmail.com> wrote: > > >> > A - Could you show me /var/log/dmesg and /var/log/messages at OOM ? > > >> > > >> It was catastrophe. :) X crashed (or killed) with all the programs, but > > >> my little program was alive for 20 minutes (see timestamps). And for > > >> that time computer was completely unusable. Couldn't even get the > > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by > > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive, > > >> oomk usually kills it also. See for yourself: > > >> > > >> dmesg: http://pastebin.com/f3f83738a > > >> messages: http://pastebin.com/f2091110a > > >> > > >> (CCing to lklm again... I just want people to see the logs.) > > >> > > > Thank you for reporting and your patience. It seems something strange > > > that your KDE programs are killed. I agree. > > > > > > I attached a scirpt for checking oom_score of all exisiting process. > > > (oom_score is a value used for selecting "bad" processs.") > > > please run if you have time. > > > > > > This is a result of my own desktop(on virtual machine.) > > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running. > > > > > > %check_badness.pl | sort -n | tail > > > -- > > > 89924 A 3938 A A mixer_applet2 > > > 90210 A 3942 A A tomboy > > > 94753 A 3936 A A clock-applet > > > 101994 A 3919 A A pulseaudio > > > 113525 A 4028 A A gnome-terminal > > > 127340 A 1 A A A init > > > 128177 A 3871 A A nautilus > > > 151003 A 11515 A bash > > > 256944 A 11653 A mmap > > > 425561 A 3829 A A gnome-session > > > -- > > > Sigh, gnome-session has twice value of mmap(1G). > > > Of course, gnome-session only uses 6M bytes of anon. > > > I wonder this is because gnome-session has many children..but need to > > > dig more. Does anyone has idea ? > > > (CCed kosaki) > > > > Following output address the issue. > > The fact is, modern desktop application linked pretty many library. it > > makes bloat VSS size and increase > > OOM score. > > > > Ideally, We shouldn't account evictable file-backed mappings for oom_score. > > > Hmm. > I wonder why we consider VM size for OOM kiling. > How about RSS size? > Maybe the current code assumes "Tons of swap have been generated, already" if oom-kill is invoked. Then, just using mm->anon_rss will not be correct. Hm, should we count # of swap entries reference from mm ?.... Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 6:36 ` KAMEZAWA Hiroyuki @ 2009-10-27 6:55 ` Minchan Kim 2009-10-27 7:45 ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: Minchan Kim @ 2009-10-27 6:55 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel On Tue, Oct 27, 2009 at 3:36 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 27 Oct 2009 15:34:29 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > >> On Tue, 27 Oct 2009 15:10:52 +0900 >> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> >> > 2009/10/27 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>: >> > > On Mon, 26 Oct 2009 17:16:14 +0100 >> > > Vedran Furač <vedran.furac@gmail.com> wrote: >> > >> > - Could you show me /var/log/dmesg and /var/log/messages at OOM ? >> > >> >> > >> It was catastrophe. :) X crashed (or killed) with all the programs, but >> > >> my little program was alive for 20 minutes (see timestamps). And for >> > >> that time computer was completely unusable. Couldn't even get the >> > >> console via ssh. Rally embarrassing for a modern OS to get destroyed by >> > >> a 5 lines of C run as an ordinary user. Luckily screen was still alive, >> > >> oomk usually kills it also. See for yourself: >> > >> >> > >> dmesg: http://pastebin.com/f3f83738a >> > >> messages: http://pastebin.com/f2091110a >> > >> >> > >> (CCing to lklm again... I just want people to see the logs.) >> > >> >> > > Thank you for reporting and your patience. It seems something strange >> > > that your KDE programs are killed. I agree. >> > > >> > > I attached a scirpt for checking oom_score of all exisiting process. >> > > (oom_score is a value used for selecting "bad" processs.") >> > > please run if you have time. >> > > >> > > This is a result of my own desktop(on virtual machine.) >> > > In this environ (Total memory is 1.6GBytes), mmap(1G) program is running. >> > > >> > > %check_badness.pl | sort -n | tail >> > > -- >> > > 89924 3938 mixer_applet2 >> > > 90210 3942 tomboy >> > > 94753 3936 clock-applet >> > > 101994 3919 pulseaudio >> > > 113525 4028 gnome-terminal >> > > 127340 1 init >> > > 128177 3871 nautilus >> > > 151003 11515 bash >> > > 256944 11653 mmap >> > > 425561 3829 gnome-session >> > > -- >> > > Sigh, gnome-session has twice value of mmap(1G). >> > > Of course, gnome-session only uses 6M bytes of anon. >> > > I wonder this is because gnome-session has many children..but need to >> > > dig more. Does anyone has idea ? >> > > (CCed kosaki) >> > >> > Following output address the issue. >> > The fact is, modern desktop application linked pretty many library. it >> > makes bloat VSS size and increase >> > OOM score. >> > >> > Ideally, We shouldn't account evictable file-backed mappings for oom_score. >> > >> Hmm. >> I wonder why we consider VM size for OOM kiling. >> How about RSS size? >> > > Maybe the current code assumes "Tons of swap have been generated, already" if > oom-kill is invoked. Then, just using mm->anon_rss will not be correct. > > Hm, should we count # of swap entries reference from mm ?.... In Vedran case, he didn't use swap. So, Only considering vm is the problem. I think it would be better to consider both RSS + # of swap entries as Kosaki mentioned. > > Regards, > -Kame > > > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 6:55 ` Minchan Kim @ 2009-10-27 7:45 ` KAMEZAWA Hiroyuki 2009-10-27 7:56 ` Minchan Kim ` (3 more replies) 0 siblings, 4 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-27 7:45 UTC (permalink / raw) To: Minchan Kim Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 15:55:26 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > >> Hmm. > >> I wonder why we consider VM size for OOM kiling. > >> How about RSS size? > >> > > > > Maybe the current code assumes "Tons of swap have been generated, already" if > > oom-kill is invoked. Then, just using mm->anon_rss will not be correct. > > > > Hm, should we count # of swap entries reference from mm ?.... > > In Vedran case, he didn't use swap. So, Only considering vm is the problem. > I think it would be better to consider both RSS + # of swap entries as > Kosaki mentioned. > Then, maybe this kind of patch is necessary. This is on 2.6.31...then I may have to rebase this to mmotom. Added more CCs. Vedran, I'm glad if you can test this patch. == Now, oom-killer's score uses mm->total_vm as its base value. But, in these days, applications like GUI program tend to use much shared libraries and total_vm grows too high even when pages are not fully mapped. For example, running a program "mmap" which allocates 1 GBbytes of anonymous memory, oom_score top 10 on system will be.. score PID name 89924 3938 mixer_applet2 90210 3942 tomboy 94753 3936 clock-applet 101994 3919 pulseaudio 113525 4028 gnome-terminal 127340 1 init 128177 3871 nautilus 151003 11515 bash 256944 11653 mmap <-----------------use 1G of anon 425561 3829 gnome-session No one believes gnome-session is more guilty than "mmap". Instead of total_vm, we should use anon/file/swap usage of a process, I think. This patch adds mm->swap_usage and calculate oom_score based on anon_rss + file_rss + swap_usage. Considering usual applications, this will be much better information than total_vm. After this patch, the score on my desktop is score PID name 4033 3176 gnome-panel 4077 3113 xinit 4526 3190 python 4820 3161 gnome-settings- 4989 3289 gnome-terminal 7105 3271 tomboy 8427 3177 nautilus 17549 3140 gnome-session 128501 3299 bash 256106 3383 mmap This order is not bad, I think. Note: This adss new counter...then new cost is added. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> --- include/linux/mm_types.h | 1 + mm/memory.c | 29 +++++++++++++++++++++-------- mm/oom_kill.c | 12 +++++++++--- mm/rmap.c | 1 + mm/swapfile.c | 1 + 5 files changed, 33 insertions(+), 11 deletions(-) Index: linux-2.6.31/include/linux/mm_types.h =================================================================== --- linux-2.6.31.orig/include/linux/mm_types.h +++ linux-2.6.31/include/linux/mm_types.h @@ -228,6 +228,7 @@ struct mm_struct { */ mm_counter_t _file_rss; mm_counter_t _anon_rss; + mm_counter_t _swap_usage; unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ Index: linux-2.6.31/mm/memory.c =================================================================== --- linux-2.6.31.orig/mm/memory.c +++ linux-2.6.31/mm/memory.c @@ -361,12 +361,15 @@ int __pte_alloc_kernel(pmd_t *pmd, unsig return 0; } -static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss) +static inline +void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss, int swaps) { if (file_rss) add_mm_counter(mm, file_rss, file_rss); if (anon_rss) add_mm_counter(mm, anon_rss, anon_rss); + if (swaps) + add_mm_counter(mm, swap_usage, swaps); } /* @@ -562,6 +565,8 @@ copy_one_pte(struct mm_struct *dst_mm, s &src_mm->mmlist); spin_unlock(&mmlist_lock); } + if (!is_migration_entry(entry)) + rss[2]++; if (is_write_migration_entry(entry) && is_cow_mapping(vm_flags)) { /* @@ -611,10 +616,10 @@ static int copy_pte_range(struct mm_stru pte_t *src_pte, *dst_pte; spinlock_t *src_ptl, *dst_ptl; int progress = 0; - int rss[2]; + int rss[3]; again: - rss[1] = rss[0] = 0; + rss[2] = rss[1] = rss[0] = 0; dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl); if (!dst_pte) return -ENOMEM; @@ -645,7 +650,7 @@ again: arch_leave_lazy_mmu_mode(); spin_unlock(src_ptl); pte_unmap_nested(src_pte - 1); - add_mm_rss(dst_mm, rss[0], rss[1]); + add_mm_rss(dst_mm, rss[0], rss[1], rss[2]); pte_unmap_unlock(dst_pte - 1, dst_ptl); cond_resched(); if (addr != end) @@ -769,6 +774,7 @@ static unsigned long zap_pte_range(struc spinlock_t *ptl; int file_rss = 0; int anon_rss = 0; + int swaps = 0; pte = pte_offset_map_lock(mm, pmd, addr, &ptl); arch_enter_lazy_mmu_mode(); @@ -838,13 +844,19 @@ static unsigned long zap_pte_range(struc if (pte_file(ptent)) { if (unlikely(!(vma->vm_flags & VM_NONLINEAR))) print_bad_pte(vma, addr, ptent, NULL); - } else if - (unlikely(!free_swap_and_cache(pte_to_swp_entry(ptent)))) - print_bad_pte(vma, addr, ptent, NULL); + } else { + swp_entry_t entry = pte_to_swp_entry(ptent); + + if (!is_migration_entry(entry)) + swaps++; + + if (unlikely(!free_swap_and_cache(entry))) + print_bad_pte(vma, addr, ptent, NULL); + } pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); } while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0)); - add_mm_rss(mm, file_rss, anon_rss); + add_mm_rss(mm, file_rss, anon_rss, swaps); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); @@ -2573,6 +2585,7 @@ static int do_swap_page(struct mm_struct */ inc_mm_counter(mm, anon_rss); + dec_mm_counter(mm, swap_usage); pte = mk_pte(page, vma->vm_page_prot); if ((flags & FAULT_FLAG_WRITE) && reuse_swap_page(page)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); Index: linux-2.6.31/mm/rmap.c =================================================================== --- linux-2.6.31.orig/mm/rmap.c +++ linux-2.6.31/mm/rmap.c @@ -834,6 +834,7 @@ static int try_to_unmap_one(struct page spin_unlock(&mmlist_lock); } dec_mm_counter(mm, anon_rss); + inc_mm_counter(mm, swap_usage); } else if (PAGE_MIGRATION) { /* * Store the pfn of the page in a special migration Index: linux-2.6.31/mm/swapfile.c =================================================================== --- linux-2.6.31.orig/mm/swapfile.c +++ linux-2.6.31/mm/swapfile.c @@ -830,6 +830,7 @@ static int unuse_pte(struct vm_area_stru } inc_mm_counter(vma->vm_mm, anon_rss); + dec_mm_counter(vma->vm_mm, swap_usage); get_page(page); set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); Index: linux-2.6.31/mm/oom_kill.c =================================================================== --- linux-2.6.31.orig/mm/oom_kill.c +++ linux-2.6.31/mm/oom_kill.c @@ -69,7 +69,8 @@ unsigned long badness(struct task_struct /* * The memory size of the process is the basis for the badness. */ - points = mm->total_vm; + points = get_mm_counter(mm, anon_rss) + get_mm_counter(mm, file_rss) + + get_mm_counter(mm, swap_usage); /* * After this unlock we can no longer dereference local variable `mm' @@ -92,8 +93,13 @@ unsigned long badness(struct task_struct */ list_for_each_entry(child, &p->children, sibling) { task_lock(child); - if (child->mm != mm && child->mm) - points += child->mm->total_vm/2 + 1; + if (child->mm != mm && child->mm) { + unsigned long cpoint; + /* At considering child, we don't count swap */ + cpoint = get_mm_counter(child->mm, anon_rss) + + get_mm_counter(child->mm, file_rss); + points += cpoint/2 + 1; + } task_unlock(child); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 7:45 ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki @ 2009-10-27 7:56 ` Minchan Kim 2009-10-27 12:38 ` Andrea Arcangeli 2009-10-27 7:56 ` KAMEZAWA Hiroyuki ` (2 subsequent siblings) 3 siblings, 1 reply; 77+ messages in thread From: Minchan Kim @ 2009-10-27 7:56 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 16:45:26 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 27 Oct 2009 15:55:26 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > > >> Hmm. > > >> I wonder why we consider VM size for OOM kiling. > > >> How about RSS size? > > >> > > > > > > Maybe the current code assumes "Tons of swap have been generated, already" if > > > oom-kill is invoked. Then, just using mm->anon_rss will not be correct. > > > > > > Hm, should we count # of swap entries reference from mm ?.... > > > > In Vedran case, he didn't use swap. So, Only considering vm is the problem. > > I think it would be better to consider both RSS + # of swap entries as > > Kosaki mentioned. > > > Then, maybe this kind of patch is necessary. > This is on 2.6.31...then I may have to rebase this to mmotom. > Added more CCs. > > Vedran, I'm glad if you can test this patch. > > > == > Now, oom-killer's score uses mm->total_vm as its base value. > But, in these days, applications like GUI program tend to use > much shared libraries and total_vm grows too high even when > pages are not fully mapped. > > For example, running a program "mmap" which allocates 1 GBbytes of > anonymous memory, oom_score top 10 on system will be.. > > score PID name > 89924 3938 mixer_applet2 > 90210 3942 tomboy > 94753 3936 clock-applet > 101994 3919 pulseaudio > 113525 4028 gnome-terminal > 127340 1 init > 128177 3871 nautilus > 151003 11515 bash > 256944 11653 mmap <-----------------use 1G of anon > 425561 3829 gnome-session > > No one believes gnome-session is more guilty than "mmap". > > Instead of total_vm, we should use anon/file/swap usage of a process, I think. > This patch adds mm->swap_usage and calculate oom_score based on > anon_rss + file_rss + swap_usage. > Considering usual applications, this will be much better information than > total_vm. After this patch, the score on my desktop is > > score PID name > 4033 3176 gnome-panel > 4077 3113 xinit > 4526 3190 python > 4820 3161 gnome-settings- > 4989 3289 gnome-terminal > 7105 3271 tomboy > 8427 3177 nautilus > 17549 3140 gnome-session > 128501 3299 bash > 256106 3383 mmap > > This order is not bad, I think. > > Note: This adss new counter...then new cost is added. > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Thanks for making the patch. Let's hear other's opinion. :) -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 7:56 ` Minchan Kim @ 2009-10-27 12:38 ` Andrea Arcangeli 2009-10-28 0:22 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: Andrea Arcangeli @ 2009-10-27 12:38 UTC (permalink / raw) To: Minchan Kim Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, Oct 27, 2009 at 04:56:12PM +0900, Minchan Kim wrote: > Thanks for making the patch. > Let's hear other's opinion. :) total_vm is nearly meaningless, especially on 64bit that reduces the mmap load on libs, I tried to change it to something "physical" (rss, didn't add swap too) some time ago too, not sure why I didn't manage to get it in. Trying again surely sounds good. Accounting swap isn't necessarily good, we may be killing a task that isn't accessing memory at all. So yes, we free swap but if the task is the "bloater" it's unlikely to be all in swap as it did all recent activity that lead to the oom. So I'm unsure if swap is good to account here, but surely I ack to replace virtual with rss. I would include the whole rss, as the file one may also be rendered unswappable if it is accessed in a loop refreshing the young bit all the time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 12:38 ` Andrea Arcangeli @ 2009-10-28 0:22 ` KAMEZAWA Hiroyuki 2009-10-28 0:45 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 0:22 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 13:38:10 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote: > On Tue, Oct 27, 2009 at 04:56:12PM +0900, Minchan Kim wrote: > > Thanks for making the patch. > > Let's hear other's opinion. :) > > total_vm is nearly meaningless, especially on 64bit that reduces the > mmap load on libs, I tried to change it to something "physical" (rss, > didn't add swap too) some time ago too, not sure why I didn't manage > to get it in. Trying again surely sounds good. Accounting swap isn't > necessarily good, we may be killing a task that isn't accessing memory > at all. So yes, we free swap but if the task is the "bloater" it's > unlikely to be all in swap as it did all recent activity that lead to > the oom. So I'm unsure if swap is good to account here, but surely I > ack to replace virtual with rss. I would include the whole rss, as the > file one may also be rendered unswappable if it is accessed in a loop > refreshing the young bit all the time. > I wonder I'll acccounting swap and export it via /proc/<pid>/??? file. So, I'll divide this patch into 2 part as swap accounting/oom patch. Considering amount of swap at oom isn't very bad, I think. But using the same weight to rss and swap is not good, maybe. Hmm, maybe anon_rss + file_rss/2 + swap_usage/4 + kosaki's time accounting change can give us some better value. I'll consider what number is logical and technically correct, again. I'll prepare series of 2-4? patches. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-28 0:22 ` KAMEZAWA Hiroyuki @ 2009-10-28 0:45 ` Vedran Furač 0 siblings, 0 replies; 77+ messages in thread From: Vedran Furač @ 2009-10-28 0:45 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Andrea Arcangeli, Minchan Kim, KOSAKI Motohiro, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes KAMEZAWA Hiroyuki wrote: > Hmm, maybe > anon_rss + file_rss/2 + swap_usage/4 + kosaki's time accounting change > can give us some better value. I'll consider what number is logical and > technically correct, again. Although my vote doesn't count, from my experience, this formula sounds like optimal solution. Thanks, hope it gets accepted! Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 7:45 ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki 2009-10-27 7:56 ` Minchan Kim @ 2009-10-27 7:56 ` KAMEZAWA Hiroyuki 2009-10-27 8:14 ` Minchan Kim 2009-10-27 17:41 ` Vedran Furač 2009-10-27 18:39 ` Hugh Dickins 3 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-27 7:56 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 16:45:26 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: /* > * After this unlock we can no longer dereference local variable `mm' > @@ -92,8 +93,13 @@ unsigned long badness(struct task_struct > */ > list_for_each_entry(child, &p->children, sibling) { > task_lock(child); > - if (child->mm != mm && child->mm) > - points += child->mm->total_vm/2 + 1; > + if (child->mm != mm && child->mm) { > + unsigned long cpoint; > + /* At considering child, we don't count swap */ > + cpoint = get_mm_counter(child->mm, anon_rss) + > + get_mm_counter(child->mm, file_rss); > + points += cpoint/2 + 1; > + } > task_unlock(child); BTW, I'd like to get rid of this code. Can't we use other techniques for detecting fork-bomb ? This check can't catch following type, anyway. fork() -> fork() -> fork() -> fork() .... but I have no good idea. What is the difference with task-launcher and fork bomb()... Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 7:56 ` KAMEZAWA Hiroyuki @ 2009-10-27 8:14 ` Minchan Kim 2009-10-27 8:33 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: Minchan Kim @ 2009-10-27 8:14 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 16:56:28 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 27 Oct 2009 16:45:26 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > /* > > * After this unlock we can no longer dereference local variable `mm' > > @@ -92,8 +93,13 @@ unsigned long badness(struct task_struct > > */ > > list_for_each_entry(child, &p->children, sibling) { > > task_lock(child); > > - if (child->mm != mm && child->mm) > > - points += child->mm->total_vm/2 + 1; > > + if (child->mm != mm && child->mm) { > > + unsigned long cpoint; > > + /* At considering child, we don't count swap */ > > + cpoint = get_mm_counter(child->mm, anon_rss) + > > + get_mm_counter(child->mm, file_rss); > > + points += cpoint/2 + 1; > > + } > > task_unlock(child); > > BTW, I'd like to get rid of this code. > > Can't we use other techniques for detecting fork-bomb ? > > This check can't catch following type, anyway. > > fork() > -> fork() > -> fork() > -> fork() > .... > > but I have no good idea. > What is the difference with task-launcher and fork bomb()... > I think it's good as-is. Kernel is hard to know it by effiecient method. It depends on applications. so Doesnt's task-launcher like gnome-session have to control his oom_score? Welcome to any ideas if kernel can do it well. > Thanks, > -Kame > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 8:14 ` Minchan Kim @ 2009-10-27 8:33 ` KAMEZAWA Hiroyuki 2009-10-27 8:52 ` Minchan Kim 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-27 8:33 UTC (permalink / raw) To: Minchan Kim Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 17:14:41 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > On Tue, 27 Oct 2009 16:56:28 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > On Tue, 27 Oct 2009 16:45:26 +0900 > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > /* > > > * After this unlock we can no longer dereference local variable `mm' > > > @@ -92,8 +93,13 @@ unsigned long badness(struct task_struct > > > */ > > > list_for_each_entry(child, &p->children, sibling) { > > > task_lock(child); > > > - if (child->mm != mm && child->mm) > > > - points += child->mm->total_vm/2 + 1; > > > + if (child->mm != mm && child->mm) { > > > + unsigned long cpoint; > > > + /* At considering child, we don't count swap */ > > > + cpoint = get_mm_counter(child->mm, anon_rss) + > > > + get_mm_counter(child->mm, file_rss); > > > + points += cpoint/2 + 1; > > > + } > > > task_unlock(child); > > > > BTW, I'd like to get rid of this code. > > > > Can't we use other techniques for detecting fork-bomb ? > > > > This check can't catch following type, anyway. > > > > fork() > > -> fork() > > -> fork() > > -> fork() > > .... > > > > but I have no good idea. > > What is the difference with task-launcher and fork bomb()... > > > > I think it's good as-is. > Kernel is hard to know it by effiecient method. > It depends on applications. so Doesnt's task-launcher > like gnome-session have to control his oom_score? > > Welcome to any ideas if kernel can do it well. > Hmmm, check system-wide fork/sec and fork-depth ? Maybe not difficult to calculate.. Regards, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 8:33 ` KAMEZAWA Hiroyuki @ 2009-10-27 8:52 ` Minchan Kim 2009-10-27 8:56 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: Minchan Kim @ 2009-10-27 8:52 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 17:33:08 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 27 Oct 2009 17:14:41 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > > > On Tue, 27 Oct 2009 16:56:28 +0900 > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > > On Tue, 27 Oct 2009 16:45:26 +0900 > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > /* > > > > * After this unlock we can no longer dereference local variable `mm' > > > > @@ -92,8 +93,13 @@ unsigned long badness(struct task_struct > > > > */ > > > > list_for_each_entry(child, &p->children, sibling) { > > > > task_lock(child); > > > > - if (child->mm != mm && child->mm) > > > > - points += child->mm->total_vm/2 + 1; > > > > + if (child->mm != mm && child->mm) { > > > > + unsigned long cpoint; > > > > + /* At considering child, we don't count swap */ > > > > + cpoint = get_mm_counter(child->mm, anon_rss) + > > > > + get_mm_counter(child->mm, file_rss); > > > > + points += cpoint/2 + 1; > > > > + } > > > > task_unlock(child); > > > > > > BTW, I'd like to get rid of this code. > > > > > > Can't we use other techniques for detecting fork-bomb ? > > > > > > This check can't catch following type, anyway. > > > > > > fork() > > > -> fork() > > > -> fork() > > > -> fork() > > > .... > > > > > > but I have no good idea. > > > What is the difference with task-launcher and fork bomb()... > > > > > > > I think it's good as-is. > > Kernel is hard to know it by effiecient method. > > It depends on applications. so Doesnt's task-launcher > > like gnome-session have to control his oom_score? > > > > Welcome to any ideas if kernel can do it well. > > > Hmmm, check system-wide fork/sec and fork-depth ? Maybe not difficult to calculate.. Yes. We can do anything to achieve the goal in kernel. Maybe check the time or fork-depth counting. What I have a concern is how we can do it nicely if it is a serious problem in kernel. ;) I think most of program which have many child are victims of OOM killing. It make sense to me. There is some cases to not make sense like task-launcher. So I think if task-launcher which is very rare and special program can change oom_adj by itself, it's good than thing that add new heuristic in kernel. It's just my opinon. :) > Regards, > -Kame > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 8:52 ` Minchan Kim @ 2009-10-27 8:56 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-27 8:56 UTC (permalink / raw) To: Minchan Kim Cc: KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 17:52:43 +0900 Minchan Kim <minchan.kim@gmail.com> wrote: > On Tue, 27 Oct 2009 17:33:08 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > On Tue, 27 Oct 2009 17:14:41 +0900 > > Minchan Kim <minchan.kim@gmail.com> wrote: > > > > > On Tue, 27 Oct 2009 16:56:28 +0900 > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > > > > On Tue, 27 Oct 2009 16:45:26 +0900 > > > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > > /* > > > > > * After this unlock we can no longer dereference local variable `mm' > > > > > @@ -92,8 +93,13 @@ unsigned long badness(struct task_struct > > > > > */ > > > > > list_for_each_entry(child, &p->children, sibling) { > > > > > task_lock(child); > > > > > - if (child->mm != mm && child->mm) > > > > > - points += child->mm->total_vm/2 + 1; > > > > > + if (child->mm != mm && child->mm) { > > > > > + unsigned long cpoint; > > > > > + /* At considering child, we don't count swap */ > > > > > + cpoint = get_mm_counter(child->mm, anon_rss) + > > > > > + get_mm_counter(child->mm, file_rss); > > > > > + points += cpoint/2 + 1; > > > > > + } > > > > > task_unlock(child); > > > > > > > > BTW, I'd like to get rid of this code. > > > > > > > > Can't we use other techniques for detecting fork-bomb ? > > > > > > > > This check can't catch following type, anyway. > > > > > > > > fork() > > > > -> fork() > > > > -> fork() > > > > -> fork() > > > > .... > > > > > > > > but I have no good idea. > > > > What is the difference with task-launcher and fork bomb()... > > > > > > > > > > I think it's good as-is. > > > Kernel is hard to know it by effiecient method. > > > It depends on applications. so Doesnt's task-launcher > > > like gnome-session have to control his oom_score? > > > > > > Welcome to any ideas if kernel can do it well. > > > > > Hmmm, check system-wide fork/sec and fork-depth ? Maybe not difficult to calculate.. > > Yes. We can do anything to achieve the goal in kernel. > Maybe check the time or fork-depth counting. > What I have a concern is how we can do it nicely if it is a serious > problem in kernel. ;) > yes...only the user knows whether user is wrong, finally. Especially in case of memory leak. > I think most of program which have many child are victims of OOM killing. > It make sense to me. There is some cases to not make sense like task-launcher. > So I think if task-launcher which is very rare and special program can change > oom_adj by itself, it's good than thing that add new heuristic in kernel. > > It's just my opinon. :) > I know KDE already adjsut oom_adj for their 3.5 release ;) Okay, concentrate on avoiding total_vm issue for a while. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 7:45 ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki 2009-10-27 7:56 ` Minchan Kim 2009-10-27 7:56 ` KAMEZAWA Hiroyuki @ 2009-10-27 17:41 ` Vedran Furač 2009-10-28 0:13 ` KAMEZAWA Hiroyuki 2009-10-27 18:39 ` Hugh Dickins 3 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-27 17:41 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, KOSAKI Motohiro, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes KAMEZAWA Hiroyuki wrote: > On Tue, 27 Oct 2009 15:55:26 +0900 > Minchan Kim <minchan.kim@gmail.com> wrote: > >>>> Hmm. >>>> I wonder why we consider VM size for OOM kiling. >>>> How about RSS size? >>>> >>> Maybe the current code assumes "Tons of swap have been generated, already" if >>> oom-kill is invoked. Then, just using mm->anon_rss will not be correct. >>> >>> Hm, should we count # of swap entries reference from mm ?.... >> In Vedran case, he didn't use swap. So, Only considering vm is the problem. >> I think it would be better to consider both RSS + # of swap entries as >> Kosaki mentioned. >> > Then, maybe this kind of patch is necessary. > This is on 2.6.31...then I may have to rebase this to mmotom. > Added more CCs. > > Vedran, I'm glad if you can test this patch. Thanks for the patch! I'll test it during this week a report after that. > Instead of total_vm, we should use anon/file/swap usage of a process, I think. > This patch adds mm->swap_usage and calculate oom_score based on > anon_rss + file_rss + swap_usage. Isn't file_rss shared between processes? Sorry, I'm newbie. :) % pmap $(pidof test) 29049: ./test 0000000000400000 4K r-x-- /home/vedranf/dev/tmp/test 0000000000600000 4K rw--- /home/vedranf/dev/tmp/test 00002ba362a80000 116K r-x-- /lib/ld-2.10.1.so 00002ba362a9d000 12K rw--- [ anon ] 00002ba362c9c000 4K r---- /lib/ld-2.10.1.so 00002ba362c9d000 4K rw--- /lib/ld-2.10.1.so 00002ba362c9e000 1320K r-x-- /lib/libc-2.10.1.so 00002ba362de8000 2044K ----- /lib/libc-2.10.1.so 00002ba362fe7000 16K r---- /lib/libc-2.10.1.so 00002ba362feb000 4K rw--- /lib/libc-2.10.1.so 00002ba362fec000 1024028K rw--- [ anon ] // <-- This 00007ffff4618000 84K rw--- [ stack ] 00007ffff47b7000 4K r-x-- [ anon ] ffffffffff600000 4K r-x-- [ anon ] total 1027648K I would just look at anon if that's OK (or possible). > Considering usual applications, this will be much better information than > total_vm. Agreed. > score PID name > 4033 3176 gnome-panel > 4077 3113 xinit > 4526 3190 python > 4820 3161 gnome-settings- > 4989 3289 gnome-terminal > 7105 3271 tomboy > 8427 3177 nautilus > 17549 3140 gnome-session > 128501 3299 bash > 256106 3383 mmap > > This order is not bad, I think. Yes, this looks much better now. Bash is only having somewhat strangely high score. Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 17:41 ` Vedran Furač @ 2009-10-28 0:13 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 0:13 UTC (permalink / raw) To: vedran.furac Cc: Minchan Kim, KOSAKI Motohiro, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes On Tue, 27 Oct 2009 18:41:22 +0100 Vedran FuraA? <vedran.furac@gmail.com> wrote: > KAMEZAWA Hiroyuki wrote: > > > On Tue, 27 Oct 2009 15:55:26 +0900 > > Minchan Kim <minchan.kim@gmail.com> wrote: > > > >>>> Hmm. > >>>> I wonder why we consider VM size for OOM kiling. > >>>> How about RSS size? > >>>> > >>> Maybe the current code assumes "Tons of swap have been generated, already" if > >>> oom-kill is invoked. Then, just using mm->anon_rss will not be correct. > >>> > >>> Hm, should we count # of swap entries reference from mm ?.... > >> In Vedran case, he didn't use swap. So, Only considering vm is the problem. > >> I think it would be better to consider both RSS + # of swap entries as > >> Kosaki mentioned. > >> > > Then, maybe this kind of patch is necessary. > > This is on 2.6.31...then I may have to rebase this to mmotom. > > Added more CCs. > > > > Vedran, I'm glad if you can test this patch. > > Thanks for the patch! I'll test it during this week a report after that. > > > Instead of total_vm, we should use anon/file/swap usage of a process, I think. > > This patch adds mm->swap_usage and calculate oom_score based on > > anon_rss + file_rss + swap_usage. > > Isn't file_rss shared between processes? Sorry, I'm newbie. :) > It's shared. But in typical case, file_rss will very small at OOM. > % pmap $(pidof test) > 29049: ./test > 0000000000400000 4K r-x-- /home/vedranf/dev/tmp/test > 0000000000600000 4K rw--- /home/vedranf/dev/tmp/test > 00002ba362a80000 116K r-x-- /lib/ld-2.10.1.so > 00002ba362a9d000 12K rw--- [ anon ] > 00002ba362c9c000 4K r---- /lib/ld-2.10.1.so > 00002ba362c9d000 4K rw--- /lib/ld-2.10.1.so > 00002ba362c9e000 1320K r-x-- /lib/libc-2.10.1.so > 00002ba362de8000 2044K ----- /lib/libc-2.10.1.so > 00002ba362fe7000 16K r---- /lib/libc-2.10.1.so > 00002ba362feb000 4K rw--- /lib/libc-2.10.1.so > 00002ba362fec000 1024028K rw--- [ anon ] // <-- This > 00007ffff4618000 84K rw--- [ stack ] > 00007ffff47b7000 4K r-x-- [ anon ] > ffffffffff600000 4K r-x-- [ anon ] > total 1027648K > > I would just look at anon if that's OK (or possible). > > > Considering usual applications, this will be much better information than > > total_vm. > > Agreed. > > > score PID name > > 4033 3176 gnome-panel > > 4077 3113 xinit > > 4526 3190 python > > 4820 3161 gnome-settings- > > 4989 3289 gnome-terminal > > 7105 3271 tomboy > > 8427 3177 nautilus > > 17549 3140 gnome-session > > 128501 3299 bash > > 256106 3383 mmap > > > > This order is not bad, I think. > > Yes, this looks much better now. Bash is only having somewhat strangely > high score. > It gets half score of mmap....If mmap goes, bash's score will goes down dramatically. I'll read other's comments and tweak this patch more. Thanks, -Kame > Regards, > > Vedran > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 7:45 ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki ` (2 preceding siblings ...) 2009-10-27 17:41 ` Vedran Furač @ 2009-10-27 18:39 ` Hugh Dickins 2009-10-27 18:47 ` Andrea Arcangeli 2009-10-28 0:28 ` KAMEZAWA Hiroyuki 3 siblings, 2 replies; 77+ messages in thread From: Hugh Dickins @ 2009-10-27 18:39 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, akpm, rientjes, aarcange On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote: > Now, oom-killer's score uses mm->total_vm as its base value. > But, in these days, applications like GUI program tend to use > much shared libraries and total_vm grows too high even when > pages are not fully mapped. > > For example, running a program "mmap" which allocates 1 GBbytes of > anonymous memory, oom_score top 10 on system will be.. > > score PID name > 89924 3938 mixer_applet2 > 90210 3942 tomboy > 94753 3936 clock-applet > 101994 3919 pulseaudio > 113525 4028 gnome-terminal > 127340 1 init > 128177 3871 nautilus > 151003 11515 bash > 256944 11653 mmap <-----------------use 1G of anon > 425561 3829 gnome-session > > No one believes gnome-session is more guilty than "mmap". > > Instead of total_vm, we should use anon/file/swap usage of a process, I think. > This patch adds mm->swap_usage and calculate oom_score based on > anon_rss + file_rss + swap_usage. > Considering usual applications, this will be much better information than > total_vm. After this patch, the score on my desktop is > > score PID name > 4033 3176 gnome-panel > 4077 3113 xinit > 4526 3190 python > 4820 3161 gnome-settings- > 4989 3289 gnome-terminal > 7105 3271 tomboy > 8427 3177 nautilus > 17549 3140 gnome-session > 128501 3299 bash > 256106 3383 mmap > > This order is not bad, I think. > > Note: This adss new counter...then new cost is added. I've often thought we ought to supply such a swap_usage statistic; and show it in /proc/pid/statsomething, presumably VmSwap in /proc/pid/status, even an additional field on the end of statm. A slight new cost, yes: doesn't matter at the swapping end, but would slightly impact fork and exit - I do hope we can afford it, because I think it should have been available all along. I've not checked your patch in detail; but I do agree that basing OOM (physical memory) decisions on total_vm (virtual memory) has seemed weird, so it's well worth trying this approach. Whether swap should be included along with rss isn't quite clear to me: I'm not saying you're wrong, not at all, just that it's not quite obvious. I've several observations to make about bad OOM kill decisions, but it's probably better that I make them in the original "Memory overcommit" thread, rather than divert this thread. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 18:39 ` Hugh Dickins @ 2009-10-27 18:47 ` Andrea Arcangeli 2009-10-28 0:32 ` KAMEZAWA Hiroyuki 2009-11-05 19:02 ` Pavel Machek 2009-10-28 0:28 ` KAMEZAWA Hiroyuki 1 sibling, 2 replies; 77+ messages in thread From: Andrea Arcangeli @ 2009-10-27 18:47 UTC (permalink / raw) To: Hugh Dickins Cc: KAMEZAWA Hiroyuki, Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, akpm, rientjes On Tue, Oct 27, 2009 at 06:39:07PM +0000, Hugh Dickins wrote: > OOM (physical memory) decisions on total_vm (virtual memory) has > seemed weird, so it's well worth trying this approach. Whether swap It is weird and wrong, I strongly support fixing it once and for all. The oom killing should be based on physical info, total_vm is a very rough approximation of the real info we're interested about (real RAM utilization of the task). > should be included along with rss isn't quite clear to me: I'm not > saying you're wrong, not at all, just that it's not quite obvious. Agreed it's not obvious. Intuitively I think only including RSS and no swap is best, but clearly I can't be entirely against including swap too as there may be scenarios where including swap provides for a better choice. My argument for not including swap is that we kill tasks to free RAM (we don't really care to free swap, system needs RAM at oom time). Freeing swap won't immediately help because no RAM is freed when swap is released (sure other tasks that sits huge in RAM can be moved to swap after swap isn't full but if we immediately killed those tasks that were huge in RAM in the first place we'd be better off). > I've several observations to make about bad OOM kill decisions, > but it's probably better that I make them in the original > "Memory overcommit" thread, rather than divert this thread. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 18:47 ` Andrea Arcangeli @ 2009-10-28 0:32 ` KAMEZAWA Hiroyuki 2009-11-05 19:02 ` Pavel Machek 1 sibling, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 0:32 UTC (permalink / raw) To: Andrea Arcangeli Cc: Hugh Dickins, Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, akpm, rientjes On Tue, 27 Oct 2009 19:47:43 +0100 Andrea Arcangeli <aarcange@redhat.com> wrote: > > should be included along with rss isn't quite clear to me: I'm not > > saying you're wrong, not at all, just that it's not quite obvious. > > Agreed it's not obvious. Intuitively I think only including RSS and no > swap is best, but clearly I can't be entirely against including swap > too as there may be scenarios where including swap provides for a > better choice. > > My argument for not including swap is that we kill tasks to free RAM > (we don't really care to free swap, system needs RAM at oom time). > Freeing swap won't immediately help because no RAM is freed when swap > is released (sure other tasks that sits huge in RAM can be moved to > swap after swap isn't full but if we immediately killed those tasks > that were huge in RAM in the first place we'd be better off). > Okay. As first step, I'll divide this into - replace total_vm with anon_rss/file_rss patch - swap accounting - a patch for consider whether swap amount should be included or not. Then, necessary part will go early. And backport will be easy. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 18:47 ` Andrea Arcangeli 2009-10-28 0:32 ` KAMEZAWA Hiroyuki @ 2009-11-05 19:02 ` Pavel Machek 1 sibling, 0 replies; 77+ messages in thread From: Pavel Machek @ 2009-11-05 19:02 UTC (permalink / raw) To: Andrea Arcangeli Cc: Hugh Dickins, KAMEZAWA Hiroyuki, Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, akpm, rientjes Hi! > Agreed it's not obvious. Intuitively I think only including RSS and no > swap is best, but clearly I can't be entirely against including swap > too as there may be scenarios where including swap provides for a > better choice. > > My argument for not including swap is that we kill tasks to free RAM > (we don't really care to free swap, system needs RAM at oom time). System should be out of _virtual_ memory at that point, so yes, freeing swap should help, too. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: Memory overcommit 2009-10-27 18:39 ` Hugh Dickins 2009-10-27 18:47 ` Andrea Arcangeli @ 2009-10-28 0:28 ` KAMEZAWA Hiroyuki 1 sibling, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 0:28 UTC (permalink / raw) To: Hugh Dickins Cc: Minchan Kim, KOSAKI Motohiro, vedran.furac, linux-mm, linux-kernel, akpm, rientjes, aarcange On Tue, 27 Oct 2009 18:39:07 +0000 (GMT) Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote: > On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote: > > Now, oom-killer's score uses mm->total_vm as its base value. > > But, in these days, applications like GUI program tend to use > > much shared libraries and total_vm grows too high even when > > pages are not fully mapped. > > > > For example, running a program "mmap" which allocates 1 GBbytes of > > anonymous memory, oom_score top 10 on system will be.. > > > > score PID name > > 89924 3938 mixer_applet2 > > 90210 3942 tomboy > > 94753 3936 clock-applet > > 101994 3919 pulseaudio > > 113525 4028 gnome-terminal > > 127340 1 init > > 128177 3871 nautilus > > 151003 11515 bash > > 256944 11653 mmap <-----------------use 1G of anon > > 425561 3829 gnome-session > > > > No one believes gnome-session is more guilty than "mmap". > > > > Instead of total_vm, we should use anon/file/swap usage of a process, I think. > > This patch adds mm->swap_usage and calculate oom_score based on > > anon_rss + file_rss + swap_usage. > > Considering usual applications, this will be much better information than > > total_vm. After this patch, the score on my desktop is > > > > score PID name > > 4033 3176 gnome-panel > > 4077 3113 xinit > > 4526 3190 python > > 4820 3161 gnome-settings- > > 4989 3289 gnome-terminal > > 7105 3271 tomboy > > 8427 3177 nautilus > > 17549 3140 gnome-session > > 128501 3299 bash > > 256106 3383 mmap > > > > This order is not bad, I think. > > > > Note: This adss new counter...then new cost is added. > > I've often thought we ought to supply such a swap_usage statistic; > and show it in /proc/pid/statsomething, presumably VmSwap in > /proc/pid/status, even an additional field on the end of statm. > Hm, ok. I'll divide this patch into - replace total_vm with anon_rss + file_rsss (everyone will agree this.) - add swap usage accounting - show it via /proc (may need discuss about its style.) - use the value at oom calculation (need discuss) > A slight new cost, yes: doesn't matter at the swapping end, but > would slightly impact fork and exit - I do hope we can afford it, > because I think it should have been available all along. > fork()/exit() uses batched counting. Then, we don't see overhead. > I've not checked your patch in detail; but I do agree that basing > OOM (physical memory) decisions on total_vm (virtual memory) has > seemed weird, so it's well worth trying this approach. Whether swap > should be included along with rss isn't quite clear to me: I'm not > saying you're wrong, not at all, just that it's not quite obvious. > yes. It just comes from heuristics. It will need discuss/investigation/theory. > I've several observations to make about bad OOM kill decisions, > but it's probably better that I make them in the original > "Memory overcommit" thread, rather than divert this thread. > Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 6:34 ` Minchan Kim 2009-10-27 6:36 ` KAMEZAWA Hiroyuki @ 2009-10-27 6:46 ` KOSAKI Motohiro 2009-10-27 6:56 ` Minchan Kim 1 sibling, 1 reply; 77+ messages in thread From: KOSAKI Motohiro @ 2009-10-27 6:46 UTC (permalink / raw) To: Minchan Kim Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel > > > %check_badness.pl | sort -n | tail > > > -- > > > 89924 A 3938 A A mixer_applet2 > > > 90210 A 3942 A A tomboy > > > 94753 A 3936 A A clock-applet > > > 101994 A 3919 A A pulseaudio > > > 113525 A 4028 A A gnome-terminal > > > 127340 A 1 A A A init > > > 128177 A 3871 A A nautilus > > > 151003 A 11515 A bash > > > 256944 A 11653 A mmap > > > 425561 A 3829 A A gnome-session > > > -- > > > Sigh, gnome-session has twice value of mmap(1G). > > > Of course, gnome-session only uses 6M bytes of anon. > > > I wonder this is because gnome-session has many children..but need to > > > dig more. Does anyone has idea ? > > > (CCed kosaki) > > > > Following output address the issue. > > The fact is, modern desktop application linked pretty many library. it > > makes bloat VSS size and increase > > OOM score. > > > > Ideally, We shouldn't account evictable file-backed mappings for oom_score. > > > Hmm. > I wonder why we consider VM size for OOM kiling. > How about RSS size? Because, swap out-ed bad body (e.g. fork bomb process) still should be killed by oom. RSS + swap-entries is acceptable to me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 6:46 ` KOSAKI Motohiro @ 2009-10-27 6:56 ` Minchan Kim 0 siblings, 0 replies; 77+ messages in thread From: Minchan Kim @ 2009-10-27 6:56 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Minchan Kim, KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel On Tue, 27 Oct 2009 15:46:36 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > > %check_badness.pl | sort -n | tail > > > > -- > > > > 89924 A 3938 A A mixer_applet2 > > > > 90210 A 3942 A A tomboy > > > > 94753 A 3936 A A clock-applet > > > > 101994 A 3919 A A pulseaudio > > > > 113525 A 4028 A A gnome-terminal > > > > 127340 A 1 A A A init > > > > 128177 A 3871 A A nautilus > > > > 151003 A 11515 A bash > > > > 256944 A 11653 A mmap > > > > 425561 A 3829 A A gnome-session > > > > -- > > > > Sigh, gnome-session has twice value of mmap(1G). > > > > Of course, gnome-session only uses 6M bytes of anon. > > > > I wonder this is because gnome-session has many children..but need to > > > > dig more. Does anyone has idea ? > > > > (CCed kosaki) > > > > > > Following output address the issue. > > > The fact is, modern desktop application linked pretty many library. it > > > makes bloat VSS size and increase > > > OOM score. > > > > > > Ideally, We shouldn't account evictable file-backed mappings for oom_score. > > > > > Hmm. > > I wonder why we consider VM size for OOM kiling. > > How about RSS size? > > Because, swap out-ed bad body (e.g. fork bomb process) still should > be killed by oom. > RSS + swap-entries is acceptable to me. It's reasonable to me. As I mentioned by reply of kame, in Vedran case, he didn't use swap. I think only considering vm is the problem. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 3:22 ` KAMEZAWA Hiroyuki 2009-10-27 6:10 ` KOSAKI Motohiro @ 2009-10-27 17:12 ` Vedran Furač 2009-10-27 18:02 ` KOSAKI Motohiro 2009-10-27 20:44 ` Hugh Dickins 2 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-27 17:12 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: linux-mm, linux-kernel, kosaki.motohiro, hugh.dickins, akpm, rientjes KAMEZAWA Hiroyuki wrote: > On Mon, 26 Oct 2009 17:16:14 +0100 > Vedran FuraA? <vedran.furac@gmail.com> wrote: >>> - Could you show me /var/log/dmesg and /var/log/messages at OOM ? >> It was catastrophe. :) X crashed (or killed) with all the programs, but >> my little program was alive for 20 minutes (see timestamps). And for >> that time computer was completely unusable. Couldn't even get the >> console via ssh. Rally embarrassing for a modern OS to get destroyed by >> a 5 lines of C run as an ordinary user. Luckily screen was still alive, >> oomk usually kills it also. See for yourself: >> >> dmesg: http://pastebin.com/f3f83738a >> messages: http://pastebin.com/f2091110a >> >> (CCing to lklm again... I just want people to see the logs.) >> > Thank you for reporting and your patience. It seems something strange > that your KDE programs are killed. I agree. No problem. I want this to be solved as much as you do. Actually, it is not strange, just a buggy algorithm. Run: % ps -T -eo pid,ppid,tid,vsz,command You'll see that ppid of a number of processes is kdeinit, gnome-session, fvwm or something else depending on what one is using. All of this processes are started automatically during startup or manually clicking on a menu item or by some keyboard shortcut. OOM algorithm just sums memory usage of all of them and adds that ot the parent. Just plain wrong. Also, it seems it's looking at VIRT instead of RES. > I attached a scirpt for checking oom_score of all exisiting process. > (oom_score is a value used for selecting "bad" processs.") > please run if you have time. 96890 21463 VirtualBox // OK 118615 11144 kded4 // WRONG 127455 11158 knotify4 // WRONG 132198 1 init // WRONG 133940 11151 ksmserver // WRONG 134109 11224 audacious2 // Audio player, maybe 145476 21503 VirtualBox // OK 174939 11322 icedove-bin // thunderbird, maybe 178015 11223 akregator // rss reader, maybe 201043 22672 krusader // WRONG 212609 11187 krunner // WRONG 256911 24252 test // culprit, malloced 1GB 1750371 11318 run-mozilla.sh // tiny, parent of firefox threads 2044902 11141 kdeinit4 // tiny, parent of most KDE apps > Sigh, gnome-session has twice value of mmap(1G). > Of course, gnome-session only uses 6M bytes of anon. > I wonder this is because gnome-session has many children..but need to Yes it is. Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 17:12 ` Vedran Furač @ 2009-10-27 18:02 ` KOSAKI Motohiro 2009-10-27 18:30 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: KOSAKI Motohiro @ 2009-10-27 18:02 UTC (permalink / raw) To: vedran.furac Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes [-- Attachment #1: Type: text/plain, Size: 1098 bytes --] >> I attached a scirpt for checking oom_score of all exisiting process. >> (oom_score is a value used for selecting "bad" processs.") >> please run if you have time. > > 96890 21463 VirtualBox // OK > 118615 11144 kded4 // WRONG > 127455 11158 knotify4 // WRONG > 132198 1 init // WRONG > 133940 11151 ksmserver // WRONG > 134109 11224 audacious2 // Audio player, maybe > 145476 21503 VirtualBox // OK > 174939 11322 icedove-bin // thunderbird, maybe > 178015 11223 akregator // rss reader, maybe > 201043 22672 krusader // WRONG > 212609 11187 krunner // WRONG > 256911 24252 test // culprit, malloced 1GB > 1750371 11318 run-mozilla.sh // tiny, parent of firefox threads > 2044902 11141 kdeinit4 // tiny, parent of most KDE apps Verdran, I made alternative improvement idea. Can you please mesure badness score on your system? Maybe your culprit process take biggest badness value. Note: this patch change time related thing. So, please drink a cup of coffee before mesurement. small rest time makes correct test result. [-- Attachment #2: 0001-oom-oom-score-bonus-by-run_time-use-proportional-va.patch --] [-- Type: application/octet-stream, Size: 3025 bytes --] From 047e6647f580a7c9bed2ac547bc9b15154d5da4c Mon Sep 17 00:00:00 2001 From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Date: Wed, 28 Oct 2009 02:25:01 +0900 Subject: [PATCH] oom: oom-score bonus by run_time use proportional value Currently, oom-score bonus by run_time use the fomula of "sqrt(sqrt(runtime / 1024)))". It mean process got 1/3 times oom-score per day. This feature exist for protect sevaral important system daemon. However, typical desktop user reboot the system everyday. then its bonus is too small. This bonus only works well on server systems. IOW typical uptime strongly depend on use-case. it shouldn't use for oom modifier. Instead, This patch use proportional run_time value against uptime. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- fs/proc/base.c | 1 + mm/oom_kill.c | 26 +++++++++++++++----------- 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 837469a..17d6fd4 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -446,6 +446,7 @@ static int proc_oom_score(struct task_struct *task, char *buffer) struct timespec uptime; do_posix_clock_monotonic_gettime(&uptime); + monotonic_to_bootbased(&uptime); read_lock(&tasklist_lock); points = badness(task->group_leader, uptime.tv_sec); read_unlock(&tasklist_lock); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index ea2147d..3c1b3a3 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -69,10 +69,10 @@ static int has_intersects_mems_allowed(struct task_struct *tsk) * algorithm has been meticulously tuned to meet the principle * of least surprise ... (be careful when you change it) */ - unsigned long badness(struct task_struct *p, unsigned long uptime) { - unsigned long points, cpu_time, run_time; + unsigned long points, cpu_time; + unsigned long run_time = 0; struct mm_struct *mm; struct task_struct *child; int oom_adj = p->signal->oom_adj; @@ -130,17 +130,20 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) utime = cputime_to_jiffies(task_time.utime); stime = cputime_to_jiffies(task_time.stime); cpu_time = (utime + stime) >> (SHIFT_HZ + 3); - - - if (uptime >= p->start_time.tv_sec) - run_time = (uptime - p->start_time.tv_sec) >> 10; - else - run_time = 0; - if (cpu_time) points /= int_sqrt(cpu_time); - if (run_time) - points /= int_sqrt(int_sqrt(run_time)); + + if (uptime <= p->real_start_time.tv_sec) { + /* Baby process may be not so important. */ + points *= 2; + } else { + run_time = (uptime - p->real_start_time.tv_sec); + if (!run_time) + run_time = 1; + + run_time = ((run_time * 100) / uptime) + 1; + points /= int_sqrt(run_time); + } /* * Niced processes are most likely less important, so double @@ -233,6 +236,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints, *ppoints = 0; do_posix_clock_monotonic_gettime(&uptime); + monotonic_to_bootbased(&uptime); for_each_process(p) { unsigned long points; -- 1.6.2.5 ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 18:02 ` KOSAKI Motohiro @ 2009-10-27 18:30 ` Vedran Furač 0 siblings, 0 replies; 77+ messages in thread From: Vedran Furač @ 2009-10-27 18:30 UTC (permalink / raw) To: KOSAKI Motohiro Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, hugh.dickins, akpm, rientjes KOSAKI Motohiro wrote: >>> I attached a scirpt for checking oom_score of all exisiting process. >>> (oom_score is a value used for selecting "bad" processs.") >>> please run if you have time. >> 96890 21463 VirtualBox // OK >> 118615 11144 kded4 // WRONG >> 127455 11158 knotify4 // WRONG >> 132198 1 init // WRONG >> 133940 11151 ksmserver // WRONG >> 134109 11224 audacious2 // Audio player, maybe >> 145476 21503 VirtualBox // OK >> 174939 11322 icedove-bin // thunderbird, maybe >> 178015 11223 akregator // rss reader, maybe >> 201043 22672 krusader // WRONG >> 212609 11187 krunner // WRONG >> 256911 24252 test // culprit, malloced 1GB >> 1750371 11318 run-mozilla.sh // tiny, parent of firefox threads >> 2044902 11141 kdeinit4 // tiny, parent of most KDE apps > > Verdran, I made alternative improvement idea. Can you please mesure > badness score > on your system? > Maybe your culprit process take biggest badness value. Thanks, I'll test it during the week. But note that not every user reboots its computer everyday. I, for example, usually have it up for days. And when it comes to my laptop - weeks, as I just suspend it when I don't use it. Maybe the best way is to combine two patches. Also, you and others could also test these patches. It is not only my kernel that behaves strange. :) > Note: this patch change time related thing. So, please drink a cup of > coffee before mesurement. > small rest time makes correct test result. OK. :) Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 3:22 ` KAMEZAWA Hiroyuki 2009-10-27 6:10 ` KOSAKI Motohiro 2009-10-27 17:12 ` Vedran Furač @ 2009-10-27 20:44 ` Hugh Dickins 2009-10-27 21:04 ` David Rientjes ` (2 more replies) 2 siblings, 3 replies; 77+ messages in thread From: Hugh Dickins @ 2009-10-27 20:44 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, linux-mm, linux-kernel, kosaki.motohiro, minchan.kim, akpm, rientjes, aarcange On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote: > Sigh, gnome-session has twice value of mmap(1G). > Of course, gnome-session only uses 6M bytes of anon. > I wonder this is because gnome-session has many children..but need to > dig more. Does anyone has idea ? When preparing KSM unmerge to handle OOM, I looked at how the precedent was handled by running a little program which mmaps an anonymous region of the same size as physical memory, then tries to mlock it. The program was such an obvious candidate to be killed, I was shocked by the poor decisions the OOM killer made. Usually I ran it with mem=512M, with gnome and firefox active. Often the OOM killer killed it right the first time, but went wrong when I tried it a second time (I think that's because of what's already swapped out the first time). I built up a patchset of fixes, but once I came to split them up for submission, not one of them seemed entirely satisfactory; and Andrea's fix to the KSM/mlock deadlock forced me to abandon even the first of the patches (we've since then fixed the way munlocking behaves, so in theory could revisit that; but Andrea disliked what I was trying to do there in KSM for other reasons, so I've not touched it since). I had to get on with KSM, so I set it all aside: none of the issues was a recent regression. I did briefly wonder about the reliance on total_vm which you're now looking into, but didn't touch that at all. Let me describe those issues which I did try but fail to fix - I've no more time to deal with them now than then, but ought at least to mention them to you. 1. select_bad_process() tries to avoid killing another process while there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm processes. However, p->mm is set to NULL well before p reaches exit_mmap() to actually free the memory, and there may be significant delays in between (I think exit_robust_list() gave me a hang at one stage). So in practice, even when the OOM killer selects the right process to kill, there can be lots of collateral damage from it not waiting long enough for that process to give up its memory. I tried to deal with that by moving the TIF_MEMDIE test up before the p->mm test, but adding in a check on p->exit_state: if (test_tsk_thread_flag(p, TIF_MEMDIE) && !p->exit_state) return ERR_PTR(-1UL); But this is then liable to hang the system if there's some reason why the selected process cannot proceed to free its memory (e.g. the current KSM unmerge case). It needs to wait "a while", but give up if no progress is made, instead of hanging: originally I thought that setting PF_MEMALLOC more widely in page_alloc.c, and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC, would deal with that; but we cannot be sure that waiting of memory is the only reason for a holdup there (in the KSM unmerge case it's waiting for an mmap_sem, and there may well be other such cases). 2. I started out running my mlock test program as root (later switched to use "ulimit -l unlimited" first). But badness() reckons CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points; and CAP_SYS_RAWIO another reason to quarter your points: so running as root makes you sixteen times less likely to be killed. Quartering is anyway debatable, but sixteenthing seems utterly excessive to me. I moved the CAP_SYS_RAWIO test in with the others, so it does no more than quartering; but is quartering appropriate anyway? I did wonder if I was right to be "subverting" the fine-grained CAPs in this way, but have since seen unrelated mail from one who knows better, implying they're something of a fantasy, that su and sudo are indeed what's used in the real world. Maybe this patch was okay. 3. badness() has a comment above it which says: * 5) we try to kill the process the user expects us to kill, this * algorithm has been meticulously tuned to meet the principle * of least surprise ... (be careful when you change it) But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815) adds plenty of surprise there, by trying to factor children into the calculation. Intended to deal with forkbombs, but any reasonable process whose purpose is to fork children (e.g. gnome-session) becomes very vulnerable. And whereas badness() itself goes on to refine the total_vm points by various adjustments peculiar to the process in question, those refinements have been ignored when adding the child's total_vm/2. (Andrea does remark that he'd rather have rewritten badness() from scratch.) I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF) part of the calculation up to select_bad_process(), making a solo_badness() function which makes all those adjustments to total_vm, then badness() itself a simple function adding half the children's solo_badness()es to the process' own solo_badness(). But probably lots more needs doing - Andrea's rewrite? 4. In some cases those children are sharing exactly the same mm, yet its total_vm is being added again and again to the points: I had a nasty inner loop searching back to see if we'd already counted this mm (but then, what if the different tasks sharing the mm deserved different adjustments to the total_vm?). I hope these notes help someone towards a better solution (and be prepared to discover more on the way). I agree with Vedran that the present behaviour is pretty unimpressive, and I'm puzzled as to how people can have been tinkering with oom_kill.c down the years without seeing any of this. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 20:44 ` Hugh Dickins @ 2009-10-27 21:04 ` David Rientjes 2009-10-28 0:08 ` Vedran Furač 2009-10-28 0:43 ` KAMEZAWA Hiroyuki 2009-10-28 2:47 ` KOSAKI Motohiro 2 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-27 21:04 UTC (permalink / raw) To: Hugh Dickins Cc: KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 27 Oct 2009, Hugh Dickins wrote: > When preparing KSM unmerge to handle OOM, I looked at how the precedent > was handled by running a little program which mmaps an anonymous region > of the same size as physical memory, then tries to mlock it. The > program was such an obvious candidate to be killed, I was shocked > by the poor decisions the OOM killer made. Usually I ran it with > mem=512M, with gnome and firefox active. Often the OOM killer killed > it right the first time, but went wrong when I tried it a second time > (I think that's because of what's already swapped out the first time). > The heuristics that the oom killer use in selecting a task seem to get debated quite often. What hasn't been mentioned is that total_vm does do a good job of identifying tasks that are using far more memory than expected. That seems to be the initial target: killing a rogue task that is hogging much more memory than it should, probably because of a memory leak. The latest approach seems to be focused more on killing the task that will free the most resident memory. That certainly is understandable to avoid killing additional tasks later and avoiding subsequent page allocations in the short term, but doesn't help to kill the memory leaker. There's advantages to either approach, but it depends on the contextual goal of the oom killer when it's called: kill a rogue task that is allocating more memory than expected, or kill a task that will free the most memory. > 1. select_bad_process() tries to avoid killing another process while > there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm > processes. However, p->mm is set to NULL well before p reaches > exit_mmap() to actually free the memory, and there may be significant > delays in between (I think exit_robust_list() gave me a hang at one > stage). So in practice, even when the OOM killer selects the right > process to kill, there can be lots of collateral damage from it not > waiting long enough for that process to give up its memory. > > I tried to deal with that by moving the TIF_MEMDIE test up before > the p->mm test, but adding in a check on p->exit_state: > if (test_tsk_thread_flag(p, TIF_MEMDIE) && > !p->exit_state) > return ERR_PTR(-1UL); > But this is then liable to hang the system if there's some reason > why the selected process cannot proceed to free its memory (e.g. > the current KSM unmerge case). It needs to wait "a while", but > give up if no progress is made, instead of hanging: originally > I thought that setting PF_MEMALLOC more widely in page_alloc.c, > and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC, > would deal with that; but we cannot be sure that waiting of memory > is the only reason for a holdup there (in the KSM unmerge case it's > waiting for an mmap_sem, and there may well be other such cases). > I've proposed an oom killer timeout in the past which adds a jiffies count to struct task_struct and will defer killing other tasks until the predefined time limit (we use 10*HZ) has been exceeded. The problem is that even if you kill another task, it is highly unlikely that the expired task will ever exit at that point and is still holding a substantial amount of memory since it also had access to memory reserves and has still failed to exit. > 2. I started out running my mlock test program as root (later > switched to use "ulimit -l unlimited" first). But badness() reckons > CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points; > and CAP_SYS_RAWIO another reason to quarter your points: so running > as root makes you sixteen times less likely to be killed. Quartering > is anyway debatable, but sixteenthing seems utterly excessive to me. > > I moved the CAP_SYS_RAWIO test in with the others, so it does no > more than quartering; but is quartering appropriate anyway? I did > wonder if I was right to be "subverting" the fine-grained CAPs in > this way, but have since seen unrelated mail from one who knows > better, implying they're something of a fantasy, that su and sudo > are indeed what's used in the real world. Maybe this patch was okay. > I think someone (Nick?) proposed a patch at one time that removed most of the heuristics from select_bad_process() other than total_vm of the task and its children, mems_allowed intersection, and oom_adj. > 4. In some cases those children are sharing exactly the same mm, > yet its total_vm is being added again and again to the points: > I had a nasty inner loop searching back to see if we'd already > counted this mm (but then, what if the different tasks sharing > the mm deserved different adjustments to the total_vm?). > oom_kill_process() may not kill the task selected by select_bad_process(), it will first attempt to kill one of these children with a different mm. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 21:04 ` David Rientjes @ 2009-10-28 0:08 ` Vedran Furač 2009-10-28 0:25 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-28 0:08 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > There's advantages to either approach, but it depends on the contextual > goal of the oom killer when it's called: kill a rogue task that is > allocating more memory than expected, But it is wrong at counting allocated memory! Come on, it kills /usr/lib/icedove/run-mozilla.sh. Parent, a shell script, instead of its child(s) which allocated memory. Look, "test" allocates some (0.1GB) memory, and you have: % cat test.sh #!/bin/sh ./test& ./test& ./test& ./test % perl check_badness.pl|sort -n|g test 26511 7884 test 26511 7885 test 26511 7886 test 26511 7887 test 53994 7883 test.sh // great, so test.sh "is" the bad ass, ok, emulate OOMK: % kill -9 7883 // did we kill "a rogue task" % perl check_badness.pl|sort -n|g test 26511 7884 test 26511 7885 test 26511 7886 test 26511 7887 test // nooo, they are still alive and eating our memory! QED by newbie. ;) > or kill a task that will free the most memory. . -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 0:08 ` Vedran Furač @ 2009-10-28 0:25 ` David Rientjes 2009-10-28 0:39 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-28 0:25 UTC (permalink / raw) To: vedran.furac Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, Vedran Fura wrote: > But it is wrong at counting allocated memory! > Come on, it kills /usr/lib/icedove/run-mozilla.sh. Parent, a shell > script, instead of its child(s) which allocated memory. Look, "test" > allocates some (0.1GB) memory, and you have: > > % cat test.sh > > #!/bin/sh > ./test& > ./test& > ./test& > ./test > > % perl check_badness.pl|sort -n|g test > > 26511 7884 test > 26511 7885 test > 26511 7886 test > 26511 7887 test > 53994 7883 test.sh > > // great, so test.sh "is" the bad ass, ok, emulate OOMK: > > % kill -9 7883 > > // did we kill "a rogue task" > > % perl check_badness.pl|sort -n|g test > > 26511 7884 test > 26511 7885 test > 26511 7886 test > 26511 7887 test > > // nooo, they are still alive and eating our memory! > This is wrong; it doesn't "emulate oom" since oom_kill_process() always kills a child of the selected process instead if they do not share the same memory. The chosen task in that case is untouched. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 0:25 ` David Rientjes @ 2009-10-28 0:39 ` Vedran Furač 2009-10-28 4:08 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-28 0:39 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > This is wrong; it doesn't "emulate oom" since oom_kill_process() always > kills a child of the selected process instead if they do not share the > same memory. The chosen task in that case is untouched. OK, I stand corrected then. Thanks! But, while testing this I lost X once again and "test" survived for some time (check the timestamps): http://pastebin.com/d5c9d026e - It started by killing gkrellm(!!!) - Then I lost X (kdeinit4 I guess) - Then 103 seconds after the killing started, it killed "test" - the real culprit. I mean... how?! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 0:39 ` Vedran Furač @ 2009-10-28 4:08 ` David Rientjes 2009-10-28 4:55 ` KAMEZAWA Hiroyuki 2009-10-28 13:28 ` Vedran Furač 0 siblings, 2 replies; 77+ messages in thread From: David Rientjes @ 2009-10-28 4:08 UTC (permalink / raw) To: vedran.furac Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, Vedran Furac wrote: > > This is wrong; it doesn't "emulate oom" since oom_kill_process() always > > kills a child of the selected process instead if they do not share the > > same memory. The chosen task in that case is untouched. > > OK, I stand corrected then. Thanks! But, while testing this I lost X > once again and "test" survived for some time (check the timestamps): > > http://pastebin.com/d5c9d026e > > - It started by killing gkrellm(!!!) > - Then I lost X (kdeinit4 I guess) > - Then 103 seconds after the killing started, it killed "test" - the > real culprit. > > I mean... how?! > Here are the five oom kills that occurred in your log, and notice that the first four times it kills a child and not the actual task as I explained: [97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child [97137.725017] Killed process 21503 (VirtualBox) [97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child [97137.864656] Killed process 11142 (klauncher) [97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child [97137.888180] Killed process 11151 (ksmserver) [97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child [97137.972888] Killed process 11224 (audacious2) Those are practically happening simultaneously with very little memory being available between each oom kill. Only later is "test" killed: [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child [97240.206832] Killed process 5005 (test) Notice how the badness score is less than 1/4th of the others. So while you may find it to be hogging a lot of memory, there were others that consumed much more. You can get a more detailed understanding of this by doing echo 1 > /proc/sys/vm/oom_dump_tasks before trying your testcase; it will show various information like the total_vm and oom_adj value for each task at the time of oom (and the actual badness score is exported per-task via /proc/pid/oom_score in real-time). This will also include the rss and show what the end result would be in using that value as part of the heuristic on this particular workload compared to the current implementation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 4:08 ` David Rientjes @ 2009-10-28 4:55 ` KAMEZAWA Hiroyuki 2009-10-28 5:13 ` David Rientjes 2009-10-28 13:28 ` Vedran Furač 1 sibling, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 4:55 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 27 Oct 2009 21:08:56 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Wed, 28 Oct 2009, Vedran Furac wrote: > > > > This is wrong; it doesn't "emulate oom" since oom_kill_process() always > > > kills a child of the selected process instead if they do not share the > > > same memory. The chosen task in that case is untouched. > > > > OK, I stand corrected then. Thanks! But, while testing this I lost X > > once again and "test" survived for some time (check the timestamps): > > > > http://pastebin.com/d5c9d026e > > > > - It started by killing gkrellm(!!!) > > - Then I lost X (kdeinit4 I guess) > > - Then 103 seconds after the killing started, it killed "test" - the > > real culprit. > > > > I mean... how?! > > > > Here are the five oom kills that occurred in your log, and notice that the > first four times it kills a child and not the actual task as I explained: > > [97137.724971] Out of memory: kill process 21485 (VBoxSVC) score 1564940 or a child > [97137.725017] Killed process 21503 (VirtualBox) > [97137.864622] Out of memory: kill process 11141 (kdeinit4) score 1196178 or a child > [97137.864656] Killed process 11142 (klauncher) > [97137.888146] Out of memory: kill process 11141 (kdeinit4) score 1184308 or a child > [97137.888180] Killed process 11151 (ksmserver) > [97137.972875] Out of memory: kill process 11141 (kdeinit4) score 1146255 or a child > [97137.972888] Killed process 11224 (audacious2) > > Those are practically happening simultaneously with very little memory > being available between each oom kill. Only later is "test" killed: > > [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child > [97240.206832] Killed process 5005 (test) > > Notice how the badness score is less than 1/4th of the others. So while > you may find it to be hogging a lot of memory, there were others that > consumed much more. not related to child-parent problem. Seeing this number more. == [97137.709272] Active_anon:671487 active_file:82 inactive_anon:132316 [97137.709273] inactive_file:82 unevictable:50 dirty:0 writeback:0 unstable:0 [97137.709273] free:6122 slab:17179 mapped:30661 pagetables:8052 bounce:0 == acitve_file + inactive_file is very low. Almost all pages are for anon. But "mapped(NR_FILE_MAPPED)" is a little high. This implies remaining file caches are mapped by many processes OR some mega bytes of shmem is used. # of pagetables is 8052, this means 8052x4096/8*4k bytes = 16Gbytes of mapped area. Total available memory is near to be active/inactive + slab 671487+82+132316+82+50+6122+17179+8052=835370x4k= 3.2Gbytes ? (this system is swapless) Then, considering the pmap kosaki shows, I guess killed ones had big total_vm but has not much real rss, and no helps for oom. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 4:55 ` KAMEZAWA Hiroyuki @ 2009-10-28 5:13 ` David Rientjes 2009-10-28 6:05 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-28 5:13 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote: > not related to child-parent problem. > > Seeing this number more. > == > [97137.709272] Active_anon:671487 active_file:82 inactive_anon:132316 > [97137.709273] inactive_file:82 unevictable:50 dirty:0 writeback:0 unstable:0 > [97137.709273] free:6122 slab:17179 mapped:30661 pagetables:8052 bounce:0 > == > > acitve_file + inactive_file is very low. Almost all pages are for anon. > But "mapped(NR_FILE_MAPPED)" is a little high. This implies remaining file caches > are mapped by many processes OR some mega bytes of shmem is used. > > # of pagetables is 8052, this means > 8052x4096/8*4k bytes = 16Gbytes of mapped area. > > Total available memory is near to be active/inactive + slab > 671487+82+132316+82+50+6122+17179+8052=835370x4k= 3.2Gbytes ? > (this system is swapless) > Yep: [97137.724965] 917504 pages RAM [97137.724967] 69721 pages reserved (917504 - 69721) * 4K = ~3.23G > Then, considering the pmap kosaki shows, > I guess killed ones had big total_vm but has not much real rss, > and no helps for oom. > echo 1 > /proc/sys/vm/oom_dump_tasks can confirm that. The bigger issue is making the distinction between killing a rogue task that is using much more memory than expected (the supposed current behavior, influenced from userspace by /proc/pid/oom_adj), and killing the task with the highest rss. The latter is definitely desired if we are allocating tons of memory but reduces the ability of the user to influence the badness score. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 5:13 ` David Rientjes @ 2009-10-28 6:05 ` KAMEZAWA Hiroyuki 2009-10-28 6:17 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 6:05 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 27 Oct 2009 22:13:44 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > Yep: > > [97137.724965] 917504 pages RAM > [97137.724967] 69721 pages reserved > > (917504 - 69721) * 4K = ~3.23G > > > Then, considering the pmap kosaki shows, > > I guess killed ones had big total_vm but has not much real rss, > > and no helps for oom. > > > > echo 1 > /proc/sys/vm/oom_dump_tasks can confirm that. > yes. > The bigger issue is making the distinction between killing a rogue task > that is using much more memory than expected (the supposed current > behavior, influenced from userspace by /proc/pid/oom_adj), and killing the > task with the highest rss. All kernel engineers know "than expected or not" can be never known to the kernel. So, oom_adj workaround is used now. (by some special users.) OOM Killer itself is also a workaround, too. "No kill" is the best thing but we know there are tend to be memory-leaker on bad systems and all systems in this world are not perfect. In the kernel view, there is no difference between rogue one and highest rss one. As heuristics, "time" is used now. But it's not very trustable. > The latter is definitely desired if we are > allocating tons of memory but reduces the ability of the user to influence > the badness score. > Yes, some more trustable values other than vmsize/rss/time are appriciated. I wonder recent memory consumption speed can be an another key value. Anyway, current bahavior of "killing X" is a bad thing. We need some fixes. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 6:05 ` KAMEZAWA Hiroyuki @ 2009-10-28 6:17 ` David Rientjes 2009-10-28 6:20 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-28 6:17 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote: > All kernel engineers know "than expected or not" can be never known to the kernel. > So, oom_adj workaround is used now. (by some special users.) > OOM Killer itself is also a workaround, too. > "No kill" is the best thing but we know there are tend to be memory-leaker on bad > systems and all systems in this world are not perfect. > Right, and historically that has been addressed by considering total_vm and adjusting it with oom_adj so that we can identify memory leaking tasks through user-defined criteria. > Yes, some more trustable values other than vmsize/rss/time are appriciated. > I wonder recent memory consumption speed can be an another key value. > Sounds very logical. > Anyway, current bahavior of "killing X" is a bad thing. > We need some fixes. > You can easily protect X with OOM_DISABLE, as you know. I don't think we need any X-specific heuristics added to the kernel, it looks like the special cases have already polluted badness() enough. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 6:17 ` David Rientjes @ 2009-10-28 6:20 ` KAMEZAWA Hiroyuki 2009-10-29 8:38 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 6:20 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 27 Oct 2009 23:17:41 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote: > > > All kernel engineers know "than expected or not" can be never known to the kernel. > > So, oom_adj workaround is used now. (by some special users.) > > OOM Killer itself is also a workaround, too. > > "No kill" is the best thing but we know there are tend to be memory-leaker on bad > > systems and all systems in this world are not perfect. > > > > Right, and historically that has been addressed by considering total_vm > and adjusting it with oom_adj so that we can identify memory leaking tasks > through user-defined criteria. > > > Yes, some more trustable values other than vmsize/rss/time are appriciated. > > I wonder recent memory consumption speed can be an another key value. > > > > Sounds very logical. > > > Anyway, current bahavior of "killing X" is a bad thing. > > We need some fixes. > > > > You can easily protect X with OOM_DISABLE, as you know. I don't think we > need any X-specific heuristics added to the kernel, it looks like the > special cases have already polluted badness() enough. > It's _not_ special to X. Almost all applications which uses many dynamica libraries can be affected by this, total_vm. And, as I explained to Vedran, multi-threaded program like Java can easily increase total_vm without using many anon_rss. And it's the reason I hate overcommit_memory. size of VM doesn't tell anything. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 6:20 ` KAMEZAWA Hiroyuki @ 2009-10-29 8:38 ` David Rientjes 2009-10-29 11:11 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-29 8:38 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, KAMEZAWA Hiroyuki wrote: > It's _not_ special to X. > > Almost all applications which uses many dynamica libraries can be affected by this, > total_vm. And, as I explained to Vedran, multi-threaded program like Java can easily > increase total_vm without using many anon_rss. > And it's the reason I hate overcommit_memory. size of VM doesn't tell anything. > Right, because in Vedran's latest oom log it shows that Xorg is preferred more than any other thread other than the memory hogging test program with your patch than without. I pointed out a clear distinction in the killing order using both total_vm and rss in that log and in my opinion killing Xorg as opposed to krunner would be undesireable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 8:38 ` David Rientjes @ 2009-10-29 11:11 ` Vedran Furač 2009-10-29 19:53 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-29 11:11 UTC (permalink / raw) To: David Rientjes Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > Right, because in Vedran's latest oom log it shows that Xorg is preferred > more than any other thread other than the memory hogging test program with > your patch than without. I pointed out a clear distinction in the killing > order using both total_vm and rss in that log and in my opinion killing > Xorg as opposed to krunner would be undesireable. But then you should rename OOM killer to TRIPK: Totally Random Innocent Process Killer If you have OOM situation and Xorg is the first, that means it's leaking memory badly and the system is probably already frozen/FUBAR. Killing krunner in that situation wouldn't do any good. From a user perspective, nothing changes, system is still FUBAR and (s)he would probably reboot cursing linux in the process. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 11:11 ` Vedran Furač @ 2009-10-29 19:53 ` David Rientjes 2009-10-29 23:48 ` KAMEZAWA Hiroyuki 2009-10-30 13:59 ` Vedran Furač 0 siblings, 2 replies; 77+ messages in thread From: David Rientjes @ 2009-10-29 19:53 UTC (permalink / raw) To: vedran.furac Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Thu, 29 Oct 2009, Vedran Furac wrote: > But then you should rename OOM killer to TRIPK: > Totally Random Innocent Process Killer > The randomness here is the order of the child list when the oom killer selects a task, based on the badness score, and then tries to kill a child with a different mm before the parent. The problem you identified in http://pastebin.com/f3f9674a0, however, is a forkbomb issue where the badness score should never have been so high for kdeinit4 compared to "test". That's directly proportional to adding the scores of all disjoint child total_vm values into the badness score for the parent and then killing the children instead. That's the problem, not using total_vm as a baseline. Replacing that with rss is not going to solve the issue and reducing the user's ability to specify a rough oom priority from userspace is simply not an option. > If you have OOM situation and Xorg is the first, that means it's leaking > memory badly and the system is probably already frozen/FUBAR. Killing > krunner in that situation wouldn't do any good. From a user perspective, > nothing changes, system is still FUBAR and (s)he would probably reboot > cursing linux in the process. > It depends on what you're running, we need to be able to have the option of protecting very large tasks on production servers. Imagine if "test" here is actually a critical application that we need to protect, its not solely mlocked anonymous memory, but still kill if it is leaking memory beyond your approximate 2.5GB. How do you do that when using rss as the baseline? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 19:53 ` David Rientjes @ 2009-10-29 23:48 ` KAMEZAWA Hiroyuki 2009-10-30 9:10 ` David Rientjes 2009-10-30 13:59 ` Vedran Furač 1 sibling, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-29 23:48 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Thu, 29 Oct 2009 12:53:42 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > > If you have OOM situation and Xorg is the first, that means it's leaking > > memory badly and the system is probably already frozen/FUBAR. Killing > > krunner in that situation wouldn't do any good. From a user perspective, > > nothing changes, system is still FUBAR and (s)he would probably reboot > > cursing linux in the process. > > > > It depends on what you're running, we need to be able to have the option > of protecting very large tasks on production servers. Imagine if "test" > here is actually a critical application that we need to protect, its > not solely mlocked anonymous memory, but still kill if it is leaking > memory beyond your approximate 2.5GB. How do you do that when using rss > as the baseline? As I wrote repeatedly, - OOM-Killer itselfs is bad thing, bad situation. - The kernel can't know the program is bad or not. just guess it. - Then, there is no "correct" OOM-Killer other than fork-bomb killer. - User has a knob as oom_adj. This is very strong. Then, there is only "reasonable" or "easy-to-understand" OOM-Kill. "Current biggest memory eater is killed" sounds reasonable, easy to understand. And if total_vm works well, overcommit_guess should catch it. Please improve overcommit_guess if you want to stay on total_vm. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 23:48 ` KAMEZAWA Hiroyuki @ 2009-10-30 9:10 ` David Rientjes 2009-10-30 9:36 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-30 9:10 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote: > As I wrote repeatedly, > > - OOM-Killer itselfs is bad thing, bad situation. Not necessarily, the memory controller and cpusets uses it quite often to enforce it's policy and is standard runtime behavior. We'd like to imagine that our cpuset will never be too small to run all the attached jobs, but that happens and we can easily recover from it by killing a task. > - The kernel can't know the program is bad or not. just guess it. Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We can tell the kernel what we'd like the oom killer behavior should be if the situation arises. > - Then, there is no "correct" OOM-Killer other than fork-bomb killer. Well of course there is, you're seeing this is a WAY too simplistic manner. If we are oom, we want to be able to influence how the oom killer behaves and respond to that situation. You are proposing that we change the baseline for how the oom killer selects tasks which we use CONSTANTLY as part of our normal production environment. I'd appreciate it if you'd take it a little more seriously. > - User has a knob as oom_adj. This is very strong. > Agreed. > Then, there is only "reasonable" or "easy-to-understand" OOM-Kill. > "Current biggest memory eater is killed" sounds reasonable, easy to > understand. And if total_vm works well, overcommit_guess should catch it. > Please improve overcommit_guess if you want to stay on total_vm. > I don't necessarily want to stay on total_vm, but I also don't want to move to rss as a baseline, as you would probably agree. We disagree about a very fundamental principle: you are coming from a perspective of always wanting to kill the biggest resident memory eater even for a single order-0 allocation that fails and I'm coming from a perspective of wanting to ensure that our machines know how the oom killer will react when it is used. Moving to rss reduces the ability of the user to specify an expected oom priority other than polarizing it by either disabling it completely with an oom_adj value of -17 or choosing the definite next victim with +15. That's my objection to it: the user cannot possibly be expected to predict what proportion of each application's memory will be resident at the time of oom. I understand you want to totally rewrite the oom killer for whatever reason, but I think you need to spend a lot more time understanding the needs that the Linux community has for its behavior instead of insisting on your point of view. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 9:10 ` David Rientjes @ 2009-10-30 9:36 ` KAMEZAWA Hiroyuki 2009-11-03 20:49 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-30 9:36 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Fri, 30 Oct 2009 02:10:37 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > > - The kernel can't know the program is bad or not. just guess it. > > Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We > can tell the kernel what we'd like the oom killer behavior should be if > the situation arises. > My point is that the server cannot distinguish memory leak from intentional memory usage. No other than that. > > - Then, there is no "correct" OOM-Killer other than fork-bomb killer. > > Well of course there is, you're seeing this is a WAY too simplistic > manner. If we are oom, we want to be able to influence how the oom killer > behaves and respond to that situation. You are proposing that we change > the baseline for how the oom killer selects tasks which we use CONSTANTLY > as part of our normal production environment. I'd appreciate it if you'd > take it a little more seriously. > Yes, I'm serious. In this summer, at lunch with a daily linux user, I was said "you, enterprise guys, don't consider desktop or laptop problem at all." yes, I use only servers. My customer uses server, too. My first priority is always on server users. But, for this time, I wrote reply to Vedran and try to fix desktop problem. Even if current logic works well for servers, "KDE/GNOME is killed" problem seems to be serious. And this may be a problem for EMBEDED people, I guess. > > - User has a knob as oom_adj. This is very strong. > > > > Agreed. > This and memcg are very useful. But everone says "bad workaround" ;( Maybe only servers can use these functions. > > Then, there is only "reasonable" or "easy-to-understand" OOM-Kill. > > "Current biggest memory eater is killed" sounds reasonable, easy to > > understand. And if total_vm works well, overcommit_guess should catch it. > > Please improve overcommit_guess if you want to stay on total_vm. > > > > I don't necessarily want to stay on total_vm, but I also don't want to > move to rss as a baseline, as you would probably agree. > I'll rewrite all. I'll not rely only on rss. There are several situations and we need some more information than we have know. I'll have to implement ways to gather information before chaging badness. > We disagree about a very fundamental principle: you are coming from a > perspective of always wanting to kill the biggest resident memory eater > even for a single order-0 allocation that fails and I'm coming from a > perspective of wanting to ensure that our machines know how the oom killer > will react when it is used. yes. > Moving to rss reduces the ability of the user to specify an expected oom > priority other than polarizing it by either > disabling it completely with an oom_adj value of -17 or choosing the > definite next victim with +15. That's my objection to it: the user cannot > possibly be expected to predict what proportion of each application's > memory will be resident at the time of oom. > I can say the same thing to total_vm size. total_vm size doesn't include any good information for oom situation. And tweaking based on that not-useful parameter will make things worse. For oom_adj tweak, we may need other technique other than "shift". If I've wrote oom_adj, I'll write it as /proc/<pid>/guarantee_nooom_size #echo 3G > /proc/<pid>/guarantee_nooom_size Then, 3G bytes of this process's memory usage will not be accounted to badness. I'm not sure I can add new interface or replace oom_adj, now. But to do this, current chilren's score problem etc...should be fixed. > I understand you want to totally rewrite the oom killer for whatever > reason, but I think you need to spend a lot more time understanding the > needs that the Linux community has for its behavior instead of insisting > on your point of view. > yes, use more time. I don't think all of changes can be in quick work. To be honest, this is a part of work to implement "custom oom handler" cgroup. Before going further, I'd like to fix current problem. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 9:36 ` KAMEZAWA Hiroyuki @ 2009-11-03 20:49 ` David Rientjes 2009-11-04 0:50 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-11-03 20:49 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote: > > > - The kernel can't know the program is bad or not. just guess it. > > > > Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We > > can tell the kernel what we'd like the oom killer behavior should be if > > the situation arises. > > > > My point is that the server cannot distinguish memory leak from intentional > memory usage. No other than that. > That's a different point. Today, we can influence the badness score of any user thread to prioritize oom killing from userspace and that can be done regardless of whether there's a memory leaker, a fork bomber, etc. The priority based oom killing is important to production scenarios and cannot be replaced by a heuristic that works everytime if it cannot be influenced by userspace. A spike in memory consumption when a process is initially forked would be defined as a memory leaker in your quiet_time model. > In this summer, at lunch with a daily linux user, I was said > "you, enterprise guys, don't consider desktop or laptop problem at all." > yes, I use only servers. My customer uses server, too. My first priority > is always on server users. > But, for this time, I wrote reply to Vedran and try to fix desktop problem. > Even if current logic works well for servers, "KDE/GNOME is killed" problem > seems to be serious. And this may be a problem for EMBEDED people, I guess. > You argued before that the problem wasn't specific to X (after I said you could protect it very trivially with /proc/pid/oom_adj set to OOM_DISABLE), but that's now your reasoning for rewriting the oom killer heuristics? > I can say the same thing to total_vm size. total_vm size doesn't include any > good information for oom situation. And tweaking based on that not-useful > parameter will make things worse. > Tweaking on the heuristic will probably make it more convoluted and overall worse, I agree. But it's a more stable baseline than rss from which we can set oom killing priorities from userspace. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-11-03 20:49 ` David Rientjes @ 2009-11-04 0:50 ` KAMEZAWA Hiroyuki 2009-11-04 1:58 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-04 0:50 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 3 Nov 2009 12:49:52 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > On Fri, 30 Oct 2009, KAMEZAWA Hiroyuki wrote: > > > > > - The kernel can't know the program is bad or not. just guess it. > > > > > > Totally irrelevant, given your fourth point about /proc/pid/oom_adj. We > > > can tell the kernel what we'd like the oom killer behavior should be if > > > the situation arises. > > > > > > > My point is that the server cannot distinguish memory leak from intentional > > memory usage. No other than that. > > > > That's a different point. Today, we can influence the badness score of > any user thread to prioritize oom killing from userspace and that can be > done regardless of whether there's a memory leaker, a fork bomber, etc. > The priority based oom killing is important to production scenarios and > cannot be replaced by a heuristic that works everytime if it cannot be > influenced by userspace. > I don't removed oom_adj... > A spike in memory consumption when a process is initially forked would be > defined as a memory leaker in your quiet_time model. > I'll rewrite or drop quiet_time. > > In this summer, at lunch with a daily linux user, I was said > > "you, enterprise guys, don't consider desktop or laptop problem at all." > > yes, I use only servers. My customer uses server, too. My first priority > > is always on server users. > > But, for this time, I wrote reply to Vedran and try to fix desktop problem. > > Even if current logic works well for servers, "KDE/GNOME is killed" problem > > seems to be serious. And this may be a problem for EMBEDED people, I guess. > > > > You argued before that the problem wasn't specific to X (after I said you > could protect it very trivially with /proc/pid/oom_adj set to > OOM_DISABLE), but that's now your reasoning for rewriting the oom killer > heuristics? > One of reasons. My cusotomers always suffers from "OOM-RANDOM-KILLER". Why I mentioned about "lunch" is for saying that "I'm not working _only_ for servers." ok ? > > I can say the same thing to total_vm size. total_vm size doesn't include any > > good information for oom situation. And tweaking based on that not-useful > > parameter will make things worse. > > > > Tweaking on the heuristic will probably make it more convoluted and > overall worse, I agree. But it's a more stable baseline than rss from > which we can set oom killing priorities from userspace. - "rss < total_vm_size" always. - oom_adj culculation is quite strong. - total_vm of processes which maps hugetlb is very big ....but killing them is no help for usual oom. I recommend you to add "stable baseline" knob for user space, as I wrote. My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough large. If users can estimate how their process uses memory, it will be good thing. I'll add some other than oom_adj (I don't say I'll drop oom_adj). Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-11-04 0:50 ` KAMEZAWA Hiroyuki @ 2009-11-04 1:58 ` David Rientjes 2009-11-04 2:17 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-11-04 1:58 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > > That's a different point. Today, we can influence the badness score of > > any user thread to prioritize oom killing from userspace and that can be > > done regardless of whether there's a memory leaker, a fork bomber, etc. > > The priority based oom killing is important to production scenarios and > > cannot be replaced by a heuristic that works everytime if it cannot be > > influenced by userspace. > > > I don't removed oom_adj... > Right, but we must ensure that we have the same ability to influence a priority based oom killing scheme from userspace as we currently do with a relatively static total_vm. total_vm may not be the optimal baseline, but it does allow users to tune oom_adj specifically to identify tasks that are using more memory than expected and to be static enough to not depend on rss, for example, that is really hard to predict at the time of oom. That's actually my main goal in this discussion: to avoid losing any ability of userspace to influence to priority of tasks being oom killed (if you haven't noticed :). > > Tweaking on the heuristic will probably make it more convoluted and > > overall worse, I agree. But it's a more stable baseline than rss from > > which we can set oom killing priorities from userspace. > > - "rss < total_vm_size" always. But rss is much more dynamic than total_vm, that's my point. > - oom_adj culculation is quite strong. > - total_vm of processes which maps hugetlb is very big ....but killing them > is no help for usual oom. > > I recommend you to add "stable baseline" knob for user space, as I wrote. > My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough > large. > There's no clear relationship between VM size and runtime. The forkbomb heuristic itself could easily return a badness of ULONG_MAX if one is detected using runtime and number of children, as I earlier proposed, but that doesn't seem helpful to factor into the scoring. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-11-04 1:58 ` David Rientjes @ 2009-11-04 2:17 ` KAMEZAWA Hiroyuki 2009-11-04 3:10 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-04 2:17 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 3 Nov 2009 17:58:04 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > > > > That's a different point. Today, we can influence the badness score of > > > any user thread to prioritize oom killing from userspace and that can be > > > done regardless of whether there's a memory leaker, a fork bomber, etc. > > > The priority based oom killing is important to production scenarios and > > > cannot be replaced by a heuristic that works everytime if it cannot be > > > influenced by userspace. > > > > > I don't removed oom_adj... > > > > Right, but we must ensure that we have the same ability to influence a > priority based oom killing scheme from userspace as we currently do with a > relatively static total_vm. total_vm may not be the optimal baseline, but > it does allow users to tune oom_adj specifically to identify tasks that > are using more memory than expected and to be static enough to not depend > on rss, for example, that is really hard to predict at the time of oom. > > That's actually my main goal in this discussion: to avoid losing any > ability of userspace to influence to priority of tasks being oom killed > (if you haven't noticed :). > > > > Tweaking on the heuristic will probably make it more convoluted and > > > overall worse, I agree. But it's a more stable baseline than rss from > > > which we can set oom killing priorities from userspace. > > > > - "rss < total_vm_size" always. > > But rss is much more dynamic than total_vm, that's my point. > My point and your point are differnt. 1. All my concern is "baseline for heuristics" 2. All your concern is "baseline for knob, as oom_adj" ok ? For selecting victim by the kernel, dynamic value is much more useful. Current behavior of "Random kill" and "Kill multiple processes" are too bad. Considering oom-killer is for what, I think "1" is more important. But I know what you want, so, I offers new knob which is not affected by RSS as I wrote in previous mail. Off-topic: As memcg is growing better, using OOM-Killer for resource control should be ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, but plz consider to use memcg. > > - oom_adj culculation is quite strong. > > - total_vm of processes which maps hugetlb is very big ....but killing them > > is no help for usual oom. > > > > I recommend you to add "stable baseline" knob for user space, as I wrote. > > My patch 6 adds stable baseline bonus as 50% of vm size if run_time is enough > > large. > > > > There's no clear relationship between VM size and runtime. The forkbomb > heuristic itself could easily return a badness of ULONG_MAX if one is > detected using runtime and number of children, as I earlier proposed, but > that doesn't seem helpful to factor into the scoring. > Old processes are important, younger are not. But as I wrote, I'll drop most of patch "6". So, plz forget about this part. I'm interested in fork-bomb killer rather than crazy badness calculation, now. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-11-04 2:17 ` KAMEZAWA Hiroyuki @ 2009-11-04 3:10 ` David Rientjes 2009-11-04 3:19 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-11-04 3:10 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > My point and your point are differnt. > > 1. All my concern is "baseline for heuristics" > 2. All your concern is "baseline for knob, as oom_adj" > > ok ? For selecting victim by the kernel, dynamic value is much more useful. > Current behavior of "Random kill" and "Kill multiple processes" are too bad. > Considering oom-killer is for what, I think "1" is more important. > > But I know what you want, so, I offers new knob which is not affected by RSS > as I wrote in previous mail. > > Off-topic: > As memcg is growing better, using OOM-Killer for resource control should be > ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, > but plz consider to use memcg. > I understand what you're trying to do, and I agree with it for most desktop systems. However, I think that admins should have a very strong influence in what tasks the oom killer kills. It doesn't really matter if it's via oom_adj or not, and its debatable whether an adjustment on a static heuristic score is in our best interest in the first place. But we must have an alternative so that our control over oom killing isn't lost. I'd also like to open another topic for discussion if you're proposing such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing anything? We both agreed that it's not always in the best interest to kill a task so that an allocation can succeed, so we need to define some criteria to simply fail the allocation instead. > Old processes are important, younger are not. But as I wrote, I'll drop > most of patch "6". So, plz forget about this part. > > I'm interested in fork-bomb killer rather than crazy badness calculation, now. > Ok, great. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-11-04 3:10 ` David Rientjes @ 2009-11-04 3:19 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-11-04 3:19 UTC (permalink / raw) To: David Rientjes Cc: vedran.furac, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 3 Nov 2009 19:10:34 -0800 (PST) David Rientjes <rientjes@google.com> wrote: > On Wed, 4 Nov 2009, KAMEZAWA Hiroyuki wrote: > > > My point and your point are differnt. > > > > 1. All my concern is "baseline for heuristics" > > 2. All your concern is "baseline for knob, as oom_adj" > > > > ok ? For selecting victim by the kernel, dynamic value is much more useful. > > Current behavior of "Random kill" and "Kill multiple processes" are too bad. > > Considering oom-killer is for what, I think "1" is more important. > > > > But I know what you want, so, I offers new knob which is not affected by RSS > > as I wrote in previous mail. > > > > Off-topic: > > As memcg is growing better, using OOM-Killer for resource control should be > > ended, I think. Maybe Fake-NUMA+cpuset is working well for google system, > > but plz consider to use memcg. > > > > I understand what you're trying to do, and I agree with it for most > desktop systems. However, I think that admins should have a very strong > influence in what tasks the oom killer kills. It doesn't really matter if > it's via oom_adj or not, and its debatable whether an adjustment on a > static heuristic score is in our best interest in the first place. But we > must have an alternative so that our control over oom killing isn't lost. > I'll not go too quickly, so, let's discuss and rewrite patches more, later. I'll parepare new version in the next week. For this week, I'll post swap accounting and improve fork-bomb detector. > I'd also like to open another topic for discussion if you're proposing > such sweeping changes: at what point do we allow ~__GFP_NOFAIL allocations > to fail even if order < PAGE_ALLOC_COSTLY_ORDER and defer killing > anything? We both agreed that it's not always in the best interest to > kill a task so that an allocation can succeed, so we need to define some > criteria to simply fail the allocation instead. > Yes, I think allocation itself (> order=0) should fail more before we finally invoke OOM. It tends to be soft-landing rather than oom-killer. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 19:53 ` David Rientjes 2009-10-29 23:48 ` KAMEZAWA Hiroyuki @ 2009-10-30 13:59 ` Vedran Furač 2009-10-30 19:24 ` David Rientjes 1 sibling, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-30 13:59 UTC (permalink / raw) To: David Rientjes Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > On Thu, 29 Oct 2009, Vedran Furac wrote: > >> But then you should rename OOM killer to TRIPK: >> Totally Random Innocent Process Killer >> > > The randomness here is the order of the child list when the oom killer > selects a task, based on the badness score, and then tries to kill a child > with a different mm before the parent. > > The problem you identified in http://pastebin.com/f3f9674a0, however, is a > forkbomb issue where the badness score should never have been so high for > kdeinit4 compared to "test". That's directly proportional to adding the > scores of all disjoint child total_vm values into the badness score for > the parent and then killing the children instead. Could you explain me why ntpd invoked oom killer? Its parent is init. Or syslog-ng? > That's the problem, not using total_vm as a baseline. Replacing that with > rss is not going to solve the issue and reducing the user's ability to > specify a rough oom priority from userspace is simply not an option. OK then, if you have a solution, I would be glad to test your patch. I won't care much if you don't change total_vm as a baseline. Just make random killing history. Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 13:59 ` Vedran Furač @ 2009-10-30 19:24 ` David Rientjes 2009-11-02 19:58 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-30 19:24 UTC (permalink / raw) To: vedran.furac Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Fri, 30 Oct 2009, Vedran Furac wrote: > > The problem you identified in http://pastebin.com/f3f9674a0, however, is a > > forkbomb issue where the badness score should never have been so high for > > kdeinit4 compared to "test". That's directly proportional to adding the > > scores of all disjoint child total_vm values into the badness score for > > the parent and then killing the children instead. > > Could you explain me why ntpd invoked oom killer? Its parent is init. Or > syslog-ng? > Because it attempted an order-0 GFP_USER allocation and direct reclaim could not free any pages. The task that invoked the oom killer is simply the unlucky task that tried an allocation that couldn't be satisified through direct reclaim. It's usually unrelated to the task chosen for kill unless /proc/sys/vm/oom_kill_allocating_task is enabled (which SGI requested to avoid excessively long tasklist scans). > > That's the problem, not using total_vm as a baseline. Replacing that with > > rss is not going to solve the issue and reducing the user's ability to > > specify a rough oom priority from userspace is simply not an option. > > OK then, if you have a solution, I would be glad to test your patch. I > won't care much if you don't change total_vm as a baseline. Just make > random killing history. > The only randomness is in selecting a task that has a different mm from the parent in the order of its child list. Yes, that can be addressed by doing a smarter iteration through the children before killing one of them. Keep in mind that a heuristic as simple as this: - kill the task that was started most recently by the same uid, or - kill the task that was started most recently on the system if a root task calls the oom killer, would have yielded perfect results for your testcase but isn't necessarily something that we'd ever want to see. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 19:24 ` David Rientjes @ 2009-11-02 19:58 ` Vedran Furač 0 siblings, 0 replies; 77+ messages in thread From: Vedran Furač @ 2009-11-02 19:58 UTC (permalink / raw) To: David Rientjes Cc: KAMEZAWA Hiroyuki, Hugh Dickins, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > On Fri, 30 Oct 2009, Vedran Furac wrote: > >>> The problem you identified in http://pastebin.com/f3f9674a0, however, is a >>> forkbomb issue where the badness score should never have been so high for >>> kdeinit4 compared to "test". That's directly proportional to adding the >>> scores of all disjoint child total_vm values into the badness score for >>> the parent and then killing the children instead. >> Could you explain me why ntpd invoked oom killer? Its parent is init. Or >> syslog-ng? >> > > Because it attempted an order-0 GFP_USER allocation and direct reclaim > could not free any pages. > > The task that invoked the oom killer is simply the unlucky task that tried > an allocation that couldn't be satisified through direct reclaim. It's > usually unrelated to the task chosen for kill unless > /proc/sys/vm/oom_kill_allocating_task is enabled (which SGI requested to > avoid excessively long tasklist scans). Oh, well, I didn't know that. Maybe rephrasing of that part of the output would help eliminating future misinterpretation. >> OK then, if you have a solution, I would be glad to test your patch. I >> won't care much if you don't change total_vm as a baseline. Just make >> random killing history. > > The only randomness is in selecting a task that has a different mm from > the parent in the order of its child list. Yes, that can be addressed by > doing a smarter iteration through the children before killing one of them. > > Keep in mind that a heuristic as simple as this: > > - kill the task that was started most recently by the same uid, or > > - kill the task that was started most recently on the system if a root > task calls the oom killer, > > would have yielded perfect results for your testcase but isn't necessarily > something that we'd ever want to see. Of course, I want algorithm that works well in all possible situations. Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 4:08 ` David Rientjes 2009-10-28 4:55 ` KAMEZAWA Hiroyuki @ 2009-10-28 13:28 ` Vedran Furač 2009-10-28 20:10 ` David Rientjes 1 sibling, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-28 13:28 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > On Wed, 28 Oct 2009, Vedran Furac wrote: > >>> This is wrong; it doesn't "emulate oom" since oom_kill_process() always >>> kills a child of the selected process instead if they do not share the >>> same memory. The chosen task in that case is untouched. >> OK, I stand corrected then. Thanks! But, while testing this I lost X >> once again and "test" survived for some time (check the timestamps): >> >> http://pastebin.com/d5c9d026e >> >> - It started by killing gkrellm(!!!) >> - Then I lost X (kdeinit4 I guess) >> - Then 103 seconds after the killing started, it killed "test" - the >> real culprit. >> >> I mean... how?! >> > > Here are the five oom kills that occurred in your log, and notice that the > first four times it kills a child and not the actual task as I explained: Yes, but four times wrong. > Those are practically happening simultaneously with very little memory > being available between each oom kill. Only later is "test" killed: > > [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child > [97240.206832] Killed process 5005 (test) > > Notice how the badness score is less than 1/4th of the others. So while > you may find it to be hogging a lot of memory, there were others that > consumed much more. ^^^^^^^^^^^^^^^^^^^^^ This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty (ignoring cache). Culprit then allocates all free memory (2GB). That means it is using *more* than all other processes *together*. There cannot be any other "that consumed much more". > You can get a more detailed understanding of this by doing > > echo 1 > /proc/sys/vm/oom_dump_tasks > > before trying your testcase; it will show various information like the > total_vm Looking at total_vm (VIRT in top/vsize in ps?) is completely wrong. If I sum up those numbers for every process running I would get: %ps -eo pid,vsize,command|awk '{ SUM += $2} END {print SUM/1024/1024}' 14.7935 14GB. And I only have 3GB. I usually use exmap to get realistic numbers: http://www.berthels.co.uk/exmap/doc.html > and oom_adj value for each task at the time of oom (and the > actual badness score is exported per-task via /proc/pid/oom_score in > real-time). This will also include the rss and show what the end result > would be in using that value as part of the heuristic on this particular > workload compared to the current implementation. Thanks, I'll try that... but I guess that using rss would yield better results. Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 13:28 ` Vedran Furač @ 2009-10-28 20:10 ` David Rientjes 2009-10-29 3:05 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-28 20:10 UTC (permalink / raw) To: Vedran Furac Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, Vedran Furac wrote: > > Those are practically happening simultaneously with very little memory > > being available between each oom kill. Only later is "test" killed: > > > > [97240.203228] Out of memory: kill process 5005 (test) score 256912 or a child > > [97240.206832] Killed process 5005 (test) > > > > Notice how the badness score is less than 1/4th of the others. So while > > you may find it to be hogging a lot of memory, there were others that > > consumed much more. > ^^^^^^^^^^^^^^^^^^^^^ > > This is just wrong. I have 3.5GB of RAM, free says that 2GB are empty > (ignoring cache). Culprit then allocates all free memory (2GB). That > means it is using *more* than all other processes *together*. There > cannot be any other "that consumed much more". > Just post the oom killer results after using echo 1 > /proc/sys/vm/oom_dump_tasks as requested and it will clarify why those tasks were chosen to kill. It will also show the result of using rss instead of total_vm and allow us to see how such a change would have changed the killing order for your workload. > Thanks, I'll try that... but I guess that using rss would yield better > results. > We would know if you posted the data. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 20:10 ` David Rientjes @ 2009-10-29 3:05 ` Vedran Furač 2009-10-29 8:35 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-29 3:05 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > We would know if you posted the data. I need to find some free time to destroy a session on a computer which I use for work. You could easily test it yourself also as this doesn't happen only to me. Anyways, here it is... this time it started with ntpd: http://pastebin.com/f3f9674a0 Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 3:05 ` Vedran Furač @ 2009-10-29 8:35 ` David Rientjes 2009-10-29 11:01 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-29 8:35 UTC (permalink / raw) To: vedran.furac Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Thu, 29 Oct 2009, Vedran Furac wrote: > > We would know if you posted the data. > > I need to find some free time to destroy a session on a computer which I > use for work. You could easily test it yourself also as this doesn't > happen only to me. > > Anyways, here it is... this time it started with ntpd: > > http://pastebin.com/f3f9674a0 > That oom log shows 12 ooms but no tasks actually appear to be getting killed (there're no "Killed process 1234 (task)" found). Do you have any idea why? Anyway, as I posted in response to KAMEZAWA-san's patch, the change to get_mm_rss(mm) prefers Xorg more than the current implementation. >From your log at the link above: total_vm 669624 test 195695 krunner 187342 krusader 168881 plasma-desktop 130562 ktorrent 127081 knotify4 125881 icedove-bin 123036 akregator rss 668738 test 42191 Xorg 30761 firefox-bin 13331 icedove-bin 10234 ktorrent 9263 akregator 8864 plasma-desktop 7532 krunner Can you explain why Xorg is preferred as a baseline to kill rather than krunner in your example? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 8:35 ` David Rientjes @ 2009-10-29 11:01 ` Vedran Furač 2009-10-29 19:42 ` David Rientjes 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-29 11:01 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > On Thu, 29 Oct 2009, Vedran Furac wrote: > >>> We would know if you posted the data. >> I need to find some free time to destroy a session on a computer which I >> use for work. You could easily test it yourself also as this doesn't >> happen only to me. >> >> Anyways, here it is... this time it started with ntpd: >> >> http://pastebin.com/f3f9674a0 >> > > That oom log shows 12 ooms but no tasks actually appear to be getting > killed (there're no "Killed process 1234 (task)" found). Do you have any > idea why? That's /var/log/messages. I posted it and not dmesg because whole log didn't fit dmesg buffer, here is waht i have (compare timestamps): % dmesg|grep -i kill [ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231 or a child [ 1493.064467] Killed process 6409 (konqueror) [ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1493.261166] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265 or a child [ 1493.276538] Killed process 6411 (krusader) [ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1499.221178] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593 or a child [ 1499.236441] Killed process 6412 (irexec) [ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1499.370209] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861 or a child [ 1499.385427] Killed process 6420 (xchm) [ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1499.458333] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1499.458367] [<ffffffff81120900>] ? d_kill+0x5c/0x7c [ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690 or a child [ 1499.473582] Killed process 6425 (kio_file) [ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1500.250765] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350 or a child [ 1500.266196] Killed process 6464 (icedove) [ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1500.349371] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864 or a child [ 1500.364699] Killed process 6477 (kio_http) [ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1500.452167] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1500.452196] [<ffffffff81120900>] ? d_kill+0x5c/0x7c [ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142 or a child [ 1500.467316] Killed process 6478 (kio_http) [ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1500.780239] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331 or a child [ 1500.796290] Killed process 6484 (kio_http) [ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1501.065390] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434 or a child [ 1501.080587] Killed process 6486 (kio_http) [ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1501.381204] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691 or a child [ 1501.396346] Killed process 6487 (firefox-bin) [ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0 [ 1502.661311] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 [ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a child [ 1502.676575] Killed process 7580 (test) > Can you explain why Xorg is preferred as a baseline to kill rather than > krunner in your example? Krunner is a small app for running other apps and do similar things. It shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps and so on. That was expected result. Fist Xorg, then firefox and thunderbird. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 11:01 ` Vedran Furač @ 2009-10-29 19:42 ` David Rientjes 2009-10-30 13:53 ` Vedran Furač 0 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-29 19:42 UTC (permalink / raw) To: vedran.furac Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Thu, 29 Oct 2009, Vedran Furac wrote: > [ 1493.064458] Out of memory: kill process 6304 (kdeinit4) score 1190231 > or a child > [ 1493.064467] Killed process 6409 (konqueror) > [ 1493.261149] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1493.261166] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1493.276528] Out of memory: kill process 6304 (kdeinit4) score 1161265 > or a child > [ 1493.276538] Killed process 6411 (krusader) > [ 1499.221160] akregator invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1499.221178] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1499.236431] Out of memory: kill process 6304 (kdeinit4) score 1067593 > or a child > [ 1499.236441] Killed process 6412 (irexec) > [ 1499.370192] firefox-bin invoked oom-killer: gfp_mask=0x201da, > order=0, oomkilladj=0 > [ 1499.370209] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1499.385417] Out of memory: kill process 6304 (kdeinit4) score 1066861 > or a child > [ 1499.385427] Killed process 6420 (xchm) > [ 1499.458304] kio_file invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1499.458333] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1499.458367] [<ffffffff81120900>] ? d_kill+0x5c/0x7c > [ 1499.473573] Out of memory: kill process 6304 (kdeinit4) score 1043690 > or a child > [ 1499.473582] Killed process 6425 (kio_file) > [ 1500.250746] korgac invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.250765] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.266186] Out of memory: kill process 6304 (kdeinit4) score 1020350 > or a child > [ 1500.266196] Killed process 6464 (icedove) > [ 1500.349355] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.349371] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.364689] Out of memory: kill process 6304 (kdeinit4) score 1019864 > or a child > [ 1500.364699] Killed process 6477 (kio_http) > [ 1500.452151] kded4 invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.452167] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.452196] [<ffffffff81120900>] ? d_kill+0x5c/0x7c > [ 1500.467307] Out of memory: kill process 6304 (kdeinit4) score 993142 > or a child > [ 1500.467316] Killed process 6478 (kio_http) > [ 1500.780222] akregator invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1500.780239] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1500.796280] Out of memory: kill process 6304 (kdeinit4) score 966331 > or a child > [ 1500.796290] Killed process 6484 (kio_http) > [ 1501.065374] syslog-ng invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1501.065390] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1501.080579] Out of memory: kill process 6304 (kdeinit4) score 939434 > or a child > [ 1501.080587] Killed process 6486 (kio_http) > [ 1501.381188] knotify4 invoked oom-killer: gfp_mask=0x201da, order=0, > oomkilladj=0 > [ 1501.381204] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1501.396338] Out of memory: kill process 6304 (kdeinit4) score 912691 > or a child > [ 1501.396346] Killed process 6487 (firefox-bin) > [ 1502.661294] icedove-bin invoked oom-killer: gfp_mask=0x201da, > order=0, oomkilladj=0 > [ 1502.661311] [<ffffffff810d6dd7>] ? oom_kill_process+0x9a/0x264 > [ 1502.676563] Out of memory: kill process 7580 (test) score 708945 or a > child > [ 1502.676575] Killed process 7580 (test) > Ok, so this is the forkbomb problem by adding half of each child's total_vm into the badness score of the parent. We should address this completely seperately by addressing that specific part of the heuristic, not changing what we consider to be a baseline. The rationale is quite simple: we'll still experience the same problem with rss as we did with total_vm in the forkbomb scenario above on certain workloads (maybe not yours, but others). The oom killer always kills a child first if it has a different mm than the selected parent, so the amount of memory freeing as a result of that is entirely dependent on the order of the child list. It may be very little, but killed because its siblings had large total_vm values. So instead of focusing on rss, we simply need to find a better heuristic for the forkbomb issue which I've already proposed a very trivial solution for. Then, afterwards, we can debate about how the scoring heuristic can be changed to select better tasks (and perhaps remove a lot of the clutter that's there currently!). > > Can you explain why Xorg is preferred as a baseline to kill rather than > > krunner in your example? > > Krunner is a small app for running other apps and do similar things. It > shouldn't use a lot of memory. OTOH, Xorg has to hold all the pixmaps > and so on. That was expected result. Fist Xorg, then firefox and > thunderbird. > You're making all these claims and assertions based _solely_ on the theory that killing the application with the most resident RAM is always the optimal solution. That's just not true, especially if we're just allocating small numbers of order-0 memory. Much better is to allow the user to decide at what point, regardless of swap usage, their application is using much more memory than expected or required. They can do that right now pretty well with /proc/pid/oom_adj without this outlandish claim that they should be expected to know the rss of their applications at the time of oom to effectively tune oom_adj. What would you suggest? A script that sits in a loop checking each task's current rss from /proc/pid/stat or their current oom priority though /proc/pid/oom_score and adjusting oom_adj preemptively just in case the oom killer is invoked in the next second? And that "small app" has 30MB of rss which could be freed, if killed, and utilized for subsequent page allocations. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-29 19:42 ` David Rientjes @ 2009-10-30 13:53 ` Vedran Furač 2009-10-30 14:08 ` Thomas Fjellstrom ` (2 more replies) 0 siblings, 3 replies; 77+ messages in thread From: Vedran Furač @ 2009-10-30 13:53 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > Ok, so this is the forkbomb problem by adding half of each child's > total_vm into the badness score of the parent. We should address this > completely seperately by addressing that specific part of the heuristic, > not changing what we consider to be a baseline. > thunderbird. > > You're making all these claims and assertions based _solely_ on the theory > that killing the application with the most resident RAM is always the > optimal solution. That's just not true, especially if we're just > allocating small numbers of order-0 memory. Well, you are kernel hacker, not me. You know how linux mm works much more than I do. I just reported a, what I think is a big problem, which needs to be solved ASAP (2.6.33). I'm afraid that we'll just talk much and nothing will be done with solution/fix postponed indefinitely. Not sure if you are interested, but I tested this on windowsxp also, and nothing bad happens there, system continues to function properly. For 2-3 years I had memory overcommit turn off. I didn't get any OOM, but sometimes Java didn't work and it seems that because of some kernel weirdness (or misunderstanding on my part) I couldn't use all the available memory: # echo 2 > /proc/sys/vm/overcommit_memory # echo 95 > /proc/sys/vm/overcommit_ratio % ./test /* malloc in loop as before */ malloc: Cannot allocate memory /* Great, no OOM, but: */ % free -m total used free shared buffers cached Mem: 3458 3429 29 0 102 1119 -/+ buffers/cache: 2207 1251 There's plenty of memory available. Shouldn't cache be automatically dropped (this question was in my original mail, hence the subject)? All this frustrated not only me, but a great number of users on our local Croatian linux usenet newsgroup with some of them pointing that as the reason they use solaris. And so on... > Much better is to allow the user to decide at what point, regardless of > swap usage, their application is using much more memory than expected or > required. They can do that right now pretty well with /proc/pid/oom_adj > without this outlandish claim that they should be expected to know the rss > of their applications at the time of oom to effectively tune oom_adj. Believe me, barely a few developers use oom_adj for their applications, and probably almost none of the end users. What should they do, every time they start an application, go to console and set the oom_adj. You cannot expect them to do that. > What would you suggest? A script that sits in a loop checking each task's > current rss from /proc/pid/stat or their current oom priority though > /proc/pid/oom_score and adjusting oom_adj preemptively just in case the > oom killer is invoked in the next second? :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 13:53 ` Vedran Furač @ 2009-10-30 14:08 ` Thomas Fjellstrom 2009-10-30 15:13 ` Vedran Furač 2009-10-30 14:12 ` Andrea Arcangeli 2009-10-30 19:44 ` David Rientjes 2 siblings, 1 reply; 77+ messages in thread From: Thomas Fjellstrom @ 2009-10-30 14:08 UTC (permalink / raw) To: linux-kernel, vedran.furac Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Fri October 30 2009, Vedran Furač wrote: > David Rientjes wrote: > > Ok, so this is the forkbomb problem by adding half of each child's > > total_vm into the badness score of the parent. We should address this > > completely seperately by addressing that specific part of the > > heuristic, not changing what we consider to be a baseline. > > thunderbird. > > > > You're making all these claims and assertions based _solely_ on the > > theory that killing the application with the most resident RAM is > > always the optimal solution. That's just not true, especially if we're > > just allocating small numbers of order-0 memory. > > Well, you are kernel hacker, not me. You know how linux mm works much > more than I do. I just reported a, what I think is a big problem, which > needs to be solved ASAP (2.6.33). I'm afraid that we'll just talk much > and nothing will be done with solution/fix postponed indefinitely. Not > sure if you are interested, but I tested this on windowsxp also, and > nothing bad happens there, system continues to function properly. > > For 2-3 years I had memory overcommit turn off. I didn't get any OOM, > but sometimes Java didn't work and it seems that because of some kernel > weirdness (or misunderstanding on my part) I couldn't use all the > available memory: > > # echo 2 > /proc/sys/vm/overcommit_memory > > # echo 95 > /proc/sys/vm/overcommit_ratio > % ./test /* malloc in loop as before */ > malloc: Cannot allocate memory /* Great, no OOM, but: */ > > % free -m > total used free shared buffers cached > Mem: 3458 3429 29 0 102 1119 > -/+ buffers/cache: 2207 1251 > > There's plenty of memory available. Shouldn't cache be automatically > dropped (this question was in my original mail, hence the subject)? > > All this frustrated not only me, but a great number of users on our > local Croatian linux usenet newsgroup with some of them pointing that as > the reason they use solaris. And so on... I think this is the MOST serious issue related to the oom killer. For some reason it refuses to drop pages before trying to kill. When it should drop cache, THEN kill if needed. > > Much better is to allow the user to decide at what point, regardless of > > swap usage, their application is using much more memory than expected > > or required. They can do that right now pretty well with > > /proc/pid/oom_adj without this outlandish claim that they should be > > expected to know the rss of their applications at the time of oom to > > effectively tune oom_adj. > > Believe me, barely a few developers use oom_adj for their applications, > and probably almost none of the end users. What should they do, every > time they start an application, go to console and set the oom_adj. You > cannot expect them to do that. > > > What would you suggest? A script that sits in a loop checking each > > task's current rss from /proc/pid/stat or their current oom priority > > though /proc/pid/oom_score and adjusting oom_adj preemptively just in > > case the oom killer is invoked in the next second? > > > :) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" > in the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Thomas Fjellstrom tfjellstrom@shaw.ca -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 14:08 ` Thomas Fjellstrom @ 2009-10-30 15:13 ` Vedran Furač 0 siblings, 0 replies; 77+ messages in thread From: Vedran Furač @ 2009-10-30 15:13 UTC (permalink / raw) To: tfjellstrom Cc: linux-kernel, David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli Thomas Fjellstrom wrote: >> malloc: Cannot allocate memory /* Great, no OOM, but: */ >> >> % free -m total used free shared buffers cached >> Mem: 3458 3429 29 0 102 1119 >> -/+ buffers/cache: 2207 1251 >> >> There's plenty of memory available. Shouldn't cache be >> automatically dropped (this question was in my original mail, hence >> the subject)? >> > > I think this is the MOST serious issue related to the oom killer. For > some reason it refuses to drop pages before trying to kill. When it > should drop cache, THEN kill if needed. This isn't about OOM, but situation when you turn off overcommit. I was jumping to conclusion here. You can drop caches manually with: # echo 1 > /proc/sys/vm/drop_caches but you still get: "malloc: Cannot allocate memory" even if almost nothing is cached: total used free shared buffers cached Mem: 3458 2210 1248 0 3 90 -/+ buffers/cache: 2116 1342 As for not dropping pages by kernel before killing, I don't know nothing about it. It happens so fast and I never tried to measure it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 13:53 ` Vedran Furač 2009-10-30 14:08 ` Thomas Fjellstrom @ 2009-10-30 14:12 ` Andrea Arcangeli 2009-10-30 14:41 ` Vedran Furač 2009-10-30 19:44 ` David Rientjes 2 siblings, 1 reply; 77+ messages in thread From: Andrea Arcangeli @ 2009-10-30 14:12 UTC (permalink / raw) To: Vedran Furač Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton On Fri, Oct 30, 2009 at 02:53:33PM +0100, Vedran FuraA? wrote: > % free -m > total used free shared buffers cached > Mem: 3458 3429 29 0 102 1119 > -/+ buffers/cache: 2207 1251 > > There's plenty of memory available. Shouldn't cache be automatically > dropped (this question was in my original mail, hence the subject)? This is not about cache, cache amount is physical, this about virtual amount that can only go in ram or swap (at any later time, current time is irrelevant) vs "ram + swap". In short add more swap if you don't like overcommit and check grep Commit /proc/meminfo in case this is accounting bug... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 14:12 ` Andrea Arcangeli @ 2009-10-30 14:41 ` Vedran Furač 2009-10-30 15:15 ` Andrea Arcangeli 0 siblings, 1 reply; 77+ messages in thread From: Vedran Furač @ 2009-10-30 14:41 UTC (permalink / raw) To: Andrea Arcangeli Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton Andrea Arcangeli wrote: > On Fri, Oct 30, 2009 at 02:53:33PM +0100, Vedran FuraA? wrote: >> % free -m >> total used free shared buffers cached >> Mem: 3458 3429 29 0 102 1119 >> -/+ buffers/cache: 2207 1251 >> >> There's plenty of memory available. Shouldn't cache be automatically >> dropped (this question was in my original mail, hence the subject)? > > This is not about cache, cache amount is physical, this about > virtual amount that can only go in ram or swap (at any later time, > current time is irrelevant) vs "ram + swap". Oh... so this is because apps "reserve" (Committed_AS?) more then they currently need. > In short add more swap if > you don't like overcommit and check grep Commit /proc/meminfo in case > this is accounting bug... A the time of "malloc: Cannot allocate memory": CommitLimit: 3364440 kB Committed_AS: 3240200 kB So probably everything is ok (and free is misleading). Overcommit is unfortunately necessary if I want to be able to use all my memory. Btw. http://www.redhat.com/advice/tips/meminfo.html says Committed_AS is a (gu)estimate. Hope it is a good (not to high) guesstimate. :) Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 14:41 ` Vedran Furač @ 2009-10-30 15:15 ` Andrea Arcangeli 2009-10-30 16:24 ` Hugh Dickins 2009-11-02 19:56 ` Vedran Furač 0 siblings, 2 replies; 77+ messages in thread From: Andrea Arcangeli @ 2009-10-30 15:15 UTC (permalink / raw) To: Vedran Furač Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton On Fri, Oct 30, 2009 at 03:41:12PM +0100, Vedran FuraA? wrote: > Oh... so this is because apps "reserve" (Committed_AS?) more then they > currently need. They don't actually reserve, they end up "reserving" if overcommit is set to 2 (OVERCOMMIT_NEVER)... Apps aren't reserving, more likely they simply avoid a flood of mmap when a single one is enough to map an huge MAP_PRIVATE region like shared libs that you may only execute partially (this is why total_vm is usually much bigger than real ram mapped by pagetables represented in rss). But those shared libs are 99% pageable and they don't need to stay in swap or ram, so overcommit-as greatly overstimates the actual needs even if shared lib loading wouldn't be 64bit optimized (i.e. large and a single one). > A the time of "malloc: Cannot allocate memory": > > CommitLimit: 3364440 kB > Committed_AS: 3240200 kB > > So probably everything is ok (and free is misleading). Overcommit is > unfortunately necessary if I want to be able to use all my memory. Add more swap. > Btw. http://www.redhat.com/advice/tips/meminfo.html says Committed_AS is > a (gu)estimate. Hope it is a good (not to high) guesstimate. :) It is a guess in the sense to guarantee no ENOMEM it has to take into account the worst possible case, that is all shared lib MAP_PRIVATE mappings are cowed, which is very far from reality. Other than that the overcommitas should exactly match all mmapped possibly writeable space that can only fit in ram+swap, so from that point of view it's not a guessed number (modulo the smp read out of order). The only guess is how much slab, cache and other stuff is freeable, which doesn't provide true perfection to OVERCOMMIT_NEVER. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 15:15 ` Andrea Arcangeli @ 2009-10-30 16:24 ` Hugh Dickins 2009-11-02 19:56 ` Vedran Furač 1 sibling, 0 replies; 77+ messages in thread From: Hugh Dickins @ 2009-10-30 16:24 UTC (permalink / raw) To: Andrea Arcangeli Cc: Vedran Furač, David Rientjes, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton On Fri, 30 Oct 2009, Andrea Arcangeli wrote: > > It is a guess in the sense to guarantee no ENOMEM it has to take into > account the worst possible case, that is all shared lib MAP_PRIVATE > mappings are cowed, which is very far from reality. A MAP_PRIVATE area is only counted into Committed_AS when it is or has in the past been PROT_WRITE. I think it's up to the ELF header of the shared library whether a section is PROT_WRITE or not; but it looks like many are not, so Committed_AS should be (a little) nearer reality than you fear. Though we do account for Committed_AS, even while allowing overcommit, we do not at present account for Committed_AS per mm. Seeing David and KAMEZAWA-san debating over total_vm versus rss versus anon_rss, I wonder whether such a "commit" count might be a better measure for OOM choices (but shmem is as usual awkward: though accounted just once in Committed_AS, it would probably have to be accounted to every mm that maps it). Just an idea to throw into the mix. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 15:15 ` Andrea Arcangeli 2009-10-30 16:24 ` Hugh Dickins @ 2009-11-02 19:56 ` Vedran Furač 1 sibling, 0 replies; 77+ messages in thread From: Vedran Furač @ 2009-11-02 19:56 UTC (permalink / raw) To: Andrea Arcangeli Cc: David Rientjes, Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton Andrea Arcangeli wrote: > On Fri, Oct 30, 2009 at 03:41:12PM +0100, Vedran FuraA? wrote: >> Oh... so this is because apps "reserve" (Committed_AS?) more then they >> currently need. > > They don't actually reserve, they end up "reserving" if overcommit is > set to 2 (OVERCOMMIT_NEVER)... Apps aren't reserving, more likely they > simply avoid a flood of mmap when a single one is enough to map an > huge MAP_PRIVATE region like shared libs that you may only execute > partially (this is why total_vm is usually much bigger than real ram > mapped by pagetables represented in rss). But those shared libs are > 99% pageable and they don't need to stay in swap or ram, so > overcommit-as greatly overstimates the actual needs even if shared lib > loading wouldn't be 64bit optimized (i.e. large and a single one). Thanks for info! >> A the time of "malloc: Cannot allocate memory": >> >> CommitLimit: 3364440 kB >> Committed_AS: 3240200 kB >> >> So probably everything is ok (and free is misleading). Overcommit is >> unfortunately necessary if I want to be able to use all my memory. > > Add more swap. I don't use swap. With current prices of RAM, swap is history, at least for desktops. I hate when e.g. firefox gets swapped out if I don't use it for a while. Removing swap decreased desktop latencies drastically. And I don't care much if I'll loose 100MB of potential free memory that could be used for disk cache... Regards. Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 13:53 ` Vedran Furač 2009-10-30 14:08 ` Thomas Fjellstrom 2009-10-30 14:12 ` Andrea Arcangeli @ 2009-10-30 19:44 ` David Rientjes 2009-11-02 19:56 ` Vedran Furač 2 siblings, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-30 19:44 UTC (permalink / raw) To: vedran.furac Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli On Fri, 30 Oct 2009, Vedran Furac wrote: > Well, you are kernel hacker, not me. You know how linux mm works much > more than I do. I just reported a, what I think is a big problem, which > needs to be solved ASAP (2.6.33). The oom killer heuristics have not been changed recently, why is this suddenly a problem that needs to be immediately addressed? The heuristics you've been referring to have been used for at least three years. > I'm afraid that we'll just talk much > and nothing will be done with solution/fix postponed indefinitely. Not > sure if you are interested, but I tested this on windowsxp also, and > nothing bad happens there, system continues to function properly. > I'm totally sympathetic to testcases such as your own where the oom killer seems to react in an undesirable way. I agree that it could do a much better job at targeting "test" and killing it without negatively impacting other tasks. However, I don't think we can simply change the baseline (like the rss change which has been added to -mm (??)) and consider it a major improvement when it severely impacts how system administrators are able to tune the badness heuristic from userspace via /proc/pid/oom_adj. I'm sure you'd agree that user input is important in this matter and so that we should maximize that ability rather than make it more difficult. That's my main criticism of the suggestions thus far (and, sorry, but I have to look out for production server interests here: you can't take away our ability to influence oom badness scoring just because other simple heuristics may be more understandable). > > Much better is to allow the user to decide at what point, regardless of > > swap usage, their application is using much more memory than expected or > > required. They can do that right now pretty well with /proc/pid/oom_adj > > without this outlandish claim that they should be expected to know the rss > > of their applications at the time of oom to effectively tune oom_adj. > > Believe me, barely a few developers use oom_adj for their applications, > and probably almost none of the end users. What should they do, every > time they start an application, go to console and set the oom_adj. You > cannot expect them to do that. > oom_adj is an extremely important part of our infrastructure and although the majority of Linux users may not use it (I know a number of opensource programs that tune its own, however), we can't let go of our ability to specify an oom killing priority. There are no simple solutions to this problem: the model proposed thus far, which has basically been to acknowledge that oom killer is a bad thing to encounter (but within that, some rationale was found that we can react however we want??) and should be extremely easy to understand (just kill the memory hogger with the most resident RAM) is a non-starter. What would be better, and what I think we'll end up with, is a root selectable heuristic so that production servers and desktop machines can use different heuristics to make oom kill selections. We already have /proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to address concerns specifically of SGI and their enormously long tasklist scans. This would be variation on that idea and would include different simplistic behaviors (such as always killing the most memory hogging task, killing the most recently started task by the same uid, etc), and leave the default heuristic much the same as currently. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-30 19:44 ` David Rientjes @ 2009-11-02 19:56 ` Vedran Furač 0 siblings, 0 replies; 77+ messages in thread From: Vedran Furač @ 2009-11-02 19:56 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, KAMEZAWA Hiroyuki, linux-mm, linux-kernel, KOSAKI Motohiro, minchan.kim, Andrew Morton, Andrea Arcangeli David Rientjes wrote: > On Fri, 30 Oct 2009, Vedran Furac wrote: > >> Well, you are kernel hacker, not me. You know how linux mm works much >> more than I do. I just reported a, what I think is a big problem, which >> needs to be solved ASAP (2.6.33). > > The oom killer heuristics have not been changed recently, why is this > suddenly a problem that needs to be immediately addressed? The heuristics > you've been referring to have been used for at least three years. It isn't "suddenly a problem", but only a problem, big long time problem. If it is three years old, then it should have been addressed asap three years ago (and we would not need to talk about it now, hopefully). > However, I don't think we can simply change the baseline (like the rss > change which has been added to -mm (??)) and consider it a major > improvement when it severely impacts how system administrators are able to > tune the badness heuristic from userspace via /proc/pid/oom_adj. I'm sure > you'd agree that user input is important in this matter and so that we > should maximize that ability rather than make it more difficult. That's > my main criticism of the suggestions thus far (and, sorry, but I have to > look out for production server interests here: you can't take away our > ability to influence oom badness scoring just because other simple > heuristics may be more understandable). > > What would be better, and what I think we'll end up with, is a root > selectable heuristic so that production servers and desktop machines can > use different heuristics to make oom kill selections. We already have > /proc/sys/vm/oom_kill_allocating_task which I added 1-2 years ago to > address concerns specifically of SGI and their enormously long tasklist > scans. This would be variation on that idea and would include different > simplistic behaviors (such as always killing the most memory hogging task, > killing the most recently started task by the same uid, etc), and leave > the default heuristic much the same as currently. OK, agreed. Did you take a look at the set of patches Kame sent today? Regards, Vedran -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 20:44 ` Hugh Dickins 2009-10-27 21:04 ` David Rientjes @ 2009-10-28 0:43 ` KAMEZAWA Hiroyuki 2009-10-28 2:47 ` KOSAKI Motohiro 2 siblings, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 0:43 UTC (permalink / raw) To: Hugh Dickins Cc: vedran.furac, linux-mm, linux-kernel, kosaki.motohiro, minchan.kim, akpm, rientjes, aarcange On Tue, 27 Oct 2009 20:44:16 +0000 (GMT) Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote: > On Tue, 27 Oct 2009, KAMEZAWA Hiroyuki wrote: > > Sigh, gnome-session has twice value of mmap(1G). > > Of course, gnome-session only uses 6M bytes of anon. > > I wonder this is because gnome-session has many children..but need to > > dig more. Does anyone has idea ? > > When preparing KSM unmerge to handle OOM, I looked at how the precedent > was handled by running a little program which mmaps an anonymous region > of the same size as physical memory, then tries to mlock it. The > program was such an obvious candidate to be killed, I was shocked > by the poor decisions the OOM killer made. Usually I ran it with > mem=512M, with gnome and firefox active. Often the OOM killer killed > it right the first time, but went wrong when I tried it a second time > (I think that's because of what's already swapped out the first time). > > I built up a patchset of fixes, but once I came to split them up for > submission, not one of them seemed entirely satisfactory; and Andrea's > fix to the KSM/mlock deadlock forced me to abandon even the first of > the patches (we've since then fixed the way munlocking behaves, so > in theory could revisit that; but Andrea disliked what I was trying > to do there in KSM for other reasons, so I've not touched it since). > I had to get on with KSM, so I set it all aside: none of the issues > was a recent regression. > > I did briefly wonder about the reliance on total_vm which you're now > looking into, but didn't touch that at all. Let me describe those > issues which I did try but fail to fix - I've no more time to deal > with them now than then, but ought at least to mention them to you. > Okay, thank you for detailed information. > 1. select_bad_process() tries to avoid killing another process while > there's still a TIF_MEMDIE, but its loop starts by skipping !p->mm > processes. However, p->mm is set to NULL well before p reaches > exit_mmap() to actually free the memory, and there may be significant > delays in between (I think exit_robust_list() gave me a hang at one > stage). So in practice, even when the OOM killer selects the right > process to kill, there can be lots of collateral damage from it not > waiting long enough for that process to give up its memory. > Hmm. > I tried to deal with that by moving the TIF_MEMDIE test up before > the p->mm test, but adding in a check on p->exit_state: > if (test_tsk_thread_flag(p, TIF_MEMDIE) && > !p->exit_state) > return ERR_PTR(-1UL); > But this is then liable to hang the system if there's some reason > why the selected process cannot proceed to free its memory (e.g. > the current KSM unmerge case). It needs to wait "a while", but > give up if no progress is made, instead of hanging: originally > I thought that setting PF_MEMALLOC more widely in page_alloc.c, > and giving up on the TIF_MEMDIE if it was waiting in PF_MEMALLOC, > would deal with that; but we cannot be sure that waiting of memory > is the only reason for a holdup there (in the KSM unmerge case it's > waiting for an mmap_sem, and there may well be other such cases). > ok, then, easy handling can't be a help. > 2. I started out running my mlock test program as root (later > switched to use "ulimit -l unlimited" first). But badness() reckons > CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points; > and CAP_SYS_RAWIO another reason to quarter your points: so running > as root makes you sixteen times less likely to be killed. Quartering > is anyway debatable, but sixteenthing seems utterly excessive to me. > I can't agree that part of heuristics, either. > I moved the CAP_SYS_RAWIO test in with the others, so it does no > more than quartering; but is quartering appropriate anyway? I did > wonder if I was right to be "subverting" the fine-grained CAPs in > this way, but have since seen unrelated mail from one who knows > better, implying they're something of a fantasy, that su and sudo > are indeed what's used in the real world. Maybe this patch was okay. > ok. > 3. badness() has a comment above it which says: > * 5) we try to kill the process the user expects us to kill, this > * algorithm has been meticulously tuned to meet the principle > * of least surprise ... (be careful when you change it) > But Andrea's 2.6.11 86a4c6d9e2e43796bb362debd3f73c0e3b198efa (later > refined by Kurt's 2.6.16 9827b781f20828e5ceb911b879f268f78fe90815) > adds plenty of surprise there, by trying to factor children into the > calculation. Intended to deal with forkbombs, but any reasonable > process whose purpose is to fork children (e.g. gnome-session) > becomes very vulnerable. And whereas badness() itself goes on to > refine the total_vm points by various adjustments peculiar to the > process in question, those refinements have been ignored when > adding the child's total_vm/2. (Andrea does remark that he'd > rather have rewritten badness() from scratch.) > > I tried to fix this by moving the PF_OOM_ORIGIN (was PF_SWAPOFF) > part of the calculation up to select_bad_process(), making a > solo_badness() function which makes all those adjustments to > total_vm, then badness() itself a simple function adding half > the children's solo_badness()es to the process' own solo_badness(). > But probably lots more needs doing - Andrea's rewrite? > > 4. In some cases those children are sharing exactly the same mm, > yet its total_vm is being added again and again to the points: > I had a nasty inner loop searching back to see if we'd already > counted this mm (but then, what if the different tasks sharing > the mm deserved different adjustments to the total_vm?). > > > I hope these notes help someone towards a better solution > (and be prepared to discover more on the way). I agree with > Vedran that the present behaviour is pretty unimpressive, and > I'm puzzled as to how people can have been tinkering with > oom_kill.c down the years without seeing any of this. > Sorry, I usually don't use X on servers and almost all recent my OOM test was done under memcg ;( Thank you for your investigation. Maybe I'll need several steps. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-27 20:44 ` Hugh Dickins 2009-10-27 21:04 ` David Rientjes 2009-10-28 0:43 ` KAMEZAWA Hiroyuki @ 2009-10-28 2:47 ` KOSAKI Motohiro 2009-10-28 3:17 ` KAMEZAWA Hiroyuki 2009-10-28 4:12 ` David Rientjes 2 siblings, 2 replies; 77+ messages in thread From: KOSAKI Motohiro @ 2009-10-28 2:47 UTC (permalink / raw) To: Hugh Dickins Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel, minchan.kim, akpm, rientjes, aarcange > 2. I started out running my mlock test program as root (later > switched to use "ulimit -l unlimited" first). But badness() reckons > CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points; > and CAP_SYS_RAWIO another reason to quarter your points: so running > as root makes you sixteen times less likely to be killed. Quartering > is anyway debatable, but sixteenthing seems utterly excessive to me. > > I moved the CAP_SYS_RAWIO test in with the others, so it does no > more than quartering; but is quartering appropriate anyway? I did > wonder if I was right to be "subverting" the fine-grained CAPs in > this way, but have since seen unrelated mail from one who knows > better, implying they're something of a fantasy, that su and sudo > are indeed what's used in the real world. Maybe this patch was okay. I agree quartering is debatable. At least, killing quartering is worth for any user, and it can be push into -stable. From 27331555366c908a93c2cdd780b77e421869c5af Mon Sep 17 00:00:00 2001 From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Date: Wed, 28 Oct 2009 11:28:39 +0900 Subject: [PATCH] oom: Mitigate suer-user's bonus of oom-score Currently, badness calculation code of oom contemplate following bonus. - Super-user have quartering oom-score - CAP_SYS_RAWIO process (e.g. database) also have quartering oom-score The problem is, Super-users have CAP_SYS_RAWIO too. Then, they have sixteenthing bonus. it's obviously too excessive and meaningless. This patch fixes it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/oom_kill.c | 13 +++++-------- 1 files changed, 5 insertions(+), 8 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index ea2147d..40d323d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) /* * Superuser processes are usually more important, so we make it * less likely that we kill those. - */ - if (has_capability_noaudit(p, CAP_SYS_ADMIN) || - has_capability_noaudit(p, CAP_SYS_RESOURCE)) - points /= 4; - - /* - * We don't want to kill a process with direct hardware access. + * + * Plus, We don't want to kill a process with direct hardware access. * Not only could that mess up the hardware, but usually users * tend to only have this flag set on applications they think * of as important. */ - if (has_capability_noaudit(p, CAP_SYS_RAWIO)) + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || + has_capability_noaudit(p, CAP_SYS_RESOURCE) || + has_capability_noaudit(p, CAP_SYS_RAWIO)) points /= 4; /* -- 1.6.2.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 2:47 ` KOSAKI Motohiro @ 2009-10-28 3:17 ` KAMEZAWA Hiroyuki 2009-10-28 4:12 ` David Rientjes 1 sibling, 0 replies; 77+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-10-28 3:17 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Hugh Dickins, vedran.furac, linux-mm, linux-kernel, minchan.kim, akpm, rientjes, aarcange On Wed, 28 Oct 2009 11:47:55 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > 2. I started out running my mlock test program as root (later > > switched to use "ulimit -l unlimited" first). But badness() reckons > > CAP_SYS_ADMIN or CAP_SYS_RESOURCE is a reason to quarter your points; > > and CAP_SYS_RAWIO another reason to quarter your points: so running > > as root makes you sixteen times less likely to be killed. Quartering > > is anyway debatable, but sixteenthing seems utterly excessive to me. > > > > I moved the CAP_SYS_RAWIO test in with the others, so it does no > > more than quartering; but is quartering appropriate anyway? I did > > wonder if I was right to be "subverting" the fine-grained CAPs in > > this way, but have since seen unrelated mail from one who knows > > better, implying they're something of a fantasy, that su and sudo > > are indeed what's used in the real world. Maybe this patch was okay. > > I agree quartering is debatable. > At least, killing quartering is worth for any user, and it can be push into -stable. > > > > > From 27331555366c908a93c2cdd780b77e421869c5af Mon Sep 17 00:00:00 2001 > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Date: Wed, 28 Oct 2009 11:28:39 +0900 > Subject: [PATCH] oom: Mitigate suer-user's bonus of oom-score > > Currently, badness calculation code of oom contemplate following bonus. > - Super-user have quartering oom-score > - CAP_SYS_RAWIO process (e.g. database) also have quartering oom-score > > The problem is, Super-users have CAP_SYS_RAWIO too. Then, they have > sixteenthing bonus. it's obviously too excessive and meaningless. > > This patch fixes it. > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> I'll pick this up to my series. Thanks, -Kame > --- > mm/oom_kill.c | 13 +++++-------- > 1 files changed, 5 insertions(+), 8 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index ea2147d..40d323d 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) > /* > * Superuser processes are usually more important, so we make it > * less likely that we kill those. > - */ > - if (has_capability_noaudit(p, CAP_SYS_ADMIN) || > - has_capability_noaudit(p, CAP_SYS_RESOURCE)) > - points /= 4; > - > - /* > - * We don't want to kill a process with direct hardware access. > + * > + * Plus, We don't want to kill a process with direct hardware access. > * Not only could that mess up the hardware, but usually users > * tend to only have this flag set on applications they think > * of as important. > */ > - if (has_capability_noaudit(p, CAP_SYS_RAWIO)) > + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || > + has_capability_noaudit(p, CAP_SYS_RESOURCE) || > + has_capability_noaudit(p, CAP_SYS_RAWIO)) > points /= 4; > > /* > -- > 1.6.2.5 > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 2:47 ` KOSAKI Motohiro 2009-10-28 3:17 ` KAMEZAWA Hiroyuki @ 2009-10-28 4:12 ` David Rientjes 2009-10-28 8:10 ` Hugh Dickins 1 sibling, 1 reply; 77+ messages in thread From: David Rientjes @ 2009-10-28 4:12 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Hugh Dickins, KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel, minchan.kim, Andrew Morton, Andrea Arcangeli On Wed, 28 Oct 2009, KOSAKI Motohiro wrote: > I agree quartering is debatable. > At least, killing quartering is worth for any user, and it can be push into -stable. > Not sure where the -stable reference came from, I don't think this is a candidate. > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index ea2147d..40d323d 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -152,18 +152,15 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) > /* > * Superuser processes are usually more important, so we make it > * less likely that we kill those. > - */ > - if (has_capability_noaudit(p, CAP_SYS_ADMIN) || > - has_capability_noaudit(p, CAP_SYS_RESOURCE)) > - points /= 4; > - > - /* > - * We don't want to kill a process with direct hardware access. > + * > + * Plus, We don't want to kill a process with direct hardware access. > * Not only could that mess up the hardware, but usually users > * tend to only have this flag set on applications they think > * of as important. > */ > - if (has_capability_noaudit(p, CAP_SYS_RAWIO)) > + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || > + has_capability_noaudit(p, CAP_SYS_RESOURCE) || > + has_capability_noaudit(p, CAP_SYS_RAWIO)) > points /= 4; > > /* Acked-by: David Rientjes <rientjes@google.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: Memory overcommit 2009-10-28 4:12 ` David Rientjes @ 2009-10-28 8:10 ` Hugh Dickins 0 siblings, 0 replies; 77+ messages in thread From: Hugh Dickins @ 2009-10-28 8:10 UTC (permalink / raw) To: David Rientjes Cc: KOSAKI Motohiro, KAMEZAWA Hiroyuki, vedran.furac, linux-mm, linux-kernel, minchan.kim, Andrew Morton, Andrea Arcangeli On Tue, 27 Oct 2009, David Rientjes wrote: > > Not sure where the -stable reference came from, I don't think this is a > candidate. I agree with David, this is only one little piece of a messy puzzle, there's no good reason to rush this into -stable. > > + if (has_capability_noaudit(p, CAP_SYS_ADMIN) || > > + has_capability_noaudit(p, CAP_SYS_RESOURCE) || > > + has_capability_noaudit(p, CAP_SYS_RAWIO)) > > Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> (as far as it goes: the whole thing of quartering badness here because "we don't want to kill" and "important" is questionable; but definitely much more open to argument both ways than sixteenthing). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 77+ messages in thread
end of thread, other threads:[~2009-11-05 19:02 UTC | newest]
Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <hav57c$rso$1@ger.gmane.org>
[not found] ` <20091013120840.a844052d.kamezawa.hiroyu@jp.fujitsu.com>
[not found] ` <hb2cfu$r08$2@ger.gmane.org>
[not found] ` <20091014135119.e1baa07f.kamezawa.hiroyu@jp.fujitsu.com>
2009-10-20 21:52 ` Memory overcommit Vedran Furač
2009-10-26 1:55 ` KAMEZAWA Hiroyuki
2009-10-26 16:16 ` Vedran Furač
2009-10-27 3:22 ` KAMEZAWA Hiroyuki
2009-10-27 6:10 ` KOSAKI Motohiro
2009-10-27 6:34 ` Minchan Kim
2009-10-27 6:36 ` KAMEZAWA Hiroyuki
2009-10-27 6:55 ` Minchan Kim
2009-10-27 7:45 ` [RFC][PATCH] oom_kill: avoid depends on total_vm and use real RSS/swap value for oom_score (Re: " KAMEZAWA Hiroyuki
2009-10-27 7:56 ` Minchan Kim
2009-10-27 12:38 ` Andrea Arcangeli
2009-10-28 0:22 ` KAMEZAWA Hiroyuki
2009-10-28 0:45 ` Vedran Furač
2009-10-27 7:56 ` KAMEZAWA Hiroyuki
2009-10-27 8:14 ` Minchan Kim
2009-10-27 8:33 ` KAMEZAWA Hiroyuki
2009-10-27 8:52 ` Minchan Kim
2009-10-27 8:56 ` KAMEZAWA Hiroyuki
2009-10-27 17:41 ` Vedran Furač
2009-10-28 0:13 ` KAMEZAWA Hiroyuki
2009-10-27 18:39 ` Hugh Dickins
2009-10-27 18:47 ` Andrea Arcangeli
2009-10-28 0:32 ` KAMEZAWA Hiroyuki
2009-11-05 19:02 ` Pavel Machek
2009-10-28 0:28 ` KAMEZAWA Hiroyuki
2009-10-27 6:46 ` KOSAKI Motohiro
2009-10-27 6:56 ` Minchan Kim
2009-10-27 17:12 ` Vedran Furač
2009-10-27 18:02 ` KOSAKI Motohiro
2009-10-27 18:30 ` Vedran Furač
2009-10-27 20:44 ` Hugh Dickins
2009-10-27 21:04 ` David Rientjes
2009-10-28 0:08 ` Vedran Furač
2009-10-28 0:25 ` David Rientjes
2009-10-28 0:39 ` Vedran Furač
2009-10-28 4:08 ` David Rientjes
2009-10-28 4:55 ` KAMEZAWA Hiroyuki
2009-10-28 5:13 ` David Rientjes
2009-10-28 6:05 ` KAMEZAWA Hiroyuki
2009-10-28 6:17 ` David Rientjes
2009-10-28 6:20 ` KAMEZAWA Hiroyuki
2009-10-29 8:38 ` David Rientjes
2009-10-29 11:11 ` Vedran Furač
2009-10-29 19:53 ` David Rientjes
2009-10-29 23:48 ` KAMEZAWA Hiroyuki
2009-10-30 9:10 ` David Rientjes
2009-10-30 9:36 ` KAMEZAWA Hiroyuki
2009-11-03 20:49 ` David Rientjes
2009-11-04 0:50 ` KAMEZAWA Hiroyuki
2009-11-04 1:58 ` David Rientjes
2009-11-04 2:17 ` KAMEZAWA Hiroyuki
2009-11-04 3:10 ` David Rientjes
2009-11-04 3:19 ` KAMEZAWA Hiroyuki
2009-10-30 13:59 ` Vedran Furač
2009-10-30 19:24 ` David Rientjes
2009-11-02 19:58 ` Vedran Furač
2009-10-28 13:28 ` Vedran Furač
2009-10-28 20:10 ` David Rientjes
2009-10-29 3:05 ` Vedran Furač
2009-10-29 8:35 ` David Rientjes
2009-10-29 11:01 ` Vedran Furač
2009-10-29 19:42 ` David Rientjes
2009-10-30 13:53 ` Vedran Furač
2009-10-30 14:08 ` Thomas Fjellstrom
2009-10-30 15:13 ` Vedran Furač
2009-10-30 14:12 ` Andrea Arcangeli
2009-10-30 14:41 ` Vedran Furač
2009-10-30 15:15 ` Andrea Arcangeli
2009-10-30 16:24 ` Hugh Dickins
2009-11-02 19:56 ` Vedran Furač
2009-10-30 19:44 ` David Rientjes
2009-11-02 19:56 ` Vedran Furač
2009-10-28 0:43 ` KAMEZAWA Hiroyuki
2009-10-28 2:47 ` KOSAKI Motohiro
2009-10-28 3:17 ` KAMEZAWA Hiroyuki
2009-10-28 4:12 ` David Rientjes
2009-10-28 8:10 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox