From mboxrd@z Thu Jan 1 00:00:00 1970 Subject: Re: [PATCH] VM patch 3 for -ac7 References: Reply-To: zlatko@iskon.hr From: Zlatko Calusic Date: 04 Jun 2000 19:46:27 +0200 In-Reply-To: Rik van Riel's message of "Sat, 3 Jun 2000 12:17:16 -0300 (BRST)" Message-ID: <87wvk5e3v0.fsf@atlas.iskon.hr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-linux-mm@kvack.org Return-Path: To: Rik van Riel Cc: linux-mm@kvack.org, linux-kernel@vger.rutgers.edu List-ID: Hi, Rik! I tested all versions of your autotune patch (1-3) and am mostly satisfied with the direction of the development. But still, I have some objections and lots of questions. :) First, something that is bothering me for a long time now (as 2.3.42 gets more far away timewise, and I have chosen that kernel version to represent code that doesn't exhibit this particular bad behaviour): Bulk I/O is performing terribly. Spurious swapping is killing us as we read big chunks of data from disk. Ext2 optimizations and anti-fragmentation code are now officialy obsolete, because they never have a chance to come in effect. For example, check the following chunk of "vmstat 1" output: 0 0 0 6976 85140 232 4772 0 0 0 0 101 469 0 0 99 0 1 0 6976 74532 244 15284 0 0 2638 7 290 802 1 4 94 1 0 0 6976 59028 260 30772 0 0 3876 0 347 892 0 5 94 0 1 0 6976 43012 276 46772 0 0 4004 0 356 779 0 5 95 1 0 0 6976 26964 292 62900 0 0 4036 0 355 918 0 6 93 0 1 0 6976 10852 308 78900 0 0 4004 0 355 931 0 5 94 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 2 0 0 7304 3128 184 89120 0 56 2978 26 305 780 1 14 85 1 0 1 8084 2916 156 90320 0 220 2659 55 306 764 0 18 82 0 2 0 9448 2112 168 92236 104 312 1873 78 281 790 0 11 88 0 2 0 9916 2656 180 92016 264 212 795 53 199 465 0 4 96 0 1 1 10340 2956 192 92024 0 288 2175 72 268 727 1 10 89 0 2 0 10460 1936 204 92928 24 308 2588 77 296 804 1 6 93 0 1 0 10772 2028 208 93080 16 456 1706 114 252 648 0 8 92 1 0 1 10824 2900 204 92232 0 556 2402 139 298 784 0 5 94 0 2 0 10868 2036 192 93124 24 140 2767 35 301 844 0 9 91 0 2 0 11080 1944 192 93460 16 104 2526 26 286 836 0 6 94 0 1 0 11620 2604 192 93220 4 88 2553 22 277 760 0 10 90 0 1 0 11816 2164 196 93844 0 264 2620 66 292 792 0 9 91 0 2 0 12084 1840 204 94320 80 196 1416 49 232 567 0 5 95 0 1 0 12084 1708 216 94352 240 0 1467 0 219 676 0 1 98 At time T (top of the output), I started reading a big file from the 4k ext2 FS. The machine is completely idle, and as you can see has lots of memory free. Before the memory gets filled (first few lines), you can also see that data is coming at a 16MB/sec pace (bi ~ 4000), which is _exactly_ the available (and expected) bandwidth. And *then* we get in the trouble. VM kicks in and starts to swap in an' out at will. Disk heads starts thrashing with sounds similar to the ones heard when running netscape on a 16MB machine. Of course, the reading speed drops drastically, and in the end we finish 10 seconds later than we expected. I/O bandwidth is effectively halved, and why? Because we enlarged page cache from completely satisfying 90MB!!! to 95MB (by 5%!), and to get that pissy 5MB we were swapping out as mad, then processes started recolecting their pages back from the disk, then all over again... Now the question is: is such behaviour as expected or will that get fixed before the final 2.4.0? I'm worried that we are going to release the ultimate swapping machine and say to people: here is the new an' great stable kernel! Watch it swap and never stop. :) What especially bother me is that nobody sees the problem. Everybody is talking about better and better kernel, how things are getting stable and response time is getting better, but I see new releases getting worse and worse with performance going down the drain. Tell me that I'm an idiot, that system is supposed to swap all the time and then I'll maybe stop bitching. But not before. :) Second thing: Around two years before, IIRC, Linus and people on this group decided that we don't need page aging, that it only kills performance and thus code is removed, not to be seen again. I wasn't so sure then it was such a good idea, but when Andrea's lru page management got in, I become very satisfied with our page replacement policies. Obviously, with zoned memory we got in the trouble once again and now, as a solution you're getting page aging in the kernel again. Could you tell us what are you're reasons? What has changed in the meantime, so that we haven't needed page aging two years before, and now we need it? I didn't find any improvement (as can be seen from the vmstat output above). Yes, I'm well aware of what are you _trying_ to achieve, but in reality, you've just added lots of logic to the kernel and accomplished nothing. Not to say we're back to 2.1.something and history is repeating. :( Gratuitous adding of untested code this far in the development doesn't look like a very good idea to me. In the end, I hope nobody sees this rather long complaint as an attack, but rather as a call to a debate of how we could improve the kernel and hopefully get us 2.4.0 out sooner. 2.4.0 we'll be proud of. Regards, -- Zlatko -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/