From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 15 Jun 2004 16:31:59 +0200 From: Miquel van Smoorenburg Subject: Re: Keeping mmap'ed files in core regression in 2.6.7-rc Message-ID: <20040615143159.GQ19271@traveler.cistron.net> References: <20040608142918.GA7311@traveler.cistron.net> <40CAA904.8080305@yahoo.com.au> <20040614140642.GE13422@traveler.cistron.net> <40CE66EE.8090903@yahoo.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Andrew Morton , linux-mm@kvack.org List-ID: According to Nick Piggin: > Miquel van Smoorenburg wrote: > >On 2004.06.12 08:56, Nick Piggin wrote: > > > >>Miquel van Smoorenburg wrote: > >> > >> > >>>Now I tried 2.6.7-rc2 and -rc3 (well rc2-bk-latest-before-rc3) and > >>>with those kernels, performance goes to hell because no matter > >>>how much I tune, the kernel will throw out the mmap'ed pages first. > >>>RSS of the innd process hovers around 200-250 MB instead of 600. > >>> > >>Can you try the following patch please? > > > >The patch below indeed fixes this problem. Now most of the mmap'ed files > >are actually kept in memory and RSS is around 600 MB again: > > OK good. Cc'ing Andrew. I've built a small test app that creates the same I/O pattern and ran it on 2.6.6, 2.6.7-rc3 and 2.6.7-rc3+patch and running that confirms it, though not as dramatically as the real-life application. Now something else that is weird, but might be unrelated and I have not found a way to reproduce it on a different machine yet, so feel free to ignore it, I'm just mentioning it in case someone reckognizes this. The news server process uses /dev/hd[cdg]1 directly for storage (Cyclic News File System). There's about 12 MB/sec incoming being stored on those 3 (SATA) disks. Look at the vmstat output: # vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 22664 5216 277332 496644 28 0 8143 36 9785 2162 12 43 28 16 1 3 22660 231252 71808 488580 16 0 5947 33856 8868 1633 9 60 11 20 2 2 22660 273972 40988 489508 0 0 8895 21144 8875 1931 10 43 21 27 3 0 22660 236412 73620 491148 0 0 10774 10551 9877 1937 10 44 24 22 1 1 22660 185112 104112 492616 0 0 9677 12354 10216 1863 10 44 28 19 2 0 22660 148700 138388 494108 0 0 10227 13919 9976 1925 11 44 24 21 0 2 22660 123432 162032 495012 0 0 6244 15418 10065 1793 11 46 28 16 3 0 22660 93096 190452 496292 8 0 6548 10293 9860 1975 11 43 31 15 2 0 22660 51688 218628 497424 0 0 6405 52 10575 2063 13 48 27 12 3 1 22660 19012 245632 499032 8 0 8108 12400 10136 1892 11 44 24 21 2 1 22660 249192 42956 490932 0 0 8231 33005 9109 1343 10 60 13 18 0 1 22660 240396 53764 491956 0 0 10082 18625 9504 1740 10 47 24 19 2 2 22660 205632 86108 493408 0 0 8305 12368 8941 1775 8 33 32 26 0 2 22660 164672 119156 494972 0 0 6867 62 9695 1894 11 40 31 18 1 3 22660 137924 144964 496568 0 0 7099 16440 10388 1878 11 47 26 17 1 1 22660 101604 176936 498052 0 0 9166 12332 10237 1694 12 44 28 16 2 1 22660 67816 205376 499176 8 0 6169 6158 9906 1897 11 44 31 15 1 1 22660 28004 236520 500652 10 0 7418 6202 10289 1744 12 44 30 14 2 1 22660 7484 259156 492544 12 0 7494 18540 10218 1757 11 49 21 19 1 4 22660 61664 228360 494004 72 0 6131 14412 9611 2437 10 46 20 23 3 1 22660 76976 242652 498884 36 0 6927 16558 7560 2219 18 42 13 27 0 1 22660 62352 267840 501140 14 0 7358 10424 8273 2601 11 32 33 23 1 1 22660 6880 301056 502528 4 0 11045 2304 10177 2137 12 42 26 20 0 4 22660 280848 40856 494196 0 0 6583 45092 9379 1505 9 61 13 See how "cache" remains stable, but free/buffers memory is oscillating? That shouldn't happen, right ? I tried to reproduce it on another 2.6.7-rc3 system with while :; do dd if=/dev/zero of=/dev/sda8 bs=1M count=10; sleep 1; done and while I did see it oscillating once or twice after that it remained stable (buffers high / free memory low) and I can't seem to be able to reproduce it again. Yesterday I ported my rawfs module over to 2.6. It's a minimal filesystem that shows a blockdevice as a single large file. I'm letting the newsserver access that instead of the blockdevice directly so all access goes through the pagecache instead of the buffer cache and that runs much more smoothly, though it's harder to tune 'swappiness' with it - it seems to be much more "all or nothing" in that case. Anyway that's what I'm using now. Mike. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org