From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <40CFBB75.1010702@yahoo.com.au> Date: Wed, 16 Jun 2004 13:16:05 +1000 From: Nick Piggin MIME-Version: 1.0 Subject: Re: Keeping mmap'ed files in core regression in 2.6.7-rc References: <20040608142918.GA7311@traveler.cistron.net> <40CAA904.8080305@yahoo.com.au> <20040614140642.GE13422@traveler.cistron.net> <40CE66EE.8090903@yahoo.com.au> <20040615143159.GQ19271@traveler.cistron.net> In-Reply-To: <20040615143159.GQ19271@traveler.cistron.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Miquel van Smoorenburg Cc: Andrew Morton , linux-mm@kvack.org List-ID: Miquel van Smoorenburg wrote: > According to Nick Piggin: > >>Miquel van Smoorenburg wrote: >> >>> >>>The patch below indeed fixes this problem. Now most of the mmap'ed files >>>are actually kept in memory and RSS is around 600 MB again: >> >>OK good. Cc'ing Andrew. > > > I've built a small test app that creates the same I/O pattern and ran it > on 2.6.6, 2.6.7-rc3 and 2.6.7-rc3+patch and running that confirms it, > though not as dramatically as the real-life application. > Can you send the test app over? Andrew, do you have any ideas about how to fix this so far? > > > Now something else that is weird, but might be unrelated and I have > not found a way to reproduce it on a different machine yet, so feel > free to ignore it, I'm just mentioning it in case someone reckognizes > this. > > The news server process uses /dev/hd[cdg]1 directly for storage > (Cyclic News File System). There's about 12 MB/sec incoming > being stored on those 3 (SATA) disks. Look at the vmstat output: > > # vmstat 2 > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 4 0 22664 5216 277332 496644 28 0 8143 36 9785 2162 12 43 28 16 > 1 3 22660 231252 71808 488580 16 0 5947 33856 8868 1633 9 60 11 20 > 2 2 22660 273972 40988 489508 0 0 8895 21144 8875 1931 10 43 21 27 > 3 0 22660 236412 73620 491148 0 0 10774 10551 9877 1937 10 44 24 22 > 1 1 22660 185112 104112 492616 0 0 9677 12354 10216 1863 10 44 28 19 > 2 0 22660 148700 138388 494108 0 0 10227 13919 9976 1925 11 44 24 21 > 0 2 22660 123432 162032 495012 0 0 6244 15418 10065 1793 11 46 28 16 > 3 0 22660 93096 190452 496292 8 0 6548 10293 9860 1975 11 43 31 15 > 2 0 22660 51688 218628 497424 0 0 6405 52 10575 2063 13 48 27 12 > 3 1 22660 19012 245632 499032 8 0 8108 12400 10136 1892 11 44 24 21 > 2 1 22660 249192 42956 490932 0 0 8231 33005 9109 1343 10 60 13 18 > 0 1 22660 240396 53764 491956 0 0 10082 18625 9504 1740 10 47 24 19 > 2 2 22660 205632 86108 493408 0 0 8305 12368 8941 1775 8 33 32 26 > 0 2 22660 164672 119156 494972 0 0 6867 62 9695 1894 11 40 31 18 > 1 3 22660 137924 144964 496568 0 0 7099 16440 10388 1878 11 47 26 17 > 1 1 22660 101604 176936 498052 0 0 9166 12332 10237 1694 12 44 28 16 > 2 1 22660 67816 205376 499176 8 0 6169 6158 9906 1897 11 44 31 15 > 1 1 22660 28004 236520 500652 10 0 7418 6202 10289 1744 12 44 30 14 > 2 1 22660 7484 259156 492544 12 0 7494 18540 10218 1757 11 49 21 19 > 1 4 22660 61664 228360 494004 72 0 6131 14412 9611 2437 10 46 20 23 > 3 1 22660 76976 242652 498884 36 0 6927 16558 7560 2219 18 42 13 27 > 0 1 22660 62352 267840 501140 14 0 7358 10424 8273 2601 11 32 33 23 > 1 1 22660 6880 301056 502528 4 0 11045 2304 10177 2137 12 42 26 20 > 0 4 22660 280848 40856 494196 0 0 6583 45092 9379 1505 9 61 13 > > See how "cache" remains stable, but free/buffers memory is oscillating? > That shouldn't happen, right ? > If it is doing IO to large regions of mapped memory, the page reclaim can start getting a bit chunky. Not much you can do about it, but it shouldn't do any harm. > I tried to reproduce it on another 2.6.7-rc3 system with > while :; do dd if=/dev/zero of=/dev/sda8 bs=1M count=10; sleep 1; done > and while I did see it oscillating once or twice after that it > remained stable (buffers high / free memory low) and I can't seem > to be able to reproduce it again. > Probably because it isn't doing mmapped IO. > Yesterday I ported my rawfs module over to 2.6. It's a minimal filesystem that > shows a blockdevice as a single large file. I'm letting the newsserver access > that instead of the blockdevice directly so all access goes through the > pagecache instead of the buffer cache and that runs much more smoothly, though > it's harder to tune 'swappiness' with it - it seems to be much more "all > or nothing" in that case. Anyway that's what I'm using now. > In 2.6, everything basically should go through the same path I think, so it really shouldn't make much difference. The fact that swappiness stops having any effect sounds like the server switched from doing mapped IO to read/write. Maybe I'm crazy... could you verify? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: aart@kvack.org