From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from sct@localhost) by sisko.scot.redhat.com (8.11.6/8.11.2) id gA6LxmK23126 for linux-mm@kvack.org; Wed, 6 Nov 2002 21:59:48 GMT Date: Wed, 6 Nov 2002 21:59:48 GMT Resent-Message-Id: <200211062159.gA6LxmK23126@sisko.scot.redhat.com> Message-Id: <200211062159.gA6LxmK23126@sisko.scot.redhat.com> From: "Stephen C. Tweedie" Subject: [patch] Buffers pinning inodes in icache forever Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="sHrvAb52M6C8blB9" Content-Disposition: inline Resent-To: linux-mm@kvack.org Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton , linux-mm@kvack.org Cc: Stephen Tweedie List-ID: --sHrvAb52M6C8blB9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, In chasing a performance problem on a 2.4.9-based VM (yes, that one!), we found a case where kswapd was consuming massive CPU time, 97% of which was in prune_icache (and of that, about 7% was in the inode_has_buffers() sub-function). slabinfo showed about 100k inodes in use. The hypothesis is that we've got buffers in cache pinning the inodes. It's not pages doing the pinning because if the inode page count is zero we never perform the inode_has_buffers() test. On buffer write, the bh goes onto BUF_LOCKED, but never gets removed from there. In other testing I've seen several GB of memory in BUF_LOCKED bh'es during extensive write loads. That's normally no problem, except that the lack of a refile_buffer() on those bh'es also keeps them on the inode's own buffer lists. If it's metadata that the buffers back (ie. it's all in low memory) and the demand on the system is for highmem pages, then we're not necessarily going to be aggressively doing try_to_release_page() on the lowmem pages which would allow the bhes to be refiled. Doing the refile really isn't hard, either. We expect IO completion to be happening in approximately list order on the BUF_LOCKED list, so simply doing a refile on any unlocked buffers at the head of that list is going to keep it under control in O(1) time per buffer. With the patch below we've not seen this particular pathology recur. Comments? --Stephen --sHrvAb52M6C8blB9 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="io-postprocess.patch" --- linux/fs/buffer.c.orig Fri Oct 25 09:53:43 2002 +++ linux/fs/buffer.c Fri Oct 25 10:15:51 2002 @@ -2835,6 +2835,30 @@ } } + +/* + * Do some IO post-processing here!!! + */ +void do_io_postprocessing(void) +{ + int i; + struct buffer_head *bh, *next; + + spin_lock(&lru_list_lock); + bh = lru_list[BUF_LOCKED]; + if (bh) { + for (i = nr_buffers_type[BUF_LOCKED]; i-- > 0; bh = next) { + next = bh->b_next_free; + + if (!buffer_locked(bh)) + __refile_buffer(bh); + else + break; + } + } + spin_unlock(&lru_list_lock); +} + /* * This is the kernel update daemon. It was used to live in userspace * but since it's need to run safely we want it unkillable by mistake. @@ -2886,6 +2910,7 @@ #ifdef DEBUG printk(KERN_DEBUG "kupdate() activated...\n"); #endif + do_io_postprocessing(); sync_old_buffers(); } } --sHrvAb52M6C8blB9-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/