* [patch] Buffers pinning inodes in icache forever
@ 2002-11-06 21:59 Stephen C. Tweedie
2002-11-06 22:40 ` Andrew Morton
2002-11-07 2:16 ` Rik van Riel
0 siblings, 2 replies; 3+ messages in thread
From: Stephen C. Tweedie @ 2002-11-06 21:59 UTC (permalink / raw)
To: Andrew Morton, linux-mm; +Cc: Stephen Tweedie
[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]
Hi,
In chasing a performance problem on a 2.4.9-based VM (yes, that one!),
we found a case where kswapd was consuming massive CPU time, 97% of
which was in prune_icache (and of that, about 7% was in the
inode_has_buffers() sub-function). slabinfo showed about 100k inodes
in use.
The hypothesis is that we've got buffers in cache pinning the inodes.
It's not pages doing the pinning because if the inode page count is
zero we never perform the inode_has_buffers() test.
On buffer write, the bh goes onto BUF_LOCKED, but never gets removed
from there. In other testing I've seen several GB of memory in
BUF_LOCKED bh'es during extensive write loads.
That's normally no problem, except that the lack of a refile_buffer()
on those bh'es also keeps them on the inode's own buffer lists. If
it's metadata that the buffers back (ie. it's all in low memory) and
the demand on the system is for highmem pages, then we're not
necessarily going to be aggressively doing try_to_release_page() on
the lowmem pages which would allow the bhes to be refiled.
Doing the refile really isn't hard, either. We expect IO completion
to be happening in approximately list order on the BUF_LOCKED list, so
simply doing a refile on any unlocked buffers at the head of that list
is going to keep it under control in O(1) time per buffer.
With the patch below we've not seen this particular pathology recur.
Comments?
--Stephen
[-- Attachment #2: io-postprocess.patch --]
[-- Type: text/plain, Size: 844 bytes --]
--- linux/fs/buffer.c.orig Fri Oct 25 09:53:43 2002
+++ linux/fs/buffer.c Fri Oct 25 10:15:51 2002
@@ -2835,6 +2835,30 @@
}
}
+
+/*
+ * Do some IO post-processing here!!!
+ */
+void do_io_postprocessing(void)
+{
+ int i;
+ struct buffer_head *bh, *next;
+
+ spin_lock(&lru_list_lock);
+ bh = lru_list[BUF_LOCKED];
+ if (bh) {
+ for (i = nr_buffers_type[BUF_LOCKED]; i-- > 0; bh = next) {
+ next = bh->b_next_free;
+
+ if (!buffer_locked(bh))
+ __refile_buffer(bh);
+ else
+ break;
+ }
+ }
+ spin_unlock(&lru_list_lock);
+}
+
/*
* This is the kernel update daemon. It was used to live in userspace
* but since it's need to run safely we want it unkillable by mistake.
@@ -2886,6 +2910,7 @@
#ifdef DEBUG
printk(KERN_DEBUG "kupdate() activated...\n");
#endif
+ do_io_postprocessing();
sync_old_buffers();
}
}
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [patch] Buffers pinning inodes in icache forever
2002-11-06 21:59 [patch] Buffers pinning inodes in icache forever Stephen C. Tweedie
@ 2002-11-06 22:40 ` Andrew Morton
2002-11-07 2:16 ` Rik van Riel
1 sibling, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2002-11-06 22:40 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: linux-mm
"Stephen C. Tweedie" wrote:
>
> ...
> Doing the refile really isn't hard, either. We expect IO completion
> to be happening in approximately list order on the BUF_LOCKED list, so
> simply doing a refile on any unlocked buffers at the head of that list
> is going to keep it under control in O(1) time per buffer.
>
> With the patch below we've not seen this particular pathology recur.
> Comments?
See IRC for discussion ;) Seems good.
2.5 will have the same problem, and doesn't have the global
buffer list. So doing it per-inode should work. Untested patch
follows. (This approach would work in 2.4 as well?)
--- 25/fs/buffer.c~remove-inode-buffers Wed Nov 6 14:17:36 2002
+++ 25-akpm/fs/buffer.c Wed Nov 6 14:24:58 2002
@@ -870,6 +870,28 @@ void invalidate_inode_buffers(struct ino
}
/*
+ * Remove any clean buffers from the inode's buffer list. This is called
+ * when we're trying to free the inode itself. Those buffers can pin it.
+ */
+void remove_inode_buffers(struct inode *inode)
+{
+ if (inode_has_buffers(inode)) {
+ struct address_space *mapping = inode->i_mapping;
+ struct list_head *list = &mapping->private_list;
+ struct address_space *buffer_mapping = mapping->assoc_mapping;
+
+ spin_lock_irq(&buffer_mapping->i_private_lock);
+ while (!list_empty(list)) {
+ struct buffer_head *bh = BH_ENTRY(list->next);
+ if (buffer_dirty(bh))
+ break;
+ __remove_assoc_queue(bh);
+ }
+ spin_unlock_irq(&buffer_mapping->i_private_lock);
+ }
+}
+
+/*
* Create the appropriate buffers when given a page for data area and
* the size of each buffer.. Use the bh->b_this_page linked list to
* follow the buffers created. Return NULL if unable to create more
--- 25/include/linux/buffer_head.h~remove-inode-buffers Wed Nov 6 14:17:36 2002
+++ 25-akpm/include/linux/buffer_head.h Wed Nov 6 14:17:36 2002
@@ -141,6 +141,7 @@ void buffer_insert_list(spinlock_t *lock
void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode);
int inode_has_buffers(struct inode *);
void invalidate_inode_buffers(struct inode *);
+void remove_inode_buffers(struct inode *inode);
int fsync_buffers_list(spinlock_t *lock, struct list_head *);
int sync_mapping_buffers(struct address_space *mapping);
void unmap_underlying_metadata(struct block_device *bdev, sector_t block);
--- 25/fs/inode.c~remove-inode-buffers Wed Nov 6 14:17:36 2002
+++ 25-akpm/fs/inode.c Wed Nov 6 14:17:36 2002
@@ -371,6 +371,8 @@ static int can_unuse(struct inode *inode
return 0;
if (atomic_read(&inode->i_count))
return 0;
+ if (inode->i_data.nrpages)
+ return 0;
return 1;
}
@@ -399,13 +401,14 @@ static void prune_icache(int nr_to_scan)
inode = list_entry(inode_unused.prev, struct inode, i_list);
- if (!can_unuse(inode)) {
+ if (inode->i_state || atomic_read(&inode->i_count)) {
list_move(&inode->i_list, &inode_unused);
continue;
}
- if (inode->i_data.nrpages) {
+ if (inode_has_buffers(inode) || inode->i_data.nrpages) {
__iget(inode);
spin_unlock(&inode_lock);
+ remove_inode_buffers(inode);
invalidate_inode_pages(&inode->i_data);
iput(inode);
spin_lock(&inode_lock);
@@ -415,8 +418,6 @@ static void prune_icache(int nr_to_scan)
continue; /* wrong inode or list_empty */
if (!can_unuse(inode))
continue;
- if (inode->i_data.nrpages)
- continue;
}
list_del_init(&inode->i_hash);
list_move(&inode->i_list, &freeable);
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [patch] Buffers pinning inodes in icache forever
2002-11-06 21:59 [patch] Buffers pinning inodes in icache forever Stephen C. Tweedie
2002-11-06 22:40 ` Andrew Morton
@ 2002-11-07 2:16 ` Rik van Riel
1 sibling, 0 replies; 3+ messages in thread
From: Rik van Riel @ 2002-11-07 2:16 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Andrew Morton, linux-mm
On Wed, 6 Nov 2002, Stephen C. Tweedie wrote:
> With the patch below we've not seen this particular pathology recur.
> Comments?
I like it. This seems like a correct and simple fix.
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Current spamtrap: <a href=mailto:"october@surriel.com">october@surriel.com</a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-11-07 2:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-06 21:59 [patch] Buffers pinning inodes in icache forever Stephen C. Tweedie
2002-11-06 22:40 ` Andrew Morton
2002-11-07 2:16 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox