* [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
@ 1999-12-22 5:58 Benjamin C.R. LaHaise
1999-12-22 15:08 ` Chuck Lever
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-12-22 5:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-mm
Here's the fix I've got for the mmap/write deadlock. I don't like it, but
the only other fixes I can think of are just as bad, or horrendously
complex. Note that the first patch to fs/buffer.c fixes a serious problem
in block_write_zero_range: a partial write to a page that is not already
cached on a file on a file system with more than two blocks per page could
result in a stack scribble -- eeek!
The patch to filemap.c changes filemap_nopage to use __find_page_nolock
rather than __find_get_page which waits for the page to become unlocked
before returning (maybe __find_get_page was meant to check PageUptodate?),
since filemap_nopage checks PageUptodate before proceeding -- which is
consistent with do_generic_file_read.
-ben
diff -ur clean/2.3.34-2/fs/buffer.c 2.3.34-2/fs/buffer.c
--- clean/2.3.34-2/fs/buffer.c Thu Dec 9 16:10:18 1999
+++ 2.3.34-2/fs/buffer.c Wed Dec 22 00:46:18 1999
@@ -1386,7 +1386,7 @@
unsigned long block;
int err = 0, partial = 0, need_balance_dirty = 0;
unsigned blocksize, bbits;
- struct buffer_head *bh, *head, *wait[2], **wait_bh=wait;
+ struct buffer_head *bh, *head, *wait[PAGE_CACHE_SIZE / 512], **wait_bh=wait;
char *kaddr = (char *)kmap(page);
blocksize = inode->i_sb->s_blocksize;
diff -ur clean/2.3.34-2/include/linux/sched.h 2.3.34-2/include/linux/sched.h
--- clean/2.3.34-2/include/linux/sched.h Mon Dec 20 18:53:12 1999
+++ 2.3.34-2/include/linux/sched.h Wed Dec 22 00:02:06 1999
@@ -349,6 +349,7 @@
/* memory management info */
struct mm_struct *mm, *active_mm;
+ struct page *write_locked_page; /* currently locked page for mmap<->write deadlock test */
/* signal handlers */
spinlock_t sigmask_lock; /* Protects signal and blocked */
@@ -426,7 +427,7 @@
/* thread */ INIT_THREAD, \
/* fs */ &init_fs, \
/* files */ &init_files, \
-/* mm */ NULL, &init_mm, \
+/* mm */ NULL, &init_mm, NULL, \
/* signals */ SPIN_LOCK_UNLOCKED, &init_signals, {{0}}, {{0}}, NULL, &init_task.sigqueue, 0, 0, \
/* exec cts */ 0,0, \
/* exit_sem */ __MUTEX_INITIALIZER(name.exit_sem), \
diff -ur clean/2.3.34-2/mm/filemap.c 2.3.34-2/mm/filemap.c
--- clean/2.3.34-2/mm/filemap.c Mon Dec 20 14:20:06 1999
+++ 2.3.34-2/mm/filemap.c Wed Dec 22 00:21:12 1999
@@ -1325,7 +1325,12 @@
*/
hash = page_hash(&inode->i_data, pgoff);
retry_find:
- page = __find_get_page(&inode->i_data, pgoff, hash);
+ spin_lock(&pagecache_lock);
+ page = __find_page_nolock(&inode->i_data, pgoff, *hash);
+ if (page)
+ get_page(page);
+ spin_unlock(&pagecache_lock);
+
if (!page)
goto no_cached_page;
@@ -1388,6 +1393,9 @@
return NULL;
page_not_uptodate:
+ if (current->write_locked_page == page)
+ return NOPAGE_SIGBUS;
+
lock_page(page);
if (Page_Uptodate(page)) {
UnlockPage(page);
@@ -1917,6 +1925,9 @@
PAGE_BUG(page);
}
+ /* Detect the deadlock */
+ current->write_locked_page = page;
+
status = write_one_page(file, page, offset, bytes, buf);
if (status >= 0) {
@@ -1928,6 +1939,7 @@
inode->i_size = pos;
}
/* Mark it unlocked again and drop the page.. */
+ current->write_locked_page = NULL;
UnlockPage(page);
page_cache_release(page);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
1999-12-22 5:58 [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range Benjamin C.R. LaHaise
@ 1999-12-22 15:08 ` Chuck Lever
1999-12-22 15:43 ` Benjamin C.R. LaHaise
0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 1999-12-22 15:08 UTC (permalink / raw)
To: Benjamin C.R. LaHaise; +Cc: Linus Torvalds, linux-mm
On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> The patch to filemap.c changes filemap_nopage to use __find_page_nolock
> rather than __find_get_page which waits for the page to become unlocked
> before returning (maybe __find_get_page was meant to check PageUptodate?),
> since filemap_nopage checks PageUptodate before proceeding -- which is
> consistent with do_generic_file_read.
i've tried this before several times. i could never get the system to
perform as well under benchmark load using find_page_nolock as when using
find_get_page. the throughput difference was about 5%, if i recall. i
haven't explained this to myself yet.
perhaps a better fix would be to take out some of the page lock complexity
from filemap_nopage? dunno.
- Chuck Lever
--
corporate: <chuckl@netscape.com>
personal: <chucklever@netscape.net> or <cel@monkey.org>
The Linux Scalability project:
http://www.citi.umich.edu/projects/linux-scalability/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
1999-12-22 15:08 ` Chuck Lever
@ 1999-12-22 15:43 ` Benjamin C.R. LaHaise
1999-12-22 15:58 ` Chuck Lever
1999-12-23 4:00 ` Chuck Lever
0 siblings, 2 replies; 5+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-12-22 15:43 UTC (permalink / raw)
To: Chuck Lever; +Cc: Linus Torvalds, linux-mm
On Wed, 22 Dec 1999, Chuck Lever wrote:
> On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> > The patch to filemap.c changes filemap_nopage to use __find_page_nolock
> > rather than __find_get_page which waits for the page to become unlocked
> > before returning (maybe __find_get_page was meant to check PageUptodate?),
> > since filemap_nopage checks PageUptodate before proceeding -- which is
> > consistent with do_generic_file_read.
>
> i've tried this before several times. i could never get the system to
> perform as well under benchmark load using find_page_nolock as when using
> find_get_page. the throughput difference was about 5%, if i recall. i
> haven't explained this to myself yet.
>
> perhaps a better fix would be to take out some of the page lock complexity
> from filemap_nopage? dunno.
Well, there certainly is a lot of code in page_cache_read /
do_generic_file_read / filemap_nopage that is duplicate, and our policies
across them are inconsistent.
Here's my hypothesis about why find_page_nolock vs find_get_page makes a
difference: using find_page_nolock means that we'll never do a
run_task_queue(&tq_disk); to get our async readahead requests run. So, in
theory, doing that in filemap_nopage will restore performance. Isn't
there a way that the choice of when to run tq_disk could be made a bit
less arbitrary?
-ben
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
1999-12-22 15:43 ` Benjamin C.R. LaHaise
@ 1999-12-22 15:58 ` Chuck Lever
1999-12-23 4:00 ` Chuck Lever
1 sibling, 0 replies; 5+ messages in thread
From: Chuck Lever @ 1999-12-22 15:58 UTC (permalink / raw)
To: Benjamin C.R. LaHaise; +Cc: Linus Torvalds, linux-mm
On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> On Wed, 22 Dec 1999, Chuck Lever wrote:
> > i've tried this before several times. i could never get the system to
> > perform as well under benchmark load using find_page_nolock as when using
> > find_get_page. the throughput difference was about 5%, if i recall. i
> > haven't explained this to myself yet.
> >
> > perhaps a better fix would be to take out some of the page lock complexity
> > from filemap_nopage? dunno.
>
> Well, there certainly is a lot of code in page_cache_read /
> do_generic_file_read / filemap_nopage that is duplicate, and our policies
> across them are inconsistent.
when i started looking at mmap read-ahead and madvise, i noticed that
there was a lot of inconsistent code duplication, and thought it would be
a good thing to fold this stuff together. that's one reason i created the
"read_cluster_nonblocking" and "page_cache_read" functions. for example,
you can remove 20-40 lines of do_generic_file_read by replacing them with
one call to page_cache_read. or you could easily try clustered reads
there.
but notice you want to do something slightly different in
generic_file_write, so that code will probably need to stay.
> Here's my hypothesis about why find_page_nolock vs find_get_page makes a
> difference: using find_page_nolock means that we'll never do a
> run_task_queue(&tq_disk); to get our async readahead requests run. So, in
> theory, doing that in filemap_nopage will restore performance.
sounds like a reasonable explanation to me, and easy enough to test, even.
i'll give that a shot later today.
> Isn't
> there a way that the choice of when to run tq_disk could be made a bit
> less arbitrary?
i suppose there's a more *efficient* way of doing it, but i think running
the queue while waiting for a page is probably a good idea. in other
words, running the queue in find_get_page seems like a good idea to me.
what did you have in mind?
- Chuck Lever
--
corporate: <chuckl@netscape.com>
personal: <chucklever@netscape.net> or <cel@monkey.org>
The Linux Scalability project:
http://www.citi.umich.edu/projects/linux-scalability/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
1999-12-22 15:43 ` Benjamin C.R. LaHaise
1999-12-22 15:58 ` Chuck Lever
@ 1999-12-23 4:00 ` Chuck Lever
1 sibling, 0 replies; 5+ messages in thread
From: Chuck Lever @ 1999-12-23 4:00 UTC (permalink / raw)
To: Benjamin C.R. LaHaise; +Cc: Linus Torvalds, linux-mm
On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> On Wed, 22 Dec 1999, Chuck Lever wrote:
> > On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> > i've tried this before several times. i could never get the system to
> > perform as well under benchmark load using find_page_nolock as when using
> > find_get_page. the throughput difference was about 5%, if i recall. i
> > haven't explained this to myself yet.
>
> Here's my hypothesis about why find_page_nolock vs find_get_page makes a
> difference: using find_page_nolock means that we'll never do a
> run_task_queue(&tq_disk); to get our async readahead requests run. So, in
> theory, doing that in filemap_nopage will restore performance. Isn't
> there a way that the choice of when to run tq_disk could be made a bit
> less arbitrary?
this patch appears to have negligible effect on benchmark throughput
measurements, whereas, without the run_task_queue, throughput drops.
btw, i notice that a "read_cache_page" function has appeared that looks
similar to "page_cache_read" -- is there necessity for both?
--- linux-2.3.34-ref/mm/filemap.c Wed Dec 22 21:23:03 1999
+++ linux/mm/filemap.c Wed Dec 22 22:53:19 1999
@@ -1325,9 +1325,13 @@
*/
hash = page_hash(&inode->i_data, pgoff);
retry_find:
- page = __find_get_page(&inode->i_data, pgoff, hash);
+ spin_lock(&pagecache_lock);
+ page = __find_page_nolock(&inode->i_data, pgoff, *hash);
if (!page)
goto no_cached_page;
+ get_page(page);
+ spin_unlock(&pagecache_lock);
+ run_task_queue(&tq_disk);
/*
* Ok, found a page in the page cache, now we need to check
@@ -1358,6 +1362,8 @@
return old_page;
no_cached_page:
+ spin_unlock(&pagecache_lock);
+
/*
* If the requested offset is within our file, try to read a whole
* cluster of pages at once.
- Chuck Lever
--
corporate: <chuckl@netscape.com>
personal: <chucklever@netscape.net> or <cel@monkey.org>
The Linux Scalability project:
http://www.citi.umich.edu/projects/linux-scalability/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~1999-12-23 4:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-22 5:58 [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range Benjamin C.R. LaHaise
1999-12-22 15:08 ` Chuck Lever
1999-12-22 15:43 ` Benjamin C.R. LaHaise
1999-12-22 15:58 ` Chuck Lever
1999-12-23 4:00 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox