[patch] mmap<->write deadlock fix, plus bug in block_write_zero

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
@ 1999-12-22  5:58 Benjamin C.R. LaHaise
  1999-12-22 15:08 ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-12-22  5:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

Here's the fix I've got for the mmap/write deadlock.  I don't like it, but
the only other fixes I can think of are just as bad, or horrendously
complex.  Note that the first patch to fs/buffer.c fixes a serious problem
in block_write_zero_range: a partial write to a page that is not already
cached on a file on a file system with more than two blocks per page could
result in a stack scribble -- eeek!

The patch to filemap.c changes filemap_nopage to use __find_page_nolock
rather than __find_get_page which waits for the page to become unlocked
before returning (maybe __find_get_page was meant to check PageUptodate?),
since filemap_nopage checks PageUptodate before proceeding -- which is
consistent with do_generic_file_read.

		-ben


diff -ur clean/2.3.34-2/fs/buffer.c 2.3.34-2/fs/buffer.c
--- clean/2.3.34-2/fs/buffer.c	Thu Dec  9 16:10:18 1999
+++ 2.3.34-2/fs/buffer.c	Wed Dec 22 00:46:18 1999
@@ -1386,7 +1386,7 @@
 	unsigned long block;
 	int err = 0, partial = 0, need_balance_dirty = 0;
 	unsigned blocksize, bbits;
-	struct buffer_head *bh, *head, *wait[2], **wait_bh=wait;
+	struct buffer_head *bh, *head, *wait[PAGE_CACHE_SIZE / 512], **wait_bh=wait;
 	char *kaddr = (char *)kmap(page);
 
 	blocksize = inode->i_sb->s_blocksize;
diff -ur clean/2.3.34-2/include/linux/sched.h 2.3.34-2/include/linux/sched.h
--- clean/2.3.34-2/include/linux/sched.h	Mon Dec 20 18:53:12 1999
+++ 2.3.34-2/include/linux/sched.h	Wed Dec 22 00:02:06 1999
@@ -349,6 +349,7 @@
 
 /* memory management info */
 	struct mm_struct *mm, *active_mm;
+	struct page *write_locked_page;		/* currently locked page for mmap<->write deadlock test */
 
 /* signal handlers */
 	spinlock_t sigmask_lock;	/* Protects signal and blocked */
@@ -426,7 +427,7 @@
 /* thread */	INIT_THREAD, \
 /* fs */	&init_fs, \
 /* files */	&init_files, \
-/* mm */	NULL, &init_mm, \
+/* mm */	NULL, &init_mm, NULL, \
 /* signals */	SPIN_LOCK_UNLOCKED, &init_signals, {{0}}, {{0}}, NULL, &init_task.sigqueue, 0, 0, \
 /* exec cts */	0,0, \
 /* exit_sem */	__MUTEX_INITIALIZER(name.exit_sem),	\
diff -ur clean/2.3.34-2/mm/filemap.c 2.3.34-2/mm/filemap.c
--- clean/2.3.34-2/mm/filemap.c	Mon Dec 20 14:20:06 1999
+++ 2.3.34-2/mm/filemap.c	Wed Dec 22 00:21:12 1999
@@ -1325,7 +1325,12 @@
 	 */
 	hash = page_hash(&inode->i_data, pgoff);
 retry_find:
-	page = __find_get_page(&inode->i_data, pgoff, hash);
+	spin_lock(&pagecache_lock);
+	page = __find_page_nolock(&inode->i_data, pgoff, *hash);
+	if (page)
+		get_page(page);
+	spin_unlock(&pagecache_lock);
+
 	if (!page)
 		goto no_cached_page;
 
@@ -1388,6 +1393,9 @@
 	return NULL;
 
 page_not_uptodate:
+	if (current->write_locked_page == page)
+		return NOPAGE_SIGBUS;
+
 	lock_page(page);
 	if (Page_Uptodate(page)) {
 		UnlockPage(page);
@@ -1917,6 +1925,9 @@
 			PAGE_BUG(page);
 		}
 
+		/* Detect the deadlock */
+		current->write_locked_page = page;
+
 		status = write_one_page(file, page, offset, bytes, buf);
 
 		if (status >= 0) {
@@ -1928,6 +1939,7 @@
 				inode->i_size = pos;
 		}
 		/* Mark it unlocked again and drop the page.. */
+		current->write_locked_page = NULL;
 		UnlockPage(page);
 		page_cache_release(page);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
  1999-12-22  5:58 [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range Benjamin C.R. LaHaise
@ 1999-12-22 15:08 ` Chuck Lever
  1999-12-22 15:43   ` Benjamin C.R. LaHaise
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 1999-12-22 15:08 UTC (permalink / raw)
  To: Benjamin C.R. LaHaise; +Cc: Linus Torvalds, linux-mm

On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> The patch to filemap.c changes filemap_nopage to use __find_page_nolock
> rather than __find_get_page which waits for the page to become unlocked
> before returning (maybe __find_get_page was meant to check PageUptodate?),
> since filemap_nopage checks PageUptodate before proceeding -- which is
> consistent with do_generic_file_read.

i've tried this before several times.  i could never get the system to
perform as well under benchmark load using find_page_nolock as when using
find_get_page. the throughput difference was about 5%, if i recall.  i
haven't explained this to myself yet.

perhaps a better fix would be to take out some of the page lock complexity
from filemap_nopage?  dunno.

	- Chuck Lever
--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project:
	http://www.citi.umich.edu/projects/linux-scalability/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
  1999-12-22 15:08 ` Chuck Lever
@ 1999-12-22 15:43   ` Benjamin C.R. LaHaise
  1999-12-22 15:58     ` Chuck Lever
  1999-12-23  4:00     ` Chuck Lever
  0 siblings, 2 replies; 5+ messages in thread
From: Benjamin C.R. LaHaise @ 1999-12-22 15:43 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linus Torvalds, linux-mm

On Wed, 22 Dec 1999, Chuck Lever wrote:

> On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> > The patch to filemap.c changes filemap_nopage to use __find_page_nolock
> > rather than __find_get_page which waits for the page to become unlocked
> > before returning (maybe __find_get_page was meant to check PageUptodate?),
> > since filemap_nopage checks PageUptodate before proceeding -- which is
> > consistent with do_generic_file_read.
> 
> i've tried this before several times.  i could never get the system to
> perform as well under benchmark load using find_page_nolock as when using
> find_get_page. the throughput difference was about 5%, if i recall.  i
> haven't explained this to myself yet.
> 
> perhaps a better fix would be to take out some of the page lock complexity
> from filemap_nopage?  dunno.

Well, there certainly is a lot of code in page_cache_read /
do_generic_file_read / filemap_nopage that is duplicate, and our policies
across them are inconsistent.

Here's my hypothesis about why find_page_nolock vs find_get_page makes a
difference: using find_page_nolock means that we'll never do a
run_task_queue(&tq_disk); to get our async readahead requests run.  So, in
theory, doing that in filemap_nopage will restore performance.  Isn't
there a way that the choice of when to run tq_disk could be made a bit
less arbitrary?


		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
  1999-12-22 15:43   ` Benjamin C.R. LaHaise
@ 1999-12-22 15:58     ` Chuck Lever
  1999-12-23  4:00     ` Chuck Lever
  1 sibling, 0 replies; 5+ messages in thread
From: Chuck Lever @ 1999-12-22 15:58 UTC (permalink / raw)
  To: Benjamin C.R. LaHaise; +Cc: Linus Torvalds, linux-mm

On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> On Wed, 22 Dec 1999, Chuck Lever wrote:
> > i've tried this before several times.  i could never get the system to
> > perform as well under benchmark load using find_page_nolock as when using
> > find_get_page. the throughput difference was about 5%, if i recall.  i
> > haven't explained this to myself yet.
> > 
> > perhaps a better fix would be to take out some of the page lock complexity
> > from filemap_nopage?  dunno.
> 
> Well, there certainly is a lot of code in page_cache_read /
> do_generic_file_read / filemap_nopage that is duplicate, and our policies
> across them are inconsistent.

when i started looking at mmap read-ahead and madvise, i noticed that
there was a lot of inconsistent code duplication, and thought it would be
a good thing to fold this stuff together.  that's one reason i created the
"read_cluster_nonblocking" and "page_cache_read" functions.  for example,
you can remove 20-40 lines of do_generic_file_read by replacing them with
one call to page_cache_read.  or you could easily try clustered reads
there.

but notice you want to do something slightly different in
generic_file_write, so that code will probably need to stay.

> Here's my hypothesis about why find_page_nolock vs find_get_page makes a
> difference: using find_page_nolock means that we'll never do a
> run_task_queue(&tq_disk); to get our async readahead requests run.  So, in
> theory, doing that in filemap_nopage will restore performance.

sounds like a reasonable explanation to me, and easy enough to test, even.
i'll give that a shot later today.

> Isn't
> there a way that the choice of when to run tq_disk could be made a bit
> less arbitrary?

i suppose there's a more *efficient* way of doing it, but i think running
the queue while waiting for a page is probably a good idea.  in other
words, running the queue in find_get_page seems like a good idea to me.
what did you have in mind?

	- Chuck Lever
--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project:
	http://www.citi.umich.edu/projects/linux-scalability/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range
  1999-12-22 15:43   ` Benjamin C.R. LaHaise
  1999-12-22 15:58     ` Chuck Lever
@ 1999-12-23  4:00     ` Chuck Lever
  1 sibling, 0 replies; 5+ messages in thread
From: Chuck Lever @ 1999-12-23  4:00 UTC (permalink / raw)
  To: Benjamin C.R. LaHaise; +Cc: Linus Torvalds, linux-mm

On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> On Wed, 22 Dec 1999, Chuck Lever wrote:
> > On Wed, 22 Dec 1999, Benjamin C.R. LaHaise wrote:
> > i've tried this before several times.  i could never get the system to
> > perform as well under benchmark load using find_page_nolock as when using
> > find_get_page. the throughput difference was about 5%, if i recall.  i
> > haven't explained this to myself yet.
>
> Here's my hypothesis about why find_page_nolock vs find_get_page makes a
> difference: using find_page_nolock means that we'll never do a
> run_task_queue(&tq_disk); to get our async readahead requests run.  So, in
> theory, doing that in filemap_nopage will restore performance.  Isn't
> there a way that the choice of when to run tq_disk could be made a bit
> less arbitrary?

this patch appears to have negligible effect on benchmark throughput
measurements, whereas, without the run_task_queue, throughput drops.

btw, i notice that a "read_cache_page" function has appeared that looks
similar to "page_cache_read" -- is there necessity for both?

--- linux-2.3.34-ref/mm/filemap.c	Wed Dec 22 21:23:03 1999
+++ linux/mm/filemap.c	Wed Dec 22 22:53:19 1999
@@ -1325,9 +1325,13 @@
 	 */
 	hash = page_hash(&inode->i_data, pgoff);
 retry_find:
-	page = __find_get_page(&inode->i_data, pgoff, hash);
+	spin_lock(&pagecache_lock);
+	page = __find_page_nolock(&inode->i_data, pgoff, *hash);
 	if (!page)
 		goto no_cached_page;
+	get_page(page);
+	spin_unlock(&pagecache_lock);
+	run_task_queue(&tq_disk);
 
 	/*
 	 * Ok, found a page in the page cache, now we need to check
@@ -1358,6 +1362,8 @@
 	return old_page;
 
 no_cached_page:
+	spin_unlock(&pagecache_lock);
+
 	/*
 	 * If the requested offset is within our file, try to read a whole 
 	 * cluster of pages at once.

	- Chuck Lever
--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project:
	http://www.citi.umich.edu/projects/linux-scalability/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.nl.linux.org/Linux-MM/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-12-23  4:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-22  5:58 [patch] mmap<->write deadlock fix, plus bug in block_write_zero_range Benjamin C.R. LaHaise
1999-12-22 15:08 ` Chuck Lever
1999-12-22 15:43   ` Benjamin C.R. LaHaise
1999-12-22 15:58     ` Chuck Lever
1999-12-23  4:00     ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox