linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [patch] splice mmap_sem deadlock
Date: Mon, 1 Oct 2007 19:33:51 +0200	[thread overview]
Message-ID: <20071001173351.GK5303@kernel.dk> (raw)
In-Reply-To: <alpine.LFD.0.999.0710010807360.3579@woody.linux-foundation.org>

On Mon, Oct 01 2007, Linus Torvalds wrote:
> 
> The comment is wrong.
> 
> On Mon, 1 Oct 2007, Jens Axboe wrote:
> >  
> >  /*
> > + * Do a copy-from-user while holding the mmap_semaphore for reading. If we
> > + * have to fault the user page in, we must drop the mmap_sem to avoid a
> > + * deadlock in the page fault handling (it wants to grab mmap_sem too, but for
> > + * writing). This assumes that we will very rarely hit the partial != 0 path,
> > + * or this will not be a win.
> > + */
> 
> Page faulting only grabs it for reading, and having a page fault happen is 
> not problematic in itself. Readers *do* nest.
> 
> What is problematic is:
> 
> 	thread#1			thread#2
> 
> 	get_iovec_page_array
> 	down_read()
> 	.. everything ok so far ..
> 					mmap()
> 					down_write()
> 					.. correctly blocks on the reader ..
> 					.. everything ok so far ..
> 
> 	.. pagefault ..
> 	down_read()
> 	.. fairness code now blocks on the waiting writer! ..
> 	.. oops. We're deadlocked ..
> 
> So the problem is that while readers do nest nicely, they only do so if no 
> potential writers can possibly exist (which of course never happens: an 
> rwlock with no writers is a no-op ;).

Ah, I didn't read the explanation well enough it seems. Better?

diff --git a/fs/splice.c b/fs/splice.c
index c010a72..e95a362 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1224,6 +1224,33 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 }
 
 /*
+ * Do a copy-from-user while holding the mmap_semaphore for reading, in a
+ * manner safe from deadlocking with simultaneous mmap() (grabbing mmap_sem
+ * for writing) and page faulting on the user memory pointed to by src.
+ * This assumes that we will very rarely hit the partial != 0 path, or this
+ * will not be a win.
+ */
+static int copy_from_user_mmap_sem(void *dst, const void __user *src, size_t n)
+{
+	int partial;
+
+	pagefault_disable();
+	partial = __copy_from_user_inatomic(dst, src, n);
+	pagefault_enable();
+
+	/*
+	 * Didn't copy everything, drop the mmap_sem and do a faulting copy
+	 */
+	if (unlikely(partial)) {
+		up_read(&current->mm->mmap_sem);
+		partial = copy_from_user(dst, src, n);
+		down_read(&current->mm->mmap_sem);
+	}
+
+	return partial;
+}
+
+/*
  * Map an iov into an array of pages and offset/length tupples. With the
  * partial_page structure, we can map several non-contiguous ranges into
  * our ones pages[] map instead of splitting that operation into pieces.
@@ -1236,31 +1263,26 @@ static int get_iovec_page_array(const struct iovec __user *iov,
 {
 	int buffers = 0, error = 0;
 
-	/*
-	 * It's ok to take the mmap_sem for reading, even
-	 * across a "get_user()".
-	 */
 	down_read(&current->mm->mmap_sem);
 
 	while (nr_vecs) {
 		unsigned long off, npages;
+		struct iovec entry;
 		void __user *base;
 		size_t len;
 		int i;
 
-		/*
-		 * Get user address base and length for this iovec.
-		 */
-		error = get_user(base, &iov->iov_base);
-		if (unlikely(error))
-			break;
-		error = get_user(len, &iov->iov_len);
-		if (unlikely(error))
+		error = -EFAULT;
+		if (copy_from_user_mmap_sem(&entry, iov, sizeof(entry)))
 			break;
 
+		base = entry.iov_base;
+		len = entry.iov_len;
+
 		/*
 		 * Sanity check this iovec. 0 read succeeds.
 		 */
+		error = 0;
 		if (unlikely(!len))
 			break;
 		error = -EFAULT;

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-10-01 17:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-28 16:00 Nick Piggin
2007-09-28 17:31 ` Jens Axboe
2007-09-28 18:10   ` Linus Torvalds
2007-09-28 18:15     ` Jens Axboe
2007-09-28 18:23       ` Linus Torvalds
2007-09-28 19:30         ` Jens Axboe
2007-09-28 20:02           ` Linus Torvalds
2007-09-28 20:08             ` Linus Torvalds
2007-09-29  6:37               ` Jens Axboe
2007-10-01 12:03               ` Jens Axboe
2007-10-01 15:11                 ` Linus Torvalds
2007-10-01 15:45                   ` Balbir Singh
2007-10-01 16:11                     ` Linus Torvalds
2007-10-01 18:19                       ` Balbir Singh
2007-10-01 17:33                   ` Jens Axboe [this message]
2007-09-29 13:10             ` Nick Piggin
2007-09-30  6:46               ` Jens Axboe
2007-09-30 12:07                 ` Nick Piggin
2007-09-30 20:05                   ` Jens Axboe
2007-09-30 20:12                     ` Nick Piggin
2007-09-29 13:08     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071001173351.GK5303@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox