From: Linus Torvalds <torvalds@linux-foundation.org>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kent Overstreet <kmo@daterainc.com>,
Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>, Christoph Lameter <cl@gentwo.org>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: bad page state in 3.13-rc4
Date: Fri, 20 Dec 2013 05:02:20 +0900 [thread overview]
Message-ID: <CA+55aFy5zg_cJueMZFzuqr06rT-hwnHhvBpM6W9657sxnCzxKg@mail.gmail.com> (raw)
In-Reply-To: <20131219195352.GB9228@kvack.org>
[-- Attachment #1: Type: text/plain, Size: 870 bytes --]
On Fri, Dec 20, 2013 at 4:53 AM, Benjamin LaHaise <bcrl@kvack.org> wrote:
>
> Yes, that's what I found when I started looking into this in detail again.
> I think the page reference counting is actually correct. There are 2
> references on each page: the first is from the find_or_create_page() call,
> and the second is from the get_user_pages() (which also makes sure the page
> is populated into the page tables).
Ok, I'm sorry, but that's just pure bullshit then.
So it has the page array in the page cache, then mmap's it in, and
uses get_user_pages() to get the pages back that it *just* created.
This code is pure and utter garbage. It's beyond the pale how crazy it is.
Why not just get rid of the idiotic get_user_pages() crap then?
Something like the attached patch?
Totally untested, but at least it makes *some* amount of sense.
Linus
[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1655 bytes --]
fs/aio.c | 20 +++-----------------
1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 6efb7f6cb22e..e1b02dd1be9e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -358,6 +358,8 @@ static int aio_setup_ring(struct kioctx *ctx)
SetPageUptodate(page);
SetPageDirty(page);
unlock_page(page);
+
+ ctx->ring_pages[i] = page;
}
ctx->aio_ring_file = file;
nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring))
@@ -380,8 +382,8 @@ static int aio_setup_ring(struct kioctx *ctx)
ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_POPULATE, 0, &populate);
+ up_write(&mm->mmap_sem);
if (IS_ERR((void *)ctx->mmap_base)) {
- up_write(&mm->mmap_sem);
ctx->mmap_size = 0;
aio_free_ring(ctx);
return -EAGAIN;
@@ -389,22 +391,6 @@ static int aio_setup_ring(struct kioctx *ctx)
pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
- /* We must do this while still holding mmap_sem for write, as we
- * need to be protected against userspace attempting to mremap()
- * or munmap() the ring buffer.
- */
- ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
- 1, 0, ctx->ring_pages, NULL);
-
- /* Dropping the reference here is safe as the page cache will hold
- * onto the pages for us. It is also required so that page migration
- * can unmap the pages and get the right reference count.
- */
- for (i = 0; i < ctx->nr_pages; i++)
- put_page(ctx->ring_pages[i]);
-
- up_write(&mm->mmap_sem);
-
if (unlikely(ctx->nr_pages != nr_pages)) {
aio_free_ring(ctx);
return -EAGAIN;
next prev parent reply other threads:[~2013-12-19 20:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-19 4:07 Dave Jones
2013-12-19 4:40 ` Linus Torvalds
2013-12-19 15:41 ` Christoph Lameter
2013-12-19 20:11 ` Mel Gorman
2013-12-19 20:30 ` Dave Jones
2013-12-19 15:53 ` Dave Jones
2013-12-19 17:07 ` Linus Torvalds
2013-12-19 17:17 ` Dave Jones
2013-12-19 18:11 ` Kent Overstreet
2013-12-19 18:29 ` Benjamin LaHaise
2013-12-19 18:35 ` Dave Jones
2013-12-19 19:19 ` Linus Torvalds
2013-12-19 19:26 ` Benjamin LaHaise
2013-12-19 19:45 ` Linus Torvalds
2013-12-19 19:53 ` Benjamin LaHaise
2013-12-19 20:02 ` Linus Torvalds [this message]
2013-12-19 20:11 ` Linus Torvalds
2013-12-19 20:31 ` Benjamin LaHaise
2013-12-19 20:31 ` Linus Torvalds
2013-12-19 20:42 ` Benjamin LaHaise
2013-12-19 20:24 ` Dave Jones
2013-12-19 23:38 ` Benjamin LaHaise
2013-12-20 1:00 ` Dave Jones
2013-12-21 23:06 ` [PATCHes - aio / migrate page, please review] " Benjamin LaHaise
2013-12-22 19:09 ` Linus Torvalds
2013-12-22 21:30 ` Dave Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+55aFy5zg_cJueMZFzuqr06rT-hwnHhvBpM6W9657sxnCzxKg@mail.gmail.com \
--to=torvalds@linux-foundation.org \
--cc=bcrl@kvack.org \
--cc=cl@gentwo.org \
--cc=davej@redhat.com \
--cc=kmo@daterainc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox