From: Ingo Oeser <ingo.oeser@informatik.tu-chemnitz.de>
To: Andrew Morton <akpm@digeo.com>
Cc: linux-mm@kvack.org
Subject: [RFC PATCH] Re: get_user_pages rewrite (completed, updated for 2.4.46)
Date: Sat, 9 Nov 2002 18:52:42 +0100 [thread overview]
Message-ID: <20021109185242.T659@nightmaster.csn.tu-chemnitz.de> (raw)
In-Reply-To: <3DCC3E38.29B0ABEF@digeo.com>; from akpm@digeo.com on Fri, Nov 08, 2002 at 02:44:08PM -0800
Hi Andrew,
thanks for your review.
On Fri, Nov 08, 2002 at 02:44:08PM -0800, Andrew Morton wrote:
[custom_page_walker_t locking rules for vma->mm->page_table_lock]
> This locking is rather awkward. Why is it necessary, and can it
> be simplified??
No unlocking is needed in the fast and common cases. That shall
reduce bus traffic.
That locking is also needed for follow_page and will not be
dropped, if the page is already faulted into the process space
(should be the common case for get_user_pages).
Under normal operation walk_user_pages is a loop of
follow_page(), which need needs that lock. e.g the while
statement in single_page_walk() will not go to the loop.
The original implementation did NO proper cleanup, if the call
spanned multiple VMAs.
That's why I introduced the case IS_ERR(vma), where the
vma->mm->page_table_lock cannot be unlocked, but cleanup can
happen in case of wrong VMA and the walker having collected some
pages already.
We have two possibilities to simplify locking:
1) Explicit argument, whether the page_table_lock is taken.
- Would simplify usage, but I know that this kind of functions
where eliminated during the past, because Linus and some
other people don't like that kind of magic.
- Would remove the need to do that for huge tlb pages.
- We must check for that flag and restore state at exit and
the error path. Handling the error path is already
complicated, but very visible (the IS_ERR is a good
indicator even to the inexperienced reader).
2) Always be unlock before we enter the custom page walker.
- Would cause lock/follow_page/unlock/lock/page_cache_get/unlock
for EVERY page in the normal get_user_pages() case.
3) Always unlock if IS_ERR(page) would trigger.
(Actually the IS_ERR(page) is triggered also, if IS_ERR(vma) is true).
- Removes the unlocking completely from the custom page walker,
if it doesn't need to do that anyway.
- Is no real simplification, since the walker can be entered
with locking or without, as it is now.
- We still require locking for huge tlb pages, but Mr. Irwin
already acked the changes for that.
4) Introduce an explicit "cleaning" function passed additionally
to the walk_user_pages() function.
- This would seperate the error handling completly from the
normal case.
- It would be possible to omit the error handling, if not
needed. (Or to forget it, if needed later ;-/ )
- The page walker will ALWAYS be entered with the page_table_lock
taken.
- The cleanup handler will ALWAYS be entered without it and
only the custom_data passed along.
- Function enter/exit overhead is compiled twice, because we
have two functions.
- And we still require locking for huge tlb pages.
Which one do you like most? I would favor 3. I've appended
a patch for that against page-walk-api-2.5.46-mm1-all.patch.bz2
for you to test it.
I agree that the locking rules are awkward, but they are the best
solution I could come up with while preserving speed and
functionality. Any better rules will be implemented at your request.
> wrt the removal of the vmas arg to get_user_pages(): I assume this
> was because none of the multipage callers were using it?
Yes, thats true. If some caller needs this, it can use a custom
walker.
Single patch against 2.5.46-mm1 is at
http://www.tu-chemnitz.de/~ioe/patches-page_walk/page-walk-api-2.5.46-mm1-all.patch.bz2
All patches with description and diffstat of the whole thing at:
http://www.tu-chemnitz.de/~ioe/patches-page_walk/index.html
Thanks again for your review, I really appriciate your input here.
Regards
Ingo Oeser
diff -u linux-2.5.46-mm1-ioe/include/linux/mm.h linux-2.5.46-mm1-ioe/include/linux/mm.h
--- linux-2.5.46-mm1-ioe/include/linux/mm.h Fri Nov 8 12:55:49 2002
+++ linux-2.5.46-mm1-ioe/include/linux/mm.h Sat Nov 9 18:02:56 2002
@@ -396,14 +396,15 @@
* If this functions gets a page, for which %IS_ERR(@page) is true, than it
* should do it's cleanup of customdata and return -PTR_ERR(@page).
*
- * This function is called with @vma->vm_mm->page_table_lock held,
- * if IS_ERR(@vma) is not true.
+ * If IS_ERR(@page) is NOT TRUE, this function is called with
+ * @vma->vm_mm->page_table_lock held.
*
- * But if IS_ERR(@vma) is true, IS_ERR(@page) is also true, since if we have no
- * vma, then we also have no user space page.
+ * The value of @vma is undefined if IS_ERR(@page) is TRUE.
+ * (So never use or check it if IS_ERR(@page) is TRUE)
*
- * If it returns a negative value, then the page_table_lock must be dropped
- * by this function, if it is held.
+ * If it returns a negative value but got a valid page, then the
+ * page_table_lock must be dropped by this function. (This condition should be
+ * rather rare.)
*/
typedef int (*custom_page_walker_t)(struct vm_area_struct *vma,
struct page *page, unsigned long virt_addr, void *customdata);
diff -u linux-2.5.46-mm1-ioe/mm/memory.c linux-2.5.46-mm1-ioe/mm/memory.c
--- linux-2.5.46-mm1-ioe/mm/memory.c Fri Nov 8 12:55:49 2002
+++ linux-2.5.46-mm1-ioe/mm/memory.c Sat Nov 9 18:15:06 2002
@@ -1158,8 +1158,6 @@
struct gup_add_pages *gup = customdata;
- BUG_ON(!customdata);
-
if (!IS_ERR(page)) {
gup->pages[gup->count++] = page;
flush_dcache_page(page);
@@ -1170,8 +1168,6 @@
return (gup->count == gup->max_pages) ? 1 : 0;
}
- if (!IS_ERR(vma))
- spin_unlock(&vma->vm_mm->page_table_lock);
gup_pages_cleanup(gup);
return -PTR_ERR(page);
}
@@ -1192,7 +1188,6 @@
spin_unlock(&mm->page_table_lock);
fault = handle_mm_fault(mm, vma, start, write);
- spin_lock(&mm->page_table_lock);
switch (fault) {
case VM_FAULT_MINOR:
@@ -1210,8 +1205,13 @@
spin_unlock(&mm->page_table_lock);
BUG();
}
+ spin_lock(&mm->page_table_lock);
}
- return get_page_map(map);
+ map=get_page_map(map);
+ if (IS_ERR(map))
+ spin_unlock(&mm->page_table_lock);
+
+ return map;
}
/* VMA contains already "start".
@@ -1248,10 +1248,14 @@
spin_lock(&mm->page_table_lock);
page = single_page_walk(tsk, mm, vma, start, write);
- if (!(IS_ERR(page) || PageReserved(page)))
+ if (IS_ERR(page))
+ goto out;
+
+ if (!PageReserved(page))
page_cache_get(page);
spin_unlock(&mm->page_table_lock);
+out:
return page;
}
@@ -2101,8 +2105,6 @@
return (*todo) ? 0 : 1;
}
- if (!IS_ERR(vma))
- spin_unlock(&vma->vm_mm->page_table_lock);
return -PTR_ERR(page);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-11-09 17:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-07 10:08 Ingo Oeser
2002-11-07 11:38 ` William Lee Irwin III
[not found] ` <20021107135747.A594@rotuma.informatik.tu-chemnitz.de>
[not found] ` <20021107125955.GK19821@holomorphy.com>
2002-11-07 22:13 ` Ingo Oeser
2002-11-08 22:44 ` Andrew Morton
2002-11-09 17:52 ` Ingo Oeser [this message]
2002-11-09 22:41 ` [RFC PATCH] " William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021109185242.T659@nightmaster.csn.tu-chemnitz.de \
--to=ingo.oeser@informatik.tu-chemnitz.de \
--cc=akpm@digeo.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox