* get_user_pages rewrite rediffed against 2.5.47-mm1
@ 2002-11-12 19:58 Ingo Oeser
2002-11-12 20:27 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Oeser @ 2002-11-12 19:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 316 bytes --]
Hi Andrew,
the get_user_pages rewrite has been rediffed against 2.5.47-mm1
to make it as painless as possible for you and others reviewing
and applying it.
bzip2ed patch attached, because it's 25K uncompressed.
Regards
Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth
[-- Attachment #2: page-walk-api-2.5.47-mm1-simple_locking.patch.bz2 --]
[-- Type: application/octet-stream, Size: 6052 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: get_user_pages rewrite rediffed against 2.5.47-mm1
2002-11-12 19:58 get_user_pages rewrite rediffed against 2.5.47-mm1 Ingo Oeser
@ 2002-11-12 20:27 ` Andrew Morton
2002-11-15 7:58 ` Ingo Oeser
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2002-11-12 20:27 UTC (permalink / raw)
To: Ingo Oeser; +Cc: linux-mm
Ingo Oeser wrote:
>
> Hi Andrew,
>
> the get_user_pages rewrite has been rediffed against 2.5.47-mm1
The single diff helps, thanks.
I'm still having indigestion over this:
+/*** Page walking API ***/
+
+/* &custom_page_walker - A custom page walk handler for walk_user_pages().
+ * vma: The vma we walk pages of.
+ * page: The page we found or an %ERR_PTR() value
+ * virt_addr: The virtual address we are at while walking.
+ * customdata: Anything you would like to pass additionally.
+ *
+ * Returns:
+ * Negative values -> ERRNO values.
+ * 0 -> continue page walking.
+ * 1 -> abort page walking.
+ *
+ * If this functions gets a page, for which %IS_ERR(@page) is true, than it
+ * should do it's cleanup of customdata and return -PTR_ERR(@page).
+ *
+ * If IS_ERR(@page) is NOT TRUE, this function is called with
+ * @vma->vm_mm->page_table_lock held.
+ *
+ * The value of @vma is undefined if IS_ERR(@page) is TRUE.
+ * (So never use or check it if IS_ERR(@page) is TRUE)
+ *
+ * If it returns a negative value but got a valid page, then the
+ * page_table_lock must be dropped by this function. (This condition should be
+ * rather rare.)
+ */
+typedef int (*custom_page_walker_t)(struct vm_area_struct *vma,
+ struct page *page, unsigned long virt_addr, void *customdata);
+
I think I see what you're doing now. You've overloaded the callback,
with an IS_ERR value of "page" to mean "something went wrong".
Would that be a correct interpretation?
If so, it would be better (ie: more Linus-friendly) to make that a
separate callback. One which is called outside the lock, and which
has distinctly different semantics from the normal page walker.
Some (all?) callers of walk_user_pages() may not even be interested
in the error-time callout. In fact it may be possible to just leave
the state at time-of-error in the state structure (see below) and just
return an error code to the caller of walk_user_pages()?
I suggest that it's time to fold all these arguments into a structure
which is on the caller's stack, and pass the address of that around.
This will simplify things, but one needs to be careful to think through
the ownership rules of the various parts of that structure.
Please review your ERR_PTR handling. You have lots of these:
+ return ERR_PTR(EFAULT);
which are all wrong. It needs to be `ERR_PTR(-EFAULT)'. (editing
the diff is the easy fix ;))
When you do the above, this becomes wrong too:
return -PTR_ERR(page);
So please check all those too.
Also, please rip everything which is appropriate out of mm/memory.c
and create a new file in mm/ for it.
I cannot guarantee that we can get this merged up, frankly. We need
a *reason* for doing that. The current code is "good enough" for
current callers. So the best I can do is to get it under test, give
Linus a heads-up that it's floating about while you get in there and
start creating reasons for merging it - namely the clients down in
device drivers.
If we don't make it then we can definitely push it for 2.7.
How does that suit?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: get_user_pages rewrite rediffed against 2.5.47-mm1
2002-11-12 20:27 ` Andrew Morton
@ 2002-11-15 7:58 ` Ingo Oeser
2002-11-15 21:08 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Oeser @ 2002-11-15 7:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Hi Andrew,
On Tue, Nov 12, 2002 at 12:27:22PM -0800, Andrew Morton wrote:
> I think I see what you're doing now. You've overloaded the callback,
> with an IS_ERR value of "page" to mean "something went wrong".
>
> Would that be a correct interpretation?
>
> If so, it would be better (ie: more Linus-friendly) to make that a
> separate callback. One which is called outside the lock, and which
> has distinctly different semantics from the normal page walker.
Ok, did that. It also reduced code. I just thought it will
increase complexity, but it proved to reduce it.
> Some (all?) callers of walk_user_pages() may not even be interested
> in the error-time callout. In fact it may be possible to just leave
> the state at time-of-error in the state structure (see below) and just
> return an error code to the caller of walk_user_pages()?
I fear that users forget the cleanup, if they have to repeat and
duplicate it. Until now they DO the cleanup themselves, after
using the structures, but not in case of error. Providing a
cleanup function factors out the cleanup nicely.
I envision the following usage:
setup(&page_walk,...); /* currently done explicitly on stack */
walk_user_pages(&page_walk);
/* Do fancy stuff with that pages */
cleanup(&page_walk); /* calling internal cleanup function
and free the page array */
How does that sound?
> I suggest that it's time to fold all these arguments into a structure
> which is on the caller's stack, and pass the address of that around.
> This will simplify things, but one needs to be careful to think through
> the ownership rules of the various parts of that structure.
I'm working on this, but that means a new header file, a new *.c
file and exporting all the page walkers introduced by me to the
modules.
It sure will reduce stack usage although we recurse deeper.
That's a good think already.
> Please review your ERR_PTR handling.
Done. Had it otherwise before, but got confused about the code in
linux/err.h.
> Also, please rip everything which is appropriate out of mm/memory.c
> and create a new file in mm/ for it.
Everything regarding page walking, or should I cleanup more?
In fact mm/memory.c really looks like a mm/misc.c ;-)
> I cannot guarantee that we can get this merged up, frankly. We need
> a *reason* for doing that. The current code is "good enough" for
> current callers.
The current code sucks for char devices which have much IO
traffic via DMA. That might not be much, but the number is
increasing and I'm sure many drivers for measuring cards, which
will never make it into the kernel, would benefit from that.
All improvements in that direction have only been with block devices
in mind so far. I even don't see how I could improve the usage in
fs/dio.c, because it might sleep very long, so I can't use a page
walker for it (which needs the mmap_sem).
> So the best I can do is to get it under test, give
> Linus a heads-up that it's floating about while you get in there and
> start creating reasons for merging it - namely the clients down in
> device drivers.
>
> If we don't make it then we can definitely push it for 2.7.
>
> How does that suit?
That sounds great! I'll create reasons. People with 4K-Stack will
love that, I think ;-)
And it will cleanup, shrink and correct every usage. I uncovered
already 2 Bugs on the go, so it's definitly worth it anyway.
Patch follows Saturday or Sunday morning. Its not complete, yet.
Regards
Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: get_user_pages rewrite rediffed against 2.5.47-mm1
2002-11-15 7:58 ` Ingo Oeser
@ 2002-11-15 21:08 ` Andrew Morton
2002-11-17 23:27 ` Ingo Oeser
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2002-11-15 21:08 UTC (permalink / raw)
To: Ingo Oeser; +Cc: linux-mm
Ingo Oeser wrote:
>
> ...
> I envision the following usage:
>
> setup(&page_walk,...); /* currently done explicitly on stack */
> walk_user_pages(&page_walk);
>
> /* Do fancy stuff with that pages */
>
> cleanup(&page_walk); /* calling internal cleanup function
> and free the page array */
>
> How does that sound?
Good. The callers shouldn't have to know how to initialise the
state structure.
> > I suggest that it's time to fold all these arguments into a structure
> > which is on the caller's stack, and pass the address of that around.
> > This will simplify things, but one needs to be careful to think through
> > the ownership rules of the various parts of that structure.
>
> I'm working on this, but that means a new header file, a new *.c
> file and exporting all the page walkers introduced by me to the
> modules.
That's OK.
> It sure will reduce stack usage although we recurse deeper.
> That's a good think already.
>
> > Please review your ERR_PTR handling.
>
> Done. Had it otherwise before, but got confused about the code in
> linux/err.h.
>
> > Also, please rip everything which is appropriate out of mm/memory.c
> > and create a new file in mm/ for it.
>
> Everything regarding page walking, or should I cleanup more?
I'd say just keep it to "pull the user pagetable access code out of
memory.c". As much as you can, but not unrelated things. One
concept per file would be nice.
> In fact mm/memory.c really looks like a mm/misc.c ;-)
Well, with a name like "memory.c", how is anyone to know what it
is supposed to contain? ;)
> > I cannot guarantee that we can get this merged up, frankly. We need
> > a *reason* for doing that. The current code is "good enough" for
> > current callers.
>
> The current code sucks for char devices which have much IO
> traffic via DMA. That might not be much, but the number is
> increasing and I'm sure many drivers for measuring cards, which
> will never make it into the kernel, would benefit from that.
OK.
> All improvements in that direction have only been with block devices
> in mind so far. I even don't see how I could improve the usage in
> fs/dio.c, because it might sleep very long, so I can't use a page
> walker for it (which needs the mmap_sem).
Well I had plans there to reuse the page walker structure. So it
would become a big stateful thing which you just feed into the
walker engine and it spits out pages. With the callback page-processor
being able to return information such as "OK, that's enough pages
for now, let's start some IO".
It would have been rather complex, and just buffering the pages
in struct dio and using the current get_user_pages() was a reasonably
comfortable solution.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: get_user_pages rewrite rediffed against 2.5.47-mm1
2002-11-15 21:08 ` Andrew Morton
@ 2002-11-17 23:27 ` Ingo Oeser
2002-11-18 1:12 ` William Lee Irwin III
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Oeser @ 2002-11-17 23:27 UTC (permalink / raw)
To: Andrew Morton; +Cc: William Lee Irwin III, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2107 bytes --]
Hi Andrew,
Hi William,
On Fri, Nov 15, 2002 at 01:08:38PM -0800, Andrew Morton wrote:
> Ingo Oeser wrote:
> >
> > ...
> > I envision the following usage:
> >
> > setup(&page_walk,...); /* currently done explicitly on stack */
> > walk_user_pages(&page_walk);
> Good. The callers shouldn't have to know how to initialise the
> state structure.
The problem is: Sometimes they know it and duplicate the work.
But I'll tackle that next time.
> I'd say just keep it to "pull the user pagetable access code out of
> memory.c". As much as you can, but not unrelated things. One
> concept per file would be nice.
Ok, so I created {mm/,include/linux/}page_walk.[ch]
> > All improvements in that direction have only been with block devices
> > in mind so far. I even don't see how I could improve the usage in
> > fs/dio.c, because it might sleep very long, so I can't use a page
> > walker for it (which needs the mmap_sem).
>
> Well I had plans there to reuse the page walker structure. So it
> would become a big stateful thing which you just feed into the
> walker engine and it spits out pages. With the callback page-processor
> being able to return information such as "OK, that's enough pages
> for now, let's start some IO".
The walk_user_pages is actually made for this.
Return values:
1 -> The walker was satisfied
0 -> No more pages there
Negative -> some error occured, and we cleaned already up.
So it should be quite simple to use and simplify all the error
handling, too.
The code got shorter by using a struct and stack usage looks smaller.
Most of the page walking users got simpler and also the
hugetlb code looks much better now.
The diffstat shows 554 deletions and 224 additions, which is not
too bad for a cleanup, which also adds a feature.
BTW: make_pages_present needs a rewrite, because it duplicates
work from its callers. The only non-trivial user is mm/mremap.c,
the rest of it I could do.
Comments are VERY welcome. bzip2ed patch attached.
Regards
Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth
[-- Attachment #2: page-walk-api-2.5.47-mm3-cleanup.patch.bz2 --]
[-- Type: application/octet-stream, Size: 7890 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: get_user_pages rewrite rediffed against 2.5.47-mm1
2002-11-17 23:27 ` Ingo Oeser
@ 2002-11-18 1:12 ` William Lee Irwin III
0 siblings, 0 replies; 6+ messages in thread
From: William Lee Irwin III @ 2002-11-18 1:12 UTC (permalink / raw)
To: Ingo Oeser; +Cc: Andrew Morton, linux-mm
On Mon, Nov 18, 2002 at 12:27:10AM +0100, Ingo Oeser wrote:
> Most of the page walking users got simpler and also the
> hugetlb code looks much better now.
> The diffstat shows 554 deletions and 224 additions, which is not
> too bad for a cleanup, which also adds a feature.
> BTW: make_pages_present needs a rewrite, because it duplicates
> work from its callers. The only non-trivial user is mm/mremap.c,
> the rest of it I could do.
> Comments are VERY welcome. bzip2ed patch attached.
> Regards
> Ingo Oeser
Terrific. Bringing the hugetlb code closer to Linux-native idioms/style
is excellent. From my POV the change has no visible deficits remaining.
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-11-18 1:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-12 19:58 get_user_pages rewrite rediffed against 2.5.47-mm1 Ingo Oeser
2002-11-12 20:27 ` Andrew Morton
2002-11-15 7:58 ` Ingo Oeser
2002-11-15 21:08 ` Andrew Morton
2002-11-17 23:27 ` Ingo Oeser
2002-11-18 1:12 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox