* How best to bypass the page cache from within a kernel module?
@ 2003-09-17 18:24 Alan Stern
2003-09-17 19:44 ` Dave Hansen
2003-09-22 18:51 ` Ray Bryant
0 siblings, 2 replies; 10+ messages in thread
From: Alan Stern @ 2003-09-17 18:24 UTC (permalink / raw)
To: linux-mm
I'm working on a kernel module driver for Linux 2.6. One of the things
this driver needs to do is perform a VERIFY command; which means checking
to make sure that certain disk sectors within a file actually can be read
without encountering a bad sector or other hardware error. Now, I realize
that there are already issues involved with convincing the disk drive to
read from its media rather than from its cache. But apart from that, my
problem is how to convince Linux to read from the drive rather than from
the page cache.
One suggestion was to use O_DIRECT when opening the file, because that
does cause reads to go directly to the hardware. The problem with this is
that since the direct-I/O routines send file data directly to user
buffers, they must check that the buffer addresses are valid and belong to
the user's address space. But my code runs in a kernel thread so it has
no current->mm (and in any case I would prefer to use my kernel-space
buffers rather than user-space memory). It might be possible to get hold
of an mm_struct, but it's not necessarily easy as mm_alloc() isn't
EXPORTed. Perhaps my thread could keep its original current->mm by
incrementing current->mm->users before calling daemonize() and setting
current->mm back to its original value afterward. Is that legal? Having
done so, perhaps I could use some sort of mmap() call to allocate a
user-space buffer that would be okay for direct-I/O. What's the best way
to do that -- what function would I have to call?
However, all that seems rather roundabout. An equally acceptable solution
would be simply to invalidate all the entries in the page cache referring
to my file, so that reads would be forced to go to the drive. Can anyone
tell me how to do that?
TIA,
Alan Stern
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 18:24 How best to bypass the page cache from within a kernel module? Alan Stern
@ 2003-09-17 19:44 ` Dave Hansen
2003-09-17 19:50 ` William Lee Irwin III
2003-09-22 18:51 ` Ray Bryant
1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2003-09-17 19:44 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-mm
On Wed, 2003-09-17 at 11:24, Alan Stern wrote:
> However, all that seems rather roundabout. An equally acceptable solution
> would be simply to invalidate all the entries in the page cache referring
> to my file, so that reads would be forced to go to the drive. Can anyone
> tell me how to do that?
Whatever you're trying to do, you probably shouldn't be doing it in the
kernel to begin with. Do it from userspace, it will save you a lot of
pain.
--
Dave Hansen
haveblue@us.ibm.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 19:44 ` Dave Hansen
@ 2003-09-17 19:50 ` William Lee Irwin III
2003-09-17 20:33 ` Alan Stern
0 siblings, 1 reply; 10+ messages in thread
From: William Lee Irwin III @ 2003-09-17 19:50 UTC (permalink / raw)
To: Dave Hansen; +Cc: Alan Stern, linux-mm
On Wed, 2003-09-17 at 11:24, Alan Stern wrote:
>> However, all that seems rather roundabout. An equally acceptable solution
>> would be simply to invalidate all the entries in the page cache referring
>> to my file, so that reads would be forced to go to the drive. Can anyone
>> tell me how to do that?
On Wed, Sep 17, 2003 at 12:44:29PM -0700, Dave Hansen wrote:
> Whatever you're trying to do, you probably shouldn't be doing it in the
> kernel to begin with. Do it from userspace, it will save you a lot of
> pain.
If you really want to bypass the pagecache etc. entirely, use raw io and
don't even bother mounting the filesystem, and do it all from userspace.
If you need it simultaneously mounted then you're in somewhat deeper
trouble, though you can probably be rescued by nefarious means like that
bit about shooting down the pagecache so you don't have some incoherent
cache headache.
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 19:50 ` William Lee Irwin III
@ 2003-09-17 20:33 ` Alan Stern
2003-09-17 20:40 ` William Lee Irwin III
0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2003-09-17 20:33 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Dave Hansen, linux-mm
> On Wed, 2003-09-17 at 11:24, Alan Stern wrote:
> > However, all that seems rather roundabout. An equally acceptable solution
> > would be simply to invalidate all the entries in the page cache referring
> > to my file, so that reads would be forced to go to the drive. Can anyone
> > tell me how to do that?
On Wed, Sep 17, 2003 at 12:44:29PM -0700, Dave Hansen wrote:
> Whatever you're trying to do, you probably shouldn't be doing it in the
> kernel to begin with. Do it from userspace, it will save you a lot of
> pain.
That's not particularly helpful, especially considering that the entire
driver currently works just fine as a kernel module, with the exception of
this one piece. (This one piece works too; it just doesn't do exactly
what I want.)
On Wed, 17 Sep 2003, William Lee Irwin III wrote:
> If you really want to bypass the pagecache etc. entirely, use raw io and
> don't even bother mounting the filesystem, and do it all from userspace.
> If you need it simultaneously mounted then you're in somewhat deeper
> trouble, though you can probably be rescued by nefarious means like that
> bit about shooting down the pagecache so you don't have some incoherent
> cache headache.
I really want this to work through the filesystem. 99% of what my driver
does involves normal reads and writes. And there are very good reasons
for having it run as a kernel thread rather than a user process. It's
just that this one piece, which is a very minor part of the driver, needs
to avoid the page cache.
So to reiterate my original questions:
1. What's the proper way for a kernel thread running in a module to get
hold of an mm_struct or to keep the one it had before calling daemonize()?
2. What's the proper way for a kernel thread to allocate a region of
userspace memory?
3. What's the proper way to invalidate all entries in the page cache that
refer to a particular file?
Alan Stern
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 20:33 ` Alan Stern
@ 2003-09-17 20:40 ` William Lee Irwin III
2003-09-17 20:43 ` William Lee Irwin III
2003-09-17 21:30 ` Alan Stern
0 siblings, 2 replies; 10+ messages in thread
From: William Lee Irwin III @ 2003-09-17 20:40 UTC (permalink / raw)
To: Alan Stern; +Cc: Dave Hansen, linux-mm
On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> I really want this to work through the filesystem. 99% of what my driver
> does involves normal reads and writes. And there are very good reasons
> for having it run as a kernel thread rather than a user process. It's
> just that this one piece, which is a very minor part of the driver, needs
> to avoid the page cache.
> So to reiterate my original questions:
Doesn't sound much most drivers after all that, but there's some weird
stuff out there.
On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> 1. What's the proper way for a kernel thread running in a module to get
> hold of an mm_struct or to keep the one it had before calling daemonize()?
Well, you can get one from the slab allocator, though I expect there will
be a followup question here...
On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> 2. What's the proper way for a kernel thread to allocate a region of
> userspace memory?
Hmm. Sounds like you want to grab a user address space and do userspace
stuff inside there. Maybe avoid do_execve() etc. and call sys_*() for
everything else outright? The question itself probably wants sys_mmap()
or some such, or handle_mm_fault() depending on what you have in mind
for allocation.
On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> 3. What's the proper way to invalidate all entries in the page cache
> that refer to a particular file?
invalidate_inode_pages().
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 20:40 ` William Lee Irwin III
@ 2003-09-17 20:43 ` William Lee Irwin III
2003-09-17 21:30 ` Alan Stern
1 sibling, 0 replies; 10+ messages in thread
From: William Lee Irwin III @ 2003-09-17 20:43 UTC (permalink / raw)
To: Alan Stern, Dave Hansen, linux-mm
On Wed, Sep 17, 2003 at 01:40:47PM -0700, William Lee Irwin III wrote:
> or some such, or handle_mm_fault() depending on what you have in mind
s/handle_mm_fault/make_pages_present/
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 20:40 ` William Lee Irwin III
2003-09-17 20:43 ` William Lee Irwin III
@ 2003-09-17 21:30 ` Alan Stern
2003-09-17 22:44 ` William Lee Irwin III
1 sibling, 1 reply; 10+ messages in thread
From: Alan Stern @ 2003-09-17 21:30 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Dave Hansen, linux-mm
On Wed, 17 Sep 2003, William Lee Irwin III wrote:
> On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> > 1. What's the proper way for a kernel thread running in a module to get
> > hold of an mm_struct or to keep the one it had before calling daemonize()?
>
> Well, you can get one from the slab allocator, though I expect there will
> be a followup question here...
Yes :-) The slab allocator will give me a nice piece of memory, but I
will still need to turn that into a valid mm_struct. I can't call
alloc_mm() and friends because they're not EXPORTed.
Would this work: atomically increment current->mm->users and save the
value of current->mm before calling daemonize(), then re-assign the old
value back to current->mm afterwards?
> On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> > 2. What's the proper way for a kernel thread to allocate a region of
> > userspace memory?
>
> Hmm. Sounds like you want to grab a user address space and do userspace
> stuff inside there. Maybe avoid do_execve() etc. and call sys_*() for
> everything else outright? The question itself probably wants sys_mmap()
> or some such, or handle_mm_fault() depending on what you have in mind
> for allocation.
sys_mmap() or something along those lines would be good. But I can't call
it directly because 2.6 doesn't EXPORT the sys_xxx functions. Also, I'm
not clear on whether mmap() lets you create an anonymous mapping -- one
backed by swap space rather than a file -- that's what I would want to do.
> On Wed, Sep 17, 2003 at 04:33:08PM -0400, Alan Stern wrote:
> > 3. What's the proper way to invalidate all entries in the page cache
> > that refer to a particular file?
>
> invalidate_inode_pages().
Great! I'll search through the kernel code for it.
Alan Stern
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 21:30 ` Alan Stern
@ 2003-09-17 22:44 ` William Lee Irwin III
0 siblings, 0 replies; 10+ messages in thread
From: William Lee Irwin III @ 2003-09-17 22:44 UTC (permalink / raw)
To: Alan Stern; +Cc: Dave Hansen, linux-mm
On Wed, 17 Sep 2003, William Lee Irwin III wrote:
>> Well, you can get one from the slab allocator, though I expect there will
>> be a followup question here...
On Wed, Sep 17, 2003 at 05:30:50PM -0400, Alan Stern wrote:
> Yes :-) The slab allocator will give me a nice piece of memory, but I
> will still need to turn that into a valid mm_struct. I can't call
> alloc_mm() and friends because they're not EXPORTed.
Well, alloc_mm() doesn't really do much, so it should be easily
preppable along the same lines if it absolutely has to be a module. In
truth, the mm slab should be using a ctor (the vma slab too).
On Wed, 17 Sep 2003, William Lee Irwin III wrote:
>> Hmm. Sounds like you want to grab a user address space and do userspace
>> stuff inside there. Maybe avoid do_execve() etc. and call sys_*() for
>> everything else outright? The question itself probably wants sys_mmap()
>> or some such, or handle_mm_fault() depending on what you have in mind
>> for allocation.
On Wed, Sep 17, 2003 at 05:30:50PM -0400, Alan Stern wrote:
> sys_mmap() or something along those lines would be good. But I can't call
> it directly because 2.6 doesn't EXPORT the sys_xxx functions. Also, I'm
> not clear on whether mmap() lets you create an anonymous mapping -- one
> backed by swap space rather than a file -- that's what I would want to do.
That's a pain. It's probably easier to just compile the driver in, then.
On Wed, 17 Sep 2003, William Lee Irwin III wrote:
>> invalidate_inode_pages().
On Wed, Sep 17, 2003 at 05:30:50PM -0400, Alan Stern wrote:
> Great! I'll search through the kernel code for it.
Should be in mm/filemap.c
-- wli
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-17 18:24 How best to bypass the page cache from within a kernel module? Alan Stern
2003-09-17 19:44 ` Dave Hansen
@ 2003-09-22 18:51 ` Ray Bryant
2003-09-22 19:09 ` Alan Stern
1 sibling, 1 reply; 10+ messages in thread
From: Ray Bryant @ 2003-09-22 18:51 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-mm
Alan Stern wrote:
> I'm working on a kernel module driver for Linux 2.6. One of the things
> this driver needs to do is perform a VERIFY command; which means checking
> to make sure that certain disk sectors within a file actually can be read
> without encountering a bad sector or other hardware error. Now, I realize
> that there are already issues involved with convincing the disk drive to
> read from its media rather than from its cache. But apart from that, my
> problem is how to convince Linux to read from the drive rather than from
> the page cache.
>
> One suggestion was to use O_DIRECT when opening the file, because that
> does cause reads to go directly to the hardware. The problem with this is
> that since the direct-I/O routines send file data directly to user
> buffers, they must check that the buffer addresses are valid and belong to
> the user's address space. But my code runs in a kernel thread so it has
> no current->mm (and in any case I would prefer to use my kernel-space
> buffers rather than user-space memory). It might be possible to get hold
> of an mm_struct, but it's not necessarily easy as mm_alloc() isn't
> EXPORTed. Perhaps my thread could keep its original current->mm by
> incrementing current->mm->users before calling daemonize() and setting
> current->mm back to its original value afterward. Is that legal? Having
> done so, perhaps I could use some sort of mmap() call to allocate a
> user-space buffer that would be okay for direct-I/O. What's the best way
> to do that -- what function would I have to call?
>
> However, all that seems rather roundabout. An equally acceptable solution
> would be simply to invalidate all the entries in the page cache referring
> to my file, so that reads would be forced to go to the drive. Can anyone
> tell me how to do that?
Take a look at invalidate_inode_pages()....
>
> TIA,
>
> Alan Stern
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
>
--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
raybry@sgi.com raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: How best to bypass the page cache from within a kernel module?
2003-09-22 18:51 ` Ray Bryant
@ 2003-09-22 19:09 ` Alan Stern
0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2003-09-22 19:09 UTC (permalink / raw)
To: Ray Bryant; +Cc: linux-mm
On Mon, 22 Sep 2003, Ray Bryant wrote:
> Take a look at invalidate_inode_pages()....
William Lee Irwin made the same suggestion. It turned out to be just what
I needed.
Thanks, guys!
Alan Stern
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2003-09-22 19:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-17 18:24 How best to bypass the page cache from within a kernel module? Alan Stern
2003-09-17 19:44 ` Dave Hansen
2003-09-17 19:50 ` William Lee Irwin III
2003-09-17 20:33 ` Alan Stern
2003-09-17 20:40 ` William Lee Irwin III
2003-09-17 20:43 ` William Lee Irwin III
2003-09-17 21:30 ` Alan Stern
2003-09-17 22:44 ` William Lee Irwin III
2003-09-22 18:51 ` Ray Bryant
2003-09-22 19:09 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox