* Re: PATCH: mark_page_accessed() for read()s on non-page boundaries
[not found] ` <20041207135205.783860cf.akpm@osdl.org>
@ 2004-12-07 22:05 ` Miquel van Smoorenburg
2004-12-07 22:28 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Miquel van Smoorenburg @ 2004-12-07 22:05 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
On Tue, 07 Dec 2004 22:52:05, Andrew Morton wrote:
>
> You should cc a mailing list.
OK, I've added linux-mm back in. Sorry.
> Miquel van Smoorenburg <miquels@cistron.nl> wrote:
> >
> > When reading a (partial) page from disk using read(), the kernel only
> > marks the page as "accessed" if the read started at a page boundary.
> > This means that files that are accessed randomly at non-page boundaries
> > (usually database style files) will not be cached properly.
>
> Yeah. Touching the page on every read can be expensive for small reads, so
> I left that code out very early in 2.5. Your "is this page different from
> the last one" is a reasonable approach.
>
> > The patch below uses the readahead state instead. If a page is read(),
> > it is marked as "accessed" if the previous read() was for a different
> > page, whatever the offset in the page.
>
> Except a smart database programmer will have done
> posix_fadvise(POSIX_FADV_RANDOM) which turns off readahead. So
> page_cache_readahead() never updates prev_page.
Yes. Blech.
> So you'll need something like this as well:
>
>
> --- 25/mm/readahead.c~a Tue Dec 7 13:50:04 2004
> +++ 25-akpm/mm/readahead.c Tue Dec 7 13:50:58 2004
> @@ -369,8 +369,10 @@ page_cache_readahead(struct address_spac
> goto out; /* Maximally shrunk */
>
> max = get_max_readahead(ra);
> - if (max == 0)
> + if (max == 0) {
> + ra->prev_page = offset; /* For do_generic_mapping_read() */
> goto out; /* No readahead */
> + }
>
> orig_next_size = ra->next_size;
OK, got it. Will go and play with that and posix_fadvise(POSIX_FADV_RANDOM)
some more.
> Have you any testing results to justify the change, btw?
Ok some background info. I'm running the Diablo usenet news server,
which uses a large history database that needs to be fast. Because
of lots of streaming I/O, the history database pages keep getting
thrown out of memory. Many people run the history database on a
laaaarge ramdisk or in /dev/shm to get the system to keep the
database in RAM.
Because of that I developed something called LINUX_FADV_STICKY support
which makes the kernel keep read() pages from a file in core more
aggressively. Like it does when you use mmap() with swappiness
tuned down to 20 or so.
And when developing and testing LINUX_FADV_STICKY I discovered
many pages weren't being marked as accessed in the first place,
hence this patch.
This patch on its own appears to help a bit too, but not enough-
the LINUX_FADV_STICKY patch makes the app scream though, without
using a ramdisk..
So I really need it for the LINUX_FADV_STICKY stuff, but it appeared
to me to be a standalone bug as well, that's why there's this
seperate patch.
If you want me to run some tests, sure - if you have advice on which
ones would be appropriate I'll give it a go.
Thanks,
Mike.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PATCH: mark_page_accessed() for read()s on non-page boundaries
2004-12-07 22:05 ` PATCH: mark_page_accessed() for read()s on non-page boundaries Miquel van Smoorenburg
@ 2004-12-07 22:28 ` Andrew Morton
2004-12-08 23:33 ` Miquel van Smoorenburg
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2004-12-07 22:28 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: linux-mm
Miquel van Smoorenburg <miquels@cistron.nl> wrote:
>
> > --- 25/mm/readahead.c~a Tue Dec 7 13:50:04 2004
> > +++ 25-akpm/mm/readahead.c Tue Dec 7 13:50:58 2004
> > @@ -369,8 +369,10 @@ page_cache_readahead(struct address_spac
> > goto out; /* Maximally shrunk */
> >
> > max = get_max_readahead(ra);
> > - if (max == 0)
> > + if (max == 0) {
> > + ra->prev_page = offset; /* For do_generic_mapping_read() */
> > goto out; /* No readahead */
> > + }
> >
> > orig_next_size = ra->next_size;
>
> OK, got it. Will go and play with that and posix_fadvise(POSIX_FADV_RANDOM)
> some more.
It'll probably need to handle the "Maximally shrunk" case too. But I've
locally merged some readahead rework from Steve Pratt and Ram Pai, and it
looks like the changes will be simpler in that case:
page_cache_readahead(struct address_space *mapping, struct file_ra_state *ra,
struct file *filp, unsigned long offset,
unsigned long req_size)
{
unsigned long max, min;
unsigned long newsize = req_size;
unsigned long block;
/*
* Here we detect the case where the application is performing
* sub-page sized reads. We avoid doing extra work and bogusly
* perturbing the readahead window expansion logic.
* If size is zero, there is no read ahead window so we need one
*/
if (offset == ra->prev_page && req_size == 1 && ra->size != 0)
goto out;
Still. Work against -rc3 and I can fix stuff up.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PATCH: mark_page_accessed() for read()s on non-page boundaries
2004-12-07 22:28 ` Andrew Morton
@ 2004-12-08 23:33 ` Miquel van Smoorenburg
0 siblings, 0 replies; 3+ messages in thread
From: Miquel van Smoorenburg @ 2004-12-08 23:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
According to Andrew Morton:
> It'll probably need to handle the "Maximally shrunk" case too. But I've
> locally merged some readahead rework from Steve Pratt and Ram Pai, and it
> looks like the changes will be simpler in that case:
It looks like I can just set ra->prev_page unconditionally...
I redid the patch against 2.6.10-rc3 and ran some tests:
- Boot kernel with mem=128M
- create a testfile of size 8 MB on a partition. Unmount/mount.
- then generate about 10 MB/sec streaming writes
for i in `seq 1 1000`
do
dd if=/dev/zero of=junkfile.$i bs=1M count=10
sync
cat junkfile.$i > /dev/null
sleep 1
done
- use an application that reads 128 bytes 64000 times from a
random offset in the 64 MB testfile.
1. Linux 2.6.10-rc3 vanilla, no streaming writes:
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.22s system 5% cpu 4.456 total
2. Linux 2.6.10-rc3 vanilla, streaming writes:
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.16s system 2% cpu 7.667 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.37s system 1% cpu 23.294 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.02s user 0.99s system 1% cpu 1:11.52 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.21s system 2% cpu 10.273 total
3. Linux 2.6.10-rc3 with read-page-access.patch , streaming writes:
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.02s user 0.21s system 3% cpu 7.634 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.04s user 0.22s system 2% cpu 9.588 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.02s user 0.12s system 24% cpu 0.563 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.13s system 98% cpu 0.163 total
As expected, with the read-page-access.patch, the kernel keeps
the 8 MB testfile cached as expected, while without it,
it doesn't.
So this is useful for workloads where one smallish (wrt RAM) file
is read randomly over and over again (like heavily used database
indexes), while other I/O is going on. Plain 2.6 caches those files
poorly, if the app uses plain read().
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PATCH: mark_page_accessed() for read()s on non-page boundaries
When reading a (partial) page from disk using read(), the kernel only
marks the page as "accessed" if the read started at a page boundary.
This means that files that are accessed randomly at non-page boundaries
(usually database style files) will not be cached properly.
The patch below uses the readahead state instead. If a page is read(),
it is marked as "accessed" if the previous read() was for a different
page, whatever the offset in the page.
Signed-Off-By: Miquel van Smoorenburg <miquels@cistron.nl>
diff -ruN linux-2.6.10-rc3.orig/mm/filemap.c linux-2.6.10-rc3/mm/filemap.c
--- linux-2.6.10-rc3.orig/mm/filemap.c 2004-12-08 16:05:17.000000000 +0100
+++ linux-2.6.10-rc3/mm/filemap.c 2004-12-08 16:50:17.000000000 +0100
@@ -689,6 +689,7 @@
{
struct inode *inode = mapping->host;
unsigned long index, end_index, offset;
+ unsigned long prev_page;
loff_t isize;
struct page *cached_page;
int error;
@@ -719,6 +720,8 @@
}
nr = nr - offset;
+ prev_page = ra.next_size ? ra.prev_page : -1UL;
+
cond_resched();
page_cache_readahead(mapping, &ra, filp, index);
@@ -726,10 +729,13 @@
page = find_get_page(mapping, index);
if (unlikely(page == NULL)) {
handle_ra_miss(mapping, &ra, index);
+ prev_page = -1UL;
goto no_cached_page;
}
- if (!PageUptodate(page))
+ if (!PageUptodate(page)) {
+ prev_page = -1UL;
goto page_not_up_to_date;
+ }
page_ok:
/* If users can be writing to this page using arbitrary
@@ -740,9 +746,10 @@
flush_dcache_page(page);
/*
- * Mark the page accessed if we read the beginning.
+ * When (part of) the same page is read multiple times
+ * in succession, only mark it as accessed the first time.
*/
- if (!offset)
+ if (prev_page != index)
mark_page_accessed(page);
/*
diff -ruN linux-2.6.10-rc3.orig/mm/readahead.c linux-2.6.10-rc3/mm/readahead.c
--- linux-2.6.10-rc3.orig/mm/readahead.c 2004-10-18 23:53:11.000000000 +0200
+++ linux-2.6.10-rc3/mm/readahead.c 2004-12-08 16:07:40.000000000 +0100
@@ -364,6 +364,7 @@
if (ra->next_size != 0)
goto out;
}
+ ra->prev_page = offset;
if (ra->next_size == -1UL)
goto out; /* Maximally shrunk */
@@ -382,13 +383,10 @@
*/
first_access=1;
ra->next_size = max / 2;
- ra->prev_page = offset;
ra->currnt_wnd_hit++;
goto do_io;
}
- ra->prev_page = offset;
-
if (offset >= ra->start && offset <= (ra->start + ra->size)) {
/*
* A readahead hit. Either inside the window, or one
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-12-08 23:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20041207213819.GA32537@cistron.nl>
[not found] ` <20041207135205.783860cf.akpm@osdl.org>
2004-12-07 22:05 ` PATCH: mark_page_accessed() for read()s on non-page boundaries Miquel van Smoorenburg
2004-12-07 22:28 ` Andrew Morton
2004-12-08 23:33 ` Miquel van Smoorenburg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox