* [PATCH] mm/readahead.c: fix regression caused by small readahead limit
@ 2015-08-20 16:19 Roman Gushchin
2015-08-20 19:23 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Roman Gushchin @ 2015-08-20 16:19 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Roman Gushchin, Linus Torvalds, Raghavendra K T,
Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton
Effectively reverts: 6d2be915e589b58cb11418cbe1f22ff90732b6ac
("mm/readahead.c: fix readahead failure for memoryless NUMA nodes and
limit readahead pages").
This commit causes significant i/o performance regression on large
RAID disks. Limiting maximal readahead size by 2Mb is not suitable for
this case, alghough previous logic (based on free memory size
on current NUMA node) is much better.
To avoid regression in case of memoryless NUMA we can still use
MAX_READAHEAD constant, if current node has no (or less) free memory.
before:
$ dd if=/dev/md2 of=/dev/null bs=100M count=100
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 12.6441 s, 829 MB/s
$ dd if=/dev/md2 of=/dev/null bs=100M count=100 iflag=direct
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 9.49377 s, 1.1 GB/s
after:
$ dd if=/dev/md2 of=/dev/null bs=100M count=100
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 9.18119 s, 1.1 GB/s
$ dd if=/dev/md2 of=/dev/null bs=100M count=100 iflag=direct
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 9.34751 s, 1.1 GB/s
(It's 8 disks RAID 5 with 1024k chunk.)
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
mm/readahead.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 60cd846..93a00b3 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -239,7 +239,12 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
*/
unsigned long max_sane_readahead(unsigned long nr)
{
- return min(nr, MAX_READAHEAD);
+ unsigned long max_sane;
+
+ max_sane = max(MAX_READAHEAD,
+ (node_page_state(numa_node_id(), NR_INACTIVE_FILE) +
+ node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
+ return min(nr, max_sane);
}
/*
--
2.4.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm/readahead.c: fix regression caused by small readahead limit 2015-08-20 16:19 [PATCH] mm/readahead.c: fix regression caused by small readahead limit Roman Gushchin @ 2015-08-20 19:23 ` Linus Torvalds 2015-08-21 17:12 ` [PATCH] mm: use only per-device " Roman Gushchin 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2015-08-20 19:23 UTC (permalink / raw) To: Roman Gushchin Cc: linux-mm, Linux Kernel Mailing List, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton On Thu, Aug 20, 2015 at 9:19 AM, Roman Gushchin <klamm@yandex-team.ru> wrote: > + max_sane = max(MAX_READAHEAD, > + (node_page_state(numa_node_id(), NR_INACTIVE_FILE) + > + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2); No, we're not re-introducing the idiotic and broken per-node logic. There was a reason it was killed. There have been other patches suggested that actually use _valid_ heuristics, this is not one of them. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] mm: use only per-device readahead limit 2015-08-20 19:23 ` Linus Torvalds @ 2015-08-21 17:12 ` Roman Gushchin 2015-08-21 18:17 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Roman Gushchin @ 2015-08-21 17:12 UTC (permalink / raw) To: linux-mm, Linus Torvalds Cc: linux-kernel, Roman Gushchin, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton Maximal readahead size is limited now by two values: 1) by global 2Mb constant (MAX_READAHEAD in max_sane_readahead()) 2) by configurable per-device value* (bdi->ra_pages) There are devices, which require custom readahead limit. For instance, for RAIDs it's calculated as number of devices multiplied by chunk size times 2. Readahead size can never be larger than bdi->ra_pages * 2 value (POSIX_FADV_SEQUNTIAL doubles readahead size). If so, why do we need two limits? I suggest to completely remove this max_sane_readahead() stuff and use per-device readahead limit everywhere. Also, using right readahead size for RAID disks can significantly increase i/o performance: before: dd if=/dev/md2 of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 12.9741 s, 808 MB/s after: $ dd if=/dev/md2 of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 8.91317 s, 1.2 GB/s (It's an 8-disks RAID5 storage). This patch doesn't change sys_readahead and madvise(MADV_WILLNEED) behavior introduced by commit 6d2be915e589b58cb11418cbe1f22ff90732b6ac ("mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages"). Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> --- include/linux/mm.h | 2 -- mm/filemap.c | 8 +++----- mm/readahead.c | 14 ++------------ 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2e872f9..a62abdd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1942,8 +1942,6 @@ void page_cache_async_readahead(struct address_space *mapping, pgoff_t offset, unsigned long size); -unsigned long max_sane_readahead(unsigned long nr); - /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ extern int expand_stack(struct vm_area_struct *vma, unsigned long address); diff --git a/mm/filemap.c b/mm/filemap.c index 1283fc8..0e1ebef 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1807,7 +1807,6 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, struct file *file, pgoff_t offset) { - unsigned long ra_pages; struct address_space *mapping = file->f_mapping; /* If we don't want any read-ahead, don't bother */ @@ -1836,10 +1835,9 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, /* * mmap read-around */ - ra_pages = max_sane_readahead(ra->ra_pages); - ra->start = max_t(long, 0, offset - ra_pages / 2); - ra->size = ra_pages; - ra->async_size = ra_pages / 4; + ra->start = max_t(long, 0, offset - ra->ra_pages / 2); + ra->size = ra->ra_pages; + ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); } diff --git a/mm/readahead.c b/mm/readahead.c index 60cd846..b4937a6 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -213,7 +213,7 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages)) return -EINVAL; - nr_to_read = max_sane_readahead(nr_to_read); + nr_to_read = min(nr_to_read, inode_to_bdi(mapping->host)->ra_pages); while (nr_to_read) { int err; @@ -232,16 +232,6 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, return 0; } -#define MAX_READAHEAD ((512*4096)/PAGE_CACHE_SIZE) -/* - * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a - * sensible upper limit. - */ -unsigned long max_sane_readahead(unsigned long nr) -{ - return min(nr, MAX_READAHEAD); -} - /* * Set the initial window size, round to next power of 2 and square * for small size, x 4 for medium, and x 2 for large @@ -380,7 +370,7 @@ ondemand_readahead(struct address_space *mapping, bool hit_readahead_marker, pgoff_t offset, unsigned long req_size) { - unsigned long max = max_sane_readahead(ra->ra_pages); + unsigned long max = ra->ra_pages; pgoff_t prev_offset; /* -- 2.4.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: use only per-device readahead limit 2015-08-21 17:12 ` [PATCH] mm: use only per-device " Roman Gushchin @ 2015-08-21 18:17 ` Linus Torvalds 2015-08-21 20:21 ` Roman Gushchin 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2015-08-21 18:17 UTC (permalink / raw) To: Roman Gushchin Cc: linux-mm, Linux Kernel Mailing List, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton On Fri, Aug 21, 2015 at 10:12 AM, Roman Gushchin <klamm@yandex-team.ru> wrote: > > There are devices, which require custom readahead limit. > For instance, for RAIDs it's calculated as number of devices > multiplied by chunk size times 2. So afaik, the default read-ahead size is 128kB, which is actually smaller than the old 512-page limit. Which means that you probably changed "ra_pages" somehow. Is it some system tool that does that automatically, and if so based on what, exactly? I'm also slightly worried about the fact that now the max read-ahead may actually be zero, and/or basically infinite (there's a ioctl to set it that only tests that it's not negative). Does everything react ok to that? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: use only per-device readahead limit 2015-08-21 18:17 ` Linus Torvalds @ 2015-08-21 20:21 ` Roman Gushchin 2015-08-21 20:44 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Roman Gushchin @ 2015-08-21 20:21 UTC (permalink / raw) To: Linus Torvalds Cc: linux-mm, Linux Kernel Mailing List, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton 21.08.2015, 21:17, "Linus Torvalds" <torvalds@linux-foundation.org>: > On Fri, Aug 21, 2015 at 10:12 AM, Roman Gushchin <klamm@yandex-team.ru> wrote: >> ?There are devices, which require custom readahead limit. >> ?For instance, for RAIDs it's calculated as number of devices >> ?multiplied by chunk size times 2. > > So afaik, the default read-ahead size is 128kB, which is actually > smaller than the old 512-page limit. > > Which means that you probably changed "ra_pages" somehow. Is it some > system tool that does that automatically, and if so based on what, > exactly? It's just a raid driver. For instance, drivers/ms/raid5.c:6898 . On my setup I got unexpectedly even slight perfomance increase over O_DIRECT case and over old memory-based readahead limit, as you can see from numbers in the commit message (1.2GB/s vs 1.1 GB/s). So, I like an idea to delegate the readahead limit calculation to the underlying i/o level. > I'm also slightly worried about the fact that now the max read-ahead > may actually be zero, For "normal" readahead nothing changes. Only readahead syscall and madvise(MADV_WILL_NEED) cases are affected. I think, it's ok to do nothing, if readahead was deliberately disabled. > and/or basically infinite (there's a ioctl to > set it that only tests that it's not negative). Does everything react > ok to that? It's an open question, if we have to add some checks to avoid miss-configuration. In any case, we can check the limit on setting rather then adjust them dynamically. -- Roman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: use only per-device readahead limit 2015-08-21 20:21 ` Roman Gushchin @ 2015-08-21 20:44 ` Linus Torvalds 2015-08-24 11:57 ` [PATCH v2] " Roman Gushchin 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2015-08-21 20:44 UTC (permalink / raw) To: Roman Gushchin Cc: linux-mm, Linux Kernel Mailing List, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton On Fri, Aug 21, 2015 at 1:21 PM, Roman Gushchin <klamm@yandex-team.ru> wrote: > > It's just a raid driver. For instance, drivers/md/raid5.c:6898 . Ok. That makes me a bit less nervous. I was worried there was some admin program out there that just ups the readahead on peoples devices, which would mean that ra_pages is some random value chosen by crazy user space people. > So, I like an idea to delegate the readahead limit calculation to the underlying i/o level. Yeah, I'm not against it either. It's just that historically we've had some issues with people over-doing readahead (because it often helps some made-up microbenchmark), and then we end up with latency issues when somebody does a multi-gigabyte readahead... Iirc, we had exactly that problem with the readahead() system call at some point (long ago). But if it's just the default ra_pages, then that should be ok. I think the kernel defaults are generally sane, and I hope there isn't some crazy distro that ends up mucking with this. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] mm: use only per-device readahead limit 2015-08-21 20:44 ` Linus Torvalds @ 2015-08-24 11:57 ` Roman Gushchin 2015-08-24 12:41 ` Konstantin Khlebnikov 0 siblings, 1 reply; 8+ messages in thread From: Roman Gushchin @ 2015-08-24 11:57 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Roman Gushchin, Linus Torvalds, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton, Konstantin Khlebnikov Maximal readahead size is limited now by two values: 1) by global 2Mb constant (MAX_READAHEAD in max_sane_readahead()) 2) by configurable per-device value* (bdi->ra_pages) There are devices, which require custom readahead limit. For instance, for RAIDs it's calculated as number of devices multiplied by chunk size times 2. Readahead size can never be larger than bdi->ra_pages * 2 value (POSIX_FADV_SEQUNTIAL doubles readahead size). If so, why do we need two limits? I suggest to completely remove this max_sane_readahead() stuff and use per-device readahead limit everywhere. Also, using right readahead size for RAID disks can significantly increase i/o performance: before: dd if=/dev/md2 of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 12.9741 s, 808 MB/s after: $ dd if=/dev/md2 of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 8.91317 s, 1.2 GB/s (It's an 8-disks RAID5 storage). This patch doesn't change sys_readahead and madvise(MADV_WILLNEED) behavior introduced by commit 6d2be915e589b58cb11418cbe1f22ff90732b6ac ("mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages"). V2: Konstantin Khlebnikov noticed, that if readahead is completely disabled, force_page_cache_readahead() will not read anything. This function is used for sync reads (if FMODE_RANDOM flag is set). So, to guarantee read progress it's necessary to read at least 1 page. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> --- include/linux/mm.h | 2 -- mm/filemap.c | 8 +++----- mm/readahead.c | 18 ++++++------------ 3 files changed, 9 insertions(+), 19 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2e872f9..a62abdd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1942,8 +1942,6 @@ void page_cache_async_readahead(struct address_space *mapping, pgoff_t offset, unsigned long size); -unsigned long max_sane_readahead(unsigned long nr); - /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ extern int expand_stack(struct vm_area_struct *vma, unsigned long address); diff --git a/mm/filemap.c b/mm/filemap.c index 1283fc8..0e1ebef 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1807,7 +1807,6 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, struct file *file, pgoff_t offset) { - unsigned long ra_pages; struct address_space *mapping = file->f_mapping; /* If we don't want any read-ahead, don't bother */ @@ -1836,10 +1835,9 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, /* * mmap read-around */ - ra_pages = max_sane_readahead(ra->ra_pages); - ra->start = max_t(long, 0, offset - ra_pages / 2); - ra->size = ra_pages; - ra->async_size = ra_pages / 4; + ra->start = max_t(long, 0, offset - ra->ra_pages / 2); + ra->size = ra->ra_pages; + ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); } diff --git a/mm/readahead.c b/mm/readahead.c index 60cd846..7eb844c 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -213,7 +213,11 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages)) return -EINVAL; - nr_to_read = max_sane_readahead(nr_to_read); + /* + * Read at least 1 page, even if readahead is completely disabled. + */ + nr_to_read = min(nr_to_read, max(inode_to_bdi(mapping->host)->ra_pages, + 1ul)); while (nr_to_read) { int err; @@ -232,16 +236,6 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, return 0; } -#define MAX_READAHEAD ((512*4096)/PAGE_CACHE_SIZE) -/* - * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a - * sensible upper limit. - */ -unsigned long max_sane_readahead(unsigned long nr) -{ - return min(nr, MAX_READAHEAD); -} - /* * Set the initial window size, round to next power of 2 and square * for small size, x 4 for medium, and x 2 for large @@ -380,7 +374,7 @@ ondemand_readahead(struct address_space *mapping, bool hit_readahead_marker, pgoff_t offset, unsigned long req_size) { - unsigned long max = max_sane_readahead(ra->ra_pages); + unsigned long max = ra->ra_pages; pgoff_t prev_offset; /* -- 2.4.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm: use only per-device readahead limit 2015-08-24 11:57 ` [PATCH v2] " Roman Gushchin @ 2015-08-24 12:41 ` Konstantin Khlebnikov 0 siblings, 0 replies; 8+ messages in thread From: Konstantin Khlebnikov @ 2015-08-24 12:41 UTC (permalink / raw) To: Roman Gushchin, linux-mm Cc: linux-kernel, Linus Torvalds, Raghavendra K T, Jan Kara, Wu Fengguang, David Rientjes, Andrew Morton On 24.08.2015 14:57, Roman Gushchin wrote: > Maximal readahead size is limited now by two values: > 1) by global 2Mb constant (MAX_READAHEAD in max_sane_readahead()) > 2) by configurable per-device value* (bdi->ra_pages) > > There are devices, which require custom readahead limit. > For instance, for RAIDs it's calculated as number of devices > multiplied by chunk size times 2. > > Readahead size can never be larger than bdi->ra_pages * 2 value > (POSIX_FADV_SEQUNTIAL doubles readahead size). > > If so, why do we need two limits? > I suggest to completely remove this max_sane_readahead() stuff and > use per-device readahead limit everywhere. > > Also, using right readahead size for RAID disks can significantly > increase i/o performance: > > before: > dd if=/dev/md2 of=/dev/null bs=100M count=100 > 100+0 records in > 100+0 records out > 10485760000 bytes (10 GB) copied, 12.9741 s, 808 MB/s > > after: > $ dd if=/dev/md2 of=/dev/null bs=100M count=100 > 100+0 records in > 100+0 records out > 10485760000 bytes (10 GB) copied, 8.91317 s, 1.2 GB/s > > (It's an 8-disks RAID5 storage). > > This patch doesn't change sys_readahead and madvise(MADV_WILLNEED) > behavior introduced by commit > 6d2be915e589b58cb11418cbe1f22ff90732b6ac ("mm/readahead.c: fix > readahead failure for memoryless NUMA nodes and limit readahead pages"). > > V2: > Konstantin Khlebnikov noticed, that if readahead is completely > disabled, force_page_cache_readahead() will not read anything. > This function is used for sync reads (if FMODE_RANDOM flag is set). > So, to guarantee read progress it's necessary to read at least 1 page. After second thought: this isn't important. V1 is fine. page_cache_sync_readahead checks "if (!ra->ra_pages)" before and never calls force_page_cache_readahead if readahead is disabled. Anyway, this function doesn't return references to pages. All users must be ready to handle non-present or non-uptodate pages. But this probably never happened before so all callsites should be reviewed: for example splice always re-lookups pages after ->readpage() (I guess page can be truncated here) while some other users use the same page reference. > > Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Wu Fengguang <fengguang.wu@intel.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> > --- > include/linux/mm.h | 2 -- > mm/filemap.c | 8 +++----- > mm/readahead.c | 18 ++++++------------ > 3 files changed, 9 insertions(+), 19 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 2e872f9..a62abdd 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1942,8 +1942,6 @@ void page_cache_async_readahead(struct address_space *mapping, > pgoff_t offset, > unsigned long size); > > -unsigned long max_sane_readahead(unsigned long nr); > - > /* Generic expand stack which grows the stack according to GROWS{UP,DOWN} */ > extern int expand_stack(struct vm_area_struct *vma, unsigned long address); > > diff --git a/mm/filemap.c b/mm/filemap.c > index 1283fc8..0e1ebef 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1807,7 +1807,6 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, > struct file *file, > pgoff_t offset) > { > - unsigned long ra_pages; > struct address_space *mapping = file->f_mapping; > > /* If we don't want any read-ahead, don't bother */ > @@ -1836,10 +1835,9 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, > /* > * mmap read-around > */ > - ra_pages = max_sane_readahead(ra->ra_pages); > - ra->start = max_t(long, 0, offset - ra_pages / 2); > - ra->size = ra_pages; > - ra->async_size = ra_pages / 4; > + ra->start = max_t(long, 0, offset - ra->ra_pages / 2); > + ra->size = ra->ra_pages; > + ra->async_size = ra->ra_pages / 4; > ra_submit(ra, mapping, file); > } > > diff --git a/mm/readahead.c b/mm/readahead.c > index 60cd846..7eb844c 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -213,7 +213,11 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, > if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages)) > return -EINVAL; > > - nr_to_read = max_sane_readahead(nr_to_read); > + /* > + * Read at least 1 page, even if readahead is completely disabled. > + */ > + nr_to_read = min(nr_to_read, max(inode_to_bdi(mapping->host)->ra_pages, > + 1ul)); > while (nr_to_read) { > int err; > > @@ -232,16 +236,6 @@ int force_page_cache_readahead(struct address_space *mapping, struct file *filp, > return 0; > } > > -#define MAX_READAHEAD ((512*4096)/PAGE_CACHE_SIZE) > -/* > - * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a > - * sensible upper limit. > - */ > -unsigned long max_sane_readahead(unsigned long nr) > -{ > - return min(nr, MAX_READAHEAD); > -} > - > /* > * Set the initial window size, round to next power of 2 and square > * for small size, x 4 for medium, and x 2 for large > @@ -380,7 +374,7 @@ ondemand_readahead(struct address_space *mapping, > bool hit_readahead_marker, pgoff_t offset, > unsigned long req_size) > { > - unsigned long max = max_sane_readahead(ra->ra_pages); > + unsigned long max = ra->ra_pages; > pgoff_t prev_offset; > > /* > -- Konstantin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-08-24 12:42 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-08-20 16:19 [PATCH] mm/readahead.c: fix regression caused by small readahead limit Roman Gushchin 2015-08-20 19:23 ` Linus Torvalds 2015-08-21 17:12 ` [PATCH] mm: use only per-device " Roman Gushchin 2015-08-21 18:17 ` Linus Torvalds 2015-08-21 20:21 ` Roman Gushchin 2015-08-21 20:44 ` Linus Torvalds 2015-08-24 11:57 ` [PATCH v2] " Roman Gushchin 2015-08-24 12:41 ` Konstantin Khlebnikov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox