linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* readahead for strided IO
@ 2015-04-29  7:52 Scheffenegger, Richard
  0 siblings, 0 replies; only message in thread
From: Scheffenegger, Richard @ 2015-04-29  7:52 UTC (permalink / raw)
  To: linux-mm; +Cc: trond


[-- Attachment #1.1: Type: text/plain, Size: 3231 bytes --]

Hi,

I hope that you could help me out. We are currently investigating a performance issue involving a NFSv3 server (our applicance), and a Linux application doing IO against it.

The IO pattern are strictly sequential, but strided reads: the application requests 4k, skips 4k, reads 4k, skips 4k, ... in a monotonic increasing pattern, and apparently using blocking read() calls. Unfortunately, I don't know exactly, if the file handle was created using O_RDONLY or O_RDWR, and O_DIRECT or O_SYNC were specified.

As you can imagine, the RTT overhead (10s of usec per IO) of individual 4k NFS reads, which are issued by the NFS client only once the application actually requests them, is a severe limitation in terms of IOPS  (bandwidth is around 25-30MB/s, IOPS around 7000), even though the storage system / NFS server is detecting the strided reads and serving them directly from it's pre-fetch cache (few usec latency there).

Complicating the issue is that the application behaving so inefficient is closed source. The best approaches would obviously be for the application to request larger blocks of data and once in application memory, discard about half of it (the strides are broken every ~20-30 IOs, and interspaced with 16k reads, followed by strided reads aligned to the other odd/even 4k block offsets in the file), or to explicitly make use of the readahead() facility of linux.


The reason I write this is my curiosity, if there would be any way to configure the linux readahead facitily to be really aggressive on a particular nfs mount; we checked the /sys/class/bdi settings for the mount in question, and increased the read_ahead_kb, but that didn't change anything; I guess what would be necessary was a flag to have mm/readahead kick in for every read, regardless if it's considered a sequential read or not...

Finally, are there ways to extract statistical information from mm/readahead, ie. if it was actually called (not that due to some flags used by the application, it's completely bypassed to begin with), and when/why/how it decided to do the IO (or not) it does?

Thanks a lot!



Richard Scheffenegger
Storage Infrastructure Architect

NetApp Austria GmbH
+43 676 6543146 Tel
+43 1 3676811-3100 Fax
rs@netapp.com<mailto:rs@netapp.com>
www.netapp.at<http://www.netapp.at>

[Unbound Cloud(tm)]<http://www.netapp.com/au/campaigns/unboundcloud/index.aspx?ref_source=udf-cld--16113>
Die neue Vision des Cloud-
Datenmanagements<http://www.netapp.com/de/campaigns/unboundcloud/index.aspx?ref_source=udf-cld--16117>

[Description: Description: Description: Description: Description: Description: Facebook]<http://www.facebook.com/NetAppAustria>

Facebook<http://www.facebook.com/NetAppAustria>

[Description: Description: Description: Description: Description: Description: Twitter]<http://twitter.com/NetAppAustria>

Twitter<http://twitter.com/NetAppAustria>

[Description: Description: Description: Description: Description: Description: YouTube]<http://www.youtube.com/user/NetAppTV>

YouTube<http://www.youtube.com/user/NetAppTV>

[Description: Description: cid:image001.png@01CD2C88.A2795F10]<http://www.xing.com/companies/netappaustriagmbh/about>




[-- Attachment #1.2: Type: text/html, Size: 12793 bytes --]

[-- Attachment #2: image001.jpg --]
[-- Type: image/jpeg, Size: 7167 bytes --]

[-- Attachment #3: image002.png --]
[-- Type: image/png, Size: 381 bytes --]

[-- Attachment #4: image003.png --]
[-- Type: image/png, Size: 574 bytes --]

[-- Attachment #5: image004.png --]
[-- Type: image/png, Size: 413 bytes --]

[-- Attachment #6: image005.png --]
[-- Type: image/png, Size: 2648 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2015-04-29  7:53 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-29  7:52 readahead for strided IO Scheffenegger, Richard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox