From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <20020405195240.22435.qmail@london.rubylane.com> From: jim@rubylane.com Subject: Re: 2.2.20 suspends everything then recovers during heavy I/O Date: Fri, 5 Apr 2002 11:52:40 -0800 (PST) In-Reply-To: <1648866003.1018003647@[10.10.2.3]> from "Martin J. Bligh" at Apr 05, 2002 10:47:28 AM MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Martin.Bligh@us.ibm.com Cc: jim@rubylane.com, Andrew Morton , linux-mm@kvack.org List-ID: But tar & rsync don't work on raw partitions. There are lots of times when individual file data has to be processed, and lots of it, like running stats on large web server logs, compressing the logs, copying a DB backup to a remote machine for offsite backup, sorting a huge file, etc. where putting the file in the buffer cache is a waste. Even in the case of a sort, where you are going to go back and reference the data again, these often work by reading sequential through the data once, sorting the keys, then reordering the file. The initial sequential scan won't benefit from the buffer cache unless the whole file fits in memory. The reorder pass would benefit. An idea I had a while back was to keep track of whether a file has been randomly positioned or not. If not, and you have more than a certain amount of the file already in memory, start reusing buffers with early parts of the file instead of hogging more. To me this is not as good of a solution because there are likely many cases where this will hurt performance, like repeatedly fgreping a file larger than the threshold. If there was a manual tweak, it would be guaranteed to be used in only the right places. If tar used the flag, I guess it's theoretically possible someone would do repeated tars of the same data, but that seems improbable. And if they do that and it takes longer, it's still probably better than hogging buffers. Who cares how long a tar takes? Jim > > > What would be really great is some way to indicate, maybe with an > > O_SEQ flag or something, that an application is going to sequentially > > access a file, so cacheing it is a no-win proposition. Production > > servers do have situations where lots of data has to be copied or > > accessed, for example, to do a backup, but doing a backup shouldn't > > mean that all of the important stuff gets continuously thrown out of > > memory while the backup is running. Saving metadata during a backup > > is useful. Saving file data isn't. It's seems hard to do this > > without an application hint because I may scan a database > > sequentially but I'd still want those buffers to stay resident. > > Doesn't the raw IO stuff do this, effectively? > > M. > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/