linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: jim@rubylane.com
To: Martin.Bligh@us.ibm.com
Cc: jim@rubylane.com, Andrew Morton <akpm@zip.com.au>, linux-mm@kvack.org
Subject: Re: 2.2.20 suspends everything then recovers during heavy I/O
Date: Fri, 5 Apr 2002 11:52:40 -0800 (PST)	[thread overview]
Message-ID: <20020405195240.22435.qmail@london.rubylane.com> (raw)
In-Reply-To: <1648866003.1018003647@[10.10.2.3]> from "Martin J. Bligh" at Apr 05, 2002 10:47:28 AM

But tar & rsync don't work on raw partitions.  There are lots of times
when individual file data has to be processed, and lots of it, like
running stats on large web server logs, compressing the logs, copying
a DB backup to a remote machine for offsite backup, sorting a huge
file, etc. where putting the file in the buffer cache is a waste.

Even in the case of a sort, where you are going to go back and
reference the data again, these often work by reading sequential
through the data once, sorting the keys, then reordering the file.
The initial sequential scan won't benefit from the buffer cache unless
the whole file fits in memory.  The reorder pass would benefit.

An idea I had a while back was to keep track of whether a file has
been randomly positioned or not.  If not, and you have more than a
certain amount of the file already in memory, start reusing buffers
with early parts of the file instead of hogging more.  To me this
is not as good of a solution because there are likely many cases
where this will hurt performance, like repeatedly fgreping a file
larger than the threshold.  If there was a manual tweak, it would
be guaranteed to be used in only the right places.  If tar used
the flag, I guess it's theoretically possible someone would do
repeated tars of the same data, but that seems improbable.  And if
they do that and it takes longer, it's still probably better than
hogging buffers.  Who cares how long a tar takes?

Jim

> 
> > What would be really great is some way to indicate, maybe with an
> > O_SEQ flag or something, that an application is going to sequentially
> > access a file, so cacheing it is a no-win proposition.  Production
> > servers do have situations where lots of data has to be copied or
> > accessed, for example, to do a backup, but doing a backup shouldn't
> > mean that all of the important stuff gets continuously thrown out of
> > memory while the backup is running.  Saving metadata during a backup
> > is useful.  Saving file data isn't.  It's seems hard to do this
> > without an application hint because I may scan a database
> > sequentially but I'd still want those buffers to stay resident.
> 
> Doesn't the raw IO stuff do this, effectively?
> 
> M.
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

  reply	other threads:[~2002-04-05 19:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-04-04 22:06 Jim Wilcoxson
2002-04-05  5:29 ` Andrew Morton
2002-04-05 18:27   ` jim
2002-04-05 18:47     ` Martin J. Bligh
2002-04-05 19:52       ` jim [this message]
2002-04-05 19:55         ` jim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020405195240.22435.qmail@london.rubylane.com \
    --to=jim@rubylane.com \
    --cc=Martin.Bligh@us.ibm.com \
    --cc=akpm@zip.com.au \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox