From: Linus Torvalds <torvalds@transmeta.com>
To: Rik van Riel <H.H.vanRiel@fys.ruu.nl>
Cc: "Stephen C. Tweedie" <sct@dcs.ed.ac.uk>, linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH] pre3 corrections!
Date: Tue, 17 Mar 1998 11:09:52 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.3.95.980317104435.5051E-100000@penguin.transmeta.com> (raw)
In-Reply-To: <Pine.LNX.3.91.980317105548.385B-100000@mirkwood.dummy.home>
[ Cc'd to Stephen and the mm-list because I'm explaining what I really
think should happen and why I've rejected some patches: sorry for the
lack of explanations but I was fairly busy last week.. ]
On Tue, 17 Mar 1998, Rik van Riel wrote:
>
> apparently friday 13th has struck you very badly,
> since some _important_ pieces of kswapd got lost...
> (and page_alloc.c had an obvious bug).
page_alloc.c had an obvious bug, but no, the changes did not "get lost".
I decided that it was time to stop with the band-aid patches, and just
wait for the problem to be fixed _correctly_, which I didn't think this
patch does:
> --- vmscan.c.pre3 Tue Mar 17 10:47:44 1998
> +++ vmscan.c Tue Mar 17 10:52:05 1998
> @@ -568,8 +568,13 @@
> while (tries--) {
> int gfp_mask;
>
> - if (free_memory_available())
> - break;
> + if (BUFFER_MEM < buffer_mem.max * num_physpages / 100) {
> + if (free_memory_available() && nr_free_pages +
> + atomic_read(&nr_async_pages) > freepages.high)
> + break;
> + if( nr_free_pages > freepages.high * 4)
> + break;
> + }
> gfp_mask = __GFP_IO;
> try_to_free_page(gfp_mask);
> /*
Basically, I consider any patch that adds another "nr_free_pages"
occurrence to be buggy.
Why? Because I have 512 MB (yes, that's half a gig) of memory, and I don't
think it is valid to compare the number of free pages against anything,
because they have so little relevance when they may not be the basic
reason for why an allocation failed. I may have 8MB worth of free memory
(aka "a lot"), but if all those twothousand pages are single pages (or
even dual pages) then NFS won't work correctly because NFS needs to
allocate about 9000 bytes for incoming full-sized packets.
That is why I want to have the "free_memory_available()" approach of
checking that there are free large-page areas still, and continuing to
swap out IN THE BACKGROUND when this isn't true.
What I _think_ the patch should look like is roughly something like
do {
if (free_memory_available())
break;
gfp_mask = __GFP_IO;
if (!try_to_free_page(gfp_mask))
break;
run_task_queue(&tq_disk); /* or whatever */
} while (--tries);
AND then "swap_tick()" should also be changed to not look at nr_free_pages
at all, but only at whether we can easily allocate new memory (ie
"free_memory_available()")
The plan would be that
- kswapd should wake up every so often until we have large areas
(swap_tick())
- kswapd would never /ever/ run for too long ("tries"), even when low on
memory. So the solution would be to make "tries" have a low enough
value that kswapd never hogs the machine, and "swap_tick()" would make
sure that while we don't hog the machine we do keep running
occasionally until everything is dandy again..
- "nr_free_pages" should go away: we end up just spending time
maintaining it, yet it doesn't really ever tell us enough about the
actul state of the machine due to fragmentation.
I could do this myself, but I also know that my particular machine usage
isn't interesting enough to guarantee that I get the tweaking anywhere
close to reasonable, which is why I've been so happy that others (ie you)
have been looking into this - you probably have more real-world usage than
I do.
At the same time I _really_ hate the "arbitrary" kinds of tests that you
cannot actually explain _why_ they are there, and the only explanation for
them is that they hide some basic problem. This is why I want to have
"free_memory_available()" be important: because that function very clearly
states whether we can allocate things atomically or not. It's not a case
of "somebody feels that this is the right thing", but a case of "when the
free_memory_available() function returns true, we can _prove_ that some
specific condition is fine (ie the ability to allocate memory)".
(Btw, I think my original "free_memory_available()" function that only
tested the highest memory order was probably a better one: the only reason
it was downgraded was due to the interactive issues due to swap_tick() and
the pageout loop disagreeing about when things should be done).
One other thing that should probably be more aggressively looked into: the
buffer cache. It used to be that the buffer cache was of supreme
importance for performance, and we needed to keep the buffer cache big
because it was also our source of shared pages. That is no longer true.
These days we should penalize the buffer cache _heavily_: _especially_
dirty data pages that have been written out should generally be thrown
away as quickly as possible instead of leaving them in memory. Not
immediately, because re-writing parts of some file is fairly common, but
they should be aged much more aggressively (but we should not age the
metadata pages all that quickly - only the pages we have used for
write-outs).
I've too often seen my machine with 200MB worth of ex-dirty buffers (ie
they are clean now and have been synched, but they still lay around just
in case) when I've written out a large file, and I just know that that is
just all wasted memory.
Again, this is something that needs to be tested on more "normal" machines
than my particular machine is - I doubt my use is even close to what most
people tend to do..
Linus
next parent reply other threads:[~1998-03-17 19:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Pine.LNX.3.91.980317105548.385B-100000@mirkwood.dummy.home>
1998-03-17 19:09 ` Linus Torvalds [this message]
1998-03-17 20:20 ` Zlatko Calusic
1998-03-17 20:20 ` Rik van Riel
1998-03-18 21:16 ` Stephen C. Tweedie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.3.95.980317104435.5051E-100000@penguin.transmeta.com \
--to=torvalds@transmeta.com \
--cc=H.H.vanRiel@fys.ruu.nl \
--cc=linux-mm@kvack.org \
--cc=sct@dcs.ed.ac.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox