From: Andrew Morton <akpm@zip.com.au>
To: Scott Kaplan <sfkaplan@cs.amherst.edu>
Cc: linux-mm@kvack.org
Subject: Re: About the free page pool
Date: Mon, 02 Sep 2002 18:11:17 -0700 [thread overview]
Message-ID: <3D740C35.9E190D04@zip.com.au> (raw)
In-Reply-To: <218D9232-BEBF-11D6-A3BE-000393829FA4@cs.amherst.edu>
Scott Kaplan wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Monday, September 2, 2002, at 04:33 PM, Andrew Morton wrote:
>
> > Scott Kaplan wrote:
> >> How important is it to maintain a list of free pages? That is, how
> >> critical is it that there be some pool of free pages from which the only
> >> bookkeeping required is the removal of that page from the free list.
> >
> > There are several reasons, all messy.
> >
> > - We need to be able to allocate pages at interrupt time. Mainly
> > for networking receive.
>
> Okay, this actually seems pretty important, and I suspected that it would
> be a critical issue. I suppose interrupts really do need to be as quick
> as possible, so doing the reclamation work during non-interrupt times is a
> good trade off. That's a sufficient argument for me.
>
> > - We sometimes need to allocate memory from *within* the context of
> > page reclaim: find a dirty page on the LRU, need to write it out,
> > need to allocate some memory to start the IO. Where does that
> > memory come from.
>
> That part could be handled without too much trouble, I believe. If we're
> ensuring that some trailing portion of the inactive list is clean and
> ready for reclamation, then when the situation above arises, just allocate
> space by taking it from the end of the inactive list. There should be no
> problem in doing that.
yes. But there are the latency issues as well. We'll have cpu-local
pool of pages with which to satisfy most of these allocations anyway,
I guess.
> > - The kernel frequently needs to perform higher-order allocations:
> > two or more physically-contiguous pages. The way we agglomerate
> > 0-order pages into higher-order pages is by coalescing them in the
> > buddy. If _all_ "free" pages are out on an LRU somewhere, we don't
> > have a higher-order pool to draw from.
>
> What is the current approach to this problem? Does the buddy allocator
> interact with the existing VM replacement policy so that, at times, the
> page occupying some particular page frame will be evicted not because it's
> the LRU page, but rather because its page frame is physically adjacent to
> some other free page? In other words, I see the need to allocate
> physically contiguous groups of pages, and that the buddy allocator is
> used for that purpose, but what influence does the buddy allocator have to
> ensure that it can fulfill those higher-order allocations?
The current approach is guess-and-giggle. It seems to work out that
there are enough physically contig pages for it to work.
The most important are 1-order allocations (8k, for kernel stacks).
The memory allocator will retry these allocations indefinitely, so
they end up succeeding, somehow.
I think there's a bug in there, actually. If all zones have enough
free memory but there are no 1-order pages available, then the 1-order
allocator tried to run page reclaim, which will say "nope, nothing
needs doing". Eventually, someone else returns some memory and coalescing
happens. It's not a very glorious part of the kernel design.
> > It's a ratio of the zone size, and there are a few thresholds in there,
> > for hysteresis, for emergency allocations, etc. See free_area_init_core(
> > )
>
> I took a look, and if I'm calculating things correctly, pages_high seems
> to be set so that the free list is at most about 0.8% of the total number
> of pages in the zone. For larger memories (above about 128 MB), that
> percentage decreases. So we're keeping a modest pool of a few hundred
> pages -- not too big a deal.
Free memory seems to bottom out at about 2.2M on a 2.5G machine.
Note that the kernel statically allocates about 10M when it boots. This
is basically a bug, and fixing it is a matter of running around shouting
at people. This will happen ;) This is the low-hanging fruit.
> [From a later email:]
> > Well, I'm at a bit of a loss to understand what the objective
> > of all this is. Is it so that we can effectively increase the
> > cache size, by not "wasting" all that free memory?
>
> While I suppose it would be to keep those few hundred pages mapped and
> re-usable by the VM system, it would only make a difference in the miss
> rate under very tense and unlikely circumstances. A few pages can make a
> big difference in the miss rate, but only if those few pages would allow
> the replacement policy to *just barely* keep the pages cached for long
> enough before they are referenced again.
See 10M, above.
> My goal was a different one: I just wanted some further simplification of
> the replacement mechanism. When a free page is allocated, it gets mapped
> into some address space and inserted into the active list (right?).
Inactive, initially. It changes with the vm-of-the-minute though.
> If we
> wanted the active and inactive lists to remain a constant size (and for
> the movement of pages through those lists to be really simple), we could
> immediately evict a page from the active list into the inactive list, and
> then evict some other page from the inactive list to the free list. If we
> did that, though, the use of a free list would be superfluous.
>
> Since the approach I'm describing performs the VM bookkeeping during
> allocation (and, thus, potentially, interrupt) time, it would be a poor
> choice. Evictions from the active and inactive lists must be performed at
> some other time. Doing so is a tad more complicated, and makes the
> behavior of the replacement policy harder to model. It seems, however,
> that to keep allocation fast, that bit of added complexity is necessary.
>
Well, we never evict from the active list - just from the tail of the
inactive list. But yes.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-09-03 1:11 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-02 19:50 Scott Kaplan
2002-09-02 20:33 ` Andrew Morton
2002-09-02 20:50 ` Rik van Riel
2002-09-02 21:21 ` Andrew Morton
2002-09-02 21:15 ` Rik van Riel
2002-09-02 21:58 ` Scott Kaplan
2002-09-03 1:11 ` Andrew Morton [this message]
2002-09-03 1:35 ` Rik van Riel
2002-09-03 5:12 ` William Lee Irwin III
2002-09-03 5:43 ` Andrew Morton
2002-09-03 5:43 ` William Lee Irwin III
2002-09-03 16:46 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3D740C35.9E190D04@zip.com.au \
--to=akpm@zip.com.au \
--cc=linux-mm@kvack.org \
--cc=sfkaplan@cs.amherst.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox