From: William Lee Irwin III <wli@holomorphy.com>
To: Andrew Morton <akpm@zip.com.au>
Cc: Rik van Riel <riel@conectiva.com.br>,
Dave McCracken <dmccr@us.ibm.com>,
Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [PATCH] Optimize out pte_chain take three
Date: Wed, 10 Jul 2002 18:51:02 -0700 [thread overview]
Message-ID: <20020711015102.GV25360@holomorphy.com> (raw)
In-Reply-To: <3D2CD3D3.B43E0E1F@zip.com.au>
William Lee Irwin III wrote:
>> Phenomenally harsh.
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> No offence I hope. Just venting three year's VM frustration.
None taken. I believe it's a strong advisory to change direction,
so I have.
William Lee Irwin III wrote:
>> Your criteria are quantitative. I can't immediately measure all
>> of them but can go about collecting missing data immediately and post
>> as I go, then. Perhaps I'll even have helpers. =)
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> A lot of it should be fairly simple. We have tons of pagecache-intensive
> workloads. But we have gaps when it comes to the VM. In the area of
> page replacement.
> Can we fill those gaps with a reasonable amount of effort?
I'm not entirely sure, but I do have ideas of what I think would
exercise specific (sub)functions of the VM.
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> example 1: "run thirty processes which mmap a common 50000 page file and
> touch its pages in a random-but-always-the-same pattern at fifty pages
> per second. Then run (dbench|tiobench|kernel build|slocate|foo). Then
> see how many pages the three processes actually managed to touch."
Are we looking for "doesn't evict long-lived stuff" or "figures out
when the long-lived stuff finally died?" Maybe both would be good.
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> example 2: "run a process which mallocs 60% of physical memory and
> touches it randomly at 1000 pages/sec. run a second process
> which mallocs 60% of physical memory and touches it randomly at
> 500 pages/sec. Measure aggregate throughput for both processes".
> example 3: "example 2, but do some pagecache stuff as well".
I think 2 and 3 should be merged, this should basically see how the
parallel pagecache stuff disturbs the prior results.
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> example 4: "run a process which mmaps 60%-worth of memory readonly,
> another which mmaps 60%-worth of memory MAP_SHARED. First process
> touches 1000 pages/sec. Second process modifies 1000 pages/sec.
> Optimise for throughput"
> Scriptable things. Things which we can optimise for with a
> reasonable expectation that this will improve real workloads.
Sequential/random access is another useful variable here.
William Lee Irwin III wrote:
>> I've already gone about asking for help benchmarking dmc's pte_chain
>> space optimization, and I envision the following list of TODO items
>> being things you're more interested in:
>> What other missing data are you after and which of these should
>> be chucked?
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> Well there are two phases to this. One is to run workloads,
> and the other is to analyse them. I think workloads such as my
> lame ones above are worth thinking about and setting up first,
> don't you? (And a lot of this will come from me picking your
> brains ;). I can do the coding, although by /bin/sh skills
> are woeful.)
> After that comes the analysis. Looks like rmap will be merged in
> the next few days for test-and-eval, so we don't need to go through
> some great beforehand-justification exercise. But we do need
> a permanent toolkit and we do need a way of optimising the VM.
Okay, this is relatively high priority then.
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> I would suggest that the toolkit consist of two things:
> 1: A set of scenarios and associated scripts/tools such as my
> examples above (except more real-worldy) and
> 2: Permanent in-kernel instrumentation which allows us (and
> remote testers) to understand what is happening in there.
At long last!
William Lee Irwin III wrote:
>> As far as operating regions for page replacement go I see 3 obvious ones:
>> (1) lots of writeback with no swap
>> (2) churning clean pages with no swap
>> (3) swapping
>> And each of these with several proportions of memory sharing.
>> Sound reasonable?
On Wed, Jul 10, 2002 at 05:39:47PM -0700, Andrew Morton wrote:
> Sounds ideal. Care to flesh these out into real use cases?
(1) would be a system dedicated to data collection in real-life.
Lots of writeback without swap is also a common database
scenario. At any rate, it's intended to measure how effective
and/or efficient replacement is when writeback is required to
satisfy allocations.
(2) would be a system dedicated to distributing data in real life.
This would be a read-only database scenario or (ignoring
single vs. multiple files) webserving. It's intended to
measure how effective page replacement is when clean pages
must be discarded to satisfy allocations.
These two testcases would be constellations of sibling processes
mmapping a shared file-backed memory block larger than memory
(where it departs from database-land) and stomping all over it
in various patterns. Random, forward sequential, backward
sequential, mixtures of different patterns across processes,
mixtures of different patterns over time, varying frequencies
of access to different regions of the arena and checking to see
that proportions of memory being reclaimed corresponds to
frequency of access, and sudden shifts between (1) and (2) would
all be good things to measure, but the combinatorial explosion
of options here hurts.
The measurable criterion is of course throughput shoveling data.
But there are many useful bits to measure here, e.g. the amount
of cpu consumed by replacement, how much actually gets scanned
for that cpu cost, and how right or wrong the guesses were when
variable frequency access is used.
(3) this is going to be a "slightly overburdened desktop" workload
here it's difficult to get a notion of how to appropriately
simulate the workload, but I have a number of things in mind
as to how to measure some useful things for one known case:
(A) there are a number (300?) of tasks that start up, fault in
some crud, and sleep forever -- in fact the majority
They're just there in the background to chew up some
space with per-task and per-mm pinned pages.
(B) there are a number of tasks that are spawned when timer
goes off, then they stomp on a metric buttload of stuff
and exit until spawned for the next interval, e.g.
updatedb, and of course none of the data it uses is
useful for anything else and is all used just once.
This can be dcache, buffer cache, pagecache, or anything.
(C) there are a number of tasks that get prodded and fault on
minor amounts of stuff and have mostly clean mappings
but occasionally they'll all sleep for a long time
(e.g. like end-users sleeping through updatedb runs)
Basically, get a pool of processes, let them sleep,
shoot a random process with signals at random times,
and when shot, a process stomps over some clean data
with basically random access before going back to
sleep.
The goal is to throw (A) out the window, control how much is
ever taken away from (C) and given to (B), and keep (C), who
is the real user of the thing, from seeing horribly bad worst
cases like after updatedb runs at 3AM. Or, perhaps, figuring
out who the real users are is the whole problem...
Okay, even though the updatedb stuff can (and must) be solved
with dcache-specific stuff, the general problem remains as a
different kind of memory can be allocated in the same way.
Trying to fool the VM with different kinds of one-shot
allocations is probably the best variable here; specific timings
aren't really very interesting.
The number that comes out of this is of course the peak
pagefault rate of the tasks in class (C).
The other swapping case is Netscape vs. xmms:
One huge big fat memory hog is so bloated its working set alone
drives the machine to swapping. This is poked and prodded at
repeatedly, at which time it of course faults in a bunch more
garbage from its inordinately and unreasonably bloated beyond
belief working set. Now poor little innocent xmms is running
in what's essentially a fixed amount of memory and cpu but
can't lose the cpu or the disk for too long or mp3's will skip.
This test is meant to exercise the VM's control over how much cpu
and -disk bandwidth- it chews at one time so that well-behaved
users don't see unreasonable latencies as the VM swamps their
needed resources.
I suspect the way to automate it is to generate signals to wake
the bloated mem hog at random intervals, and that mem hog then
randomly stomps over a bunch of memory. The innocent victim
keeps a fixed-size arena where it sequentially slurps in fresh
files, generates gibberish from their contents into a dedicated
piece of its fixed-size arena, and then squirts out the data at
what it wants to be a fixed rate to some character device. So
the write buffer will always be dirty and the read buffer clean,
except it doesn't really mmap. Oh, and of course, it takes a
substantial but not overwhelming amount of cpu (25-40% for
ancient p200's or something?) to generate the stuff it writes.
Then the metric used is the variability in the read and write
rates and %cpu used of the victim.
Uh-oh, this stuff might take a while to write... any userspace helpers
around?
Cheers,
Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2002-07-11 1:51 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-07-09 19:04 [PATCH] Optimize out pte_chain take two Dave McCracken
2002-07-10 0:21 ` Andrew Morton
2002-07-10 14:33 ` [PATCH] Optimize out pte_chain take three Dave McCracken
2002-07-10 15:18 ` Rik van Riel
2002-07-10 17:32 ` William Lee Irwin III
2002-07-10 20:01 ` Andrew Morton
2002-07-10 20:14 ` Rik van Riel
2002-07-10 20:28 ` Andrew Morton
2002-07-10 20:38 ` Rik van Riel
2002-07-13 13:42 ` Daniel Phillips
2002-07-10 20:33 ` Martin J. Bligh
2002-07-10 22:22 ` William Lee Irwin III
2002-07-11 0:39 ` Andrew Morton
2002-07-11 0:47 ` Rik van Riel
2002-07-11 1:27 ` Andrew Morton
2002-07-13 14:10 ` Daniel Phillips
2002-07-11 1:51 ` William Lee Irwin III [this message]
2002-07-11 2:28 ` William Lee Irwin III
2002-07-11 19:54 ` Andrew Morton
2002-07-11 20:05 ` Rik van Riel
2002-07-11 20:42 ` Andrew Morton
2002-07-11 20:54 ` Rik van Riel
2002-07-11 21:16 ` Andrew Morton
2002-07-11 21:41 ` Rik van Riel
2002-07-11 22:38 ` Andrew Morton
2002-07-11 23:18 ` Rik van Riel
2002-07-12 18:27 ` Paul Larson
2002-07-12 19:06 ` Andrew Morton
2002-07-12 19:28 ` Andrew Morton
2002-07-13 15:08 ` Daniel Phillips
2002-07-11 22:54 ` William Lee Irwin III
2002-07-13 14:52 ` Daniel Phillips
2002-07-13 14:08 ` Daniel Phillips
2002-07-13 14:20 ` Daniel Phillips
2002-07-13 14:45 ` Daniel Phillips
2002-07-13 13:22 ` Daniel Phillips
2002-07-13 13:30 ` William Lee Irwin III
2002-07-13 13:55 ` Daniel Phillips
2002-07-13 13:41 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020711015102.GV25360@holomorphy.com \
--to=wli@holomorphy.com \
--cc=akpm@zip.com.au \
--cc=dmccr@us.ibm.com \
--cc=linux-mm@kvack.org \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox