* Long time spent in swap_out &co
@ 2000-06-04 0:40 John Fremlin
2000-06-04 1:28 ` Rik van Riel
0 siblings, 1 reply; 4+ messages in thread
From: John Fremlin @ 2000-06-04 0:40 UTC (permalink / raw)
To: linux-mm
I had a look at vmscan.c and noticed that the swap_out process
selection procedure looks suboptimal (this is 2.4.0-test1-ac7 with
Rik's mm patch rev 3). If I make a mistake, please correct it gently
(I am a clueless newbie).
(a) The entire list of processes is scanned through each time
at least once. (Slow, and holding a lock.)
(b) The biggest rss is chosen. Admittedly the swap_cnt
heuristics help a bit but it means that a large process that
is on touching its pages will keep distracting attention from
more smaller processes that may or may not be more wasteful.
Suggestions
Guess a reasonable minimum size process to look at (say, twice
the average of the first couple of size_cnts) so the entire
list isn't scanned through so often and different processes
will be targeted first when all the size_cnts are reset.
Are we just dealing with the running processes? (If not, why
not first try to swap out the sleeping ones?)
Or, target processes with fewest page faults.
[I'm basically unconvinced of the idea of size_cnt]
Hard evidence
I set up a lot of processes to run, more than my box can
handle and a large proportion of SysReq-Ps had EIPs in
swap_out. (Waiting for lock? Not checked).
--
http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Long time spent in swap_out &co 2000-06-04 0:40 Long time spent in swap_out &co John Fremlin @ 2000-06-04 1:28 ` Rik van Riel 2000-06-04 13:54 ` John Fremlin 0 siblings, 1 reply; 4+ messages in thread From: Rik van Riel @ 2000-06-04 1:28 UTC (permalink / raw) To: John Fremlin; +Cc: linux-mm On 4 Jun 2000, John Fremlin wrote: > (a) The entire list of processes is scanned through each time > at least once. (Slow, and holding a lock.) This is not very slow, since it only looks at something like 3 or 4 numbers and flags per process. > (b) The biggest rss is chosen. Admittedly the swap_cnt > heuristics help a bit but it means that a large process that > is on touching its pages will keep distracting attention from > more smaller processes that may or may not be more wasteful. Please look at the 'assign' variable. We will chose the process with the biggest swap_cnt until swap_cnt for *all* processes is 0. Then we will reassign swap_cnt. This ensures that all processes get scanned fairly. Also, note the counter variable, we'll only scan up to a few processes, and we'll return after we have freed just one page. regards, Rik -- The Internet is not a network of computers. It is a network of people. That is its real strength. Wanna talk about the kernel? irc.openprojects.net / #kernelnewbies http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Long time spent in swap_out &co 2000-06-04 1:28 ` Rik van Riel @ 2000-06-04 13:54 ` John Fremlin 2000-06-04 15:43 ` PATCH: swap_out mega (100+ times) speedboost John Fremlin 0 siblings, 1 reply; 4+ messages in thread From: John Fremlin @ 2000-06-04 13:54 UTC (permalink / raw) To: linux-mm Rik van Riel <riel@conectiva.com.br> writes: > On 4 Jun 2000, John Fremlin wrote: > > > (a) The entire list of processes is scanned through each time > > at least once. (Slow, and holding a lock.) > > This is not very slow, since it only looks at something like > 3 or 4 numbers and flags per process. Perhaps, but you're traversing a linked list. That means that the task_struct entries will probably be widely dispersed, so that each one has to be fetched from main RAM, Then you look at the mm_struct (another miss?). According to "Modern Compiler Implementation in ML" (Andrew W. Appel, Cambridge University Press, 1998) a secondary cache miss is typically 100-200 cycles. So if we say around 300-400 cycles per iteration of the loop (assuming that the needed data in the two structs are fetched completely for each miss penalty), everything taken together. I'd say that's quite slow, but I guess assembly programming skews your outlook considerably ;-) Supposing 1000 tasks, that's a millisecond to do this scan once on a 300 MHz machine. (That is, the process is O(N) or worse due to cache hierarchies and so doesn't scale well). You could keep an list of 1st biggest, 2nd biggest, 3rd biggest, etc. up to "count" biggest so that you don't scan the list for each count. > > > (b) The biggest rss is chosen. Admittedly the swap_cnt > > heuristics help a bit but it means that a large process that > > is on touching its pages will keep distracting attention from > > more smaller processes that may or may not be more wasteful. > > Please look at the 'assign' variable. We will chose the process > with the biggest swap_cnt until swap_cnt for *all* processes is > 0. Yes, I saw that. But always picking on the very biggest first once the swap_cnts hit 0 seems unfair and wasteful. > Then we will reassign swap_cnt. This ensures that all processes > get scanned fairly. But why bother always picking the biggest first? Just pick the first that's got a non-zero swap_cnt would be faster and no less fair. I'll see if I can get a noticable improvement. > Also, note the counter variable, we'll only scan up to a few Supposing priority nr_threads counter 64 100 0 => 1 32 100 1 => 1 So that it is unlikely that we iterate more than a single time. Have I got my priorities wrong? ;-) If I choose 16: 16 100 25 => 25 > processes, and we'll return after we have freed just one page. Hmm. Seems a false speed economy (quicker to do a lot in one place that to do a little in many). We are scanning through all the running threads or all the threads? -- http://altern.org/vii -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* PATCH: swap_out mega (100+ times) speedboost 2000-06-04 13:54 ` John Fremlin @ 2000-06-04 15:43 ` John Fremlin 0 siblings, 0 replies; 4+ messages in thread From: John Fremlin @ 2000-06-04 15:43 UTC (permalink / raw) To: linux-kernel; +Cc: linux-mm The following patch improves responsiveness (by an immodestly ridiculous ammount) when there are hundreds of small processes running on a machine. Without it, it takes a few minutes for even a SysReq SAK to get through, with it, a few seconds (AMD K6-2 64Mb)!!! I am somewhat excited as this is my first kernel patch (I *can't resist* posting it to the kernel list). It is my thesis that 1) there is no point in finding the largest process (modified by size_cnt) in vmscan.c:swap_out as this is unfair and a bad heuristic; and 2) spending ages searching through the task list to find the very biggest is a waste of time and severely impacts performance; and 3) the size_cnt heuristic should be done someother way. My patch proves that searching through the task list for the largest size_cnt severely impacts performance when there are many (100s of) threads. As I explained (on linux-mm): > Perhaps, but you're traversing a linked list. That means that the > task_struct entries will probably be widely dispersed, so that each > one has to be fetched from main RAM, then you look at the mm_struct > (another miss?). According to "Modern Compiler Implementation in ML" > (Andrew W. Appel, Cambridge University Press, 1998) a secondary cache > miss is typically 100-200 cycles. So if we say around 300-400 cycles > per iteration of the loop (assuming that the needed data in the two > structs are fetched completely for each miss penalty), everything > taken together. I'd say that's quite slow, but I guess assembly > programming skews your outlook considerably ;-) The patch is against 2.4.0test1-ac7 with Rik's mmpatch version 3. It stops trying to get the biggest size_cnt on the task list (vmscan.c:swap_out), instead just picking the first possible one to swap out. That is, size_cnt is being (ab)used as a boolean. IMHO, I think it should be rethoughtout, so take this patch as a technology demonstration ;-) Normal system performance does not seem to be affected, but responsiveness under heavy load is increased considerably. Reports of performance slowdowns in any situations of course welcome! Hope I didn't break anything :=) --- linux-2.4.0t1a7m3/mm/vmscan.c Sat Jun 3 17:10:15 2000 +++ kernel-hacking/mm/vmscan.c Sun Jun 4 16:35:31 2000 @@ -361,23 +361,24 @@ /* * We make one or two passes through the task list, indexed by * assign = {0, 1}: - * Pass 1: select the swappable task with maximal RSS that has - * not yet been swapped out. + * + * Pass 1: select the first swappable task that has not yet + * been swapped out. + * * Pass 2: re-assign rss swap_cnt values, then select as above. * * With this approach, there's no need to remember the last task * swapped out. If the swap-out fails, we clear swap_cnt so the * task won't be selected again until all others have been tried. * - * Think of swap_cnt as a "shadow rss" - it tells us which process - * we want to page out (always try largest first). - */ + * Think of swap_cnt as a "shadow rss" - it tells us which + * process we want to page out (always try largest first). */ + counter = (nr_threads << 2) >> (priority >> 2); if (counter < 1) counter = 1; for (; counter >= 0; counter--) { - unsigned long max_cnt = 0; struct mm_struct *best = NULL; int pid = 0; int assign = 0; @@ -391,13 +392,14 @@ if (mm->rss <= 0) continue; /* Refresh swap_cnt? */ - if (assign == 1) + best = mm; + pid = p->pid; + + if (assign == 1){ mm->swap_cnt = mm->rss; - if (mm->swap_cnt > max_cnt) { - max_cnt = mm->swap_cnt; - best = mm; - pid = p->pid; } + else + break; } read_unlock(&tasklist_lock); if (!best) { -- http://altern.org/vii -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/ ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2000-06-04 15:43 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-06-04 0:40 Long time spent in swap_out &co John Fremlin 2000-06-04 1:28 ` Rik van Riel 2000-06-04 13:54 ` John Fremlin 2000-06-04 15:43 ` PATCH: swap_out mega (100+ times) speedboost John Fremlin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox