Long time spent in swap

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Long time spent in swap_out &co
@ 2000-06-04  0:40 John Fremlin
  2000-06-04  1:28 ` Rik van Riel
  0 siblings, 1 reply; 4+ messages in thread
From: John Fremlin @ 2000-06-04  0:40 UTC (permalink / raw)
  To: linux-mm

I had a look at vmscan.c and noticed that the swap_out process
selection procedure looks suboptimal (this is 2.4.0-test1-ac7 with
Rik's mm patch rev 3). If I make a mistake, please correct it gently
(I am a clueless newbie).

        (a) The entire list of processes is scanned through each time
        at least once. (Slow, and holding a lock.)

        (b) The biggest rss is chosen. Admittedly the swap_cnt
        heuristics help a bit but it means that a large process that
        is on touching its pages will keep distracting attention from
        more smaller processes that may or may not be more wasteful.

Suggestions

        Guess a reasonable minimum size process to look at (say, twice
        the average of the first couple of size_cnts) so the entire
        list isn't scanned through so often and different processes
        will be targeted first when all the size_cnts are reset.

        Are we just dealing with the running processes? (If not, why
        not first try to swap out the sleeping ones?)

        Or, target processes with fewest page faults.

        [I'm basically unconvinced of the idea of size_cnt]

Hard evidence

        I set up a lot of processes to run, more than my box can
        handle and a large proportion of SysReq-Ps had EIPs in
        swap_out. (Waiting for lock? Not checked).

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Long time spent in swap_out &co
  2000-06-04  0:40 Long time spent in swap_out &co John Fremlin
@ 2000-06-04  1:28 ` Rik van Riel
  2000-06-04 13:54   ` John Fremlin
  0 siblings, 1 reply; 4+ messages in thread
From: Rik van Riel @ 2000-06-04  1:28 UTC (permalink / raw)
  To: John Fremlin; +Cc: linux-mm

On 4 Jun 2000, John Fremlin wrote:

>         (a) The entire list of processes is scanned through each time
>         at least once. (Slow, and holding a lock.)

This is not very slow, since it only looks at something like
3 or 4 numbers and flags per process.

>         (b) The biggest rss is chosen. Admittedly the swap_cnt
>         heuristics help a bit but it means that a large process that
>         is on touching its pages will keep distracting attention from
>         more smaller processes that may or may not be more wasteful.

Please look at the 'assign' variable. We will chose the process
with the biggest swap_cnt until swap_cnt for *all* processes is
0.

Then we will reassign swap_cnt. This ensures that all processes
get scanned fairly.

Also, note the counter variable, we'll only scan up to a few
processes, and we'll return after we have freed just one page.

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

Wanna talk about the kernel?  irc.openprojects.net / #kernelnewbies
http://www.conectiva.com/		http://www.surriel.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Long time spent in swap_out &co
  2000-06-04  1:28 ` Rik van Riel
@ 2000-06-04 13:54   ` John Fremlin
  2000-06-04 15:43     ` PATCH: swap_out mega (100+ times) speedboost John Fremlin
  0 siblings, 1 reply; 4+ messages in thread
From: John Fremlin @ 2000-06-04 13:54 UTC (permalink / raw)
  To: linux-mm

Rik van Riel <riel@conectiva.com.br> writes:

> On 4 Jun 2000, John Fremlin wrote:
> 
> >         (a) The entire list of processes is scanned through each time
> >         at least once. (Slow, and holding a lock.)
> 
> This is not very slow, since it only looks at something like
> 3 or 4 numbers and flags per process.

Perhaps, but you're traversing a linked list. That means that the
task_struct entries will probably be widely dispersed, so that each
one has to be fetched from main RAM, Then you look at the mm_struct
(another miss?). According to "Modern Compiler Implementation in ML"
(Andrew W. Appel, Cambridge University Press, 1998) a secondary cache
miss is typically 100-200 cycles.  So if we say around 300-400 cycles
per iteration of the loop (assuming that the needed data in the two
structs are fetched completely for each miss penalty), everything
taken together. I'd say that's quite slow, but I guess assembly
programming skews your outlook considerably ;-)

Supposing 1000 tasks, that's a millisecond to do this scan once on a
300 MHz machine. (That is, the process is O(N) or worse due to cache
hierarchies and so doesn't scale well).

You could keep an list of 1st biggest, 2nd biggest, 3rd biggest,
etc. up to "count" biggest so that you don't scan the list for each
count.
> 
> >         (b) The biggest rss is chosen. Admittedly the swap_cnt
> >         heuristics help a bit but it means that a large process that
> >         is on touching its pages will keep distracting attention from
> >         more smaller processes that may or may not be more wasteful.
> 
> Please look at the 'assign' variable. We will chose the process
> with the biggest swap_cnt until swap_cnt for *all* processes is
> 0.

Yes, I saw that. But always picking on the very biggest first once the
swap_cnts hit 0 seems unfair and wasteful. 

> Then we will reassign swap_cnt. This ensures that all processes
> get scanned fairly.

But why bother always picking the biggest first? Just pick the first
that's got a non-zero swap_cnt would be faster and no less fair. I'll
see if I can get a noticable improvement.

> Also, note the counter variable, we'll only scan up to a few

Supposing 
        priority        nr_threads              counter
        64              100                     0 => 1
        32              100                     1 => 1

So that it is unlikely that we iterate more than a single time. Have I
got my priorities wrong? ;-) 

If I choose 16:

        16              100                     25 => 25

> processes, and we'll return after we have freed just one page.

Hmm. Seems a false speed economy (quicker to do a lot in one place
that to do a little in many).

We are scanning through all the running threads or all the threads?

-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* PATCH: swap_out mega (100+ times) speedboost
  2000-06-04 13:54   ` John Fremlin
@ 2000-06-04 15:43     ` John Fremlin
  0 siblings, 0 replies; 4+ messages in thread
From: John Fremlin @ 2000-06-04 15:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

The following patch improves responsiveness (by an immodestly
ridiculous ammount) when there are hundreds of small processes running
on a machine. Without it, it takes a few minutes for even a SysReq SAK
to get through, with it, a few seconds (AMD K6-2 64Mb)!!!  I am
somewhat excited as this is my first kernel patch (I *can't resist*
posting it to the kernel list).

It is my thesis that
        1) there is no point in finding the largest process (modified
by size_cnt) in vmscan.c:swap_out as this is unfair and a bad
heuristic; and
        2) spending ages searching through the task list to find the
very biggest is a waste of time and severely impacts performance; and
        3) the size_cnt heuristic should be done someother way.

My patch proves that searching through the task list for the largest
size_cnt severely impacts performance when there are many (100s of)
threads.

As I explained (on linux-mm):
> Perhaps, but you're traversing a linked list. That means that the
> task_struct entries will probably be widely dispersed, so that each
> one has to be fetched from main RAM, then you look at the mm_struct
> (another miss?). According to "Modern Compiler Implementation in ML"
> (Andrew W. Appel, Cambridge University Press, 1998) a secondary cache
> miss is typically 100-200 cycles.  So if we say around 300-400 cycles
> per iteration of the loop (assuming that the needed data in the two
> structs are fetched completely for each miss penalty), everything
> taken together. I'd say that's quite slow, but I guess assembly
> programming skews your outlook considerably ;-)

The patch is against 2.4.0test1-ac7 with Rik's mmpatch version 3. It
stops trying to get the biggest size_cnt on the task list
(vmscan.c:swap_out), instead just picking the first possible one to
swap out. That is, size_cnt is being (ab)used as a boolean. IMHO, I
think it should be rethoughtout, so take this patch as a technology
demonstration ;-)

Normal system performance does not seem to be affected, but
responsiveness under heavy load is increased considerably. Reports of
performance slowdowns in any situations of course welcome!

Hope I didn't break anything :=)

--- linux-2.4.0t1a7m3/mm/vmscan.c	Sat Jun  3 17:10:15 2000
+++ kernel-hacking/mm/vmscan.c	Sun Jun  4 16:35:31 2000
@@ -361,23 +361,24 @@
 	/* 
 	 * We make one or two passes through the task list, indexed by 
 	 * assign = {0, 1}:
-	 *   Pass 1: select the swappable task with maximal RSS that has
-	 *         not yet been swapped out. 
+	 *
+	 *   Pass 1: select the first swappable task that has not yet
+	 *   been swapped out.
+	 *
 	 *   Pass 2: re-assign rss swap_cnt values, then select as above.
 	 *
 	 * With this approach, there's no need to remember the last task
 	 * swapped out.  If the swap-out fails, we clear swap_cnt so the 
 	 * task won't be selected again until all others have been tried.
 	 *
-	 * Think of swap_cnt as a "shadow rss" - it tells us which process
-	 * we want to page out (always try largest first).
-	 */
+	 * Think of swap_cnt as a "shadow rss" - it tells us which
+	 * process we want to page out (always try largest first).  */
+	
 	counter = (nr_threads << 2) >> (priority >> 2);
 	if (counter < 1)
 		counter = 1;
 
 	for (; counter >= 0; counter--) {
-		unsigned long max_cnt = 0;
 		struct mm_struct *best = NULL;
 		int pid = 0;
 		int assign = 0;
@@ -391,13 +392,14 @@
 	 		if (mm->rss <= 0)
 				continue;
 			/* Refresh swap_cnt? */
-			if (assign == 1)
+			best = mm;
+			pid = p->pid;
+
+			if (assign == 1){
 				mm->swap_cnt = mm->rss;
-			if (mm->swap_cnt > max_cnt) {
-				max_cnt = mm->swap_cnt;
-				best = mm;
-				pid = p->pid;
 			}
+			else
+				break;
 		}
 		read_unlock(&tasklist_lock);
 		if (!best) {


-- 

	http://altern.org/vii
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-06-04 15:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-04  0:40 Long time spent in swap_out &co John Fremlin
2000-06-04  1:28 ` Rik van Riel
2000-06-04 13:54   ` John Fremlin
2000-06-04 15:43     ` PATCH: swap_out mega (100+ times) speedboost John Fremlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox