From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from host-54.subnet-241.amherst.edu (sfkaplan@host-54.subnet-241.amherst.edu [148.85.241.54]) by amherst.edu (PMDF V5.2-33 #45524) with ESMTP id <01K3BKLR13AMA4P5XO@amherst.edu> for linux-mm@kvack.org; Tue, 8 May 2001 12:25:38 EDT Date: Tue, 08 May 2001 08:25:47 -0400 (EDT) From: "Scott F. Kaplan" Subject: Re: on load control / process swapping In-reply-to: Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org List-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Okay, in responding to this topic, I will issue a warning: I'm looking at this from an academic point of view, and probably won't give as much attention to what is reasonable to engineer as some people might like. That said, I think I might have some useful thoughts...y'all can be the judge of that. On Mon, 7 May 2001, Rik van Riel wrote: > In short, the process suspension / wake up code only does > load control in the sense that system load is reduced, but > absolutely no effort is made to ensure that individual > programs can run without thrashing. This, of course, kind of > defeats the purpose of doing load control in the first place. First, I agree -- To suspend a process without any calculation that will indicate that the suspension will reduce the page fault rate is to operate blindly. Performing such a calculation, though, requires some information about the locality characteristics of each process, based on recent reference behavior. What would be really nice is some indication as to how much additional space would reduce paging for each of the processes that will remain active. For some, a little extra space won't help much, and for others, a little extra space is just what it needs for a significant reduction. Determining which processes are which, and just how much "a little extra" needs to be, seems important in this context. Second, a nit pick: We're using the term "thrashing" in so many ways that it would be nice to standardize on something so that we understand one another. As I understand it, the textbook definition of thrashing is the point at which CPU utilization falls because all active processes are I/O bound. That is, thrashing is a system-wide characteristic, and not applicable to individual processes. That's why some people have pointed out that "thrashing" and "heavy paging" aren't the same thing. A single process can cause heavy paging while the CPU is still fully loaded with the work of other processes. So, given the paragraph above, are you talking a single process that may still be paging heavily, in spite of the additional free space created by process suspension? (Like I said, it was a nit pick.) I'm assuming that's what you mean. > Any solution will have to address the following points: > > 1) allow the resident processes to stay resident long > enough to make progess Seems reasonable. > 2) make sure the resident processes aren't thrashing, > that is, don't let new processes back in memory if > none of the currently resident processes is "ready" > to be suspended What does it mean to be ready to be suspended? I'm confused by this one. > 3) have a mechanism to detect thrashing in a VM > subsystem which isn't rate-limited (hard?) What's your definition of "thrashing" here? If it's the system-wide version, detection of this situation doesn't seem to be too difficult: When all processes are stalled on page faults, and that situation obtains over time recently, then the system is thrashing. Detecting whether or not a single process is thrashing (paging hopelessly) is a different matter. You could deactivate this process (or some other in the hopes of helping this process), but it could be the case the a reallocation of space could stop this process from paging so heavily while not increasing the paging rate of any other process substantially. > and, for extra brownie points: > 4) fairness, small processes can be paged in and out > faster, so we can suspend&resume them faster; this > has the side effect of leaving the proverbial root > shell more usable I think point should have greater significance. The very issue at hand is that fairness and throughput are at odds when there is contention for memory. The central question (I think) is, "Given paging sufficiently detrimental to progress, *how* unfair should the system be in order to restore progress and increase throughput?" Note that if we want increased throughput, we can easily come up with a scheme that almost completely throws fairness to the wind, and we'll get great reductions in total paging and incrases in process throughput. For a time-sharing system, though, there should probably a limit to the unfairness. There has never been a really good solution to this kind of problem, and there seems to be two important sides to it: 1) Given a level of fairness that you want to maintain, how can you keep the paging as low as possible? 2) Given the unfairness you're willing to use, how can you select eligible processes intelligently so as to maximize the reduction in total paging? Question 1 is associated, and an important problem, but not part of the issue here. Question 2 seems to be the central question, and a hard one. I have trouble believing that any solution to Question 2 will make sense if it does not refer directly to the reference behavior of both the suspended process, and the reference behavior of the remaining active processes. I also have trouble with any solution to Question 2 that doesn't take into account the cost associated with the deactivation and reactivation steps. When a process is reactivated, it's going to cause substantial paging activity, and so it needs not to be done too frequently. If you're going to be unfair, then leave the deactivated process out for long enough that the cost of paging it back in will be a small fraction of the total time spent on the deactivation/reactivation activities. I hope these are useful thoughts. Despite all of my complaining here, I think this problem has been insufficiently addressed for a long time. Working Set counted on it, but there was never a study that showed a good strategy for deacivation/reactivation, in spite of the fact that different choices could significantly affect the results. I'd like very much to see a solution to this particular problem. Scott -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE69+Wz8eFdWQtoOmgRAopvAJ0QuVPjUFZU5Pa78JsNUSgndKmGGwCdGJ2/ YKDVahEmCMm7yfoSXnrvfE4= =Ql2h -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/