Dear Peter, Thanks for your quick response. Our patch is indeed an extension of the LRU-token approach. I am a friend of Song Jiang who wrote the LRU-token and continued his approach; however unlike his approach we suggest to block some processes on thrashing. Blocking some processes because of an IO as you mentioned in your email can sometimes help, but sometimes cannot help, so it is not enough. LRU-token approach works on multi-processors, so I do not see a major difference why our approach cannot work. The change in the scheduler will take effect just when the calculation is done. The working set is calculated by the resident size (RSS) of a process which is quite close to the working set if the system is on memory pressure. I am attaching again my paper. Thanks again for your comments, -Yair. ------------------------------------------------------------------------- Dr. Yair Wiseman, Ph.D. Computer Science Department Bar-Ilan University Ramat-Gan 52900 Israel Tel: 972-3-5317015 Fax: 972-3-7384056 http://www.cs.biu.ac.il/~wiseman >From the keyboard of Peter Zijlstra > On Thu, 2009-08-06 at 11:14 +0300, Yair Wiseman wrote: >> Dear Peter Zijlstra, >> >> We would like to suggest our patch to Linux scheduler. I am attaching >> our paper about it. Can you please consider it? > > You seem to have based your work on 2.6.9 which is completely > unrepresentative for a current (2.6.30) Linux kernel. Both the scheduler > and the page reclaim infrastructure have undergone significant rewrites > since. > > The scheduler is now a Proportional Fair Scheduler and the Page Reclaim > bits have been split into two, using SEQ for anonymous pages and the old > double CLOCK like LRU for file pages. > > Still, non of that seems to invalidate your proposal in the conceptual > way. > > However, your proposal seems flawed in a significant way, it assumes a > global state -- being able to bin-pack all processes' working set size > into memory. > > [ with the minor nit that its terribly hard to determine the working set > size for a process ] > > Linux needs to support large CC-NUMA machines (the currently largest > know single system image machine to run Linux had 4096 CPUs), which > renders any such scheme pointless. > > [ I might have mis-understood or missed a detail, for I only read the > paper in a brief way -- please correct me when I seem to mis-represent > your ideas ] > > > Now work on anti-thrashing is interesting, and I do encourage you to > continue. However I would urge you to look at ways that scale per-cpu or > per-node. That is, look at distributed algorithms that work on local > state but converge to this global goal without the severe > synchronization costs. > > > Personally I don't think we need to modify the task scheduler, since > processes blocked on (swap) IO aren't runnable, so by managing the > contention on the IO devices and simply not serving pages for some > you'll end up with the same result. > > This could be an extension of the LRU-token approach. > > Note that this doesn't restrict itself to swap but can also be regular > file based paging which is not always considered but can be the > predominant trashing cause for many workloads (databases). > > > I thank you for sharing your research, and maybe we can continue this > discussion. If you are indeed interested in further communication with > me, or I would urge, the Linux community at large, I would recommend > including the linux-mm (Memory Management) mailing > list, Rik van Riel and Johannes Weiner > . > > Kind regards, > > Peter >