From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from localhost (riel@localhost) by duckman.distro.conectiva (8.9.3/8.8.7) with ESMTP id CAA30868 for ; Sun, 20 Aug 2000 02:18:44 -0300 Date: Sun, 20 Aug 2000 02:18:44 -0300 (BRST) From: Rik van Riel Subject: differential throttling & local page replacement Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org List-ID: Hi, I think I have worked out two (related) things to make the system (more) resistant against thrashing memory hogs, more interactive during heavy IO load and resistant against heavy disk writes. The basic idea revolves around one statistic, which is kept locally per process and globally, lets call it fault_rate. On every page fault (in handle_mm_fault()) and on write() calls, we do a current->fault_rate++, global_fault_rate++; The value will be decayed on a regular basis. (that this may count write faults twice is probably a beneficial effect) This will mean that for every process we will know the percentage of page faults in the system are for this process (current->fault_rate * 100 / global_fault_rate) and we can use this ratio for 2 different, but related, things. First there's write throttling. The current write throttling code has the disadvantage that one heavily writing thread can stall the others too. The formula (in balance_dirty_state()) is as follows: if (nr_dirty * 100 > total * max_dirty_percentage/2) wake_up_bdflush(NOWAIT); if (nr_dirty * 100 > total * max_dirty_percentage) wake_up_bdflush(WAIT); Since waiting on a disk write is rather expensive, it may well be worth it to change the second formula a bit to take into account the percentage of VM activity of the current process and put VM or disk bandwidth hogs to sleep a bit earlier so they won't impact the performance of lower activity tasks. local_max_percentage = (max_dirty_percentage - \ ((current->fault_rate * max_dirty_percentage/3) / \ global_fault_rate); if (nr_dirty * 100 > total * local_max_percentage) wake_up_bdflush(WAIT); This will still allow disk bandwidth eaters to run at full disk speed, but without having any of the less IO intensive processes stalled on IO. The second place where we can use these statistics is in making sure VM hogs (where a hog is defined as a process that has lots of page faults and causes the VM subsystem to be busy) don't impact the rest of the system too much. The idea here is to make the VM hog do local page replacement some percentage of the time, the percentage of course being its own (current->fault_rate * 100 / global_fault_rate). To prevent a task from doing only local page replacement while there are idle pages sitting around, this should only happen when the amount of free+freeable memory is _lower_ than the amount of memory kswapd tries to keep free, so the local page replacement only kicks in at the moment where our task truly starts to burden the rest of the system. So if: if (nr_free_pages() + nr_inactive_clean_pages() < kswapd_goal / 2) { do_local_fault_rate_percentage_of_the_time() swap_out_mm(current->mm); } Or, alternatively: if (nr_free_pages() + nr_inactive_clean_pages() < kswapd_goal * local_fault_ratio) swap_out_mm(current->mm); This way the memory hog (say, mmap002, tar, ...) will do some page replacement on itself to protect the rest of the system. Of course we need to do something like that for the page cache too, but that should be trivial. regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/